DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN

Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of web engineering 2017-12, Vol.16 (7-8), p.653
Hauptverfasser:	Ma, Wen, Luo, Xiangfeng, JUNYU XUAN, Xue, RuiRong, Guo, Yike
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Dirichlet problem Documents Patent applications Representations Semantics Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	7-8
container_start_page	653
container_title	Journal of web engineering
container_volume	16
creator	Ma, Wen Luo, Xiangfeng JUNYU XUAN Xue, RuiRong Guo, Yike
description	Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3055525010</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3055525010</sourcerecordid><originalsourceid>FETCH-LOGICAL-p183t-a53e5ea32c28c1262b2827d131a257199e4bbc02072514eaf45beea550be9fb33</originalsourceid><addsrcrecordid>eNotjcFKAzEURYMoWKv_EHAdSF7ymgm4GdLUBtqZwURdlmTMLIrY2mn_30Fd3Xvgcu4VmQlUiqHRi-vfzpnBytySu3Hcc640AM7I09IH2765Fxrctm6itzS2nbeB-oZ2dXRNDPTdx_WENQ2ds341bZbttvbNPbkZ0udYHv5zTl5XLto127TP3tYbdhSVPLOEsmBJEnqoegELyFCB_hBSJEAtjCkq554D14BClTQozKUkRJ6LGbKUc_L45z2eDt-XMp53-8Pl9DVd7iRHREAuuPwBZyU-Ww</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055525010</pqid></control><display><type>article</type><title>DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN</title><source>ProQuest Central</source><creator>Ma, Wen ; Luo, Xiangfeng ; JUNYU XUAN ; Xue, RuiRong ; Guo, Yike</creator><creatorcontrib>Ma, Wen ; Luo, Xiangfeng ; JUNYU XUAN ; Xue, RuiRong ; Guo, Yike</creatorcontrib><description>Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.</description><identifier>ISSN: 1540-9589</identifier><identifier>EISSN: 1544-5976</identifier><language>eng</language><publisher>Milan: River Publishers</publisher><subject>Accuracy ; Dirichlet problem ; Documents ; Patent applications ; Representations ; Semantics ; Words (language)</subject><ispartof>Journal of web engineering, 2017-12, Vol.16 (7-8), p.653</ispartof><rights>2017. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3055525010?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,21367,33721,43781</link.rule.ids></links><search><creatorcontrib>Ma, Wen</creatorcontrib><creatorcontrib>Luo, Xiangfeng</creatorcontrib><creatorcontrib>JUNYU XUAN</creatorcontrib><creatorcontrib>Xue, RuiRong</creatorcontrib><creatorcontrib>Guo, Yike</creatorcontrib><title>DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN</title><title>Journal of web engineering</title><description>Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.</description><subject>Accuracy</subject><subject>Dirichlet problem</subject><subject>Documents</subject><subject>Patent applications</subject><subject>Representations</subject><subject>Semantics</subject><subject>Words (language)</subject><issn>1540-9589</issn><issn>1544-5976</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNotjcFKAzEURYMoWKv_EHAdSF7ymgm4GdLUBtqZwURdlmTMLIrY2mn_30Fd3Xvgcu4VmQlUiqHRi-vfzpnBytySu3Hcc640AM7I09IH2765Fxrctm6itzS2nbeB-oZ2dXRNDPTdx_WENQ2ds341bZbttvbNPbkZ0udYHv5zTl5XLto127TP3tYbdhSVPLOEsmBJEnqoegELyFCB_hBSJEAtjCkq554D14BClTQozKUkRJ6LGbKUc_L45z2eDt-XMp53-8Pl9DVd7iRHREAuuPwBZyU-Ww</recordid><startdate>20171201</startdate><enddate>20171201</enddate><creator>Ma, Wen</creator><creator>Luo, Xiangfeng</creator><creator>JUNYU XUAN</creator><creator>Xue, RuiRong</creator><creator>Guo, Yike</creator><general>River Publishers</general><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20171201</creationdate><title>DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN</title><author>Ma, Wen ; Luo, Xiangfeng ; JUNYU XUAN ; Xue, RuiRong ; Guo, Yike</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p183t-a53e5ea32c28c1262b2827d131a257199e4bbc02072514eaf45beea550be9fb33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Accuracy</topic><topic>Dirichlet problem</topic><topic>Documents</topic><topic>Patent applications</topic><topic>Representations</topic><topic>Semantics</topic><topic>Words (language)</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Wen</creatorcontrib><creatorcontrib>Luo, Xiangfeng</creatorcontrib><creatorcontrib>JUNYU XUAN</creatorcontrib><creatorcontrib>Xue, RuiRong</creatorcontrib><creatorcontrib>Guo, Yike</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Journal of web engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Wen</au><au>Luo, Xiangfeng</au><au>JUNYU XUAN</au><au>Xue, RuiRong</au><au>Guo, Yike</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN</atitle><jtitle>Journal of web engineering</jtitle><date>2017-12-01</date><risdate>2017</risdate><volume>16</volume><issue>7-8</issue><spage>653</spage><pages>653-</pages><issn>1540-9589</issn><eissn>1544-5976</eissn><abstract>Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.</abstract><cop>Milan</cop><pub>River Publishers</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1540-9589
ispartof	Journal of web engineering, 2017-12, Vol.16 (7-8), p.653
issn	1540-9589 1544-5976
language	eng
recordid	cdi_proquest_journals_3055525010
source	ProQuest Central
subjects	Accuracy Dirichlet problem Documents Patent applications Representations Semantics Words (language)
title	DISCOVER SEMANTIC TOPICS IN PATENTS WITHIN A SPECIFIC DOMAIN
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T00%3A46%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DISCOVER%20SEMANTIC%20TOPICS%20IN%20PATENTS%20WITHIN%20A%20SPECIFIC%20DOMAIN&rft.jtitle=Journal%20of%20web%20engineering&rft.au=Ma,%20Wen&rft.date=2017-12-01&rft.volume=16&rft.issue=7-8&rft.spage=653&rft.pages=653-&rft.issn=1540-9589&rft.eissn=1544-5976&rft_id=info:doi/&rft_dat=%3Cproquest%3E3055525010%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055525010&rft_id=info:pmid/&rfr_iscdi=true