Learning-Free Unsupervised Extractive Summarization Model

Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of de...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.14358-14368
Hauptverfasser:	Jang, Myeongjun, Kang, Pilsung
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Computational modeling Data mining Deep learning Feature extraction integer linear programming Integer programming Machine learning Mathematical models natural language processing Parameters Principal components analysis sentence representation vector Task analysis Text summarization Training Training data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	14368
container_issue
container_start_page	14358
container_title	IEEE access
container_volume	9
creator	Jang, Myeongjun Kang, Pilsung
description	Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.
doi_str_mv	10.1109/ACCESS.2021.3051237
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2483240545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9321308</ieee_id><doaj_id>oai_doaj_org_article_e4ac439b04f749eab4b8eb09602bfd3e</doaj_id><sourcerecordid>2483240545</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</originalsourceid><addsrcrecordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d62ZaOwSRtIKWHNGchyeugkNipZIe2X1-lLqF72WWYmR0GoSnBM0KwfJqX5WKzmVFMyYzhjFBW3KARJblMWcby23_3PZqEsMdxRISyYoTkGrRvXLNLlx4g2TahP4E_uwBVsvjsvLadO0Oy6Y9H7d237lzbJK9tBYcHdFfrQ4DJ3x6j7XLxXr6k67fnVTlfp5Zj0aWVMZWQ0nANxNZ5wWTGSEWYyKWJSFZXvM6JtZwWkmgDOAcuCDeSigJnpmBjtBp8q1bv1cm7GORLtdqpX6D1O6V95-wBFHBtOZMG87rgErThRoDBMsfU1BWD6PU4eJ18-9FD6NS-7X0T4yvKBaMcZzyLLDawrG9D8FBfvxKsLpWroXJ1qVz9VR5V00HlAOCqkIwShgX7AclJfE4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2483240545</pqid></control><display><type>article</type><title>Learning-Free Unsupervised Extractive Summarization Model</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Jang, Myeongjun ; Kang, Pilsung</creator><creatorcontrib>Jang, Myeongjun ; Kang, Pilsung</creatorcontrib><description>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3051237</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Computational modeling ; Data mining ; Deep learning ; Feature extraction ; integer linear programming ; Integer programming ; Machine learning ; Mathematical models ; natural language processing ; Parameters ; Principal components analysis ; sentence representation vector ; Task analysis ; Text summarization ; Training ; Training data</subject><ispartof>IEEE access, 2021, Vol.9, p.14358-14368</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</citedby><cites>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</cites><orcidid>0000-0002-9352-4799 ; 0000-0001-7663-3937</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9321308$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54911</link.rule.ids></links><search><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><title>Learning-Free Unsupervised Extractive Summarization Model</title><title>IEEE access</title><addtitle>Access</addtitle><description>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</description><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Data mining</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>integer linear programming</subject><subject>Integer programming</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>natural language processing</subject><subject>Parameters</subject><subject>Principal components analysis</subject><subject>sentence representation vector</subject><subject>Task analysis</subject><subject>Text summarization</subject><subject>Training</subject><subject>Training data</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d62ZaOwSRtIKWHNGchyeugkNipZIe2X1-lLqF72WWYmR0GoSnBM0KwfJqX5WKzmVFMyYzhjFBW3KARJblMWcby23_3PZqEsMdxRISyYoTkGrRvXLNLlx4g2TahP4E_uwBVsvjsvLadO0Oy6Y9H7d237lzbJK9tBYcHdFfrQ4DJ3x6j7XLxXr6k67fnVTlfp5Zj0aWVMZWQ0nANxNZ5wWTGSEWYyKWJSFZXvM6JtZwWkmgDOAcuCDeSigJnpmBjtBp8q1bv1cm7GORLtdqpX6D1O6V95-wBFHBtOZMG87rgErThRoDBMsfU1BWD6PU4eJ18-9FD6NS-7X0T4yvKBaMcZzyLLDawrG9D8FBfvxKsLpWroXJ1qVz9VR5V00HlAOCqkIwShgX7AclJfE4</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Jang, Myeongjun</creator><creator>Kang, Pilsung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9352-4799</orcidid><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid></search><sort><creationdate>2021</creationdate><title>Learning-Free Unsupervised Extractive Summarization Model</title><author>Jang, Myeongjun ; Kang, Pilsung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Data mining</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>integer linear programming</topic><topic>Integer programming</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>natural language processing</topic><topic>Parameters</topic><topic>Principal components analysis</topic><topic>sentence representation vector</topic><topic>Task analysis</topic><topic>Text summarization</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jang, Myeongjun</au><au>Kang, Pilsung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning-Free Unsupervised Extractive Summarization Model</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>14358</spage><epage>14368</epage><pages>14358-14368</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3051237</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-9352-4799</orcidid><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2021, Vol.9, p.14358-14368
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2483240545
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Artificial neural networks Computational modeling Data mining Deep learning Feature extraction integer linear programming Integer programming Machine learning Mathematical models natural language processing Parameters Principal components analysis sentence representation vector Task analysis Text summarization Training Training data
title	Learning-Free Unsupervised Extractive Summarization Model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T23%3A33%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning-Free%20Unsupervised%20Extractive%20Summarization%20Model&rft.jtitle=IEEE%20access&rft.au=Jang,%20Myeongjun&rft.date=2021&rft.volume=9&rft.spage=14358&rft.epage=14368&rft.pages=14358-14368&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3051237&rft_dat=%3Cproquest_ieee_%3E2483240545%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2483240545&rft_id=info:pmid/&rft_ieee_id=9321308&rft_doaj_id=oai_doaj_org_article_e4ac439b04f749eab4b8eb09602bfd3e&rfr_iscdi=true