Learning-Free Unsupervised Extractive Summarization Model

Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.14358-14368
Hauptverfasser: Jang, Myeongjun, Kang, Pilsung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 14368
container_issue
container_start_page 14358
container_title IEEE access
container_volume 9
creator Jang, Myeongjun
Kang, Pilsung
description Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.
doi_str_mv 10.1109/ACCESS.2021.3051237
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2483240545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9321308</ieee_id><doaj_id>oai_doaj_org_article_e4ac439b04f749eab4b8eb09602bfd3e</doaj_id><sourcerecordid>2483240545</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</originalsourceid><addsrcrecordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d62ZaOwSRtIKWHNGchyeugkNipZIe2X1-lLqF72WWYmR0GoSnBM0KwfJqX5WKzmVFMyYzhjFBW3KARJblMWcby23_3PZqEsMdxRISyYoTkGrRvXLNLlx4g2TahP4E_uwBVsvjsvLadO0Oy6Y9H7d237lzbJK9tBYcHdFfrQ4DJ3x6j7XLxXr6k67fnVTlfp5Zj0aWVMZWQ0nANxNZ5wWTGSEWYyKWJSFZXvM6JtZwWkmgDOAcuCDeSigJnpmBjtBp8q1bv1cm7GORLtdqpX6D1O6V95-wBFHBtOZMG87rgErThRoDBMsfU1BWD6PU4eJ18-9FD6NS-7X0T4yvKBaMcZzyLLDawrG9D8FBfvxKsLpWroXJ1qVz9VR5V00HlAOCqkIwShgX7AclJfE4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2483240545</pqid></control><display><type>article</type><title>Learning-Free Unsupervised Extractive Summarization Model</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Jang, Myeongjun ; Kang, Pilsung</creator><creatorcontrib>Jang, Myeongjun ; Kang, Pilsung</creatorcontrib><description>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3051237</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Computational modeling ; Data mining ; Deep learning ; Feature extraction ; integer linear programming ; Integer programming ; Machine learning ; Mathematical models ; natural language processing ; Parameters ; Principal components analysis ; sentence representation vector ; Task analysis ; Text summarization ; Training ; Training data</subject><ispartof>IEEE access, 2021, Vol.9, p.14358-14368</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</citedby><cites>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</cites><orcidid>0000-0002-9352-4799 ; 0000-0001-7663-3937</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9321308$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54911</link.rule.ids></links><search><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><title>Learning-Free Unsupervised Extractive Summarization Model</title><title>IEEE access</title><addtitle>Access</addtitle><description>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</description><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Data mining</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>integer linear programming</subject><subject>Integer programming</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>natural language processing</subject><subject>Parameters</subject><subject>Principal components analysis</subject><subject>sentence representation vector</subject><subject>Task analysis</subject><subject>Text summarization</subject><subject>Training</subject><subject>Training data</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d62ZaOwSRtIKWHNGchyeugkNipZIe2X1-lLqF72WWYmR0GoSnBM0KwfJqX5WKzmVFMyYzhjFBW3KARJblMWcby23_3PZqEsMdxRISyYoTkGrRvXLNLlx4g2TahP4E_uwBVsvjsvLadO0Oy6Y9H7d237lzbJK9tBYcHdFfrQ4DJ3x6j7XLxXr6k67fnVTlfp5Zj0aWVMZWQ0nANxNZ5wWTGSEWYyKWJSFZXvM6JtZwWkmgDOAcuCDeSigJnpmBjtBp8q1bv1cm7GORLtdqpX6D1O6V95-wBFHBtOZMG87rgErThRoDBMsfU1BWD6PU4eJ18-9FD6NS-7X0T4yvKBaMcZzyLLDawrG9D8FBfvxKsLpWroXJ1qVz9VR5V00HlAOCqkIwShgX7AclJfE4</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Jang, Myeongjun</creator><creator>Kang, Pilsung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9352-4799</orcidid><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid></search><sort><creationdate>2021</creationdate><title>Learning-Free Unsupervised Extractive Summarization Model</title><author>Jang, Myeongjun ; Kang, Pilsung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Data mining</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>integer linear programming</topic><topic>Integer programming</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>natural language processing</topic><topic>Parameters</topic><topic>Principal components analysis</topic><topic>sentence representation vector</topic><topic>Task analysis</topic><topic>Text summarization</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jang, Myeongjun</au><au>Kang, Pilsung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning-Free Unsupervised Extractive Summarization Model</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>14358</spage><epage>14368</epage><pages>14358-14368</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3051237</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-9352-4799</orcidid><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.14358-14368
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2483240545
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Artificial neural networks
Computational modeling
Data mining
Deep learning
Feature extraction
integer linear programming
Integer programming
Machine learning
Mathematical models
natural language processing
Parameters
Principal components analysis
sentence representation vector
Task analysis
Text summarization
Training
Training data
title Learning-Free Unsupervised Extractive Summarization Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T23%3A33%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning-Free%20Unsupervised%20Extractive%20Summarization%20Model&rft.jtitle=IEEE%20access&rft.au=Jang,%20Myeongjun&rft.date=2021&rft.volume=9&rft.spage=14358&rft.epage=14368&rft.pages=14358-14368&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3051237&rft_dat=%3Cproquest_ieee_%3E2483240545%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2483240545&rft_id=info:pmid/&rft_ieee_id=9321308&rft_doaj_id=oai_doaj_org_article_e4ac439b04f749eab4b8eb09602bfd3e&rfr_iscdi=true