Learning-Free Unsupervised Extractive Summarization Model
Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of de...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.14358-14368 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 14368 |
---|---|
container_issue | |
container_start_page | 14358 |
container_title | IEEE access |
container_volume | 9 |
creator | Jang, Myeongjun Kang, Pilsung |
description | Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process. |
doi_str_mv | 10.1109/ACCESS.2021.3051237 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2483240545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9321308</ieee_id><doaj_id>oai_doaj_org_article_e4ac439b04f749eab4b8eb09602bfd3e</doaj_id><sourcerecordid>2483240545</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</originalsourceid><addsrcrecordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d62ZaOwSRtIKWHNGchyeugkNipZIe2X1-lLqF72WWYmR0GoSnBM0KwfJqX5WKzmVFMyYzhjFBW3KARJblMWcby23_3PZqEsMdxRISyYoTkGrRvXLNLlx4g2TahP4E_uwBVsvjsvLadO0Oy6Y9H7d237lzbJK9tBYcHdFfrQ4DJ3x6j7XLxXr6k67fnVTlfp5Zj0aWVMZWQ0nANxNZ5wWTGSEWYyKWJSFZXvM6JtZwWkmgDOAcuCDeSigJnpmBjtBp8q1bv1cm7GORLtdqpX6D1O6V95-wBFHBtOZMG87rgErThRoDBMsfU1BWD6PU4eJ18-9FD6NS-7X0T4yvKBaMcZzyLLDawrG9D8FBfvxKsLpWroXJ1qVz9VR5V00HlAOCqkIwShgX7AclJfE4</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2483240545</pqid></control><display><type>article</type><title>Learning-Free Unsupervised Extractive Summarization Model</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Jang, Myeongjun ; Kang, Pilsung</creator><creatorcontrib>Jang, Myeongjun ; Kang, Pilsung</creatorcontrib><description>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3051237</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Computational modeling ; Data mining ; Deep learning ; Feature extraction ; integer linear programming ; Integer programming ; Machine learning ; Mathematical models ; natural language processing ; Parameters ; Principal components analysis ; sentence representation vector ; Task analysis ; Text summarization ; Training ; Training data</subject><ispartof>IEEE access, 2021, Vol.9, p.14358-14368</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</citedby><cites>FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</cites><orcidid>0000-0002-9352-4799 ; 0000-0001-7663-3937</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9321308$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54911</link.rule.ids></links><search><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><title>Learning-Free Unsupervised Extractive Summarization Model</title><title>IEEE access</title><addtitle>Access</addtitle><description>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</description><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Data mining</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>integer linear programming</subject><subject>Integer programming</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>natural language processing</subject><subject>Parameters</subject><subject>Principal components analysis</subject><subject>sentence representation vector</subject><subject>Task analysis</subject><subject>Text summarization</subject><subject>Training</subject><subject>Training data</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d62ZaOwSRtIKWHNGchyeugkNipZIe2X1-lLqF72WWYmR0GoSnBM0KwfJqX5WKzmVFMyYzhjFBW3KARJblMWcby23_3PZqEsMdxRISyYoTkGrRvXLNLlx4g2TahP4E_uwBVsvjsvLadO0Oy6Y9H7d237lzbJK9tBYcHdFfrQ4DJ3x6j7XLxXr6k67fnVTlfp5Zj0aWVMZWQ0nANxNZ5wWTGSEWYyKWJSFZXvM6JtZwWkmgDOAcuCDeSigJnpmBjtBp8q1bv1cm7GORLtdqpX6D1O6V95-wBFHBtOZMG87rgErThRoDBMsfU1BWD6PU4eJ18-9FD6NS-7X0T4yvKBaMcZzyLLDawrG9D8FBfvxKsLpWroXJ1qVz9VR5V00HlAOCqkIwShgX7AclJfE4</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Jang, Myeongjun</creator><creator>Kang, Pilsung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9352-4799</orcidid><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid></search><sort><creationdate>2021</creationdate><title>Learning-Free Unsupervised Extractive Summarization Model</title><author>Jang, Myeongjun ; Kang, Pilsung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-dbbd899b4ae1cf6739531d13869be1c5fd4f61cc42791abe06e4814b928705b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Data mining</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>integer linear programming</topic><topic>Integer programming</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>natural language processing</topic><topic>Parameters</topic><topic>Principal components analysis</topic><topic>sentence representation vector</topic><topic>Task analysis</topic><topic>Text summarization</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jang, Myeongjun</creatorcontrib><creatorcontrib>Kang, Pilsung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jang, Myeongjun</au><au>Kang, Pilsung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning-Free Unsupervised Extractive Summarization Model</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>14358</spage><epage>14368</epage><pages>14358-14368</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Text summarization is an information condensation technique that abbreviates a source document to a few representative sentences with the intention to create a coherent summary containing relevant information of source corpora. This promising subject has been rapidly developed since the advent of deep learning. However, summarization models based on deep neural network have several critical shortcomings. First, a large amount of labeled training data is necessary. This problem is standard for low-resource languages in which publicly available labeled data do not exist. In addition, a significant amount of computational ability is required to train neural models with enormous network parameters. In this study, we propose a model called Learning Free Integer Programming Summarizer (LFIP-SUM), which is an unsupervised extractive summarization model. The advantage of our approach is that parameter training is unnecessary because the model does not require any labeled training data. To achieve this, we formulate an integer programming problem based on pre-trained sentence embedding vectors. We also use principal component analysis to automatically determine the number of sentences to be extracted and to evaluate the importance of each sentence. Experimental results demonstrate that the proposed model exhibits generally acceptable performance compared with deep learning summarization models although it does not learn any parameters during the model construction process.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3051237</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-9352-4799</orcidid><orcidid>https://orcid.org/0000-0001-7663-3937</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.14358-14368 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2483240545 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Artificial neural networks Computational modeling Data mining Deep learning Feature extraction integer linear programming Integer programming Machine learning Mathematical models natural language processing Parameters Principal components analysis sentence representation vector Task analysis Text summarization Training Training data |
title | Learning-Free Unsupervised Extractive Summarization Model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T23%3A33%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning-Free%20Unsupervised%20Extractive%20Summarization%20Model&rft.jtitle=IEEE%20access&rft.au=Jang,%20Myeongjun&rft.date=2021&rft.volume=9&rft.spage=14358&rft.epage=14368&rft.pages=14358-14368&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3051237&rft_dat=%3Cproquest_ieee_%3E2483240545%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2483240545&rft_id=info:pmid/&rft_ieee_id=9321308&rft_doaj_id=oai_doaj_org_article_e4ac439b04f749eab4b8eb09602bfd3e&rfr_iscdi=true |