Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning

•Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obt...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2019-06, Vol.123, p.195-211
Hauptverfasser: Alami, Nabil, Meknassi, Mohammed, En-nahnahi, Noureddine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 211
container_issue
container_start_page 195
container_title Expert systems with applications
container_volume 123
creator Alami, Nabil
Meknassi, Mohammed
En-nahnahi, Noureddine
description •Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obtains the best results. The vast amounts of data being collected and analyzed have led to invaluable source of information, which needs to be easily handled by humans. Automatic Text Summarization (ATS) systems enable users to get the gist of information and knowledge in a short time in order to make critical decisions quickly. Deep neural networks have proven their ability to achieve excellent performance in many real-world Natural Language Processing and computer vision applications. However, it still lacks attention in ATS. The key problem of traditional applications is that they involve high dimensional and sparse data, which makes it difficult to capture relevant information. One technique for overcoming these problems is learning features via dimensionality reduction. On the other hand, word embedding is another neural network technique that generates a much more compact word representation than a traditional Bag-of-Words (BOW) approach. In this paper, we are seeking to enhance the quality of ATS by integrating unsupervised deep neural network techniques with word embedding approach. First, we develop a word embedding based text summarization, and we show that Word2Vec representation gives better results than traditional BOW representation. Second, we propose other models by combining word2vec and unsupervised feature learning methods in order to merge information from different sources. We show that unsupervised neural networks models trained on Word2Vec representation give better results than those trained on BOW representation. Third, we also propose three ensemble techniques. The first ensemble combines BOW and word2vec using a majority voting technique. The second ensemble aggregates the information provided by the BOW approach and unsupervised neural networks. The third ensemble aggregates the information provided by Word2Vec and unsupervised neural networks. We show that the ensemble methods improve the quality of ATS, in particular the ensemble based on word2vec approach gives better results. Finally, we perform different experiments to evaluate the performance of the investigated models. We use two kind of datasets that are publically available for
doi_str_mv 10.1016/j.eswa.2019.01.037
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2193150625</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417419300375</els_id><sourcerecordid>2193150625</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-1cf1e854aa864a7ae88f1ac10d81353b19ad77a393213f7311ae773bc84ae4bd3</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIlMIPcIrEOcEbJ3UicUFVeUiVuMDZ2tgb6tI6xU5a4OtxVM6cRjuamd0dxq6BZ8BhdrvOKBwwyznUGYeMC3nCJlBJkc5kLU7ZhNelTAuQxTm7CGHNOUjO5YTZhVuh09a9J4MLw4783gYyiaPB4yZCf-j8R0gaHNmevvokDNstevuDve1ccrD9Kokak9C2IWPGJHRxciESG0o2hN5F9pKdtbgJdPWHU_b2sHidP6XLl8fn-f0y1SKv-hR0C1SVBWI1K1AiVVULqIGbCkQpGqjRSImiFjmIVgoAJClFo6sCqWiMmLKbY-7Od58DhV6tu8G7uFLlUAso-Swvoyo_qrTvQvDUqp238a1vBVyNlaq1GitVY6WKg4qVRtPd0UTx_r0lr4K25DQZ60n3ynT2P_svtgGB-Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2193150625</pqid></control><display><type>article</type><title>Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Alami, Nabil ; Meknassi, Mohammed ; En-nahnahi, Noureddine</creator><creatorcontrib>Alami, Nabil ; Meknassi, Mohammed ; En-nahnahi, Noureddine</creatorcontrib><description>•Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obtains the best results. The vast amounts of data being collected and analyzed have led to invaluable source of information, which needs to be easily handled by humans. Automatic Text Summarization (ATS) systems enable users to get the gist of information and knowledge in a short time in order to make critical decisions quickly. Deep neural networks have proven their ability to achieve excellent performance in many real-world Natural Language Processing and computer vision applications. However, it still lacks attention in ATS. The key problem of traditional applications is that they involve high dimensional and sparse data, which makes it difficult to capture relevant information. One technique for overcoming these problems is learning features via dimensionality reduction. On the other hand, word embedding is another neural network technique that generates a much more compact word representation than a traditional Bag-of-Words (BOW) approach. In this paper, we are seeking to enhance the quality of ATS by integrating unsupervised deep neural network techniques with word embedding approach. First, we develop a word embedding based text summarization, and we show that Word2Vec representation gives better results than traditional BOW representation. Second, we propose other models by combining word2vec and unsupervised feature learning methods in order to merge information from different sources. We show that unsupervised neural networks models trained on Word2Vec representation give better results than those trained on BOW representation. Third, we also propose three ensemble techniques. The first ensemble combines BOW and word2vec using a majority voting technique. The second ensemble aggregates the information provided by the BOW approach and unsupervised neural networks. The third ensemble aggregates the information provided by Word2Vec and unsupervised neural networks. We show that the ensemble methods improve the quality of ATS, in particular the ensemble based on word2vec approach gives better results. Finally, we perform different experiments to evaluate the performance of the investigated models. We use two kind of datasets that are publically available for evaluating ATS task. Results of statistical studies affirm that word embedding-based models outperform the summarization task compared to those based on BOW approach. In particular, ensemble learning technique with Word2Vec representation surpass all the investigated models.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2019.01.037</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Aggregates ; Artificial neural networks ; Auto-encoder ; Computer vision ; Embedding ; Ensemble learning ; Extreme learning machine ; Natural language processing ; Neural networks ; Performance evaluation ; Representations ; Text summarization ; Variational auto-encoder ; Word2vec</subject><ispartof>Expert systems with applications, 2019-06, Vol.123, p.195-211</ispartof><rights>2019 Elsevier Ltd</rights><rights>Copyright Elsevier BV Jun 1, 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-1cf1e854aa864a7ae88f1ac10d81353b19ad77a393213f7311ae773bc84ae4bd3</citedby><cites>FETCH-LOGICAL-c328t-1cf1e854aa864a7ae88f1ac10d81353b19ad77a393213f7311ae773bc84ae4bd3</cites><orcidid>0000-0003-1641-1501</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2019.01.037$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Alami, Nabil</creatorcontrib><creatorcontrib>Meknassi, Mohammed</creatorcontrib><creatorcontrib>En-nahnahi, Noureddine</creatorcontrib><title>Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning</title><title>Expert systems with applications</title><description>•Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obtains the best results. The vast amounts of data being collected and analyzed have led to invaluable source of information, which needs to be easily handled by humans. Automatic Text Summarization (ATS) systems enable users to get the gist of information and knowledge in a short time in order to make critical decisions quickly. Deep neural networks have proven their ability to achieve excellent performance in many real-world Natural Language Processing and computer vision applications. However, it still lacks attention in ATS. The key problem of traditional applications is that they involve high dimensional and sparse data, which makes it difficult to capture relevant information. One technique for overcoming these problems is learning features via dimensionality reduction. On the other hand, word embedding is another neural network technique that generates a much more compact word representation than a traditional Bag-of-Words (BOW) approach. In this paper, we are seeking to enhance the quality of ATS by integrating unsupervised deep neural network techniques with word embedding approach. First, we develop a word embedding based text summarization, and we show that Word2Vec representation gives better results than traditional BOW representation. Second, we propose other models by combining word2vec and unsupervised feature learning methods in order to merge information from different sources. We show that unsupervised neural networks models trained on Word2Vec representation give better results than those trained on BOW representation. Third, we also propose three ensemble techniques. The first ensemble combines BOW and word2vec using a majority voting technique. The second ensemble aggregates the information provided by the BOW approach and unsupervised neural networks. The third ensemble aggregates the information provided by Word2Vec and unsupervised neural networks. We show that the ensemble methods improve the quality of ATS, in particular the ensemble based on word2vec approach gives better results. Finally, we perform different experiments to evaluate the performance of the investigated models. We use two kind of datasets that are publically available for evaluating ATS task. Results of statistical studies affirm that word embedding-based models outperform the summarization task compared to those based on BOW approach. In particular, ensemble learning technique with Word2Vec representation surpass all the investigated models.</description><subject>Aggregates</subject><subject>Artificial neural networks</subject><subject>Auto-encoder</subject><subject>Computer vision</subject><subject>Embedding</subject><subject>Ensemble learning</subject><subject>Extreme learning machine</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Performance evaluation</subject><subject>Representations</subject><subject>Text summarization</subject><subject>Variational auto-encoder</subject><subject>Word2vec</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9UMtOwzAQtBBIlMIPcIrEOcEbJ3UicUFVeUiVuMDZ2tgb6tI6xU5a4OtxVM6cRjuamd0dxq6BZ8BhdrvOKBwwyznUGYeMC3nCJlBJkc5kLU7ZhNelTAuQxTm7CGHNOUjO5YTZhVuh09a9J4MLw4783gYyiaPB4yZCf-j8R0gaHNmevvokDNstevuDve1ccrD9Kokak9C2IWPGJHRxciESG0o2hN5F9pKdtbgJdPWHU_b2sHidP6XLl8fn-f0y1SKv-hR0C1SVBWI1K1AiVVULqIGbCkQpGqjRSImiFjmIVgoAJClFo6sCqWiMmLKbY-7Od58DhV6tu8G7uFLlUAso-Swvoyo_qrTvQvDUqp238a1vBVyNlaq1GitVY6WKg4qVRtPd0UTx_r0lr4K25DQZ60n3ynT2P_svtgGB-Q</recordid><startdate>20190601</startdate><enddate>20190601</enddate><creator>Alami, Nabil</creator><creator>Meknassi, Mohammed</creator><creator>En-nahnahi, Noureddine</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1641-1501</orcidid></search><sort><creationdate>20190601</creationdate><title>Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning</title><author>Alami, Nabil ; Meknassi, Mohammed ; En-nahnahi, Noureddine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-1cf1e854aa864a7ae88f1ac10d81353b19ad77a393213f7311ae773bc84ae4bd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Aggregates</topic><topic>Artificial neural networks</topic><topic>Auto-encoder</topic><topic>Computer vision</topic><topic>Embedding</topic><topic>Ensemble learning</topic><topic>Extreme learning machine</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Performance evaluation</topic><topic>Representations</topic><topic>Text summarization</topic><topic>Variational auto-encoder</topic><topic>Word2vec</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alami, Nabil</creatorcontrib><creatorcontrib>Meknassi, Mohammed</creatorcontrib><creatorcontrib>En-nahnahi, Noureddine</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alami, Nabil</au><au>Meknassi, Mohammed</au><au>En-nahnahi, Noureddine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning</atitle><jtitle>Expert systems with applications</jtitle><date>2019-06-01</date><risdate>2019</risdate><volume>123</volume><spage>195</spage><epage>211</epage><pages>195-211</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•Word2vec representation improves the summarization task compared to bag of words.•Feature learning using unsupervised neural networks improves the summarization task.•Unsupervised neural networks trained on word2vec vectors gives promising results.•Ensemble learning with word2vec representation obtains the best results. The vast amounts of data being collected and analyzed have led to invaluable source of information, which needs to be easily handled by humans. Automatic Text Summarization (ATS) systems enable users to get the gist of information and knowledge in a short time in order to make critical decisions quickly. Deep neural networks have proven their ability to achieve excellent performance in many real-world Natural Language Processing and computer vision applications. However, it still lacks attention in ATS. The key problem of traditional applications is that they involve high dimensional and sparse data, which makes it difficult to capture relevant information. One technique for overcoming these problems is learning features via dimensionality reduction. On the other hand, word embedding is another neural network technique that generates a much more compact word representation than a traditional Bag-of-Words (BOW) approach. In this paper, we are seeking to enhance the quality of ATS by integrating unsupervised deep neural network techniques with word embedding approach. First, we develop a word embedding based text summarization, and we show that Word2Vec representation gives better results than traditional BOW representation. Second, we propose other models by combining word2vec and unsupervised feature learning methods in order to merge information from different sources. We show that unsupervised neural networks models trained on Word2Vec representation give better results than those trained on BOW representation. Third, we also propose three ensemble techniques. The first ensemble combines BOW and word2vec using a majority voting technique. The second ensemble aggregates the information provided by the BOW approach and unsupervised neural networks. The third ensemble aggregates the information provided by Word2Vec and unsupervised neural networks. We show that the ensemble methods improve the quality of ATS, in particular the ensemble based on word2vec approach gives better results. Finally, we perform different experiments to evaluate the performance of the investigated models. We use two kind of datasets that are publically available for evaluating ATS task. Results of statistical studies affirm that word embedding-based models outperform the summarization task compared to those based on BOW approach. In particular, ensemble learning technique with Word2Vec representation surpass all the investigated models.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2019.01.037</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0003-1641-1501</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2019-06, Vol.123, p.195-211
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2193150625
source ScienceDirect Journals (5 years ago - present)
subjects Aggregates
Artificial neural networks
Auto-encoder
Computer vision
Embedding
Ensemble learning
Extreme learning machine
Natural language processing
Neural networks
Performance evaluation
Representations
Text summarization
Variational auto-encoder
Word2vec
title Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T00%3A13%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20unsupervised%20neural%20networks%20based%20text%20summarization%20with%20word%20embedding%20and%20ensemble%20learning&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Alami,%20Nabil&rft.date=2019-06-01&rft.volume=123&rft.spage=195&rft.epage=211&rft.pages=195-211&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2019.01.037&rft_dat=%3Cproquest_cross%3E2193150625%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2193150625&rft_id=info:pmid/&rft_els_id=S0957417419300375&rfr_iscdi=true