Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users’ information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2017-05, Vol.53 (3), p.640-652
Hauptverfasser: AL-Smadi, Mohammad, Jaradat, Zain, AL-Ayyoub, Mahmoud, Jararweh, Yaser
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 652
container_issue 3
container_start_page 640
container_title Information processing & management
container_volume 53
creator AL-Smadi, Mohammad
Jaradat, Zain
AL-Ayyoub, Mahmoud
Jararweh, Yaser
description The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users’ information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.
doi_str_mv 10.1016/j.ipm.2017.01.002
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1885709465</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306457316302382</els_id><sourcerecordid>4321564771</sourcerecordid><originalsourceid>FETCH-LOGICAL-c325t-2904ab4342eb71e793f50ef4c3431a8a316c41077fc165f648c1693f6745f9af3</originalsourceid><addsrcrecordid>eNp9kEFvEzEQhS0EEqHwA7hZ4trdzqzt9UacqqqFSpXaQzlbjjMGR5vd4HFo8gP437gKl156etLM955mnhCfEVoE7C82bdpt2w7QtoAtQPdGLHCwqjHK4luxAAV9o41V78UH5g0AaIPdQvx98NnvfmXPJNOappJiCr6keZJ-Wkumra-zIAsdiuS0TaPPqRzr0o9HTizTJC-zX1VkoieW5YmosNxzmn7KkQ41bDyXfJyKDzXn_GVqJF_2mfijeBf9yPTpv56JHzfXj1ffm7v7b7dXl3dNUJ0pTbcE7Vda6Y5WFskuVTRAUQelFfrBK-yDRrA2BuxN7PVQtUK91SYufVRn4sspd5fn33vi4jbzPtdX2OEwGAtL3ZtK4YkKeWbOFN0up63PR4fgntt2G1fbds9tO0BX266erycP1fP_JMqOQ6Ip0DplCsWt5_SK-x9gKomy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1885709465</pqid></control><display><type>article</type><title>Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>AL-Smadi, Mohammad ; Jaradat, Zain ; AL-Ayyoub, Mahmoud ; Jararweh, Yaser</creator><creatorcontrib>AL-Smadi, Mohammad ; Jaradat, Zain ; AL-Ayyoub, Mahmoud ; Jararweh, Yaser</creatorcontrib><description>The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users’ information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.</description><identifier>ISSN: 0306-4573</identifier><identifier>EISSN: 1873-5371</identifier><identifier>DOI: 10.1016/j.ipm.2017.01.002</identifier><identifier>CODEN: IPMADK</identifier><language>eng</language><publisher>Oxford: Elsevier Ltd</publisher><subject>Arabic language ; Content analysis ; Digital media ; Experimentation ; Feature extraction ; Maximum entropy ; Maximum entropy method ; Natural language processing ; News ; Paraphrase identification ; Regression analysis ; Semantic analysis ; Semantic text similarity ; Semantics ; Similarity ; Social networks ; Studies ; Support vector machines</subject><ispartof>Information processing &amp; management, 2017-05, Vol.53 (3), p.640-652</ispartof><rights>2017 Elsevier Ltd</rights><rights>Copyright Pergamon Press Inc. May 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c325t-2904ab4342eb71e793f50ef4c3431a8a316c41077fc165f648c1693f6745f9af3</citedby><cites>FETCH-LOGICAL-c325t-2904ab4342eb71e793f50ef4c3431a8a316c41077fc165f648c1693f6745f9af3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.ipm.2017.01.002$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>AL-Smadi, Mohammad</creatorcontrib><creatorcontrib>Jaradat, Zain</creatorcontrib><creatorcontrib>AL-Ayyoub, Mahmoud</creatorcontrib><creatorcontrib>Jararweh, Yaser</creatorcontrib><title>Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features</title><title>Information processing &amp; management</title><description>The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users’ information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.</description><subject>Arabic language</subject><subject>Content analysis</subject><subject>Digital media</subject><subject>Experimentation</subject><subject>Feature extraction</subject><subject>Maximum entropy</subject><subject>Maximum entropy method</subject><subject>Natural language processing</subject><subject>News</subject><subject>Paraphrase identification</subject><subject>Regression analysis</subject><subject>Semantic analysis</subject><subject>Semantic text similarity</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Social networks</subject><subject>Studies</subject><subject>Support vector machines</subject><issn>0306-4573</issn><issn>1873-5371</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNp9kEFvEzEQhS0EEqHwA7hZ4trdzqzt9UacqqqFSpXaQzlbjjMGR5vd4HFo8gP437gKl156etLM955mnhCfEVoE7C82bdpt2w7QtoAtQPdGLHCwqjHK4luxAAV9o41V78UH5g0AaIPdQvx98NnvfmXPJNOappJiCr6keZJ-Wkumra-zIAsdiuS0TaPPqRzr0o9HTizTJC-zX1VkoieW5YmosNxzmn7KkQ41bDyXfJyKDzXn_GVqJF_2mfijeBf9yPTpv56JHzfXj1ffm7v7b7dXl3dNUJ0pTbcE7Vda6Y5WFskuVTRAUQelFfrBK-yDRrA2BuxN7PVQtUK91SYufVRn4sspd5fn33vi4jbzPtdX2OEwGAtL3ZtK4YkKeWbOFN0up63PR4fgntt2G1fbds9tO0BX266erycP1fP_JMqOQ6Ip0DplCsWt5_SK-x9gKomy</recordid><startdate>201705</startdate><enddate>201705</enddate><creator>AL-Smadi, Mohammad</creator><creator>Jaradat, Zain</creator><creator>AL-Ayyoub, Mahmoud</creator><creator>Jararweh, Yaser</creator><general>Elsevier Ltd</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>201705</creationdate><title>Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features</title><author>AL-Smadi, Mohammad ; Jaradat, Zain ; AL-Ayyoub, Mahmoud ; Jararweh, Yaser</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c325t-2904ab4342eb71e793f50ef4c3431a8a316c41077fc165f648c1693f6745f9af3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Arabic language</topic><topic>Content analysis</topic><topic>Digital media</topic><topic>Experimentation</topic><topic>Feature extraction</topic><topic>Maximum entropy</topic><topic>Maximum entropy method</topic><topic>Natural language processing</topic><topic>News</topic><topic>Paraphrase identification</topic><topic>Regression analysis</topic><topic>Semantic analysis</topic><topic>Semantic text similarity</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Social networks</topic><topic>Studies</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>AL-Smadi, Mohammad</creatorcontrib><creatorcontrib>Jaradat, Zain</creatorcontrib><creatorcontrib>AL-Ayyoub, Mahmoud</creatorcontrib><creatorcontrib>Jararweh, Yaser</creatorcontrib><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><jtitle>Information processing &amp; management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>AL-Smadi, Mohammad</au><au>Jaradat, Zain</au><au>AL-Ayyoub, Mahmoud</au><au>Jararweh, Yaser</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features</atitle><jtitle>Information processing &amp; management</jtitle><date>2017-05</date><risdate>2017</risdate><volume>53</volume><issue>3</issue><spage>640</spage><epage>652</epage><pages>640-652</pages><issn>0306-4573</issn><eissn>1873-5371</eissn><coden>IPMADK</coden><abstract>The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users’ information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.</abstract><cop>Oxford</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.ipm.2017.01.002</doi><tpages>13</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0306-4573
ispartof Information processing & management, 2017-05, Vol.53 (3), p.640-652
issn 0306-4573
1873-5371
language eng
recordid cdi_proquest_journals_1885709465
source ScienceDirect Journals (5 years ago - present)
subjects Arabic language
Content analysis
Digital media
Experimentation
Feature extraction
Maximum entropy
Maximum entropy method
Natural language processing
News
Paraphrase identification
Regression analysis
Semantic analysis
Semantic text similarity
Semantics
Similarity
Social networks
Studies
Support vector machines
title Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T09%3A39%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Paraphrase%20identification%20and%20semantic%20text%20similarity%20analysis%20in%20Arabic%20news%20tweets%20using%20lexical,%20syntactic,%20and%20semantic%20features&rft.jtitle=Information%20processing%20&%20management&rft.au=AL-Smadi,%20Mohammad&rft.date=2017-05&rft.volume=53&rft.issue=3&rft.spage=640&rft.epage=652&rft.pages=640-652&rft.issn=0306-4573&rft.eissn=1873-5371&rft.coden=IPMADK&rft_id=info:doi/10.1016/j.ipm.2017.01.002&rft_dat=%3Cproquest_cross%3E4321564771%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1885709465&rft_id=info:pmid/&rft_els_id=S0306457316302382&rfr_iscdi=true