A Machine Learning Model to Identify Duplicate Questions in Social Media Forums

In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provid...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of innovative technology and exploring engineering 2020-02, Vol.9 (4), p.370-373
Hauptverfasser: Panda, Sandeep Kumar, Bhalerao, Vivek, AR, Sathya
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 373
container_issue 4
container_start_page 370
container_title International journal of innovative technology and exploring engineering
container_volume 9
creator Panda, Sandeep Kumar
Bhalerao, Vivek
AR, Sathya
description In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provided by Quora, requires many modifications before training machine learning models to obtain a good accuracy. These modifications include feature extraction, vectorization and tokenization after which the data is ready for training desired models. While analyzing each model after prediction, it gives plenty of information about its efficiency and many other factors. Later, these information of different models are compared and helps to choose the best model. These models later can be combined and used as a single model with best accuracy. In this paper, a Machine Learning model which will predict duplicate questions is proposed.
doi_str_mv 10.35940/ijitee.D1362.029420
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_35940_ijitee_D1362_029420</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_35940_ijitee_D1362_029420</sourcerecordid><originalsourceid>FETCH-LOGICAL-c910-299d894d052019cad529d78543b4b7f443b5a0cfca44480ed5d67bb9a4add543</originalsourceid><addsrcrecordid>eNpN0L1uwjAcBHCraqUiyht08AuE_v2VxCOC0iIlQhXdLcd2WqNgozgZePsiwtDpbjjd8EPolcCSCcnhzR_94NxyQ1hOl0Alp_CAZpQWZcagEI__-jNapHQEAMI4KXM5Q_sVrrX59cHhyuk--PCD62hdh4eId9aFwbcXvBnPnTd6cPhrdGnwMSTsAz5E43WHa2e9xtvYj6f0gp5a3SW3uOccHbbv3-vPrNp_7NarKjOSQEaltKXkFgQFIo22gkpblIKzhjdFy68pNJjWaM55Cc4KmxdNIzXX1l5Xc8SnV9PHlHrXqnPvT7q_KALqpqImFXVTUZMK-wNMQlbL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Machine Learning Model to Identify Duplicate Questions in Social Media Forums</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Panda, Sandeep Kumar ; Bhalerao, Vivek ; AR, Sathya</creator><creatorcontrib>Panda, Sandeep Kumar ; Bhalerao, Vivek ; AR, Sathya ; Department of Computer Science and Engineering, Faculty of Science and Technology, IcfaiTech, ICFAI Foundation for Higher Education, Hyderabad, India</creatorcontrib><description>In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provided by Quora, requires many modifications before training machine learning models to obtain a good accuracy. These modifications include feature extraction, vectorization and tokenization after which the data is ready for training desired models. While analyzing each model after prediction, it gives plenty of information about its efficiency and many other factors. Later, these information of different models are compared and helps to choose the best model. These models later can be combined and used as a single model with best accuracy. In this paper, a Machine Learning model which will predict duplicate questions is proposed.</description><identifier>ISSN: 2278-3075</identifier><identifier>EISSN: 2278-3075</identifier><identifier>DOI: 10.35940/ijitee.D1362.029420</identifier><language>eng</language><ispartof>International journal of innovative technology and exploring engineering, 2020-02, Vol.9 (4), p.370-373</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906</link.rule.ids></links><search><creatorcontrib>Panda, Sandeep Kumar</creatorcontrib><creatorcontrib>Bhalerao, Vivek</creatorcontrib><creatorcontrib>AR, Sathya</creatorcontrib><creatorcontrib>Department of Computer Science and Engineering, Faculty of Science and Technology, IcfaiTech, ICFAI Foundation for Higher Education, Hyderabad, India</creatorcontrib><title>A Machine Learning Model to Identify Duplicate Questions in Social Media Forums</title><title>International journal of innovative technology and exploring engineering</title><description>In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provided by Quora, requires many modifications before training machine learning models to obtain a good accuracy. These modifications include feature extraction, vectorization and tokenization after which the data is ready for training desired models. While analyzing each model after prediction, it gives plenty of information about its efficiency and many other factors. Later, these information of different models are compared and helps to choose the best model. These models later can be combined and used as a single model with best accuracy. In this paper, a Machine Learning model which will predict duplicate questions is proposed.</description><issn>2278-3075</issn><issn>2278-3075</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNpN0L1uwjAcBHCraqUiyht08AuE_v2VxCOC0iIlQhXdLcd2WqNgozgZePsiwtDpbjjd8EPolcCSCcnhzR_94NxyQ1hOl0Alp_CAZpQWZcagEI__-jNapHQEAMI4KXM5Q_sVrrX59cHhyuk--PCD62hdh4eId9aFwbcXvBnPnTd6cPhrdGnwMSTsAz5E43WHa2e9xtvYj6f0gp5a3SW3uOccHbbv3-vPrNp_7NarKjOSQEaltKXkFgQFIo22gkpblIKzhjdFy68pNJjWaM55Cc4KmxdNIzXX1l5Xc8SnV9PHlHrXqnPvT7q_KALqpqImFXVTUZMK-wNMQlbL</recordid><startdate>20200228</startdate><enddate>20200228</enddate><creator>Panda, Sandeep Kumar</creator><creator>Bhalerao, Vivek</creator><creator>AR, Sathya</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20200228</creationdate><title>A Machine Learning Model to Identify Duplicate Questions in Social Media Forums</title><author>Panda, Sandeep Kumar ; Bhalerao, Vivek ; AR, Sathya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c910-299d894d052019cad529d78543b4b7f443b5a0cfca44480ed5d67bb9a4add543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Panda, Sandeep Kumar</creatorcontrib><creatorcontrib>Bhalerao, Vivek</creatorcontrib><creatorcontrib>AR, Sathya</creatorcontrib><creatorcontrib>Department of Computer Science and Engineering, Faculty of Science and Technology, IcfaiTech, ICFAI Foundation for Higher Education, Hyderabad, India</creatorcontrib><collection>CrossRef</collection><jtitle>International journal of innovative technology and exploring engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Panda, Sandeep Kumar</au><au>Bhalerao, Vivek</au><au>AR, Sathya</au><aucorp>Department of Computer Science and Engineering, Faculty of Science and Technology, IcfaiTech, ICFAI Foundation for Higher Education, Hyderabad, India</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Machine Learning Model to Identify Duplicate Questions in Social Media Forums</atitle><jtitle>International journal of innovative technology and exploring engineering</jtitle><date>2020-02-28</date><risdate>2020</risdate><volume>9</volume><issue>4</issue><spage>370</spage><epage>373</epage><pages>370-373</pages><issn>2278-3075</issn><eissn>2278-3075</eissn><abstract>In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provided by Quora, requires many modifications before training machine learning models to obtain a good accuracy. These modifications include feature extraction, vectorization and tokenization after which the data is ready for training desired models. While analyzing each model after prediction, it gives plenty of information about its efficiency and many other factors. Later, these information of different models are compared and helps to choose the best model. These models later can be combined and used as a single model with best accuracy. In this paper, a Machine Learning model which will predict duplicate questions is proposed.</abstract><doi>10.35940/ijitee.D1362.029420</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2278-3075
ispartof International journal of innovative technology and exploring engineering, 2020-02, Vol.9 (4), p.370-373
issn 2278-3075
2278-3075
language eng
recordid cdi_crossref_primary_10_35940_ijitee_D1362_029420
source EZB-FREE-00999 freely available EZB journals
title A Machine Learning Model to Identify Duplicate Questions in Social Media Forums
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T11%3A50%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Machine%20Learning%20Model%20to%20Identify%20Duplicate%20Questions%20in%20Social%20Media%20Forums&rft.jtitle=International%20journal%20of%20innovative%20technology%20and%20exploring%20engineering&rft.au=Panda,%20Sandeep%20Kumar&rft.aucorp=Department%20of%20Computer%20Science%20and%20Engineering,%20Faculty%20of%20Science%20and%20Technology,%20IcfaiTech,%20ICFAI%20Foundation%20for%20Higher%20Education,%20Hyderabad,%20India&rft.date=2020-02-28&rft.volume=9&rft.issue=4&rft.spage=370&rft.epage=373&rft.pages=370-373&rft.issn=2278-3075&rft.eissn=2278-3075&rft_id=info:doi/10.35940/ijitee.D1362.029420&rft_dat=%3Ccrossref%3E10_35940_ijitee_D1362_029420%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true