A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayes...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of advanced computer science & applications 2020, Vol.11 (7)
Hauptverfasser:	Devi, S Anjali, Siva, S
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Clustering Documents Feature extraction Neural networks Similarity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	7
container_start_page
container_title	International journal of advanced computer science & applications
container_volume	11
creator	Devi, S Anjali Siva, S
description	As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayesian models have high true negative rate and error rate for document classification process. In order to improve the computational efficacy of the traditional document classification models, a hybrid feature extraction-based document cluster approach and classification approaches are developed on the large document sets. In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus. In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords. Finally, a hybrid document classification model is used to classify the clustered documents on large corpus. Experimental results are conducted on different datasets, it is noted that the proposed document clustering-based classification model has high true positive rate, accuracy and low error rate than the conventional models.
doi_str_mv	10.14569/IJACSA.2020.0110748
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2655153962</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2655153962</sourcerecordid><originalsourceid>FETCH-LOGICAL-c325t-48a502d289cdb62ca21c46196cf406d4bbac8f8d3b028f48315355c48a1f8c553</originalsourceid><addsrcrecordid>eNpFkMtKAzEUhoMoWLRv4CLgemruzSyHsbWVgosquAuZTKaml5maZKh9e2Nb8GzO7T__gQ-AB4xGmHGRP81fi3JZjAgiaIQwRmMmr8CAYC4yzsfo-lTLLC0-b8EwhDVKQXMiJB2ATQFnx8q7Gj53pt_ZNsKp1bH3NsDJT_TaRNe18ODiFyy3fYjWu3YFKx1snQY6BNc4o0-iqdc7e-j8BqZmof3K_psubQz34KbR22CHl3wHPqaT93KWLd5e5mWxyAwlPGZMao5ITWRu6koQowk2TOBcmIYhUbOq0kY2sqYVIrJhkmJOOTfpDDfScE7vwOPZd--7796GqNZd79v0UhHBeZLngiQVO6uM70LwtlF773baHxVG6kRWncmqP7LqQpb-AgKbbJI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2655153962</pqid></control><display><type>article</type><title>A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Devi, S Anjali ; Siva, S</creator><creatorcontrib>Devi, S Anjali ; Siva, S</creatorcontrib><description>As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayesian models have high true negative rate and error rate for document classification process. In order to improve the computational efficacy of the traditional document classification models, a hybrid feature extraction-based document cluster approach and classification approaches are developed on the large document sets. In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus. In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords. Finally, a hybrid document classification model is used to classify the clustered documents on large corpus. Experimental results are conducted on different datasets, it is noted that the proposed document clustering-based classification model has high true positive rate, accuracy and low error rate than the conventional models.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2020.0110748</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Classification ; Clustering ; Documents ; Feature extraction ; Neural networks ; Similarity</subject><ispartof>International journal of advanced computer science & applications, 2020, Vol.11 (7)</ispartof><rights>2020. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c325t-48a502d289cdb62ca21c46196cf406d4bbac8f8d3b028f48315355c48a1f8c553</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,4024,27923,27924,27925</link.rule.ids></links><search><creatorcontrib>Devi, S Anjali</creatorcontrib><creatorcontrib>Siva, S</creatorcontrib><title>A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets</title><title>International journal of advanced computer science & applications</title><description>As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayesian models have high true negative rate and error rate for document classification process. In order to improve the computational efficacy of the traditional document classification models, a hybrid feature extraction-based document cluster approach and classification approaches are developed on the large document sets. In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus. In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords. Finally, a hybrid document classification model is used to classify the clustered documents on large corpus. Experimental results are conducted on different datasets, it is noted that the proposed document clustering-based classification model has high true positive rate, accuracy and low error rate than the conventional models.</description><subject>Classification</subject><subject>Clustering</subject><subject>Documents</subject><subject>Feature extraction</subject><subject>Neural networks</subject><subject>Similarity</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNpFkMtKAzEUhoMoWLRv4CLgemruzSyHsbWVgosquAuZTKaml5maZKh9e2Nb8GzO7T__gQ-AB4xGmHGRP81fi3JZjAgiaIQwRmMmr8CAYC4yzsfo-lTLLC0-b8EwhDVKQXMiJB2ATQFnx8q7Gj53pt_ZNsKp1bH3NsDJT_TaRNe18ODiFyy3fYjWu3YFKx1snQY6BNc4o0-iqdc7e-j8BqZmof3K_psubQz34KbR22CHl3wHPqaT93KWLd5e5mWxyAwlPGZMao5ITWRu6koQowk2TOBcmIYhUbOq0kY2sqYVIrJhkmJOOTfpDDfScE7vwOPZd--7796GqNZd79v0UhHBeZLngiQVO6uM70LwtlF773baHxVG6kRWncmqP7LqQpb-AgKbbJI</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Devi, S Anjali</creator><creator>Siva, S</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2020</creationdate><title>A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets</title><author>Devi, S Anjali ; Siva, S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c325t-48a502d289cdb62ca21c46196cf406d4bbac8f8d3b028f48315355c48a1f8c553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Classification</topic><topic>Clustering</topic><topic>Documents</topic><topic>Feature extraction</topic><topic>Neural networks</topic><topic>Similarity</topic><toplevel>online_resources</toplevel><creatorcontrib>Devi, S Anjali</creatorcontrib><creatorcontrib>Siva, S</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Devi, S Anjali</au><au>Siva, S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets</atitle><jtitle>International journal of advanced computer science & applications</jtitle><date>2020</date><risdate>2020</risdate><volume>11</volume><issue>7</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>As the size of the document collections are increasing day-by-day, finding an essential document clusters for classification problem is one of the major problem due to high inter and intra document variations. Also, most of the conventional classification models such as SVM, neural network and Bayesian models have high true negative rate and error rate for document classification process. In order to improve the computational efficacy of the traditional document classification models, a hybrid feature extraction-based document cluster approach and classification approaches are developed on the large document sets. In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus. In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords. Finally, a hybrid document classification model is used to classify the clustered documents on large corpus. Experimental results are conducted on different datasets, it is noted that the proposed document clustering-based classification model has high true positive rate, accuracy and low error rate than the conventional models.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2020.0110748</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2158-107X
ispartof	International journal of advanced computer science & applications, 2020, Vol.11 (7)
issn	2158-107X 2156-5570
language	eng
recordid	cdi_proquest_journals_2655153962
source	EZB-FREE-00999 freely available EZB journals
subjects	Classification Clustering Documents Feature extraction Neural networks Similarity
title	A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T23%3A28%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Hybrid%20Document%20Features%20Extraction%20with%20Clustering%20based%20Classification%20Framework%20on%20Large%20Document%20Sets&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=Devi,%20S%20Anjali&rft.date=2020&rft.volume=11&rft.issue=7&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2020.0110748&rft_dat=%3Cproquest_cross%3E2655153962%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2655153962&rft_id=info:pmid/&rfr_iscdi=true