An Automatic Text Document Classification using Modified Weight and Semantic Method

Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of innovative technology and exploring engineering 2019-10, Vol.8 (12), p.2608-2622
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2622
container_issue 12
container_start_page 2608
container_title International journal of innovative technology and exploring engineering
container_volume 8
description Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.
doi_str_mv 10.35940/ijitee.K2123.1081219
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_35940_ijitee_K2123_1081219</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_35940_ijitee_K2123_1081219</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2159-e7fa75a3e8aba57e833bc35f54d356f1da80894fc76bb51b7fb006cab4f2dbfb3</originalsourceid><addsrcrecordid>eNpNkMlOwzAYhC0EElXpIyD5BVK8xLFzjMJW0YpDizhGXn63rpoExY4Eb09pe-A0oxnNHD6E7imZc1Hm5CHsQwKYvzHK-JwSRRktr9CEMakyTqS4_udv0SzGPSGE8pyqopygddXhakx9q1OweAPfCT_2dmyhS7g-6BiDD_bY9R0eY-i2eNW7YwQOf0LY7hLWncNraHX3t19B2vXuDt14fYgwu-gUfTw_berXbPn-sqirZWYZFWUG0mspNAeljRYSFOfGcuFF7rgoPHVaEVXm3srCGEGN9IaQwmqTe-aMN3yKxPnXDn2MA_jmawitHn4aSpoTnOYMpznBaS5w-C8yWlsJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An Automatic Text Document Classification using Modified Weight and Semantic Method</title><source>EZB-FREE-00999 freely available EZB journals</source><description>Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.</description><identifier>ISSN: 2278-3075</identifier><identifier>EISSN: 2278-3075</identifier><identifier>DOI: 10.35940/ijitee.K2123.1081219</identifier><language>eng</language><ispartof>International journal of innovative technology and exploring engineering, 2019-10, Vol.8 (12), p.2608-2622</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2159-e7fa75a3e8aba57e833bc35f54d356f1da80894fc76bb51b7fb006cab4f2dbfb3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><title>An Automatic Text Document Classification using Modified Weight and Semantic Method</title><title>International journal of innovative technology and exploring engineering</title><description>Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.</description><issn>2278-3075</issn><issn>2278-3075</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpNkMlOwzAYhC0EElXpIyD5BVK8xLFzjMJW0YpDizhGXn63rpoExY4Eb09pe-A0oxnNHD6E7imZc1Hm5CHsQwKYvzHK-JwSRRktr9CEMakyTqS4_udv0SzGPSGE8pyqopygddXhakx9q1OweAPfCT_2dmyhS7g-6BiDD_bY9R0eY-i2eNW7YwQOf0LY7hLWncNraHX3t19B2vXuDt14fYgwu-gUfTw_berXbPn-sqirZWYZFWUG0mspNAeljRYSFOfGcuFF7rgoPHVaEVXm3srCGEGN9IaQwmqTe-aMN3yKxPnXDn2MA_jmawitHn4aSpoTnOYMpznBaS5w-C8yWlsJ</recordid><startdate>20191010</startdate><enddate>20191010</enddate><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20191010</creationdate><title>An Automatic Text Document Classification using Modified Weight and Semantic Method</title></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2159-e7fa75a3e8aba57e833bc35f54d356f1da80894fc76bb51b7fb006cab4f2dbfb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><toplevel>online_resources</toplevel><collection>CrossRef</collection><jtitle>International journal of innovative technology and exploring engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Automatic Text Document Classification using Modified Weight and Semantic Method</atitle><jtitle>International journal of innovative technology and exploring engineering</jtitle><date>2019-10-10</date><risdate>2019</risdate><volume>8</volume><issue>12</issue><spage>2608</spage><epage>2622</epage><pages>2608-2622</pages><issn>2278-3075</issn><eissn>2278-3075</eissn><abstract>Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.</abstract><doi>10.35940/ijitee.K2123.1081219</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2278-3075
ispartof International journal of innovative technology and exploring engineering, 2019-10, Vol.8 (12), p.2608-2622
issn 2278-3075
2278-3075
language eng
recordid cdi_crossref_primary_10_35940_ijitee_K2123_1081219
source EZB-FREE-00999 freely available EZB journals
title An Automatic Text Document Classification using Modified Weight and Semantic Method
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T17%3A25%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Automatic%20Text%20Document%20Classification%20using%20Modified%20Weight%20and%20Semantic%20Method&rft.jtitle=International%20journal%20of%20innovative%20technology%20and%20exploring%20engineering&rft.date=2019-10-10&rft.volume=8&rft.issue=12&rft.spage=2608&rft.epage=2622&rft.pages=2608-2622&rft.issn=2278-3075&rft.eissn=2278-3075&rft_id=info:doi/10.35940/ijitee.K2123.1081219&rft_dat=%3Ccrossref%3E10_35940_ijitee_K2123_1081219%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true