An Automatic Text Document Classification using Modified Weight and Semantic Method
Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequ...
Gespeichert in:
Veröffentlicht in: | International journal of innovative technology and exploring engineering 2019-10, Vol.8 (12), p.2608-2622 |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2622 |
---|---|
container_issue | 12 |
container_start_page | 2608 |
container_title | International journal of innovative technology and exploring engineering |
container_volume | 8 |
description | Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods. |
doi_str_mv | 10.35940/ijitee.K2123.1081219 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_35940_ijitee_K2123_1081219</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_35940_ijitee_K2123_1081219</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2159-e7fa75a3e8aba57e833bc35f54d356f1da80894fc76bb51b7fb006cab4f2dbfb3</originalsourceid><addsrcrecordid>eNpNkMlOwzAYhC0EElXpIyD5BVK8xLFzjMJW0YpDizhGXn63rpoExY4Eb09pe-A0oxnNHD6E7imZc1Hm5CHsQwKYvzHK-JwSRRktr9CEMakyTqS4_udv0SzGPSGE8pyqopygddXhakx9q1OweAPfCT_2dmyhS7g-6BiDD_bY9R0eY-i2eNW7YwQOf0LY7hLWncNraHX3t19B2vXuDt14fYgwu-gUfTw_berXbPn-sqirZWYZFWUG0mspNAeljRYSFOfGcuFF7rgoPHVaEVXm3srCGEGN9IaQwmqTe-aMN3yKxPnXDn2MA_jmawitHn4aSpoTnOYMpznBaS5w-C8yWlsJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An Automatic Text Document Classification using Modified Weight and Semantic Method</title><source>EZB-FREE-00999 freely available EZB journals</source><description>Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.</description><identifier>ISSN: 2278-3075</identifier><identifier>EISSN: 2278-3075</identifier><identifier>DOI: 10.35940/ijitee.K2123.1081219</identifier><language>eng</language><ispartof>International journal of innovative technology and exploring engineering, 2019-10, Vol.8 (12), p.2608-2622</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2159-e7fa75a3e8aba57e833bc35f54d356f1da80894fc76bb51b7fb006cab4f2dbfb3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><title>An Automatic Text Document Classification using Modified Weight and Semantic Method</title><title>International journal of innovative technology and exploring engineering</title><description>Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.</description><issn>2278-3075</issn><issn>2278-3075</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpNkMlOwzAYhC0EElXpIyD5BVK8xLFzjMJW0YpDizhGXn63rpoExY4Eb09pe-A0oxnNHD6E7imZc1Hm5CHsQwKYvzHK-JwSRRktr9CEMakyTqS4_udv0SzGPSGE8pyqopygddXhakx9q1OweAPfCT_2dmyhS7g-6BiDD_bY9R0eY-i2eNW7YwQOf0LY7hLWncNraHX3t19B2vXuDt14fYgwu-gUfTw_berXbPn-sqirZWYZFWUG0mspNAeljRYSFOfGcuFF7rgoPHVaEVXm3srCGEGN9IaQwmqTe-aMN3yKxPnXDn2MA_jmawitHn4aSpoTnOYMpznBaS5w-C8yWlsJ</recordid><startdate>20191010</startdate><enddate>20191010</enddate><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20191010</creationdate><title>An Automatic Text Document Classification using Modified Weight and Semantic Method</title></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2159-e7fa75a3e8aba57e833bc35f54d356f1da80894fc76bb51b7fb006cab4f2dbfb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><toplevel>online_resources</toplevel><collection>CrossRef</collection><jtitle>International journal of innovative technology and exploring engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Automatic Text Document Classification using Modified Weight and Semantic Method</atitle><jtitle>International journal of innovative technology and exploring engineering</jtitle><date>2019-10-10</date><risdate>2019</risdate><volume>8</volume><issue>12</issue><spage>2608</spage><epage>2622</epage><pages>2608-2622</pages><issn>2278-3075</issn><eissn>2278-3075</eissn><abstract>Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.</abstract><doi>10.35940/ijitee.K2123.1081219</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2278-3075 |
ispartof | International journal of innovative technology and exploring engineering, 2019-10, Vol.8 (12), p.2608-2622 |
issn | 2278-3075 2278-3075 |
language | eng |
recordid | cdi_crossref_primary_10_35940_ijitee_K2123_1081219 |
source | EZB-FREE-00999 freely available EZB journals |
title | An Automatic Text Document Classification using Modified Weight and Semantic Method |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T17%3A25%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Automatic%20Text%20Document%20Classification%20using%20Modified%20Weight%20and%20Semantic%20Method&rft.jtitle=International%20journal%20of%20innovative%20technology%20and%20exploring%20engineering&rft.date=2019-10-10&rft.volume=8&rft.issue=12&rft.spage=2608&rft.epage=2622&rft.pages=2608-2622&rft.issn=2278-3075&rft.eissn=2278-3075&rft_id=info:doi/10.35940/ijitee.K2123.1081219&rft_dat=%3Ccrossref%3E10_35940_ijitee_K2123_1081219%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |