A case-comparison study of automatic document classification utilizing both serial and parallel approaches
A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing bot...
Gespeichert in:
Veröffentlicht in: | Journal of physics. Conference series 2014-01, Vol.540 (1), p.12001-10 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 10 |
---|---|
container_issue | 1 |
container_start_page | 12001 |
container_title | Journal of physics. Conference series |
container_volume | 540 |
creator | Wilges, B Bastos, R C Mateus, G P Dantas, M A R |
description | A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency. |
doi_str_mv | 10.1088/1742-6596/540/1/012001 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718969707</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2576723983</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</originalsourceid><addsrcrecordid>eNpdkT1PwzAQhi0EEqXwF5AlFpYQO25sZ6wqvqRKLDBbtnOhrpw4xM5Qfj2uihi45b4e3Z3uReiWkgdKpCypWFUFrxte1itS0pLQihB6hhZ_jfO_WMpLdBXjnhCWTSzQfo2tjlDY0I96cjEMOKa5PeDQYT2n0OvkLG6DnXsYErZex-g6Z3M5o3Ny3n274RObkHY4wuS0x3pocR6mvYecjOMUtN1BvEYXnfYRbn79En08Pb5vXort2_PrZr0tLKM0FZJXWoKV2gC0vAMjV6YlomON4HVtwRphBLHUdFBbI2VjDDOV4ZILyVpK2RLdn-bmxV8zxKR6Fy14rwcIc1RUUNnwRhCR0bt_6D7M05CvU1UtuKhYI1mm-ImyU4hxgk6Nk-v1dFCUqKME6vhddfy0yhIoqk4SsB8_sXwH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2576723983</pqid></control><display><type>article</type><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><source>IOP Publishing Free Content</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>IOPscience extra</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</creator><creatorcontrib>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</creatorcontrib><description>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</description><identifier>ISSN: 1742-6588</identifier><identifier>EISSN: 1742-6596</identifier><identifier>DOI: 10.1088/1742-6596/540/1/012001</identifier><language>eng</language><publisher>Bristol: IOP Publishing</publisher><subject>Automatic classification ; Automation ; Classification ; Computer programs ; Construction ; Data mining ; Distributed processing ; Physics ; Serials ; Software ; Software development tools ; Source code</subject><ispartof>Journal of physics. Conference series, 2014-01, Vol.540 (1), p.12001-10</ispartof><rights>2014. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Wilges, B</creatorcontrib><creatorcontrib>Bastos, R C</creatorcontrib><creatorcontrib>Mateus, G P</creatorcontrib><creatorcontrib>Dantas, M A R</creatorcontrib><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><title>Journal of physics. Conference series</title><description>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</description><subject>Automatic classification</subject><subject>Automation</subject><subject>Classification</subject><subject>Computer programs</subject><subject>Construction</subject><subject>Data mining</subject><subject>Distributed processing</subject><subject>Physics</subject><subject>Serials</subject><subject>Software</subject><subject>Software development tools</subject><subject>Source code</subject><issn>1742-6588</issn><issn>1742-6596</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpdkT1PwzAQhi0EEqXwF5AlFpYQO25sZ6wqvqRKLDBbtnOhrpw4xM5Qfj2uihi45b4e3Z3uReiWkgdKpCypWFUFrxte1itS0pLQihB6hhZ_jfO_WMpLdBXjnhCWTSzQfo2tjlDY0I96cjEMOKa5PeDQYT2n0OvkLG6DnXsYErZex-g6Z3M5o3Ny3n274RObkHY4wuS0x3pocR6mvYecjOMUtN1BvEYXnfYRbn79En08Pb5vXort2_PrZr0tLKM0FZJXWoKV2gC0vAMjV6YlomON4HVtwRphBLHUdFBbI2VjDDOV4ZILyVpK2RLdn-bmxV8zxKR6Fy14rwcIc1RUUNnwRhCR0bt_6D7M05CvU1UtuKhYI1mm-ImyU4hxgk6Nk-v1dFCUqKME6vhddfy0yhIoqk4SsB8_sXwH</recordid><startdate>20140101</startdate><enddate>20140101</enddate><creator>Wilges, B</creator><creator>Bastos, R C</creator><creator>Mateus, G P</creator><creator>Dantas, M A R</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7U5</scope><scope>8BQ</scope><scope>JG9</scope></search><sort><creationdate>20140101</creationdate><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><author>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Automatic classification</topic><topic>Automation</topic><topic>Classification</topic><topic>Computer programs</topic><topic>Construction</topic><topic>Data mining</topic><topic>Distributed processing</topic><topic>Physics</topic><topic>Serials</topic><topic>Software</topic><topic>Software development tools</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wilges, B</creatorcontrib><creatorcontrib>Bastos, R C</creatorcontrib><creatorcontrib>Mateus, G P</creatorcontrib><creatorcontrib>Dantas, M A R</creatorcontrib><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Materials Research Database</collection><jtitle>Journal of physics. Conference series</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wilges, B</au><au>Bastos, R C</au><au>Mateus, G P</au><au>Dantas, M A R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</atitle><jtitle>Journal of physics. Conference series</jtitle><date>2014-01-01</date><risdate>2014</risdate><volume>540</volume><issue>1</issue><spage>12001</spage><epage>10</epage><pages>12001-10</pages><issn>1742-6588</issn><eissn>1742-6596</eissn><abstract>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</abstract><cop>Bristol</cop><pub>IOP Publishing</pub><doi>10.1088/1742-6596/540/1/012001</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1742-6588 |
ispartof | Journal of physics. Conference series, 2014-01, Vol.540 (1), p.12001-10 |
issn | 1742-6588 1742-6596 |
language | eng |
recordid | cdi_proquest_miscellaneous_1718969707 |
source | IOP Publishing Free Content; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; IOPscience extra; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry |
subjects | Automatic classification Automation Classification Computer programs Construction Data mining Distributed processing Physics Serials Software Software development tools Source code |
title | A case-comparison study of automatic document classification utilizing both serial and parallel approaches |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A53%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20case-comparison%20study%20of%20automatic%20document%20classification%20utilizing%20both%20serial%20and%20parallel%20approaches&rft.jtitle=Journal%20of%20physics.%20Conference%20series&rft.au=Wilges,%20B&rft.date=2014-01-01&rft.volume=540&rft.issue=1&rft.spage=12001&rft.epage=10&rft.pages=12001-10&rft.issn=1742-6588&rft.eissn=1742-6596&rft_id=info:doi/10.1088/1742-6596/540/1/012001&rft_dat=%3Cproquest_cross%3E2576723983%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2576723983&rft_id=info:pmid/&rfr_iscdi=true |