A case-comparison study of automatic document classification utilizing both serial and parallel approaches

A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing bot...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of physics. Conference series 2014-01, Vol.540 (1), p.12001-10
Hauptverfasser:	Wilges, B, Bastos, R C, Mateus, G P, Dantas, M A R
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic classification Automation Classification Computer programs Construction Data mining Distributed processing Physics Serials Software Software development tools Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	10
container_issue	1
container_start_page	12001
container_title	Journal of physics. Conference series
container_volume	540
creator	Wilges, B Bastos, R C Mateus, G P Dantas, M A R
description	A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.
doi_str_mv	10.1088/1742-6596/540/1/012001
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718969707</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2576723983</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</originalsourceid><addsrcrecordid>eNpdkT1PwzAQhi0EEqXwF5AlFpYQO25sZ6wqvqRKLDBbtnOhrpw4xM5Qfj2uihi45b4e3Z3uReiWkgdKpCypWFUFrxte1itS0pLQihB6hhZ_jfO_WMpLdBXjnhCWTSzQfo2tjlDY0I96cjEMOKa5PeDQYT2n0OvkLG6DnXsYErZex-g6Z3M5o3Ny3n274RObkHY4wuS0x3pocR6mvYecjOMUtN1BvEYXnfYRbn79En08Pb5vXort2_PrZr0tLKM0FZJXWoKV2gC0vAMjV6YlomON4HVtwRphBLHUdFBbI2VjDDOV4ZILyVpK2RLdn-bmxV8zxKR6Fy14rwcIc1RUUNnwRhCR0bt_6D7M05CvU1UtuKhYI1mm-ImyU4hxgk6Nk-v1dFCUqKME6vhddfy0yhIoqk4SsB8_sXwH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2576723983</pqid></control><display><type>article</type><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><source>IOP Publishing Free Content</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>IOPscience extra</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</creator><creatorcontrib>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</creatorcontrib><description>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</description><identifier>ISSN: 1742-6588</identifier><identifier>EISSN: 1742-6596</identifier><identifier>DOI: 10.1088/1742-6596/540/1/012001</identifier><language>eng</language><publisher>Bristol: IOP Publishing</publisher><subject>Automatic classification ; Automation ; Classification ; Computer programs ; Construction ; Data mining ; Distributed processing ; Physics ; Serials ; Software ; Software development tools ; Source code</subject><ispartof>Journal of physics. Conference series, 2014-01, Vol.540 (1), p.12001-10</ispartof><rights>2014. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Wilges, B</creatorcontrib><creatorcontrib>Bastos, R C</creatorcontrib><creatorcontrib>Mateus, G P</creatorcontrib><creatorcontrib>Dantas, M A R</creatorcontrib><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><title>Journal of physics. Conference series</title><description>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</description><subject>Automatic classification</subject><subject>Automation</subject><subject>Classification</subject><subject>Computer programs</subject><subject>Construction</subject><subject>Data mining</subject><subject>Distributed processing</subject><subject>Physics</subject><subject>Serials</subject><subject>Software</subject><subject>Software development tools</subject><subject>Source code</subject><issn>1742-6588</issn><issn>1742-6596</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpdkT1PwzAQhi0EEqXwF5AlFpYQO25sZ6wqvqRKLDBbtnOhrpw4xM5Qfj2uihi45b4e3Z3uReiWkgdKpCypWFUFrxte1itS0pLQihB6hhZ_jfO_WMpLdBXjnhCWTSzQfo2tjlDY0I96cjEMOKa5PeDQYT2n0OvkLG6DnXsYErZex-g6Z3M5o3Ny3n274RObkHY4wuS0x3pocR6mvYecjOMUtN1BvEYXnfYRbn79En08Pb5vXort2_PrZr0tLKM0FZJXWoKV2gC0vAMjV6YlomON4HVtwRphBLHUdFBbI2VjDDOV4ZILyVpK2RLdn-bmxV8zxKR6Fy14rwcIc1RUUNnwRhCR0bt_6D7M05CvU1UtuKhYI1mm-ImyU4hxgk6Nk-v1dFCUqKME6vhddfy0yhIoqk4SsB8_sXwH</recordid><startdate>20140101</startdate><enddate>20140101</enddate><creator>Wilges, B</creator><creator>Bastos, R C</creator><creator>Mateus, G P</creator><creator>Dantas, M A R</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7U5</scope><scope>8BQ</scope><scope>JG9</scope></search><sort><creationdate>20140101</creationdate><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><author>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Automatic classification</topic><topic>Automation</topic><topic>Classification</topic><topic>Computer programs</topic><topic>Construction</topic><topic>Data mining</topic><topic>Distributed processing</topic><topic>Physics</topic><topic>Serials</topic><topic>Software</topic><topic>Software development tools</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wilges, B</creatorcontrib><creatorcontrib>Bastos, R C</creatorcontrib><creatorcontrib>Mateus, G P</creatorcontrib><creatorcontrib>Dantas, M A R</creatorcontrib><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Materials Research Database</collection><jtitle>Journal of physics. Conference series</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wilges, B</au><au>Bastos, R C</au><au>Mateus, G P</au><au>Dantas, M A R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</atitle><jtitle>Journal of physics. Conference series</jtitle><date>2014-01-01</date><risdate>2014</risdate><volume>540</volume><issue>1</issue><spage>12001</spage><epage>10</epage><pages>12001-10</pages><issn>1742-6588</issn><eissn>1742-6596</eissn><abstract>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</abstract><cop>Bristol</cop><pub>IOP Publishing</pub><doi>10.1088/1742-6596/540/1/012001</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1742-6588
ispartof	Journal of physics. Conference series, 2014-01, Vol.540 (1), p.12001-10
issn	1742-6588 1742-6596
language	eng
recordid	cdi_proquest_miscellaneous_1718969707
source	IOP Publishing Free Content; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; IOPscience extra; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry
subjects	Automatic classification Automation Classification Computer programs Construction Data mining Distributed processing Physics Serials Software Software development tools Source code
title	A case-comparison study of automatic document classification utilizing both serial and parallel approaches
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A53%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20case-comparison%20study%20of%20automatic%20document%20classification%20utilizing%20both%20serial%20and%20parallel%20approaches&rft.jtitle=Journal%20of%20physics.%20Conference%20series&rft.au=Wilges,%20B&rft.date=2014-01-01&rft.volume=540&rft.issue=1&rft.spage=12001&rft.epage=10&rft.pages=12001-10&rft.issn=1742-6588&rft.eissn=1742-6596&rft_id=info:doi/10.1088/1742-6596/540/1/012001&rft_dat=%3Cproquest_cross%3E2576723983%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2576723983&rft_id=info:pmid/&rfr_iscdi=true