A case-comparison study of automatic document classification utilizing both serial and parallel approaches

A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing bot...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2014-01, Vol.540 (1), p.12001-10
Hauptverfasser: Wilges, B, Bastos, R C, Mateus, G P, Dantas, M A R
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 10
container_issue 1
container_start_page 12001
container_title Journal of physics. Conference series
container_volume 540
creator Wilges, B
Bastos, R C
Mateus, G P
Dantas, M A R
description A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.
doi_str_mv 10.1088/1742-6596/540/1/012001
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718969707</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2576723983</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</originalsourceid><addsrcrecordid>eNpdkT1PwzAQhi0EEqXwF5AlFpYQO25sZ6wqvqRKLDBbtnOhrpw4xM5Qfj2uihi45b4e3Z3uReiWkgdKpCypWFUFrxte1itS0pLQihB6hhZ_jfO_WMpLdBXjnhCWTSzQfo2tjlDY0I96cjEMOKa5PeDQYT2n0OvkLG6DnXsYErZex-g6Z3M5o3Ny3n274RObkHY4wuS0x3pocR6mvYecjOMUtN1BvEYXnfYRbn79En08Pb5vXort2_PrZr0tLKM0FZJXWoKV2gC0vAMjV6YlomON4HVtwRphBLHUdFBbI2VjDDOV4ZILyVpK2RLdn-bmxV8zxKR6Fy14rwcIc1RUUNnwRhCR0bt_6D7M05CvU1UtuKhYI1mm-ImyU4hxgk6Nk-v1dFCUqKME6vhddfy0yhIoqk4SsB8_sXwH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2576723983</pqid></control><display><type>article</type><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><source>IOP Publishing Free Content</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>IOPscience extra</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</creator><creatorcontrib>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</creatorcontrib><description>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</description><identifier>ISSN: 1742-6588</identifier><identifier>EISSN: 1742-6596</identifier><identifier>DOI: 10.1088/1742-6596/540/1/012001</identifier><language>eng</language><publisher>Bristol: IOP Publishing</publisher><subject>Automatic classification ; Automation ; Classification ; Computer programs ; Construction ; Data mining ; Distributed processing ; Physics ; Serials ; Software ; Software development tools ; Source code</subject><ispartof>Journal of physics. Conference series, 2014-01, Vol.540 (1), p.12001-10</ispartof><rights>2014. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Wilges, B</creatorcontrib><creatorcontrib>Bastos, R C</creatorcontrib><creatorcontrib>Mateus, G P</creatorcontrib><creatorcontrib>Dantas, M A R</creatorcontrib><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><title>Journal of physics. Conference series</title><description>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</description><subject>Automatic classification</subject><subject>Automation</subject><subject>Classification</subject><subject>Computer programs</subject><subject>Construction</subject><subject>Data mining</subject><subject>Distributed processing</subject><subject>Physics</subject><subject>Serials</subject><subject>Software</subject><subject>Software development tools</subject><subject>Source code</subject><issn>1742-6588</issn><issn>1742-6596</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpdkT1PwzAQhi0EEqXwF5AlFpYQO25sZ6wqvqRKLDBbtnOhrpw4xM5Qfj2uihi45b4e3Z3uReiWkgdKpCypWFUFrxte1itS0pLQihB6hhZ_jfO_WMpLdBXjnhCWTSzQfo2tjlDY0I96cjEMOKa5PeDQYT2n0OvkLG6DnXsYErZex-g6Z3M5o3Ny3n274RObkHY4wuS0x3pocR6mvYecjOMUtN1BvEYXnfYRbn79En08Pb5vXort2_PrZr0tLKM0FZJXWoKV2gC0vAMjV6YlomON4HVtwRphBLHUdFBbI2VjDDOV4ZILyVpK2RLdn-bmxV8zxKR6Fy14rwcIc1RUUNnwRhCR0bt_6D7M05CvU1UtuKhYI1mm-ImyU4hxgk6Nk-v1dFCUqKME6vhddfy0yhIoqk4SsB8_sXwH</recordid><startdate>20140101</startdate><enddate>20140101</enddate><creator>Wilges, B</creator><creator>Bastos, R C</creator><creator>Mateus, G P</creator><creator>Dantas, M A R</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>H8D</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7U5</scope><scope>8BQ</scope><scope>JG9</scope></search><sort><creationdate>20140101</creationdate><title>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</title><author>Wilges, B ; Bastos, R C ; Mateus, G P ; Dantas, M A R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-862a8ec8abeed6feb84bd07f397655cecb7b70c1bfe5cb889bb3b2b686783d113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Automatic classification</topic><topic>Automation</topic><topic>Classification</topic><topic>Computer programs</topic><topic>Construction</topic><topic>Data mining</topic><topic>Distributed processing</topic><topic>Physics</topic><topic>Serials</topic><topic>Software</topic><topic>Software development tools</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wilges, B</creatorcontrib><creatorcontrib>Bastos, R C</creatorcontrib><creatorcontrib>Mateus, G P</creatorcontrib><creatorcontrib>Dantas, M A R</creatorcontrib><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Aerospace Database</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Materials Research Database</collection><jtitle>Journal of physics. Conference series</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wilges, B</au><au>Bastos, R C</au><au>Mateus, G P</au><au>Dantas, M A R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A case-comparison study of automatic document classification utilizing both serial and parallel approaches</atitle><jtitle>Journal of physics. Conference series</jtitle><date>2014-01-01</date><risdate>2014</risdate><volume>540</volume><issue>1</issue><spage>12001</spage><epage>10</epage><pages>12001-10</pages><issn>1742-6588</issn><eissn>1742-6596</eissn><abstract>A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.</abstract><cop>Bristol</cop><pub>IOP Publishing</pub><doi>10.1088/1742-6596/540/1/012001</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1742-6588
ispartof Journal of physics. Conference series, 2014-01, Vol.540 (1), p.12001-10
issn 1742-6588
1742-6596
language eng
recordid cdi_proquest_miscellaneous_1718969707
source IOP Publishing Free Content; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; IOPscience extra; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry
subjects Automatic classification
Automation
Classification
Computer programs
Construction
Data mining
Distributed processing
Physics
Serials
Software
Software development tools
Source code
title A case-comparison study of automatic document classification utilizing both serial and parallel approaches
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A53%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20case-comparison%20study%20of%20automatic%20document%20classification%20utilizing%20both%20serial%20and%20parallel%20approaches&rft.jtitle=Journal%20of%20physics.%20Conference%20series&rft.au=Wilges,%20B&rft.date=2014-01-01&rft.volume=540&rft.issue=1&rft.spage=12001&rft.epage=10&rft.pages=12001-10&rft.issn=1742-6588&rft.eissn=1742-6596&rft_id=info:doi/10.1088/1742-6596/540/1/012001&rft_dat=%3Cproquest_cross%3E2576723983%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2576723983&rft_id=info:pmid/&rfr_iscdi=true