Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL

Very large scientific datasets are becoming increasingly available in XML formats. Our earlier benchmarking results show that parsing XML is a time consuming process when compared with binary formats optimized for largescale documents. This performance bottleneck will get exacerbated as size of XML...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Head, M.R., Govindaraju, M.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithm design and analysis automata Delay Humans Large-scale systems Microprocessors Middleware Multicore processing parallel Parallel processing parsing Web services XML
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	268
container_issue
container_start_page	261
container_title
container_volume
creator	Head, M.R. Govindaraju, M.
description	Very large scientific datasets are becoming increasingly available in XML formats. Our earlier benchmarking results show that parsing XML is a time consuming process when compared with binary formats optimized for largescale documents. This performance bottleneck will get exacerbated as size of XML data increases in e-science applications. Our focus in this paper is on addressing this performance bottleneck. In recent times, the microprocessor industry has made rapid strides towards chip multi processors (CMPs). The widely available XML parsers have been unable to take advantage of the opportunities presented by CMPs, instead, passing the task of parallelization to the application programmer. The paradigms used thus far to process large size XML documents on uniprocessors are not applicable for CMPs. We present the design, implementation, and performance analysis of PiXiMaL, a parallel processing library for large-scale XML-data files. In particular, we discuss an effective scheme to parallelize the tokenization process to achieve an overall performance increase when parsing large-scale XML documents that are increasingly in use today. Our approach is to build a DFA-based parser that recognizes a useful subset of the XML specification and converts the DFA into an NFA which can be applied on any subset of the input.
doi_str_mv	10.1109/eScience.2008.77
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4736766</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4736766</ieee_id><sourcerecordid>4736766</sourcerecordid><originalsourceid>FETCH-LOGICAL-i1327-10777cfec1d2dbd49b9caa0ef929067087a5e3707c96082ebb8888afc0d160983</originalsourceid><addsrcrecordid>eNotjF1LwzAUhiMiKLp7wZv8gc6Tps1JLuf8hA4HU9jdSNPTLZq1I-kQ_71FfZ6Ll-fmZexawFQIMLe0cp46R9McQE8RT9jEoAZUppSjePrbosiLQkoN8pxNUvqAkaKUqoQL9rm00YZAgS9j7ygl32153_LKxi1lK2cD8fWiyu5soobPDofgnR183_H73h331A2Jj7E4hsFnro_EZ9Ht_EBuOEZK_MsPO770a7-w1RU7a21INPnfS_b--PA2f86q16eX-azKvJA5ZgIQ0bXkRJM3dVOY2jhrgVqTG1AIGm1JEgGdUaBzqms9YlsHjVBgtLxkN3-_nog2h-j3Nn5vCpQKlZI_Cmta0Q</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Head, M.R. ; Govindaraju, M.</creator><creatorcontrib>Head, M.R. ; Govindaraju, M.</creatorcontrib><description>Very large scientific datasets are becoming increasingly available in XML formats. Our earlier benchmarking results show that parsing XML is a time consuming process when compared with binary formats optimized for largescale documents. This performance bottleneck will get exacerbated as size of XML data increases in e-science applications. Our focus in this paper is on addressing this performance bottleneck. In recent times, the microprocessor industry has made rapid strides towards chip multi processors (CMPs). The widely available XML parsers have been unable to take advantage of the opportunities presented by CMPs, instead, passing the task of parallelization to the application programmer. The paradigms used thus far to process large size XML documents on uniprocessors are not applicable for CMPs. We present the design, implementation, and performance analysis of PiXiMaL, a parallel processing library for large-scale XML-data files. In particular, we discuss an effective scheme to parallelize the tokenization process to achieve an overall performance increase when parsing large-scale XML documents that are increasingly in use today. Our approach is to build a DFA-based parser that recognizes a useful subset of the XML specification and converts the DFA into an NFA which can be applied on any subset of the input.</description><identifier>ISBN: 9781424433803</identifier><identifier>ISBN: 1424433800</identifier><identifier>EISBN: 9780769535357</identifier><identifier>EISBN: 0769535356</identifier><identifier>DOI: 10.1109/eScience.2008.77</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; automata ; Delay ; Humans ; Large-scale systems ; Microprocessors ; Middleware ; Multicore processing ; parallel ; Parallel processing ; parsing ; Web services ; XML</subject><ispartof>2008 IEEE Fourth International Conference on eScience, 2008, p.261-268</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4736766$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4736766$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Head, M.R.</creatorcontrib><creatorcontrib>Govindaraju, M.</creatorcontrib><title>Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL</title><title>2008 IEEE Fourth International Conference on eScience</title><addtitle>ESCIENCE</addtitle><description>Very large scientific datasets are becoming increasingly available in XML formats. Our earlier benchmarking results show that parsing XML is a time consuming process when compared with binary formats optimized for largescale documents. This performance bottleneck will get exacerbated as size of XML data increases in e-science applications. Our focus in this paper is on addressing this performance bottleneck. In recent times, the microprocessor industry has made rapid strides towards chip multi processors (CMPs). The widely available XML parsers have been unable to take advantage of the opportunities presented by CMPs, instead, passing the task of parallelization to the application programmer. The paradigms used thus far to process large size XML documents on uniprocessors are not applicable for CMPs. We present the design, implementation, and performance analysis of PiXiMaL, a parallel processing library for large-scale XML-data files. In particular, we discuss an effective scheme to parallelize the tokenization process to achieve an overall performance increase when parsing large-scale XML documents that are increasingly in use today. Our approach is to build a DFA-based parser that recognizes a useful subset of the XML specification and converts the DFA into an NFA which can be applied on any subset of the input.</description><subject>Algorithm design and analysis</subject><subject>automata</subject><subject>Delay</subject><subject>Humans</subject><subject>Large-scale systems</subject><subject>Microprocessors</subject><subject>Middleware</subject><subject>Multicore processing</subject><subject>parallel</subject><subject>Parallel processing</subject><subject>parsing</subject><subject>Web services</subject><subject>XML</subject><isbn>9781424433803</isbn><isbn>1424433800</isbn><isbn>9780769535357</isbn><isbn>0769535356</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjF1LwzAUhiMiKLp7wZv8gc6Tps1JLuf8hA4HU9jdSNPTLZq1I-kQ_71FfZ6Ll-fmZexawFQIMLe0cp46R9McQE8RT9jEoAZUppSjePrbosiLQkoN8pxNUvqAkaKUqoQL9rm00YZAgS9j7ygl32153_LKxi1lK2cD8fWiyu5soobPDofgnR183_H73h331A2Jj7E4hsFnro_EZ9Ht_EBuOEZK_MsPO770a7-w1RU7a21INPnfS_b--PA2f86q16eX-azKvJA5ZgIQ0bXkRJM3dVOY2jhrgVqTG1AIGm1JEgGdUaBzqms9YlsHjVBgtLxkN3-_nog2h-j3Nn5vCpQKlZI_Cmta0Q</recordid><startdate>200812</startdate><enddate>200812</enddate><creator>Head, M.R.</creator><creator>Govindaraju, M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200812</creationdate><title>Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL</title><author>Head, M.R. ; Govindaraju, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i1327-10777cfec1d2dbd49b9caa0ef929067087a5e3707c96082ebb8888afc0d160983</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithm design and analysis</topic><topic>automata</topic><topic>Delay</topic><topic>Humans</topic><topic>Large-scale systems</topic><topic>Microprocessors</topic><topic>Middleware</topic><topic>Multicore processing</topic><topic>parallel</topic><topic>Parallel processing</topic><topic>parsing</topic><topic>Web services</topic><topic>XML</topic><toplevel>online_resources</toplevel><creatorcontrib>Head, M.R.</creatorcontrib><creatorcontrib>Govindaraju, M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Head, M.R.</au><au>Govindaraju, M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL</atitle><btitle>2008 IEEE Fourth International Conference on eScience</btitle><stitle>ESCIENCE</stitle><date>2008-12</date><risdate>2008</risdate><spage>261</spage><epage>268</epage><pages>261-268</pages><isbn>9781424433803</isbn><isbn>1424433800</isbn><eisbn>9780769535357</eisbn><eisbn>0769535356</eisbn><abstract>Very large scientific datasets are becoming increasingly available in XML formats. Our earlier benchmarking results show that parsing XML is a time consuming process when compared with binary formats optimized for largescale documents. This performance bottleneck will get exacerbated as size of XML data increases in e-science applications. Our focus in this paper is on addressing this performance bottleneck. In recent times, the microprocessor industry has made rapid strides towards chip multi processors (CMPs). The widely available XML parsers have been unable to take advantage of the opportunities presented by CMPs, instead, passing the task of parallelization to the application programmer. The paradigms used thus far to process large size XML documents on uniprocessors are not applicable for CMPs. We present the design, implementation, and performance analysis of PiXiMaL, a parallel processing library for large-scale XML-data files. In particular, we discuss an effective scheme to parallelize the tokenization process to achieve an overall performance increase when parsing large-scale XML documents that are increasingly in use today. Our approach is to build a DFA-based parser that recognizes a useful subset of the XML specification and converts the DFA into an NFA which can be applied on any subset of the input.</abstract><pub>IEEE</pub><doi>10.1109/eScience.2008.77</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9781424433803
ispartof	2008 IEEE Fourth International Conference on eScience, 2008, p.261-268
issn
language	eng
recordid	cdi_ieee_primary_4736766
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Algorithm design and analysis automata Delay Humans Large-scale systems Microprocessors Middleware Multicore processing parallel Parallel processing parsing Web services XML
title	Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T02%3A09%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Parallel%20Processing%20of%20Large-Scale%20XML-Based%20Application%20Documents%20on%20Multi-core%20Architectures%20with%20PiXiMaL&rft.btitle=2008%20IEEE%20Fourth%20International%20Conference%20on%20eScience&rft.au=Head,%20M.R.&rft.date=2008-12&rft.spage=261&rft.epage=268&rft.pages=261-268&rft.isbn=9781424433803&rft.isbn_list=1424433800&rft_id=info:doi/10.1109/eScience.2008.77&rft_dat=%3Cieee_6IE%3E4736766%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769535357&rft.eisbn_list=0769535356&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4736766&rfr_iscdi=true