A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency

sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bagnall, Anthony, Király, Franz, Löning, Markus, Middlehurst, Matthew, Oastler, George
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Bagnall, Anthony Király, Franz Löning, Markus Middlehurst, Matthew Oastler, George
description	sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality equivalent to that available in a related java package, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different implementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved.
doi_str_mv	10.48550/arxiv.1909.05738
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1909_05738</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1909_05738</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-5fa4497fcbbe6b08ef2f2d457579ba934bc54b721ca82d448a8e855185d80c783</originalsourceid><addsrcrecordid>eNotkMtOwzAURL1hgQofwIr7AaQ4iV077KqKl1SJTffRtXPdWE3iyraAip8nlK5GGo2OdIaxu5IvhZaSP2L89p_LsuHNkktV62v2s4aMA0FwkL8C5BCGg8_pASIdQ8yQewLnY8pPYGiy_Yjx4Kc9ZD8SJIqeEtgBU_LOW8w-TIDDPkSf-zGBCxFsiJFsniglwKkDcvPSz6zTDbtyOCS6veSC7V6ed5u3Yvvx-r5ZbwtcKV1Ih0I0ylljaGW4Jle5qhNSSdUYbGphrBRGVaVFPfdCo6ZZttSy09wqXS_Y_T_2bN8eo58lTu3fC-35hfoX8nBalA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency</title><source>arXiv.org</source><creator>Bagnall, Anthony ; Király, Franz ; Löning, Markus ; Middlehurst, Matthew ; Oastler, George</creator><creatorcontrib>Bagnall, Anthony ; Király, Franz ; Löning, Markus ; Middlehurst, Matthew ; Oastler, George</creatorcontrib><description>sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality equivalent to that available in a related java package, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different implementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved.</description><identifier>DOI: 10.48550/arxiv.1909.05738</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1909.05738$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1909.05738$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bagnall, Anthony</creatorcontrib><creatorcontrib>Király, Franz</creatorcontrib><creatorcontrib>Löning, Markus</creatorcontrib><creatorcontrib>Middlehurst, Matthew</creatorcontrib><creatorcontrib>Oastler, George</creatorcontrib><title>A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency</title><description>sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality equivalent to that available in a related java package, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different implementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved.</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotkMtOwzAURL1hgQofwIr7AaQ4iV077KqKl1SJTffRtXPdWE3iyraAip8nlK5GGo2OdIaxu5IvhZaSP2L89p_LsuHNkktV62v2s4aMA0FwkL8C5BCGg8_pASIdQ8yQewLnY8pPYGiy_Yjx4Kc9ZD8SJIqeEtgBU_LOW8w-TIDDPkSf-zGBCxFsiJFsniglwKkDcvPSz6zTDbtyOCS6veSC7V6ed5u3Yvvx-r5ZbwtcKV1Ih0I0ylljaGW4Jle5qhNSSdUYbGphrBRGVaVFPfdCo6ZZttSy09wqXS_Y_T_2bN8eo58lTu3fC-35hfoX8nBalA</recordid><startdate>20190912</startdate><enddate>20190912</enddate><creator>Bagnall, Anthony</creator><creator>Király, Franz</creator><creator>Löning, Markus</creator><creator>Middlehurst, Matthew</creator><creator>Oastler, George</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190912</creationdate><title>A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency</title><author>Bagnall, Anthony ; Király, Franz ; Löning, Markus ; Middlehurst, Matthew ; Oastler, George</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-5fa4497fcbbe6b08ef2f2d457579ba934bc54b721ca82d448a8e855185d80c783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Bagnall, Anthony</creatorcontrib><creatorcontrib>Király, Franz</creatorcontrib><creatorcontrib>Löning, Markus</creatorcontrib><creatorcontrib>Middlehurst, Matthew</creatorcontrib><creatorcontrib>Oastler, George</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bagnall, Anthony</au><au>Király, Franz</au><au>Löning, Markus</au><au>Middlehurst, Matthew</au><au>Oastler, George</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency</atitle><date>2019-09-12</date><risdate>2019</risdate><abstract>sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality equivalent to that available in a related java package, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different implementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved.</abstract><doi>10.48550/arxiv.1909.05738</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1909.05738
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1909_05738
source	arXiv.org
subjects	Computer Science - Learning Statistics - Machine Learning
title	A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T11%3A39%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20tale%20of%20two%20toolkits,%20report%20the%20first:%20benchmarking%20time%20series%20classification%20algorithms%20for%20correctness%20and%20efficiency&rft.au=Bagnall,%20Anthony&rft.date=2019-09-12&rft_id=info:doi/10.48550/arxiv.1909.05738&rft_dat=%3Carxiv_GOX%3E1909_05738%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true