Two-level clustering approach to training data instance selection: A case study for the steel industry

Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the availa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Koskimaki, H., Juutilainen, I., Laurinen, P., Roning, J.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Biological system modeling Clustering algorithms Data models Distance measurement Predictive models Steel Training data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3049
container_issue
container_start_page	3044
container_title
container_volume	10
creator	Koskimaki, H. Juutilainen, I. Laurinen, P. Roning, J.
description	Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.
doi_str_mv	10.1109/IJCNN.2008.4634228
format	Conference Proceeding
fullrecord	<record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_4634228</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4634228</ieee_id><sourcerecordid>34536313</sourcerecordid><originalsourceid>FETCH-LOGICAL-g261t-f79f232965e01e087b1ce2492c11d56524a2224dfa77908308ff230e716ebf403</originalsourceid><addsrcrecordid>eNpFkEtPwzAQhM1Loi38Abj4xC3FXjsPc6sqHkVVuZRz5Drr1shNSuyA-u8JahGXXe3Mp9FoCbnhbMw5U_ez1-liMQbGirHMhAQoTsiQS5CSF8CzUzLoJ0-kZPnZv8GK8z9DKHFJhiF8MAZCKTEgdvndJB6_0FPjuxCxdfWa6t2ubbTZ0NjQ2GpX_4qVjpq6OkRdG6QBPZromvqBTqjRoVdiV-2pbVoaN78X9pmurvrQdn9FLqz2Aa-Pe0Tenx6X05dk_vY8m07myRoyHhObKwsCVJYi48iKfMUNglRgOK_SLAWpAUBWVue5YoVghe15hjnPcGUlEyNyd8jt-392GGK5dcGg97rGpgulkKnIBBc9eHsAHSKWu9Ztdbsvj18VP_cpZ58</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>34536313</pqid></control><display><type>conference_proceeding</type><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</creator><creatorcontrib>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</creatorcontrib><description>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</description><identifier>ISSN: 2161-4393</identifier><identifier>ISSN: 1522-4899</identifier><identifier>ISBN: 1424418208</identifier><identifier>ISBN: 9781424418206</identifier><identifier>ISBN: 9781424432196</identifier><identifier>ISBN: 1424432197</identifier><identifier>EISSN: 2161-4407</identifier><identifier>EISBN: 1424418216</identifier><identifier>EISBN: 9781424418213</identifier><identifier>DOI: 10.1109/IJCNN.2008.4634228</identifier><language>eng</language><publisher>IEEE</publisher><subject>Biological system modeling ; Clustering algorithms ; Data models ; Distance measurement ; Predictive models ; Steel ; Training data</subject><ispartof>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, Vol.10, p.3044-3049</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4634228$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,314,777,781,786,787,2052,27905,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4634228$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Koskimaki, H.</creatorcontrib><creatorcontrib>Juutilainen, I.</creatorcontrib><creatorcontrib>Laurinen, P.</creatorcontrib><creatorcontrib>Roning, J.</creatorcontrib><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><title>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</title><addtitle>IJCNN</addtitle><description>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</description><subject>Biological system modeling</subject><subject>Clustering algorithms</subject><subject>Data models</subject><subject>Distance measurement</subject><subject>Predictive models</subject><subject>Steel</subject><subject>Training data</subject><issn>2161-4393</issn><issn>1522-4899</issn><issn>2161-4407</issn><isbn>1424418208</isbn><isbn>9781424418206</isbn><isbn>9781424432196</isbn><isbn>1424432197</isbn><isbn>1424418216</isbn><isbn>9781424418213</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFkEtPwzAQhM1Loi38Abj4xC3FXjsPc6sqHkVVuZRz5Drr1shNSuyA-u8JahGXXe3Mp9FoCbnhbMw5U_ez1-liMQbGirHMhAQoTsiQS5CSF8CzUzLoJ0-kZPnZv8GK8z9DKHFJhiF8MAZCKTEgdvndJB6_0FPjuxCxdfWa6t2ubbTZ0NjQ2GpX_4qVjpq6OkRdG6QBPZromvqBTqjRoVdiV-2pbVoaN78X9pmurvrQdn9FLqz2Aa-Pe0Tenx6X05dk_vY8m07myRoyHhObKwsCVJYi48iKfMUNglRgOK_SLAWpAUBWVue5YoVghe15hjnPcGUlEyNyd8jt-392GGK5dcGg97rGpgulkKnIBBc9eHsAHSKWu9Ztdbsvj18VP_cpZ58</recordid><startdate>20080101</startdate><enddate>20080101</enddate><creator>Koskimaki, H.</creator><creator>Juutilainen, I.</creator><creator>Laurinen, P.</creator><creator>Roning, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20080101</creationdate><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><author>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-g261t-f79f232965e01e087b1ce2492c11d56524a2224dfa77908308ff230e716ebf403</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Biological system modeling</topic><topic>Clustering algorithms</topic><topic>Data models</topic><topic>Distance measurement</topic><topic>Predictive models</topic><topic>Steel</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Koskimaki, H.</creatorcontrib><creatorcontrib>Juutilainen, I.</creatorcontrib><creatorcontrib>Laurinen, P.</creatorcontrib><creatorcontrib>Roning, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koskimaki, H.</au><au>Juutilainen, I.</au><au>Laurinen, P.</au><au>Roning, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Two-level clustering approach to training data instance selection: A case study for the steel industry</atitle><btitle>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</btitle><stitle>IJCNN</stitle><date>2008-01-01</date><risdate>2008</risdate><volume>10</volume><spage>3044</spage><epage>3049</epage><pages>3044-3049</pages><issn>2161-4393</issn><issn>1522-4899</issn><eissn>2161-4407</eissn><isbn>1424418208</isbn><isbn>9781424418206</isbn><isbn>9781424432196</isbn><isbn>1424432197</isbn><eisbn>1424418216</eisbn><eisbn>9781424418213</eisbn><abstract>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</abstract><pub>IEEE</pub><doi>10.1109/IJCNN.2008.4634228</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2161-4393
ispartof	2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, Vol.10, p.3044-3049
issn	2161-4393 1522-4899 2161-4407
language	eng
recordid	cdi_ieee_primary_4634228
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Biological system modeling Clustering algorithms Data models Distance measurement Predictive models Steel Training data
title	Two-level clustering approach to training data instance selection: A case study for the steel industry
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T15%3A41%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Two-level%20clustering%20approach%20to%20training%20data%20instance%20selection:%20A%20case%20study%20for%20the%20steel%20industry&rft.btitle=2008%20IEEE%20International%20Joint%20Conference%20on%20Neural%20Networks%20(IEEE%20World%20Congress%20on%20Computational%20Intelligence)&rft.au=Koskimaki,%20H.&rft.date=2008-01-01&rft.volume=10&rft.spage=3044&rft.epage=3049&rft.pages=3044-3049&rft.issn=2161-4393&rft.eissn=2161-4407&rft.isbn=1424418208&rft.isbn_list=9781424418206&rft.isbn_list=9781424432196&rft.isbn_list=1424432197&rft_id=info:doi/10.1109/IJCNN.2008.4634228&rft_dat=%3Cproquest_6IE%3E34536313%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424418216&rft.eisbn_list=9781424418213&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=34536313&rft_id=info:pmid/&rft_ieee_id=4634228&rfr_iscdi=true