Two-level clustering approach to training data instance selection: A case study for the steel industry

Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the availa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Koskimaki, H., Juutilainen, I., Laurinen, P., Roning, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3049
container_issue
container_start_page 3044
container_title
container_volume 10
creator Koskimaki, H.
Juutilainen, I.
Laurinen, P.
Roning, J.
description Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.
doi_str_mv 10.1109/IJCNN.2008.4634228
format Conference Proceeding
fullrecord <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_4634228</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4634228</ieee_id><sourcerecordid>34536313</sourcerecordid><originalsourceid>FETCH-LOGICAL-g261t-f79f232965e01e087b1ce2492c11d56524a2224dfa77908308ff230e716ebf403</originalsourceid><addsrcrecordid>eNpFkEtPwzAQhM1Loi38Abj4xC3FXjsPc6sqHkVVuZRz5Drr1shNSuyA-u8JahGXXe3Mp9FoCbnhbMw5U_ez1-liMQbGirHMhAQoTsiQS5CSF8CzUzLoJ0-kZPnZv8GK8z9DKHFJhiF8MAZCKTEgdvndJB6_0FPjuxCxdfWa6t2ubbTZ0NjQ2GpX_4qVjpq6OkRdG6QBPZromvqBTqjRoVdiV-2pbVoaN78X9pmurvrQdn9FLqz2Aa-Pe0Tenx6X05dk_vY8m07myRoyHhObKwsCVJYi48iKfMUNglRgOK_SLAWpAUBWVue5YoVghe15hjnPcGUlEyNyd8jt-392GGK5dcGg97rGpgulkKnIBBc9eHsAHSKWu9Ztdbsvj18VP_cpZ58</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>34536313</pqid></control><display><type>conference_proceeding</type><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</creator><creatorcontrib>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</creatorcontrib><description>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</description><identifier>ISSN: 2161-4393</identifier><identifier>ISSN: 1522-4899</identifier><identifier>ISBN: 1424418208</identifier><identifier>ISBN: 9781424418206</identifier><identifier>ISBN: 9781424432196</identifier><identifier>ISBN: 1424432197</identifier><identifier>EISSN: 2161-4407</identifier><identifier>EISBN: 1424418216</identifier><identifier>EISBN: 9781424418213</identifier><identifier>DOI: 10.1109/IJCNN.2008.4634228</identifier><language>eng</language><publisher>IEEE</publisher><subject>Biological system modeling ; Clustering algorithms ; Data models ; Distance measurement ; Predictive models ; Steel ; Training data</subject><ispartof>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, Vol.10, p.3044-3049</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4634228$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,314,777,781,786,787,2052,27905,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4634228$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Koskimaki, H.</creatorcontrib><creatorcontrib>Juutilainen, I.</creatorcontrib><creatorcontrib>Laurinen, P.</creatorcontrib><creatorcontrib>Roning, J.</creatorcontrib><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><title>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</title><addtitle>IJCNN</addtitle><description>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</description><subject>Biological system modeling</subject><subject>Clustering algorithms</subject><subject>Data models</subject><subject>Distance measurement</subject><subject>Predictive models</subject><subject>Steel</subject><subject>Training data</subject><issn>2161-4393</issn><issn>1522-4899</issn><issn>2161-4407</issn><isbn>1424418208</isbn><isbn>9781424418206</isbn><isbn>9781424432196</isbn><isbn>1424432197</isbn><isbn>1424418216</isbn><isbn>9781424418213</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFkEtPwzAQhM1Loi38Abj4xC3FXjsPc6sqHkVVuZRz5Drr1shNSuyA-u8JahGXXe3Mp9FoCbnhbMw5U_ez1-liMQbGirHMhAQoTsiQS5CSF8CzUzLoJ0-kZPnZv8GK8z9DKHFJhiF8MAZCKTEgdvndJB6_0FPjuxCxdfWa6t2ubbTZ0NjQ2GpX_4qVjpq6OkRdG6QBPZromvqBTqjRoVdiV-2pbVoaN78X9pmurvrQdn9FLqz2Aa-Pe0Tenx6X05dk_vY8m07myRoyHhObKwsCVJYi48iKfMUNglRgOK_SLAWpAUBWVue5YoVghe15hjnPcGUlEyNyd8jt-392GGK5dcGg97rGpgulkKnIBBc9eHsAHSKWu9Ztdbsvj18VP_cpZ58</recordid><startdate>20080101</startdate><enddate>20080101</enddate><creator>Koskimaki, H.</creator><creator>Juutilainen, I.</creator><creator>Laurinen, P.</creator><creator>Roning, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20080101</creationdate><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><author>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-g261t-f79f232965e01e087b1ce2492c11d56524a2224dfa77908308ff230e716ebf403</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Biological system modeling</topic><topic>Clustering algorithms</topic><topic>Data models</topic><topic>Distance measurement</topic><topic>Predictive models</topic><topic>Steel</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Koskimaki, H.</creatorcontrib><creatorcontrib>Juutilainen, I.</creatorcontrib><creatorcontrib>Laurinen, P.</creatorcontrib><creatorcontrib>Roning, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koskimaki, H.</au><au>Juutilainen, I.</au><au>Laurinen, P.</au><au>Roning, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Two-level clustering approach to training data instance selection: A case study for the steel industry</atitle><btitle>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</btitle><stitle>IJCNN</stitle><date>2008-01-01</date><risdate>2008</risdate><volume>10</volume><spage>3044</spage><epage>3049</epage><pages>3044-3049</pages><issn>2161-4393</issn><issn>1522-4899</issn><eissn>2161-4407</eissn><isbn>1424418208</isbn><isbn>9781424418206</isbn><isbn>9781424432196</isbn><isbn>1424432197</isbn><eisbn>1424418216</eisbn><eisbn>9781424418213</eisbn><abstract>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</abstract><pub>IEEE</pub><doi>10.1109/IJCNN.2008.4634228</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2161-4393
ispartof 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, Vol.10, p.3044-3049
issn 2161-4393
1522-4899
2161-4407
language eng
recordid cdi_ieee_primary_4634228
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Biological system modeling
Clustering algorithms
Data models
Distance measurement
Predictive models
Steel
Training data
title Two-level clustering approach to training data instance selection: A case study for the steel industry
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T15%3A41%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Two-level%20clustering%20approach%20to%20training%20data%20instance%20selection:%20A%20case%20study%20for%20the%20steel%20industry&rft.btitle=2008%20IEEE%20International%20Joint%20Conference%20on%20Neural%20Networks%20(IEEE%20World%20Congress%20on%20Computational%20Intelligence)&rft.au=Koskimaki,%20H.&rft.date=2008-01-01&rft.volume=10&rft.spage=3044&rft.epage=3049&rft.pages=3044-3049&rft.issn=2161-4393&rft.eissn=2161-4407&rft.isbn=1424418208&rft.isbn_list=9781424418206&rft.isbn_list=9781424432196&rft.isbn_list=1424432197&rft_id=info:doi/10.1109/IJCNN.2008.4634228&rft_dat=%3Cproquest_6IE%3E34536313%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424418216&rft.eisbn_list=9781424418213&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=34536313&rft_id=info:pmid/&rft_ieee_id=4634228&rfr_iscdi=true