Two-level clustering approach to training data instance selection: A case study for the steel industry
Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the availa...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3049 |
---|---|
container_issue | |
container_start_page | 3044 |
container_title | |
container_volume | 10 |
creator | Koskimaki, H. Juutilainen, I. Laurinen, P. Roning, J. |
description | Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment. |
doi_str_mv | 10.1109/IJCNN.2008.4634228 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_4634228</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4634228</ieee_id><sourcerecordid>34536313</sourcerecordid><originalsourceid>FETCH-LOGICAL-g261t-f79f232965e01e087b1ce2492c11d56524a2224dfa77908308ff230e716ebf403</originalsourceid><addsrcrecordid>eNpFkEtPwzAQhM1Loi38Abj4xC3FXjsPc6sqHkVVuZRz5Drr1shNSuyA-u8JahGXXe3Mp9FoCbnhbMw5U_ez1-liMQbGirHMhAQoTsiQS5CSF8CzUzLoJ0-kZPnZv8GK8z9DKHFJhiF8MAZCKTEgdvndJB6_0FPjuxCxdfWa6t2ubbTZ0NjQ2GpX_4qVjpq6OkRdG6QBPZromvqBTqjRoVdiV-2pbVoaN78X9pmurvrQdn9FLqz2Aa-Pe0Tenx6X05dk_vY8m07myRoyHhObKwsCVJYi48iKfMUNglRgOK_SLAWpAUBWVue5YoVghe15hjnPcGUlEyNyd8jt-392GGK5dcGg97rGpgulkKnIBBc9eHsAHSKWu9Ztdbsvj18VP_cpZ58</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>34536313</pqid></control><display><type>conference_proceeding</type><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</creator><creatorcontrib>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</creatorcontrib><description>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</description><identifier>ISSN: 2161-4393</identifier><identifier>ISSN: 1522-4899</identifier><identifier>ISBN: 1424418208</identifier><identifier>ISBN: 9781424418206</identifier><identifier>ISBN: 9781424432196</identifier><identifier>ISBN: 1424432197</identifier><identifier>EISSN: 2161-4407</identifier><identifier>EISBN: 1424418216</identifier><identifier>EISBN: 9781424418213</identifier><identifier>DOI: 10.1109/IJCNN.2008.4634228</identifier><language>eng</language><publisher>IEEE</publisher><subject>Biological system modeling ; Clustering algorithms ; Data models ; Distance measurement ; Predictive models ; Steel ; Training data</subject><ispartof>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, Vol.10, p.3044-3049</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4634228$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,314,777,781,786,787,2052,27905,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4634228$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Koskimaki, H.</creatorcontrib><creatorcontrib>Juutilainen, I.</creatorcontrib><creatorcontrib>Laurinen, P.</creatorcontrib><creatorcontrib>Roning, J.</creatorcontrib><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><title>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</title><addtitle>IJCNN</addtitle><description>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</description><subject>Biological system modeling</subject><subject>Clustering algorithms</subject><subject>Data models</subject><subject>Distance measurement</subject><subject>Predictive models</subject><subject>Steel</subject><subject>Training data</subject><issn>2161-4393</issn><issn>1522-4899</issn><issn>2161-4407</issn><isbn>1424418208</isbn><isbn>9781424418206</isbn><isbn>9781424432196</isbn><isbn>1424432197</isbn><isbn>1424418216</isbn><isbn>9781424418213</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFkEtPwzAQhM1Loi38Abj4xC3FXjsPc6sqHkVVuZRz5Drr1shNSuyA-u8JahGXXe3Mp9FoCbnhbMw5U_ez1-liMQbGirHMhAQoTsiQS5CSF8CzUzLoJ0-kZPnZv8GK8z9DKHFJhiF8MAZCKTEgdvndJB6_0FPjuxCxdfWa6t2ubbTZ0NjQ2GpX_4qVjpq6OkRdG6QBPZromvqBTqjRoVdiV-2pbVoaN78X9pmurvrQdn9FLqz2Aa-Pe0Tenx6X05dk_vY8m07myRoyHhObKwsCVJYi48iKfMUNglRgOK_SLAWpAUBWVue5YoVghe15hjnPcGUlEyNyd8jt-392GGK5dcGg97rGpgulkKnIBBc9eHsAHSKWu9Ztdbsvj18VP_cpZ58</recordid><startdate>20080101</startdate><enddate>20080101</enddate><creator>Koskimaki, H.</creator><creator>Juutilainen, I.</creator><creator>Laurinen, P.</creator><creator>Roning, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20080101</creationdate><title>Two-level clustering approach to training data instance selection: A case study for the steel industry</title><author>Koskimaki, H. ; Juutilainen, I. ; Laurinen, P. ; Roning, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-g261t-f79f232965e01e087b1ce2492c11d56524a2224dfa77908308ff230e716ebf403</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Biological system modeling</topic><topic>Clustering algorithms</topic><topic>Data models</topic><topic>Distance measurement</topic><topic>Predictive models</topic><topic>Steel</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Koskimaki, H.</creatorcontrib><creatorcontrib>Juutilainen, I.</creatorcontrib><creatorcontrib>Laurinen, P.</creatorcontrib><creatorcontrib>Roning, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Koskimaki, H.</au><au>Juutilainen, I.</au><au>Laurinen, P.</au><au>Roning, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Two-level clustering approach to training data instance selection: A case study for the steel industry</atitle><btitle>2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)</btitle><stitle>IJCNN</stitle><date>2008-01-01</date><risdate>2008</risdate><volume>10</volume><spage>3044</spage><epage>3049</epage><pages>3044-3049</pages><issn>2161-4393</issn><issn>1522-4899</issn><eissn>2161-4407</eissn><isbn>1424418208</isbn><isbn>9781424418206</isbn><isbn>9781424432196</isbn><isbn>1424432197</isbn><eisbn>1424418216</eisbn><eisbn>9781424418213</eisbn><abstract>Nowadays, huge amounts of information from different industrial processes are stored into databases and companies can improve their production efficiency by mining some new knowledge from this information. However, when these databases becomes too large, it is not efficient to process all the available data with practical data mining applications. As a solution, different approaches for intelligent selection of training data for model fitting have to be developed. In this article, training instances are selected to fit predictive regression models developed for optimization of the steel manufacturing process settings beforehand, and the selection is approached from a clustering point of view. Because basic k-means clustering was found to consume too much time and memory for the purpose, a new algorithm was developed to divide the data coarsely, after which k-means clustering could be performed. The instances were selected using the cluster structure by weighting more the observations from scattered and separated clusters. The study shows that by using this kind of approach to data set selection, the prediction accuracy of the models will get even better. It was noticed that only a quarter of the data, selected with our approach, could be used to achieve results comparable with a reference case, while the procedure can be easily developed for an actual industrial environment.</abstract><pub>IEEE</pub><doi>10.1109/IJCNN.2008.4634228</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2161-4393 |
ispartof | 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, Vol.10, p.3044-3049 |
issn | 2161-4393 1522-4899 2161-4407 |
language | eng |
recordid | cdi_ieee_primary_4634228 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Biological system modeling Clustering algorithms Data models Distance measurement Predictive models Steel Training data |
title | Two-level clustering approach to training data instance selection: A case study for the steel industry |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T15%3A41%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Two-level%20clustering%20approach%20to%20training%20data%20instance%20selection:%20A%20case%20study%20for%20the%20steel%20industry&rft.btitle=2008%20IEEE%20International%20Joint%20Conference%20on%20Neural%20Networks%20(IEEE%20World%20Congress%20on%20Computational%20Intelligence)&rft.au=Koskimaki,%20H.&rft.date=2008-01-01&rft.volume=10&rft.spage=3044&rft.epage=3049&rft.pages=3044-3049&rft.issn=2161-4393&rft.eissn=2161-4407&rft.isbn=1424418208&rft.isbn_list=9781424418206&rft.isbn_list=9781424432196&rft.isbn_list=1424432197&rft_id=info:doi/10.1109/IJCNN.2008.4634228&rft_dat=%3Cproquest_6IE%3E34536313%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424418216&rft.eisbn_list=9781424418213&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=34536313&rft_id=info:pmid/&rft_ieee_id=4634228&rfr_iscdi=true |