Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening
There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose tha...
Gespeichert in:
Veröffentlicht in: | SLAS discovery 2020-07, Vol.25 (6), p.655-664, Article 2472555220919345 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 664 |
---|---|
container_issue | 6 |
container_start_page | 655 |
container_title | SLAS discovery |
container_volume | 25 |
creator | Omta, Wienand A. van Heesbeen, Roy G. Shen, Ian de Nobel, Jacob Robers, Desmond van der Velden, Lieke M. Medema, René H. Siebes, Arno P.J.M. Feelders, Ad J. Brinkkemper, Sjaak Klumperman, Judith S. Spruit, Marco René Brinkhuis, Matthieu J.S. Egan, David A. |
description | There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone. |
doi_str_mv | 10.1177/2472555220919345 |
format | Article |
fullrecord | <record><control><sourceid>proquest_webof</sourceid><recordid>TN_cdi_webofscience_primary_000534051500001</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_2472555220919345</sage_id><els_id>S2472555222065741</els_id><sourcerecordid>2402422436</sourcerecordid><originalsourceid>FETCH-LOGICAL-c430t-74d47c42d972d86b39faef04cc5180150c9f12a5d67b5a6ec558a60a8962a3a33</originalsourceid><addsrcrecordid>eNqNkV1rFDEUhoNYbNn23ivJpSBj8zkf3slgq7DFQu11yCRnuqk7yZrMVPrvzTjbFQSl5CLh8LxJznMQek3Je0qr6pyJikkpGSMNbbiQL9DJXCqkLMnLw1myY3SW0j0hhFYlz-sVOuZMEMJKdoK-t2HonHf-Dt9MO4gPLoHF2lt869OfwpU2G-cBr0HH3_AVjJtgE-5DxNcb8GF83DmDLyZvRhe83uLLXBycSfjGRIA5dIqOer1NcLbfV-j24tO39nOx_nr5pf24LozgZCwqYUVlBLNNxWxddrzpNfREGCNpTagkpukp09KWVSd1CUbKWpdE103JNNecr9Db5d5dDD8mSKMaXDKw3WoPYUoqN88EYyLLWCGyoCaGlCL0ahfdoOOjokTNltXflnPkzf72qRvAHgJPTjPwbgF-Qhf6ZBx4Awcsz0FyQWTuYx5Jpuvn060b9ay3DZMfc7RYoknfgboPU8ze0_9-_mHhIct_cBDV_jnrIphR2eD-Hf4FIMm3lA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2402422436</pqid></control><display><type>article</type><title>Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><source>Alma/SFX Local Collection</source><creator>Omta, Wienand A. ; van Heesbeen, Roy G. ; Shen, Ian ; de Nobel, Jacob ; Robers, Desmond ; van der Velden, Lieke M. ; Medema, René H. ; Siebes, Arno P.J.M. ; Feelders, Ad J. ; Brinkkemper, Sjaak ; Klumperman, Judith S. ; Spruit, Marco René ; Brinkhuis, Matthieu J.S. ; Egan, David A.</creator><creatorcontrib>Omta, Wienand A. ; van Heesbeen, Roy G. ; Shen, Ian ; de Nobel, Jacob ; Robers, Desmond ; van der Velden, Lieke M. ; Medema, René H. ; Siebes, Arno P.J.M. ; Feelders, Ad J. ; Brinkkemper, Sjaak ; Klumperman, Judith S. ; Spruit, Marco René ; Brinkhuis, Matthieu J.S. ; Egan, David A.</creatorcontrib><description>There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.</description><identifier>ISSN: 2472-5552</identifier><identifier>EISSN: 2472-5560</identifier><identifier>DOI: 10.1177/2472555220919345</identifier><identifier>PMID: 32400262</identifier><language>eng</language><publisher>Los Angeles, CA: Elsevier Inc</publisher><subject>artificial intelligence ; Biochemical Research Methods ; Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Chemistry ; Chemistry, Analytical ; classification ; Life Sciences & Biomedicine ; phenotypic profiles ; Physical Sciences ; Science & Technology ; supervised machine learning</subject><ispartof>SLAS discovery, 2020-07, Vol.25 (6), p.655-664, Article 2472555220919345</ispartof><rights>2020 Society for Laboratory Automation and Screening</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>9</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000534051500001</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c430t-74d47c42d972d86b39faef04cc5180150c9f12a5d67b5a6ec558a60a8962a3a33</citedby><cites>FETCH-LOGICAL-c430t-74d47c42d972d86b39faef04cc5180150c9f12a5d67b5a6ec558a60a8962a3a33</cites><orcidid>0000-0003-1054-6683 ; 0000-0002-9237-221X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,782,786,27931,27932,28255</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32400262$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Omta, Wienand A.</creatorcontrib><creatorcontrib>van Heesbeen, Roy G.</creatorcontrib><creatorcontrib>Shen, Ian</creatorcontrib><creatorcontrib>de Nobel, Jacob</creatorcontrib><creatorcontrib>Robers, Desmond</creatorcontrib><creatorcontrib>van der Velden, Lieke M.</creatorcontrib><creatorcontrib>Medema, René H.</creatorcontrib><creatorcontrib>Siebes, Arno P.J.M.</creatorcontrib><creatorcontrib>Feelders, Ad J.</creatorcontrib><creatorcontrib>Brinkkemper, Sjaak</creatorcontrib><creatorcontrib>Klumperman, Judith S.</creatorcontrib><creatorcontrib>Spruit, Marco René</creatorcontrib><creatorcontrib>Brinkhuis, Matthieu J.S.</creatorcontrib><creatorcontrib>Egan, David A.</creatorcontrib><title>Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening</title><title>SLAS discovery</title><addtitle>SLAS DISCOV</addtitle><addtitle>J Biomol Screen</addtitle><description>There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.</description><subject>artificial intelligence</subject><subject>Biochemical Research Methods</subject><subject>Biochemistry & Molecular Biology</subject><subject>Biotechnology & Applied Microbiology</subject><subject>Chemistry</subject><subject>Chemistry, Analytical</subject><subject>classification</subject><subject>Life Sciences & Biomedicine</subject><subject>phenotypic profiles</subject><subject>Physical Sciences</subject><subject>Science & Technology</subject><subject>supervised machine learning</subject><issn>2472-5552</issn><issn>2472-5560</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><recordid>eNqNkV1rFDEUhoNYbNn23ivJpSBj8zkf3slgq7DFQu11yCRnuqk7yZrMVPrvzTjbFQSl5CLh8LxJznMQek3Je0qr6pyJikkpGSMNbbiQL9DJXCqkLMnLw1myY3SW0j0hhFYlz-sVOuZMEMJKdoK-t2HonHf-Dt9MO4gPLoHF2lt869OfwpU2G-cBr0HH3_AVjJtgE-5DxNcb8GF83DmDLyZvRhe83uLLXBycSfjGRIA5dIqOer1NcLbfV-j24tO39nOx_nr5pf24LozgZCwqYUVlBLNNxWxddrzpNfREGCNpTagkpukp09KWVSd1CUbKWpdE103JNNecr9Db5d5dDD8mSKMaXDKw3WoPYUoqN88EYyLLWCGyoCaGlCL0ahfdoOOjokTNltXflnPkzf72qRvAHgJPTjPwbgF-Qhf6ZBx4Awcsz0FyQWTuYx5Jpuvn060b9ay3DZMfc7RYoknfgboPU8ze0_9-_mHhIct_cBDV_jnrIphR2eD-Hf4FIMm3lA</recordid><startdate>20200701</startdate><enddate>20200701</enddate><creator>Omta, Wienand A.</creator><creator>van Heesbeen, Roy G.</creator><creator>Shen, Ian</creator><creator>de Nobel, Jacob</creator><creator>Robers, Desmond</creator><creator>van der Velden, Lieke M.</creator><creator>Medema, René H.</creator><creator>Siebes, Arno P.J.M.</creator><creator>Feelders, Ad J.</creator><creator>Brinkkemper, Sjaak</creator><creator>Klumperman, Judith S.</creator><creator>Spruit, Marco René</creator><creator>Brinkhuis, Matthieu J.S.</creator><creator>Egan, David A.</creator><general>Elsevier Inc</general><general>SAGE Publications</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1054-6683</orcidid><orcidid>https://orcid.org/0000-0002-9237-221X</orcidid></search><sort><creationdate>20200701</creationdate><title>Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening</title><author>Omta, Wienand A. ; van Heesbeen, Roy G. ; Shen, Ian ; de Nobel, Jacob ; Robers, Desmond ; van der Velden, Lieke M. ; Medema, René H. ; Siebes, Arno P.J.M. ; Feelders, Ad J. ; Brinkkemper, Sjaak ; Klumperman, Judith S. ; Spruit, Marco René ; Brinkhuis, Matthieu J.S. ; Egan, David A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c430t-74d47c42d972d86b39faef04cc5180150c9f12a5d67b5a6ec558a60a8962a3a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>artificial intelligence</topic><topic>Biochemical Research Methods</topic><topic>Biochemistry & Molecular Biology</topic><topic>Biotechnology & Applied Microbiology</topic><topic>Chemistry</topic><topic>Chemistry, Analytical</topic><topic>classification</topic><topic>Life Sciences & Biomedicine</topic><topic>phenotypic profiles</topic><topic>Physical Sciences</topic><topic>Science & Technology</topic><topic>supervised machine learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Omta, Wienand A.</creatorcontrib><creatorcontrib>van Heesbeen, Roy G.</creatorcontrib><creatorcontrib>Shen, Ian</creatorcontrib><creatorcontrib>de Nobel, Jacob</creatorcontrib><creatorcontrib>Robers, Desmond</creatorcontrib><creatorcontrib>van der Velden, Lieke M.</creatorcontrib><creatorcontrib>Medema, René H.</creatorcontrib><creatorcontrib>Siebes, Arno P.J.M.</creatorcontrib><creatorcontrib>Feelders, Ad J.</creatorcontrib><creatorcontrib>Brinkkemper, Sjaak</creatorcontrib><creatorcontrib>Klumperman, Judith S.</creatorcontrib><creatorcontrib>Spruit, Marco René</creatorcontrib><creatorcontrib>Brinkhuis, Matthieu J.S.</creatorcontrib><creatorcontrib>Egan, David A.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>SLAS discovery</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Omta, Wienand A.</au><au>van Heesbeen, Roy G.</au><au>Shen, Ian</au><au>de Nobel, Jacob</au><au>Robers, Desmond</au><au>van der Velden, Lieke M.</au><au>Medema, René H.</au><au>Siebes, Arno P.J.M.</au><au>Feelders, Ad J.</au><au>Brinkkemper, Sjaak</au><au>Klumperman, Judith S.</au><au>Spruit, Marco René</au><au>Brinkhuis, Matthieu J.S.</au><au>Egan, David A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening</atitle><jtitle>SLAS discovery</jtitle><stitle>SLAS DISCOV</stitle><addtitle>J Biomol Screen</addtitle><date>2020-07-01</date><risdate>2020</risdate><volume>25</volume><issue>6</issue><spage>655</spage><epage>664</epage><pages>655-664</pages><artnum>2472555220919345</artnum><issn>2472-5552</issn><eissn>2472-5560</eissn><abstract>There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.</abstract><cop>Los Angeles, CA</cop><pub>Elsevier Inc</pub><pmid>32400262</pmid><doi>10.1177/2472555220919345</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1054-6683</orcidid><orcidid>https://orcid.org/0000-0002-9237-221X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2472-5552 |
ispartof | SLAS discovery, 2020-07, Vol.25 (6), p.655-664, Article 2472555220919345 |
issn | 2472-5552 2472-5560 |
language | eng |
recordid | cdi_webofscience_primary_000534051500001 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; Alma/SFX Local Collection |
subjects | artificial intelligence Biochemical Research Methods Biochemistry & Molecular Biology Biotechnology & Applied Microbiology Chemistry Chemistry, Analytical classification Life Sciences & Biomedicine phenotypic profiles Physical Sciences Science & Technology supervised machine learning |
title | Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T02%3A34%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_webof&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Combining%20Supervised%20and%20Unsupervised%20Machine%20Learning%20Methods%20for%20Phenotypic%20Functional%20Genomics%20Screening&rft.jtitle=SLAS%20discovery&rft.au=Omta,%20Wienand%20A.&rft.date=2020-07-01&rft.volume=25&rft.issue=6&rft.spage=655&rft.epage=664&rft.pages=655-664&rft.artnum=2472555220919345&rft.issn=2472-5552&rft.eissn=2472-5560&rft_id=info:doi/10.1177/2472555220919345&rft_dat=%3Cproquest_webof%3E2402422436%3C/proquest_webof%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2402422436&rft_id=info:pmid/32400262&rft_sage_id=10.1177_2472555220919345&rft_els_id=S2472555222065741&rfr_iscdi=true |