Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge

Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from the scientific literature can help inform and a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2020-06, Vol.60 (6), p.2876-2887
Hauptverfasser: Hiszpanski, Anna M, Gallagher, Brian, Chellappan, Karthik, Li, Peggy, Liu, Shusen, Kim, Hyojin, Han, Jinkyu, Kailkhura, Bhavya, Buttler, David J, Han, Thomas Yong-Jin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2887
container_issue 6
container_start_page 2876
container_title Journal of chemical information and modeling
container_volume 60
creator Hiszpanski, Anna M
Gallagher, Brian
Chellappan, Karthik
Li, Peggy
Liu, Shusen
Kim, Hyojin
Han, Jinkyu
Kailkhura, Bhavya
Buttler, David J
Han, Thomas Yong-Jin
description Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from the scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of ∼35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles' text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% accuracy on morphology prediction, 0.99 AUC on protocol identification, and up to a 0.87 F1-score on chemical entity recognition. In addition to processing articles' text, microscopy images of nanomaterials within the articles are also automatically identified and analyzed to determine the nanomaterials' morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.
doi_str_mv 10.1021/acs.jcim.0c00199
format Article
fullrecord <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1669214</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2390147306</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-a078bc421a14727cd661d2193a67864722abdf81a6be8677ea03287bbbdc7d83</originalsourceid><addsrcrecordid>eNpdkU1vEzEQhi1ERUvhzglZcOmBBH9s_XGsqpZWDXBIhbhZ3llv4mjXLrZXEI788jok5cDJY-uZd-R5EHpDyZwSRj9ayPMN-HFOgBCq9TN0Qs8bPdOCfH_-VJ9rcYxe5rwhhHMt2At0zBlTQlF1gv58sSGOtrjk7YCX21DWLvuMb0P2q3XJuE9xxJ8trH1weOFsCj6scOzxErwLxfce8EUqHgaXcbvFV79KslAq9AEvS5qgTOnvxYYOf_N5soP_vYu4C_Hn4LqVe4WOejtk9_pwnqL766v7y5vZ4uun28uLxQwawcrMEqlaaBi1tJFMQicE7RjV3AqpRH1itu16Ra1onRJSOks4U7Jt2w5kp_gperePjbl4k8EXB2uIITgohgqhGW0qdLaHHlL8MblczOgzuGGwwcUpG8Y1qeM5ERV9_x-6iVMK9QeGNVRoLeuuK0X2FKSYc3K9eUh-tGlrKDE7h6Y6NDuH5uCwtrw9BE_t6Lp_DU_S-CM4GpqO</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2416997962</pqid></control><display><type>article</type><title>Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge</title><source>American Chemical Society Journals</source><creator>Hiszpanski, Anna M ; Gallagher, Brian ; Chellappan, Karthik ; Li, Peggy ; Liu, Shusen ; Kim, Hyojin ; Han, Jinkyu ; Kailkhura, Bhavya ; Buttler, David J ; Han, Thomas Yong-Jin</creator><creatorcontrib>Hiszpanski, Anna M ; Gallagher, Brian ; Chellappan, Karthik ; Li, Peggy ; Liu, Shusen ; Kim, Hyojin ; Han, Jinkyu ; Kailkhura, Bhavya ; Buttler, David J ; Han, Thomas Yong-Jin ; Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)</creatorcontrib><description>Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from the scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of ∼35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles' text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% accuracy on morphology prediction, 0.99 AUC on protocol identification, and up to a 0.87 F1-score on chemical entity recognition. In addition to processing articles' text, microscopy images of nanomaterials within the articles are also automatically identified and analyzed to determine the nanomaterials' morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.0c00199</identifier><identifier>PMID: 32286818</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>biological databases ; Chemical synthesis ; Composition ; Design of experiments ; Design parameters ; gold ; Knowledge bases (artificial intelligence) ; Machine learning ; MATERIALS SCIENCE ; Morphology ; nanocubes ; Nanomaterials ; Object recognition ; Reagents ; Scientific papers</subject><ispartof>Journal of chemical information and modeling, 2020-06, Vol.60 (6), p.2876-2887</ispartof><rights>Copyright American Chemical Society Jun 22, 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c462t-a078bc421a14727cd661d2193a67864722abdf81a6be8677ea03287bbbdc7d83</citedby><cites>FETCH-LOGICAL-c462t-a078bc421a14727cd661d2193a67864722abdf81a6be8677ea03287bbbdc7d83</cites><orcidid>0000-0002-6374-116X ; 0000-0002-3000-2782 ; 0000000230002782 ; 000000026374116X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,2765,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32286818$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1669214$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Hiszpanski, Anna M</creatorcontrib><creatorcontrib>Gallagher, Brian</creatorcontrib><creatorcontrib>Chellappan, Karthik</creatorcontrib><creatorcontrib>Li, Peggy</creatorcontrib><creatorcontrib>Liu, Shusen</creatorcontrib><creatorcontrib>Kim, Hyojin</creatorcontrib><creatorcontrib>Han, Jinkyu</creatorcontrib><creatorcontrib>Kailkhura, Bhavya</creatorcontrib><creatorcontrib>Buttler, David J</creatorcontrib><creatorcontrib>Han, Thomas Yong-Jin</creatorcontrib><creatorcontrib>Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)</creatorcontrib><title>Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge</title><title>Journal of chemical information and modeling</title><addtitle>J Chem Inf Model</addtitle><description>Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from the scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of ∼35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles' text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% accuracy on morphology prediction, 0.99 AUC on protocol identification, and up to a 0.87 F1-score on chemical entity recognition. In addition to processing articles' text, microscopy images of nanomaterials within the articles are also automatically identified and analyzed to determine the nanomaterials' morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.</description><subject>biological databases</subject><subject>Chemical synthesis</subject><subject>Composition</subject><subject>Design of experiments</subject><subject>Design parameters</subject><subject>gold</subject><subject>Knowledge bases (artificial intelligence)</subject><subject>Machine learning</subject><subject>MATERIALS SCIENCE</subject><subject>Morphology</subject><subject>nanocubes</subject><subject>Nanomaterials</subject><subject>Object recognition</subject><subject>Reagents</subject><subject>Scientific papers</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNpdkU1vEzEQhi1ERUvhzglZcOmBBH9s_XGsqpZWDXBIhbhZ3llv4mjXLrZXEI788jok5cDJY-uZd-R5EHpDyZwSRj9ayPMN-HFOgBCq9TN0Qs8bPdOCfH_-VJ9rcYxe5rwhhHMt2At0zBlTQlF1gv58sSGOtrjk7YCX21DWLvuMb0P2q3XJuE9xxJ8trH1weOFsCj6scOzxErwLxfce8EUqHgaXcbvFV79KslAq9AEvS5qgTOnvxYYOf_N5soP_vYu4C_Hn4LqVe4WOejtk9_pwnqL766v7y5vZ4uun28uLxQwawcrMEqlaaBi1tJFMQicE7RjV3AqpRH1itu16Ra1onRJSOks4U7Jt2w5kp_gperePjbl4k8EXB2uIITgohgqhGW0qdLaHHlL8MblczOgzuGGwwcUpG8Y1qeM5ERV9_x-6iVMK9QeGNVRoLeuuK0X2FKSYc3K9eUh-tGlrKDE7h6Y6NDuH5uCwtrw9BE_t6Lp_DU_S-CM4GpqO</recordid><startdate>20200622</startdate><enddate>20200622</enddate><creator>Hiszpanski, Anna M</creator><creator>Gallagher, Brian</creator><creator>Chellappan, Karthik</creator><creator>Li, Peggy</creator><creator>Liu, Shusen</creator><creator>Kim, Hyojin</creator><creator>Han, Jinkyu</creator><creator>Kailkhura, Bhavya</creator><creator>Buttler, David J</creator><creator>Han, Thomas Yong-Jin</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-6374-116X</orcidid><orcidid>https://orcid.org/0000-0002-3000-2782</orcidid><orcidid>https://orcid.org/0000000230002782</orcidid><orcidid>https://orcid.org/000000026374116X</orcidid></search><sort><creationdate>20200622</creationdate><title>Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge</title><author>Hiszpanski, Anna M ; Gallagher, Brian ; Chellappan, Karthik ; Li, Peggy ; Liu, Shusen ; Kim, Hyojin ; Han, Jinkyu ; Kailkhura, Bhavya ; Buttler, David J ; Han, Thomas Yong-Jin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-a078bc421a14727cd661d2193a67864722abdf81a6be8677ea03287bbbdc7d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>biological databases</topic><topic>Chemical synthesis</topic><topic>Composition</topic><topic>Design of experiments</topic><topic>Design parameters</topic><topic>gold</topic><topic>Knowledge bases (artificial intelligence)</topic><topic>Machine learning</topic><topic>MATERIALS SCIENCE</topic><topic>Morphology</topic><topic>nanocubes</topic><topic>Nanomaterials</topic><topic>Object recognition</topic><topic>Reagents</topic><topic>Scientific papers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hiszpanski, Anna M</creatorcontrib><creatorcontrib>Gallagher, Brian</creatorcontrib><creatorcontrib>Chellappan, Karthik</creatorcontrib><creatorcontrib>Li, Peggy</creatorcontrib><creatorcontrib>Liu, Shusen</creatorcontrib><creatorcontrib>Kim, Hyojin</creatorcontrib><creatorcontrib>Han, Jinkyu</creatorcontrib><creatorcontrib>Kailkhura, Bhavya</creatorcontrib><creatorcontrib>Buttler, David J</creatorcontrib><creatorcontrib>Han, Thomas Yong-Jin</creatorcontrib><creatorcontrib>Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hiszpanski, Anna M</au><au>Gallagher, Brian</au><au>Chellappan, Karthik</au><au>Li, Peggy</au><au>Liu, Shusen</au><au>Kim, Hyojin</au><au>Han, Jinkyu</au><au>Kailkhura, Bhavya</au><au>Buttler, David J</au><au>Han, Thomas Yong-Jin</au><aucorp>Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J Chem Inf Model</addtitle><date>2020-06-22</date><risdate>2020</risdate><volume>60</volume><issue>6</issue><spage>2876</spage><epage>2887</epage><pages>2876-2887</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Nanomaterials of varying compositions and morphologies are of interest for many applications from catalysis to optics, but the synthesis of nanomaterials and their scale-up are most often time-consuming and Edisonian processes. Information gleaned from the scientific literature can help inform and accelerate nanomaterials development, but again, searching the literature and digesting the information are time-consuming manual processes for researchers. To help address these challenges, we developed scientific article-processing tools that extract and structure information from the text and figures of nanomaterials articles, thereby enabling the creation of a personalized knowledgebase for nanomaterials synthesis that can be mined to help inform further nanomaterials development. Starting with a corpus of ∼35k nanomaterials-related articles, we developed models to classify articles according to the nanomaterial composition and morphology, extract synthesis protocols from within the articles' text, and extract, normalize, and categorize chemical terms within synthesis protocols. We demonstrate the efficiency of the proposed pipeline on an expert-labeled set of nanomaterials synthesis articles, achieving 100% accuracy on composition prediction, 95% accuracy on morphology prediction, 0.99 AUC on protocol identification, and up to a 0.87 F1-score on chemical entity recognition. In addition to processing articles' text, microscopy images of nanomaterials within the articles are also automatically identified and analyzed to determine the nanomaterials' morphologies and size distributions. To enable users to easily explore the database, we developed a complementary browser-based visualization tool that provides flexibility in comparing across subsets of articles of interest. We use these tools and information to identify trends in nanomaterials synthesis, such as the correlation of certain reagents with various nanomaterial morphologies, which is useful in guiding hypotheses and reducing the potential parameter space during experimental design.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>32286818</pmid><doi>10.1021/acs.jcim.0c00199</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-6374-116X</orcidid><orcidid>https://orcid.org/0000-0002-3000-2782</orcidid><orcidid>https://orcid.org/0000000230002782</orcidid><orcidid>https://orcid.org/000000026374116X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of chemical information and modeling, 2020-06, Vol.60 (6), p.2876-2887
issn 1549-9596
1549-960X
language eng
recordid cdi_osti_scitechconnect_1669214
source American Chemical Society Journals
subjects biological databases
Chemical synthesis
Composition
Design of experiments
Design parameters
gold
Knowledge bases (artificial intelligence)
Machine learning
MATERIALS SCIENCE
Morphology
nanocubes
Nanomaterials
Object recognition
Reagents
Scientific papers
title Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T10%3A07%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Nanomaterial%20Synthesis%20Insights%20from%20Machine%20Learning%20of%20Scientific%20Articles%20by%20Extracting,%20Structuring,%20and%20Visualizing%20Knowledge&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Hiszpanski,%20Anna%20M&rft.aucorp=Lawrence%20Livermore%20National%20Lab.%20(LLNL),%20Livermore,%20CA%20(United%20States)&rft.date=2020-06-22&rft.volume=60&rft.issue=6&rft.spage=2876&rft.epage=2887&rft.pages=2876-2887&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.0c00199&rft_dat=%3Cproquest_osti_%3E2390147306%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2416997962&rft_id=info:pmid/32286818&rfr_iscdi=true