A Text Mining Approach to Uncover the Structure of Subject Metadata in the Biodiversity Heritage Library

We propose a bottom‐up, data‐driven pipeline to uncover the structure of biodiversity subject metadata using a combination of text mining approaches. In this study, we analyze 721,035 subject terms in the Biodiversity Heritage Library (BHL). We utilize named entity recognition and word‐embedding met...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ASIST Annual Meeting 2023-10, Vol.60 (1), p.926-928
Hauptverfasser: Cheng, Yi‐Yun, Parulian, Nikolaus Nova, Dinh, Ly
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 928
container_issue 1
container_start_page 926
container_title Proceedings of the ASIST Annual Meeting
container_volume 60
creator Cheng, Yi‐Yun
Parulian, Nikolaus Nova
Dinh, Ly
description We propose a bottom‐up, data‐driven pipeline to uncover the structure of biodiversity subject metadata using a combination of text mining approaches. In this study, we analyze 721,035 subject terms in the Biodiversity Heritage Library (BHL). We utilize named entity recognition and word‐embedding methods to systematically label and group terms based on their vector‐space distances. The results show that the subject terms from BHL are clustered into several prominent themes relating to environmental regulations, geographic locations, organisms, and subject access points. We hope that our approach can serve as a first step to group similar subject terms together in large‐scale, constant growing digital collections with aggregated metadata from multiple sources. Ultimately, we hope the next phases of this project can become a basis for biodiversity digital libraries to standardize their vocabularies.
doi_str_mv 10.1002/pra2.900
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2879746657</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2879746657</sourcerecordid><originalsourceid>FETCH-LOGICAL-c990-275e5c46639401370f6dbb66ac78b9e3a4c50303cb0d56869313312f128c9a283</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EElWpxE-wxMKScrYTOx5LBRSpiKFhjhzHaV1BHGwH0X-PSxmY7obvvbv3ELomMCcA9G7wis4lwBmaUCZYJikj5__2SzQLYQ8AJBecUzlBuwWuzHfEL7a3_RYvhsE7pXc4OvzWa_dlPI47gzfRjzqO3mDX4c3Y7I1OGhNVq6LCtv-F7q1rbVIEGw94ZbyNamvw2jZe-cMVuujUezCzvzlF1eNDtVxl69en5-VinWkpIaOiMIXOOWcyB8IEdLxtGs6VFmUjDVO5LoAB0w20BS-5ZIQxQjtCSy0VLdkU3ZxsU47P0YRY793o-3SxpqWQIlkXIlG3J0p7F4I3XT14-5G-rAnUxybrY5N1apL9AK3pZK8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2879746657</pqid></control><display><type>article</type><title>A Text Mining Approach to Uncover the Structure of Subject Metadata in the Biodiversity Heritage Library</title><source>Alma/SFX Local Collection</source><creator>Cheng, Yi‐Yun ; Parulian, Nikolaus Nova ; Dinh, Ly</creator><creatorcontrib>Cheng, Yi‐Yun ; Parulian, Nikolaus Nova ; Dinh, Ly</creatorcontrib><description>We propose a bottom‐up, data‐driven pipeline to uncover the structure of biodiversity subject metadata using a combination of text mining approaches. In this study, we analyze 721,035 subject terms in the Biodiversity Heritage Library (BHL). We utilize named entity recognition and word‐embedding methods to systematically label and group terms based on their vector‐space distances. The results show that the subject terms from BHL are clustered into several prominent themes relating to environmental regulations, geographic locations, organisms, and subject access points. We hope that our approach can serve as a first step to group similar subject terms together in large‐scale, constant growing digital collections with aggregated metadata from multiple sources. Ultimately, we hope the next phases of this project can become a basis for biodiversity digital libraries to standardize their vocabularies.</description><identifier>ISSN: 2373-9231</identifier><identifier>EISSN: 2373-9231</identifier><identifier>EISSN: 1550-8390</identifier><identifier>DOI: 10.1002/pra2.900</identifier><language>eng</language><publisher>Silver Spring: Wiley Subscription Services, Inc</publisher><subject>Biodiversity ; Data mining ; Geographical locations ; Libraries ; Metadata</subject><ispartof>Proceedings of the ASIST Annual Meeting, 2023-10, Vol.60 (1), p.926-928</ispartof><rights>2023 ASIS&amp;T</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c990-275e5c46639401370f6dbb66ac78b9e3a4c50303cb0d56869313312f128c9a283</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,27929,27930</link.rule.ids></links><search><creatorcontrib>Cheng, Yi‐Yun</creatorcontrib><creatorcontrib>Parulian, Nikolaus Nova</creatorcontrib><creatorcontrib>Dinh, Ly</creatorcontrib><title>A Text Mining Approach to Uncover the Structure of Subject Metadata in the Biodiversity Heritage Library</title><title>Proceedings of the ASIST Annual Meeting</title><description>We propose a bottom‐up, data‐driven pipeline to uncover the structure of biodiversity subject metadata using a combination of text mining approaches. In this study, we analyze 721,035 subject terms in the Biodiversity Heritage Library (BHL). We utilize named entity recognition and word‐embedding methods to systematically label and group terms based on their vector‐space distances. The results show that the subject terms from BHL are clustered into several prominent themes relating to environmental regulations, geographic locations, organisms, and subject access points. We hope that our approach can serve as a first step to group similar subject terms together in large‐scale, constant growing digital collections with aggregated metadata from multiple sources. Ultimately, we hope the next phases of this project can become a basis for biodiversity digital libraries to standardize their vocabularies.</description><subject>Biodiversity</subject><subject>Data mining</subject><subject>Geographical locations</subject><subject>Libraries</subject><subject>Metadata</subject><issn>2373-9231</issn><issn>2373-9231</issn><issn>1550-8390</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNkDFPwzAQhS0EElWpxE-wxMKScrYTOx5LBRSpiKFhjhzHaV1BHGwH0X-PSxmY7obvvbv3ELomMCcA9G7wis4lwBmaUCZYJikj5__2SzQLYQ8AJBecUzlBuwWuzHfEL7a3_RYvhsE7pXc4OvzWa_dlPI47gzfRjzqO3mDX4c3Y7I1OGhNVq6LCtv-F7q1rbVIEGw94ZbyNamvw2jZe-cMVuujUezCzvzlF1eNDtVxl69en5-VinWkpIaOiMIXOOWcyB8IEdLxtGs6VFmUjDVO5LoAB0w20BS-5ZIQxQjtCSy0VLdkU3ZxsU47P0YRY793o-3SxpqWQIlkXIlG3J0p7F4I3XT14-5G-rAnUxybrY5N1apL9AK3pZK8</recordid><startdate>202310</startdate><enddate>202310</enddate><creator>Cheng, Yi‐Yun</creator><creator>Parulian, Nikolaus Nova</creator><creator>Dinh, Ly</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>202310</creationdate><title>A Text Mining Approach to Uncover the Structure of Subject Metadata in the Biodiversity Heritage Library</title><author>Cheng, Yi‐Yun ; Parulian, Nikolaus Nova ; Dinh, Ly</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c990-275e5c46639401370f6dbb66ac78b9e3a4c50303cb0d56869313312f128c9a283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biodiversity</topic><topic>Data mining</topic><topic>Geographical locations</topic><topic>Libraries</topic><topic>Metadata</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Yi‐Yun</creatorcontrib><creatorcontrib>Parulian, Nikolaus Nova</creatorcontrib><creatorcontrib>Dinh, Ly</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Proceedings of the ASIST Annual Meeting</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cheng, Yi‐Yun</au><au>Parulian, Nikolaus Nova</au><au>Dinh, Ly</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Text Mining Approach to Uncover the Structure of Subject Metadata in the Biodiversity Heritage Library</atitle><jtitle>Proceedings of the ASIST Annual Meeting</jtitle><date>2023-10</date><risdate>2023</risdate><volume>60</volume><issue>1</issue><spage>926</spage><epage>928</epage><pages>926-928</pages><issn>2373-9231</issn><eissn>2373-9231</eissn><eissn>1550-8390</eissn><abstract>We propose a bottom‐up, data‐driven pipeline to uncover the structure of biodiversity subject metadata using a combination of text mining approaches. In this study, we analyze 721,035 subject terms in the Biodiversity Heritage Library (BHL). We utilize named entity recognition and word‐embedding methods to systematically label and group terms based on their vector‐space distances. The results show that the subject terms from BHL are clustered into several prominent themes relating to environmental regulations, geographic locations, organisms, and subject access points. We hope that our approach can serve as a first step to group similar subject terms together in large‐scale, constant growing digital collections with aggregated metadata from multiple sources. Ultimately, we hope the next phases of this project can become a basis for biodiversity digital libraries to standardize their vocabularies.</abstract><cop>Silver Spring</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/pra2.900</doi><tpages>3</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2373-9231
ispartof Proceedings of the ASIST Annual Meeting, 2023-10, Vol.60 (1), p.926-928
issn 2373-9231
2373-9231
1550-8390
language eng
recordid cdi_proquest_journals_2879746657
source Alma/SFX Local Collection
subjects Biodiversity
Data mining
Geographical locations
Libraries
Metadata
title A Text Mining Approach to Uncover the Structure of Subject Metadata in the Biodiversity Heritage Library
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T05%3A41%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Text%20Mining%20Approach%20to%20Uncover%20the%20Structure%20of%20Subject%20Metadata%20in%20the%20Biodiversity%20Heritage%20Library&rft.jtitle=Proceedings%20of%20the%20ASIST%20Annual%20Meeting&rft.au=Cheng,%20Yi%E2%80%90Yun&rft.date=2023-10&rft.volume=60&rft.issue=1&rft.spage=926&rft.epage=928&rft.pages=926-928&rft.issn=2373-9231&rft.eissn=2373-9231&rft_id=info:doi/10.1002/pra2.900&rft_dat=%3Cproquest_cross%3E2879746657%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2879746657&rft_id=info:pmid/&rfr_iscdi=true