The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context

Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Biodiversity Information Science and Standards 2023-09, Vol.7
Hauptverfasser:	Hall, Kathryn, Andrews, Matt, Connolly, Keeva, Kankanamge, Yasima, Mangion, Christopher, Mok, Winnie, Nauheimer, Lars, Sterjov, Goran, Ward, Nigel, Brenton, Peter
Format:	Artikel
Sprache:	eng
Schlagworte:	Biotechnology Crop improvement Datasets Evolutionary conservation Genomes Genomics Phylogeny Provenance Researchers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	Biodiversity Information Science and Standards
container_volume	7
creator	Hall, Kathryn Andrews, Matt Connolly, Keeva Kankanamge, Yasima Mangion, Christopher Mok, Winnie Nauheimer, Lars Sterjov, Goran Ward, Nigel Brenton, Peter
description	Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge
doi_str_mv	10.3897/biss.7.112129
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2861714389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2861714389</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1059-9b370483000a12760118115e5865385899c5996f35df7871660b9fbdd02020533</originalsourceid><addsrcrecordid>eNpNkM1LAzEQxYMoWGqP3gNeFNyaj2aTeFtKW4WCUOo5ZLPZNqXN1mRX9Ow_btZ6KHOYB_PmzfAD4BajMRWSP5UuxjEfY0wwkRdgQBhlGUqTyzN9DUYx7hBCRBIicjEAP-uthUUX26D3Tnu4srUN1hsLF9Y3hzRr9zrC-2K1KB6e4dz5yvnNI4xbHZKA2lcw2C72-ixm0y87E2GlWw2dTz7YGNOFv-ysCu7Temga39qv9gZc1Xof7ei_D8H7fLaevmTLt8XrtFhmBiMmM1lSjiaCpvc1JjxHGAuMmWUiZ1QwIaVhUuY1ZVXNBcd5jkpZl1WFSCpG6RDcnXKPofnobGzVrumCTydVgoE5niSSyZWdXCY0MQZbq2NwBx2-FUaqR6161IqrE2r6CwXXcBI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861714389</pqid></control><display><type>article</type><title>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</title><source>Pensoft Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Hall, Kathryn ; Andrews, Matt ; Connolly, Keeva ; Kankanamge, Yasima ; Mangion, Christopher ; Mok, Winnie ; Nauheimer, Lars ; Sterjov, Goran ; Ward, Nigel ; Brenton, Peter</creator><creatorcontrib>Hall, Kathryn ; Andrews, Matt ; Connolly, Keeva ; Kankanamge, Yasima ; Mangion, Christopher ; Mok, Winnie ; Nauheimer, Lars ; Sterjov, Goran ; Ward, Nigel ; Brenton, Peter</creatorcontrib><description>Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to: locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions. locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions.</description><identifier>ISSN: 2535-0897</identifier><identifier>EISSN: 2535-0897</identifier><identifier>DOI: 10.3897/biss.7.112129</identifier><language>eng</language><publisher>Sofia: Pensoft Publishers</publisher><subject>Biotechnology ; Crop improvement ; Datasets ; Evolutionary conservation ; Genomes ; Genomics ; Phylogeny ; Provenance ; Researchers</subject><ispartof>Biodiversity Information Science and Standards, 2023-09, Vol.7</ispartof><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8785-4513</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Hall, Kathryn</creatorcontrib><creatorcontrib>Andrews, Matt</creatorcontrib><creatorcontrib>Connolly, Keeva</creatorcontrib><creatorcontrib>Kankanamge, Yasima</creatorcontrib><creatorcontrib>Mangion, Christopher</creatorcontrib><creatorcontrib>Mok, Winnie</creatorcontrib><creatorcontrib>Nauheimer, Lars</creatorcontrib><creatorcontrib>Sterjov, Goran</creatorcontrib><creatorcontrib>Ward, Nigel</creatorcontrib><creatorcontrib>Brenton, Peter</creatorcontrib><title>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</title><title>Biodiversity Information Science and Standards</title><description>Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to: locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions. locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions.</description><subject>Biotechnology</subject><subject>Crop improvement</subject><subject>Datasets</subject><subject>Evolutionary conservation</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Phylogeny</subject><subject>Provenance</subject><subject>Researchers</subject><issn>2535-0897</issn><issn>2535-0897</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkM1LAzEQxYMoWGqP3gNeFNyaj2aTeFtKW4WCUOo5ZLPZNqXN1mRX9Ow_btZ6KHOYB_PmzfAD4BajMRWSP5UuxjEfY0wwkRdgQBhlGUqTyzN9DUYx7hBCRBIicjEAP-uthUUX26D3Tnu4srUN1hsLF9Y3hzRr9zrC-2K1KB6e4dz5yvnNI4xbHZKA2lcw2C72-ixm0y87E2GlWw2dTz7YGNOFv-ysCu7Temga39qv9gZc1Xof7ei_D8H7fLaevmTLt8XrtFhmBiMmM1lSjiaCpvc1JjxHGAuMmWUiZ1QwIaVhUuY1ZVXNBcd5jkpZl1WFSCpG6RDcnXKPofnobGzVrumCTydVgoE5niSSyZWdXCY0MQZbq2NwBx2-FUaqR6161IqrE2r6CwXXcBI</recordid><startdate>20230906</startdate><enddate>20230906</enddate><creator>Hall, Kathryn</creator><creator>Andrews, Matt</creator><creator>Connolly, Keeva</creator><creator>Kankanamge, Yasima</creator><creator>Mangion, Christopher</creator><creator>Mok, Winnie</creator><creator>Nauheimer, Lars</creator><creator>Sterjov, Goran</creator><creator>Ward, Nigel</creator><creator>Brenton, Peter</creator><general>Pensoft Publishers</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FH</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-8785-4513</orcidid></search><sort><creationdate>20230906</creationdate><title>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</title><author>Hall, Kathryn ; Andrews, Matt ; Connolly, Keeva ; Kankanamge, Yasima ; Mangion, Christopher ; Mok, Winnie ; Nauheimer, Lars ; Sterjov, Goran ; Ward, Nigel ; Brenton, Peter</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1059-9b370483000a12760118115e5865385899c5996f35df7871660b9fbdd02020533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biotechnology</topic><topic>Crop improvement</topic><topic>Datasets</topic><topic>Evolutionary conservation</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Phylogeny</topic><topic>Provenance</topic><topic>Researchers</topic><toplevel>online_resources</toplevel><creatorcontrib>Hall, Kathryn</creatorcontrib><creatorcontrib>Andrews, Matt</creatorcontrib><creatorcontrib>Connolly, Keeva</creatorcontrib><creatorcontrib>Kankanamge, Yasima</creatorcontrib><creatorcontrib>Mangion, Christopher</creatorcontrib><creatorcontrib>Mok, Winnie</creatorcontrib><creatorcontrib>Nauheimer, Lars</creatorcontrib><creatorcontrib>Sterjov, Goran</creatorcontrib><creatorcontrib>Ward, Nigel</creatorcontrib><creatorcontrib>Brenton, Peter</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Biodiversity Information Science and Standards</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hall, Kathryn</au><au>Andrews, Matt</au><au>Connolly, Keeva</au><au>Kankanamge, Yasima</au><au>Mangion, Christopher</au><au>Mok, Winnie</au><au>Nauheimer, Lars</au><au>Sterjov, Goran</au><au>Ward, Nigel</au><au>Brenton, Peter</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</atitle><jtitle>Biodiversity Information Science and Standards</jtitle><date>2023-09-06</date><risdate>2023</risdate><volume>7</volume><issn>2535-0897</issn><eissn>2535-0897</eissn><abstract>Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to: locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions. locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions.</abstract><cop>Sofia</cop><pub>Pensoft Publishers</pub><doi>10.3897/biss.7.112129</doi><orcidid>https://orcid.org/0000-0002-8785-4513</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2535-0897
ispartof	Biodiversity Information Science and Standards, 2023-09, Vol.7
issn	2535-0897 2535-0897
language	eng
recordid	cdi_proquest_journals_2861714389
source	Pensoft Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Biotechnology Crop improvement Datasets Evolutionary conservation Genomes Genomics Phylogeny Provenance Researchers
title	The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T03%3A07%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Australian%20Reference%20Genome%20Atlas%20(ARGA):%20Finding,%20sharing%20and%20reusing%20Australian%20genomics%20data%20in%20an%20occurrence-driven%20context&rft.jtitle=Biodiversity%20Information%20Science%20and%20Standards&rft.au=Hall,%20Kathryn&rft.date=2023-09-06&rft.volume=7&rft.issn=2535-0897&rft.eissn=2535-0897&rft_id=info:doi/10.3897/biss.7.112129&rft_dat=%3Cproquest_cross%3E2861714389%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861714389&rft_id=info:pmid/&rfr_iscdi=true