The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context
Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single p...
Gespeichert in:
Veröffentlicht in: | Biodiversity Information Science and Standards 2023-09, Vol.7 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | Biodiversity Information Science and Standards |
container_volume | 7 |
creator | Hall, Kathryn Andrews, Matt Connolly, Keeva Kankanamge, Yasima Mangion, Christopher Mok, Winnie Nauheimer, Lars Sterjov, Goran Ward, Nigel Brenton, Peter |
description | Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia.
We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge |
doi_str_mv | 10.3897/biss.7.112129 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2861714389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2861714389</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1059-9b370483000a12760118115e5865385899c5996f35df7871660b9fbdd02020533</originalsourceid><addsrcrecordid>eNpNkM1LAzEQxYMoWGqP3gNeFNyaj2aTeFtKW4WCUOo5ZLPZNqXN1mRX9Ow_btZ6KHOYB_PmzfAD4BajMRWSP5UuxjEfY0wwkRdgQBhlGUqTyzN9DUYx7hBCRBIicjEAP-uthUUX26D3Tnu4srUN1hsLF9Y3hzRr9zrC-2K1KB6e4dz5yvnNI4xbHZKA2lcw2C72-ixm0y87E2GlWw2dTz7YGNOFv-ysCu7Temga39qv9gZc1Xof7ei_D8H7fLaevmTLt8XrtFhmBiMmM1lSjiaCpvc1JjxHGAuMmWUiZ1QwIaVhUuY1ZVXNBcd5jkpZl1WFSCpG6RDcnXKPofnobGzVrumCTydVgoE5niSSyZWdXCY0MQZbq2NwBx2-FUaqR6161IqrE2r6CwXXcBI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861714389</pqid></control><display><type>article</type><title>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</title><source>Pensoft Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Hall, Kathryn ; Andrews, Matt ; Connolly, Keeva ; Kankanamge, Yasima ; Mangion, Christopher ; Mok, Winnie ; Nauheimer, Lars ; Sterjov, Goran ; Ward, Nigel ; Brenton, Peter</creator><creatorcontrib>Hall, Kathryn ; Andrews, Matt ; Connolly, Keeva ; Kankanamge, Yasima ; Mangion, Christopher ; Mok, Winnie ; Nauheimer, Lars ; Sterjov, Goran ; Ward, Nigel ; Brenton, Peter</creatorcontrib><description>Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia.
We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to:
locate genomics data for taxa of interest;
explore data within an ecological context; and
calculate metrics for data availability for provincial bioregions.
locate genomics data for taxa of interest;
explore data within an ecological context; and
calculate metrics for data availability for provincial bioregions.</description><identifier>ISSN: 2535-0897</identifier><identifier>EISSN: 2535-0897</identifier><identifier>DOI: 10.3897/biss.7.112129</identifier><language>eng</language><publisher>Sofia: Pensoft Publishers</publisher><subject>Biotechnology ; Crop improvement ; Datasets ; Evolutionary conservation ; Genomes ; Genomics ; Phylogeny ; Provenance ; Researchers</subject><ispartof>Biodiversity Information Science and Standards, 2023-09, Vol.7</ispartof><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8785-4513</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Hall, Kathryn</creatorcontrib><creatorcontrib>Andrews, Matt</creatorcontrib><creatorcontrib>Connolly, Keeva</creatorcontrib><creatorcontrib>Kankanamge, Yasima</creatorcontrib><creatorcontrib>Mangion, Christopher</creatorcontrib><creatorcontrib>Mok, Winnie</creatorcontrib><creatorcontrib>Nauheimer, Lars</creatorcontrib><creatorcontrib>Sterjov, Goran</creatorcontrib><creatorcontrib>Ward, Nigel</creatorcontrib><creatorcontrib>Brenton, Peter</creatorcontrib><title>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</title><title>Biodiversity Information Science and Standards</title><description>Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia.
We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to:
locate genomics data for taxa of interest;
explore data within an ecological context; and
calculate metrics for data availability for provincial bioregions.
locate genomics data for taxa of interest;
explore data within an ecological context; and
calculate metrics for data availability for provincial bioregions.</description><subject>Biotechnology</subject><subject>Crop improvement</subject><subject>Datasets</subject><subject>Evolutionary conservation</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Phylogeny</subject><subject>Provenance</subject><subject>Researchers</subject><issn>2535-0897</issn><issn>2535-0897</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkM1LAzEQxYMoWGqP3gNeFNyaj2aTeFtKW4WCUOo5ZLPZNqXN1mRX9Ow_btZ6KHOYB_PmzfAD4BajMRWSP5UuxjEfY0wwkRdgQBhlGUqTyzN9DUYx7hBCRBIicjEAP-uthUUX26D3Tnu4srUN1hsLF9Y3hzRr9zrC-2K1KB6e4dz5yvnNI4xbHZKA2lcw2C72-ixm0y87E2GlWw2dTz7YGNOFv-ysCu7Temga39qv9gZc1Xof7ei_D8H7fLaevmTLt8XrtFhmBiMmM1lSjiaCpvc1JjxHGAuMmWUiZ1QwIaVhUuY1ZVXNBcd5jkpZl1WFSCpG6RDcnXKPofnobGzVrumCTydVgoE5niSSyZWdXCY0MQZbq2NwBx2-FUaqR6161IqrE2r6CwXXcBI</recordid><startdate>20230906</startdate><enddate>20230906</enddate><creator>Hall, Kathryn</creator><creator>Andrews, Matt</creator><creator>Connolly, Keeva</creator><creator>Kankanamge, Yasima</creator><creator>Mangion, Christopher</creator><creator>Mok, Winnie</creator><creator>Nauheimer, Lars</creator><creator>Sterjov, Goran</creator><creator>Ward, Nigel</creator><creator>Brenton, Peter</creator><general>Pensoft Publishers</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FH</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-8785-4513</orcidid></search><sort><creationdate>20230906</creationdate><title>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</title><author>Hall, Kathryn ; Andrews, Matt ; Connolly, Keeva ; Kankanamge, Yasima ; Mangion, Christopher ; Mok, Winnie ; Nauheimer, Lars ; Sterjov, Goran ; Ward, Nigel ; Brenton, Peter</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1059-9b370483000a12760118115e5865385899c5996f35df7871660b9fbdd02020533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biotechnology</topic><topic>Crop improvement</topic><topic>Datasets</topic><topic>Evolutionary conservation</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Phylogeny</topic><topic>Provenance</topic><topic>Researchers</topic><toplevel>online_resources</toplevel><creatorcontrib>Hall, Kathryn</creatorcontrib><creatorcontrib>Andrews, Matt</creatorcontrib><creatorcontrib>Connolly, Keeva</creatorcontrib><creatorcontrib>Kankanamge, Yasima</creatorcontrib><creatorcontrib>Mangion, Christopher</creatorcontrib><creatorcontrib>Mok, Winnie</creatorcontrib><creatorcontrib>Nauheimer, Lars</creatorcontrib><creatorcontrib>Sterjov, Goran</creatorcontrib><creatorcontrib>Ward, Nigel</creatorcontrib><creatorcontrib>Brenton, Peter</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Biodiversity Information Science and Standards</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hall, Kathryn</au><au>Andrews, Matt</au><au>Connolly, Keeva</au><au>Kankanamge, Yasima</au><au>Mangion, Christopher</au><au>Mok, Winnie</au><au>Nauheimer, Lars</au><au>Sterjov, Goran</au><au>Ward, Nigel</au><au>Brenton, Peter</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context</atitle><jtitle>Biodiversity Information Science and Standards</jtitle><date>2023-09-06</date><risdate>2023</risdate><volume>7</volume><issn>2535-0897</issn><eissn>2535-0897</eissn><abstract>Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia.
We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to:
locate genomics data for taxa of interest;
explore data within an ecological context; and
calculate metrics for data availability for provincial bioregions.
locate genomics data for taxa of interest;
explore data within an ecological context; and
calculate metrics for data availability for provincial bioregions.</abstract><cop>Sofia</cop><pub>Pensoft Publishers</pub><doi>10.3897/biss.7.112129</doi><orcidid>https://orcid.org/0000-0002-8785-4513</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2535-0897 |
ispartof | Biodiversity Information Science and Standards, 2023-09, Vol.7 |
issn | 2535-0897 2535-0897 |
language | eng |
recordid | cdi_proquest_journals_2861714389 |
source | Pensoft Open Access Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Biotechnology Crop improvement Datasets Evolutionary conservation Genomes Genomics Phylogeny Provenance Researchers |
title | The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T03%3A07%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Australian%20Reference%20Genome%20Atlas%20(ARGA):%20Finding,%20sharing%20and%20reusing%20Australian%20genomics%20data%20in%20an%20occurrence-driven%20context&rft.jtitle=Biodiversity%20Information%20Science%20and%20Standards&rft.au=Hall,%20Kathryn&rft.date=2023-09-06&rft.volume=7&rft.issn=2535-0897&rft.eissn=2535-0897&rft_id=info:doi/10.3897/biss.7.112129&rft_dat=%3Cproquest_cross%3E2861714389%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861714389&rft_id=info:pmid/&rfr_iscdi=true |