Cannabis Pangenome Annotation Data
AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing culti...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Lynch, Ryan Padgitt-Cobb, Lillian Garfinkel, Andrea R. Knaus, Brian Hartwick, Nolan Allsing, Nicholas Aylward, Anthony Mamerto, Allen Kitony, Justine Kipruto Colt, Kelly Murray, Emily Duong, Tiffany Trippe, Aaron Crawford, Seth Vining, Kelly Michael, Todd |
description | AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Transposable element analysisTo identify transposable elements, we used the EDTA pip |
doi_str_mv | 10.25452/figshare.plus.25909024 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_25452_figshare_plus_25909024</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_25452_figshare_plus_25909024</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_25452_figshare_plus_259090243</originalsourceid><addsrcrecordid>eNpjYJA3NNAzMjUxNdJPy0wvzkgsStUryCktBopZGlgaGJlwMig5J-blJSZlFisEJOalp-bl56YqOObl5ZcklmTm5ym4JJYk8jCwpiXmFKfyQmluBnM31xBnD90UoGRyZklqfEFRZm5iUWW8oUE82LZ4mG3xINviYbYZk68TAHy6Pyo</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Cannabis Pangenome Annotation Data</title><source>DataCite</source><creator>Lynch, Ryan ; Padgitt-Cobb, Lillian ; Garfinkel, Andrea R. ; Knaus, Brian ; Hartwick, Nolan ; Allsing, Nicholas ; Aylward, Anthony ; Mamerto, Allen ; Kitony, Justine Kipruto ; Colt, Kelly ; Murray, Emily ; Duong, Tiffany ; Trippe, Aaron ; Crawford, Seth ; Vining, Kelly ; Michael, Todd</creator><creatorcontrib>Lynch, Ryan ; Padgitt-Cobb, Lillian ; Garfinkel, Andrea R. ; Knaus, Brian ; Hartwick, Nolan ; Allsing, Nicholas ; Aylward, Anthony ; Mamerto, Allen ; Kitony, Justine Kipruto ; Colt, Kelly ; Murray, Emily ; Duong, Tiffany ; Trippe, Aaron ; Crawford, Seth ; Vining, Kelly ; Michael, Todd</creatorcontrib><description>AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Transposable element analysisTo identify transposable elements, we used the EDTA pipeline with default settings. EDTAOutput.tar.gz includes EDTA transposon annotations for 78 scaffolded, chromosome-level cannabis genomes.Structural Variation analysis The 78 fully scaffolded assembly haplotypes were each aligned to the EH23a assembly using minimap2 (Heng Li 2018). Syri was then used to call structural variations on each alignment (Goel et al. 2019) and plotsr was used to visualize alignments and SVs (Goel and Schneeberger 2022). DUP_query_coord.bed.tar.gz includes duplications for 78 assemblies with EH23a as referenceINVTR_query_coord.bed.tar.gz includes inverted translocations for 78 assemblies with EH23a as referenceINVs_query_coord.bed.tar.gz includes inversions for 78 assemblies with EH23a as referenceTRANS_query_coord.bed.tar.gz includes translocations for 78 assemblies with EH23a as referencecsat_orientations.tsv is a scaffold orientation file for 78 assemblies with EH23a as reference</description><identifier>DOI: 10.25452/figshare.plus.25909024</identifier><language>eng</language><publisher>Figshare</publisher><subject>Genomics and transcriptomics ; Horticultural crop improvement (incl. selection and breeding) ; Plant cell and molecular biology</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3157-4990 ; 0000-0001-6272-2875 ; 0000-0003-1665-4343 ; 0000-0001-7916-7791 ; 0000-0003-1379-0614 ; 0009-0003-8297-4393 ; 0000-0002-1145-1646 ; 0000-0001-8913-4678 ; 0009-0004-3734-519X ; 0009-0004-9349-4325 ; 0000-0003-0355-9005 ; 0000-0003-2032-6078 ; 0000-0003-3524-856X ; 0009-0000-6061-4977</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1893</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.25452/figshare.plus.25909024$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Lynch, Ryan</creatorcontrib><creatorcontrib>Padgitt-Cobb, Lillian</creatorcontrib><creatorcontrib>Garfinkel, Andrea R.</creatorcontrib><creatorcontrib>Knaus, Brian</creatorcontrib><creatorcontrib>Hartwick, Nolan</creatorcontrib><creatorcontrib>Allsing, Nicholas</creatorcontrib><creatorcontrib>Aylward, Anthony</creatorcontrib><creatorcontrib>Mamerto, Allen</creatorcontrib><creatorcontrib>Kitony, Justine Kipruto</creatorcontrib><creatorcontrib>Colt, Kelly</creatorcontrib><creatorcontrib>Murray, Emily</creatorcontrib><creatorcontrib>Duong, Tiffany</creatorcontrib><creatorcontrib>Trippe, Aaron</creatorcontrib><creatorcontrib>Crawford, Seth</creatorcontrib><creatorcontrib>Vining, Kelly</creatorcontrib><creatorcontrib>Michael, Todd</creatorcontrib><title>Cannabis Pangenome Annotation Data</title><description>AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Transposable element analysisTo identify transposable elements, we used the EDTA pipeline with default settings. EDTAOutput.tar.gz includes EDTA transposon annotations for 78 scaffolded, chromosome-level cannabis genomes.Structural Variation analysis The 78 fully scaffolded assembly haplotypes were each aligned to the EH23a assembly using minimap2 (Heng Li 2018). Syri was then used to call structural variations on each alignment (Goel et al. 2019) and plotsr was used to visualize alignments and SVs (Goel and Schneeberger 2022). DUP_query_coord.bed.tar.gz includes duplications for 78 assemblies with EH23a as referenceINVTR_query_coord.bed.tar.gz includes inverted translocations for 78 assemblies with EH23a as referenceINVs_query_coord.bed.tar.gz includes inversions for 78 assemblies with EH23a as referenceTRANS_query_coord.bed.tar.gz includes translocations for 78 assemblies with EH23a as referencecsat_orientations.tsv is a scaffold orientation file for 78 assemblies with EH23a as reference</description><subject>Genomics and transcriptomics</subject><subject>Horticultural crop improvement (incl. selection and breeding)</subject><subject>Plant cell and molecular biology</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNpjYJA3NNAzMjUxNdJPy0wvzkgsStUryCktBopZGlgaGJlwMig5J-blJSZlFisEJOalp-bl56YqOObl5ZcklmTm5ym4JJYk8jCwpiXmFKfyQmluBnM31xBnD90UoGRyZklqfEFRZm5iUWW8oUE82LZ4mG3xINviYbYZk68TAHy6Pyo</recordid><startdate>20240530</startdate><enddate>20240530</enddate><creator>Lynch, Ryan</creator><creator>Padgitt-Cobb, Lillian</creator><creator>Garfinkel, Andrea R.</creator><creator>Knaus, Brian</creator><creator>Hartwick, Nolan</creator><creator>Allsing, Nicholas</creator><creator>Aylward, Anthony</creator><creator>Mamerto, Allen</creator><creator>Kitony, Justine Kipruto</creator><creator>Colt, Kelly</creator><creator>Murray, Emily</creator><creator>Duong, Tiffany</creator><creator>Trippe, Aaron</creator><creator>Crawford, Seth</creator><creator>Vining, Kelly</creator><creator>Michael, Todd</creator><general>Figshare</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0003-3157-4990</orcidid><orcidid>https://orcid.org/0000-0001-6272-2875</orcidid><orcidid>https://orcid.org/0000-0003-1665-4343</orcidid><orcidid>https://orcid.org/0000-0001-7916-7791</orcidid><orcidid>https://orcid.org/0000-0003-1379-0614</orcidid><orcidid>https://orcid.org/0009-0003-8297-4393</orcidid><orcidid>https://orcid.org/0000-0002-1145-1646</orcidid><orcidid>https://orcid.org/0000-0001-8913-4678</orcidid><orcidid>https://orcid.org/0009-0004-3734-519X</orcidid><orcidid>https://orcid.org/0009-0004-9349-4325</orcidid><orcidid>https://orcid.org/0000-0003-0355-9005</orcidid><orcidid>https://orcid.org/0000-0003-2032-6078</orcidid><orcidid>https://orcid.org/0000-0003-3524-856X</orcidid><orcidid>https://orcid.org/0009-0000-6061-4977</orcidid></search><sort><creationdate>20240530</creationdate><title>Cannabis Pangenome Annotation Data</title><author>Lynch, Ryan ; Padgitt-Cobb, Lillian ; Garfinkel, Andrea R. ; Knaus, Brian ; Hartwick, Nolan ; Allsing, Nicholas ; Aylward, Anthony ; Mamerto, Allen ; Kitony, Justine Kipruto ; Colt, Kelly ; Murray, Emily ; Duong, Tiffany ; Trippe, Aaron ; Crawford, Seth ; Vining, Kelly ; Michael, Todd</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_25452_figshare_plus_259090243</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Genomics and transcriptomics</topic><topic>Horticultural crop improvement (incl. selection and breeding)</topic><topic>Plant cell and molecular biology</topic><toplevel>online_resources</toplevel><creatorcontrib>Lynch, Ryan</creatorcontrib><creatorcontrib>Padgitt-Cobb, Lillian</creatorcontrib><creatorcontrib>Garfinkel, Andrea R.</creatorcontrib><creatorcontrib>Knaus, Brian</creatorcontrib><creatorcontrib>Hartwick, Nolan</creatorcontrib><creatorcontrib>Allsing, Nicholas</creatorcontrib><creatorcontrib>Aylward, Anthony</creatorcontrib><creatorcontrib>Mamerto, Allen</creatorcontrib><creatorcontrib>Kitony, Justine Kipruto</creatorcontrib><creatorcontrib>Colt, Kelly</creatorcontrib><creatorcontrib>Murray, Emily</creatorcontrib><creatorcontrib>Duong, Tiffany</creatorcontrib><creatorcontrib>Trippe, Aaron</creatorcontrib><creatorcontrib>Crawford, Seth</creatorcontrib><creatorcontrib>Vining, Kelly</creatorcontrib><creatorcontrib>Michael, Todd</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lynch, Ryan</au><au>Padgitt-Cobb, Lillian</au><au>Garfinkel, Andrea R.</au><au>Knaus, Brian</au><au>Hartwick, Nolan</au><au>Allsing, Nicholas</au><au>Aylward, Anthony</au><au>Mamerto, Allen</au><au>Kitony, Justine Kipruto</au><au>Colt, Kelly</au><au>Murray, Emily</au><au>Duong, Tiffany</au><au>Trippe, Aaron</au><au>Crawford, Seth</au><au>Vining, Kelly</au><au>Michael, Todd</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Cannabis Pangenome Annotation Data</title><date>2024-05-30</date><risdate>2024</risdate><abstract>AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Transposable element analysisTo identify transposable elements, we used the EDTA pipeline with default settings. EDTAOutput.tar.gz includes EDTA transposon annotations for 78 scaffolded, chromosome-level cannabis genomes.Structural Variation analysis The 78 fully scaffolded assembly haplotypes were each aligned to the EH23a assembly using minimap2 (Heng Li 2018). Syri was then used to call structural variations on each alignment (Goel et al. 2019) and plotsr was used to visualize alignments and SVs (Goel and Schneeberger 2022). DUP_query_coord.bed.tar.gz includes duplications for 78 assemblies with EH23a as referenceINVTR_query_coord.bed.tar.gz includes inverted translocations for 78 assemblies with EH23a as referenceINVs_query_coord.bed.tar.gz includes inversions for 78 assemblies with EH23a as referenceTRANS_query_coord.bed.tar.gz includes translocations for 78 assemblies with EH23a as referencecsat_orientations.tsv is a scaffold orientation file for 78 assemblies with EH23a as reference</abstract><pub>Figshare</pub><doi>10.25452/figshare.plus.25909024</doi><orcidid>https://orcid.org/0000-0003-3157-4990</orcidid><orcidid>https://orcid.org/0000-0001-6272-2875</orcidid><orcidid>https://orcid.org/0000-0003-1665-4343</orcidid><orcidid>https://orcid.org/0000-0001-7916-7791</orcidid><orcidid>https://orcid.org/0000-0003-1379-0614</orcidid><orcidid>https://orcid.org/0009-0003-8297-4393</orcidid><orcidid>https://orcid.org/0000-0002-1145-1646</orcidid><orcidid>https://orcid.org/0000-0001-8913-4678</orcidid><orcidid>https://orcid.org/0009-0004-3734-519X</orcidid><orcidid>https://orcid.org/0009-0004-9349-4325</orcidid><orcidid>https://orcid.org/0000-0003-0355-9005</orcidid><orcidid>https://orcid.org/0000-0003-2032-6078</orcidid><orcidid>https://orcid.org/0000-0003-3524-856X</orcidid><orcidid>https://orcid.org/0009-0000-6061-4977</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.25452/figshare.plus.25909024 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_25452_figshare_plus_25909024 |
source | DataCite |
subjects | Genomics and transcriptomics Horticultural crop improvement (incl. selection and breeding) Plant cell and molecular biology |
title | Cannabis Pangenome Annotation Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T08%3A13%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Lynch,%20Ryan&rft.date=2024-05-30&rft_id=info:doi/10.25452/figshare.plus.25909024&rft_dat=%3Cdatacite_PQ8%3E10_25452_figshare_plus_25909024%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |