Tracking Down Chimeric Assemblies In The TrackIt DNA Ladder Using Nanopore Sequencing
Dataset Description These files represent two different LSK114 sequencing runs on a TrackIt 1kb Plus DNA Ladder sample, and associated data analysis. July 20 2023 Flongle Run (191 Mb; 465k reads) pod5_files_2023-Jul-20_DAE_DNA_Ladder.tar.gz- raw POD5 format files called_2023-Jul-20_DAE_DNA_Ladder_du...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Eccles, David Andrew |
description | Dataset Description
These files represent two different LSK114 sequencing runs on a TrackIt 1kb Plus DNA Ladder sample, and associated data analysis.
July 20 2023 Flongle Run (191 Mb; 465k reads)
pod5_files_2023-Jul-20_DAE_DNA_Ladder.tar.gz- raw POD5 format files
called_2023-Jul-20_DAE_DNA_Ladder_duplex.bam- duplex called reads, called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Jul-20_DAE_DNA_Ladder.pdf- sequence length / quality QC plots
LAST_2023-Jul-20_DAE_DNA_Ladder_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Jul-20_DAE_DNA_Ladder.txt- Length / QC summary statistics
October 12 2023 P2 Solo Run (1.95 Gb, 1.11M reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_fail.tar.gz- raw POD5 format files (all failed reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_000-059.tar.gz- raw POD5 format files (passed reads, bundle #000-059)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_060-119.tar.gz- raw POD5 format files (passed reads, bundle #060-119)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_120-179.tar.gz- raw POD5 format files (passed reads, bundle #120-179)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_180-222.tar.gz- raw POD5 format files (passed reads, bundle #180-222)
called_2023-Oct-12_DNA-Ladder-1kbplus_duplex.bam- duplex called reads [October 12, 2023], called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Oct-12_DNA-Ladder-1kbplus.pdf- sequence length / quality QC plots
LAST_2023-Oct-12_DNA-Ladder-1kbplus_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Oct-12_DNA-Ladder-1kbplus.txt- Length / QC summary statistics
ladder_seqs.fa- assembled DNA ladder sequences, based on simplex reads
Methods
Sample preparation
Preparation of DNA for sequencing was carried out following the ONT Ligation Sequencing DNA V14 (SQK-LSK114) protocol, with modifications to exclude DNA repair, and keeping the sample in the same 1.5ml tube to reduce sample loss.
Tris-buffered Saline (TBS) buffer preparation
1M stock of NaCl was made by adding 2.922g of NaCl into a 50 ml Falcon tube, then made up to 50 ml with MilliPore water
A 50 mM TBS stock was created by adding 750 μl 1M NaCl solution to a 15 ml Falcon tube, then made up to 15 ml using Qiagen Elution Buffer (EB, i.e. 10 mM Tris-HCl at pH 8.0)
The pH was confirmed to be 7.9-8 |
doi_str_mv | 10.5281/zenodo.10020101 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_10020101</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_10020101</sourcerecordid><originalsourceid>FETCH-LOGICAL-d811-4659930362febcd2ce181d3a8aea79814aa15b9d55317bddff316976f56940e33</originalsourceid><addsrcrecordid>eNo1jz1vgzAURb10qJLOXf0HSPwwBjwi0g8klA4lM3rYj8ZKMKmhqtpf36ZJpytd3XOlw9g9iJWKc1h_kx_tuAIhYgECbtmuCWgOzr_xzfjpebl3AwVneDFNNHRHRxOvPG_2xP-G1cw324LXaC0FvpvO4Bb9eBoD8Vd6_yBvfrslu-nxONHdNReseXxoyueofnmqyqKObA4QJanSWgqZxj11xsaGIAcrMUfCTOeQIILqtFVKQtZZ2_cSUp2lvUp1IkjKBVtfbi3OaNxM7Sm4AcNXC6I9-7YX3_bfV_4AqGZPHw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Tracking Down Chimeric Assemblies In The TrackIt DNA Ladder Using Nanopore Sequencing</title><source>DataCite</source><creator>Eccles, David Andrew</creator><creatorcontrib>Eccles, David Andrew</creatorcontrib><description>Dataset Description
These files represent two different LSK114 sequencing runs on a TrackIt 1kb Plus DNA Ladder sample, and associated data analysis.
July 20 2023 Flongle Run (191 Mb; 465k reads)
pod5_files_2023-Jul-20_DAE_DNA_Ladder.tar.gz- raw POD5 format files
called_2023-Jul-20_DAE_DNA_Ladder_duplex.bam- duplex called reads, called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Jul-20_DAE_DNA_Ladder.pdf- sequence length / quality QC plots
LAST_2023-Jul-20_DAE_DNA_Ladder_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Jul-20_DAE_DNA_Ladder.txt- Length / QC summary statistics
October 12 2023 P2 Solo Run (1.95 Gb, 1.11M reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_fail.tar.gz- raw POD5 format files (all failed reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_000-059.tar.gz- raw POD5 format files (passed reads, bundle #000-059)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_060-119.tar.gz- raw POD5 format files (passed reads, bundle #060-119)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_120-179.tar.gz- raw POD5 format files (passed reads, bundle #120-179)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_180-222.tar.gz- raw POD5 format files (passed reads, bundle #180-222)
called_2023-Oct-12_DNA-Ladder-1kbplus_duplex.bam- duplex called reads [October 12, 2023], called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Oct-12_DNA-Ladder-1kbplus.pdf- sequence length / quality QC plots
LAST_2023-Oct-12_DNA-Ladder-1kbplus_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Oct-12_DNA-Ladder-1kbplus.txt- Length / QC summary statistics
ladder_seqs.fa- assembled DNA ladder sequences, based on simplex reads
Methods
Sample preparation
Preparation of DNA for sequencing was carried out following the ONT Ligation Sequencing DNA V14 (SQK-LSK114) protocol, with modifications to exclude DNA repair, and keeping the sample in the same 1.5ml tube to reduce sample loss.
Tris-buffered Saline (TBS) buffer preparation
1M stock of NaCl was made by adding 2.922g of NaCl into a 50 ml Falcon tube, then made up to 50 ml with MilliPore water
A 50 mM TBS stock was created by adding 750 μl 1M NaCl solution to a 15 ml Falcon tube, then made up to 15 ml using Qiagen Elution Buffer (EB, i.e. 10 mM Tris-HCl at pH 8.0)
The pH was confirmed to be 7.9-8.1 using a pH indicator strip (e.g. MColorpHast 6.5 - 10.0; MER1095430001)
End prep
1 μg DNA ladder (i.e. 10 μl of 0.1 μg / μl DNA ladder) was transferred into a 1.5ml Eppendorf DNA LoBind tube
The volume was topped up to 43.5 μl with TBS (i.e. 33.5 μl TBS)
3.5 μl Ultra II End-prep Reaction Buffer and 3 μl Ultra II End-prep Enzyme Mix was added
After mixing by gentle pipetting, the mixture was incubated at RT for 5 minutes, then 65 \degrees for 5 minutes
Bead cleanup
The mixture was combined with 60 μl Ampure XP beads, and incubated on a rotator mixer at RT for 5 minutes
The tube was transferred to a magnetic rack [https://www.printables.com/model/532085-open-walled-magnetic-rack]
After the supernatant became clear and colourless, supernatant was pipetted off
The magnetic beads were washed twice with 150 μl of an 80% ethanol solution
The sample was dried briefly for 30s, then eluted in 60 μl TBS
Adapter ligation and final bead cleanup
To the sample tube was added 25 μl ONT Ligation buffer (LNB), 5μl NEBNext Quick T4 DNA Ligase (reduced from the protocol-suggested 10μl because that was all that was left in the tube), and 5μl ONT Ligation Adapter (LA)
The tube was mixed by gentle pipetting, spun down for 1-3s on a mini centrifuge, then incubated for 10 minutes at RT
The mixture was combined with 40 μl Ampure XP beads (100μl Ampure XP beads were used for the Flongle sample), and incubated on a rotator mixer at RT for 5 minutes
The tube was transferred to a magnetic rack [https://www.printables.com/model/532085-open-walled-magnetic-rack]
After the supernatant became clear and colourless, supernatant was pipetted off
The magnetic beads were washed twice with 250 μl of ONT Long Fragment Buffer (LFB) for the P2 Solo run, and 250μl ONT Short Fragment Buffer (SFB) for the Flongle run
The sample was dried briefly for 30s, then eluted for 10 minutes at 37 \degrees in 15 μl ONT Elution buffer (EB)
Addition of sequencing library buffers
A flow cell was prepared by flushing with ONT Flow Cell Flush (FCF) mixed with ONT Flow Cell Tether (FCT). For the P2 Solo, I used 500 μl of a 1170μl FCF solution that had 30μl FCT added to it; for the Flongle I used 60μl of a 117μl FCF solution that had 3 μl FCT added to it
1 μl of the eluted library was quantified on a Quantus Fluorometer, and approximately 50 fmol (assuming 1kb average length) was transferred to a new 1.5μl tube
For the P2 Solo run, the volume was topped up to 32 μl TBS; for the Flongle run, the volume was topped up to 12 μl TBS
To the sample tube was added ONT Sequencing Buffer (SB; P2 Solo - 100μl; Flongle - 30μl) and ONT Library Beads (LIB; P2 Solo - 68μl; Flongle - 20μl)
The flow cell was re-flushed with additional FCF/FCT mixture (500 μl for the P2 Solo; 30 μl for the Flongle)
The sequencing library was then added to the flow cell (200 μl for the P2 Solo; 30 μl for the Flongle)
The prepared flow cell was left for 10 minutes to allow the library to settle before starting sequencing
DNA Sequencing and basecalling
Sequencing was carried out using MinKNOW v23.04.6, sequencing in fast mode at 400 bases per second with a 20bp minimum sequence length and 5 kHz sampling rate, with reads output as POD5 files
The Flongle flow cell was run for a full standard run length (24h), whereas the PromethION flow cell was run for 1.5 hours (after which the counts of 15kb reads exceeded 200)
Sequenced reads were recalled in standard (simplex) mode using Dorado v0.4.0 and the 2023-09-22 bacterial methylation model [res_dna_r10.4.1_e8.2_400bps_sup@2023-09-22_bacterial-methylation]
Bioinformatics Analysis of Ladder Sequences
Sequence assembly
Assembly process for bands that are 3k in length and greater (done on LFB-depleted P2 Solo sequences):
Filter >q20 reads for a 100bp region around the target length (e.g. 4950-5050bp for the 5k band) [High quality reads were not sufficient for the 15kb band; all reads were needed]
Chop the reads up with a 1000bp overlap (e.g. 3000bp for the 5k band). This works around a Canu expectation that any read overlaps should be less than X% of the read.
Assemble the reads with Canu v2.2 [#REF], treating them as "pacbio" reads (for correction and homopolymer compression), with the GenomeSize parameter set to the expected band length (e.g. GenomeSize=5000).
Extract the first reported assembled contig.
Map the contig to the nanopore adapter sequences, and trim to exclude any matching sequence.
[Canu has a default genome size and read length cutoff of 1kb, and performs poorly on sequences shorter than this] Assembly process for bands under 3k in length (done on LFB-depleted P2 Solo sequences):
Filter >q20 reads for a 100bp region around the target length (e.g. 4950-5050bp for the 5k band) [High quality reads were not in sufficient abundance for the 100bp band; all reads were needed]
Assemble using a kmer-based de-bruijn assembler, trimming off low-count kmers
Extract the first reported trimmed assembled chain
Map the assembled chain to the nanopore adapter sequences, and trim to exclude any matching sequence
Use web BLASTn [#REF] to help trim any additional trailing non-matching sequence
Mapping
Use a kmer-based lightweight mapper to map reads to assembled bands
Created LAST mismatch matrix using `last-train` on the 5k reads together, using the full assembled ladder sequences as a reference:last-train -Q 1 ladder_seqs.fa 5k_reads.fq.gz
Mapped all reads to the assembled ladder sequences (only the reference corresponding to the most likely band source), retaining (for each read) the mapping that had the longest combined proportion of read and reference sequence mapped:lastal -p bacterial.mat -P 10 ladder_seqs.fa reads_2023-Oct-12_DNA-Ladder-1kbplus_called_all.fq.gz | \ ~/scripts/maf2csv.pl | \ awk -F ',' '{print $0","($8/100 * $13/100)}' | \ sort -t ',' -k 16rg,16 | sort -t ',' -k 1,1 -u | sort -t ',' -k 1r,1 | \ perl -pe 's/,[^,]*$/\n/' > LAST_reads_vs_ladder_longestMatch.csv.gz</description><identifier>DOI: 10.5281/zenodo.10020101</identifier><language>eng</language><publisher>Zenodo</publisher><subject>DNA Ladder ; dorado ; duplex ; Flongle ; nanopore ; P2 Solo ; Sequencing</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-4634-4995</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1887</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.10020101$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Eccles, David Andrew</creatorcontrib><title>Tracking Down Chimeric Assemblies In The TrackIt DNA Ladder Using Nanopore Sequencing</title><description>Dataset Description
These files represent two different LSK114 sequencing runs on a TrackIt 1kb Plus DNA Ladder sample, and associated data analysis.
July 20 2023 Flongle Run (191 Mb; 465k reads)
pod5_files_2023-Jul-20_DAE_DNA_Ladder.tar.gz- raw POD5 format files
called_2023-Jul-20_DAE_DNA_Ladder_duplex.bam- duplex called reads, called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Jul-20_DAE_DNA_Ladder.pdf- sequence length / quality QC plots
LAST_2023-Jul-20_DAE_DNA_Ladder_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Jul-20_DAE_DNA_Ladder.txt- Length / QC summary statistics
October 12 2023 P2 Solo Run (1.95 Gb, 1.11M reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_fail.tar.gz- raw POD5 format files (all failed reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_000-059.tar.gz- raw POD5 format files (passed reads, bundle #000-059)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_060-119.tar.gz- raw POD5 format files (passed reads, bundle #060-119)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_120-179.tar.gz- raw POD5 format files (passed reads, bundle #120-179)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_180-222.tar.gz- raw POD5 format files (passed reads, bundle #180-222)
called_2023-Oct-12_DNA-Ladder-1kbplus_duplex.bam- duplex called reads [October 12, 2023], called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Oct-12_DNA-Ladder-1kbplus.pdf- sequence length / quality QC plots
LAST_2023-Oct-12_DNA-Ladder-1kbplus_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Oct-12_DNA-Ladder-1kbplus.txt- Length / QC summary statistics
ladder_seqs.fa- assembled DNA ladder sequences, based on simplex reads
Methods
Sample preparation
Preparation of DNA for sequencing was carried out following the ONT Ligation Sequencing DNA V14 (SQK-LSK114) protocol, with modifications to exclude DNA repair, and keeping the sample in the same 1.5ml tube to reduce sample loss.
Tris-buffered Saline (TBS) buffer preparation
1M stock of NaCl was made by adding 2.922g of NaCl into a 50 ml Falcon tube, then made up to 50 ml with MilliPore water
A 50 mM TBS stock was created by adding 750 μl 1M NaCl solution to a 15 ml Falcon tube, then made up to 15 ml using Qiagen Elution Buffer (EB, i.e. 10 mM Tris-HCl at pH 8.0)
The pH was confirmed to be 7.9-8.1 using a pH indicator strip (e.g. MColorpHast 6.5 - 10.0; MER1095430001)
End prep
1 μg DNA ladder (i.e. 10 μl of 0.1 μg / μl DNA ladder) was transferred into a 1.5ml Eppendorf DNA LoBind tube
The volume was topped up to 43.5 μl with TBS (i.e. 33.5 μl TBS)
3.5 μl Ultra II End-prep Reaction Buffer and 3 μl Ultra II End-prep Enzyme Mix was added
After mixing by gentle pipetting, the mixture was incubated at RT for 5 minutes, then 65 \degrees for 5 minutes
Bead cleanup
The mixture was combined with 60 μl Ampure XP beads, and incubated on a rotator mixer at RT for 5 minutes
The tube was transferred to a magnetic rack [https://www.printables.com/model/532085-open-walled-magnetic-rack]
After the supernatant became clear and colourless, supernatant was pipetted off
The magnetic beads were washed twice with 150 μl of an 80% ethanol solution
The sample was dried briefly for 30s, then eluted in 60 μl TBS
Adapter ligation and final bead cleanup
To the sample tube was added 25 μl ONT Ligation buffer (LNB), 5μl NEBNext Quick T4 DNA Ligase (reduced from the protocol-suggested 10μl because that was all that was left in the tube), and 5μl ONT Ligation Adapter (LA)
The tube was mixed by gentle pipetting, spun down for 1-3s on a mini centrifuge, then incubated for 10 minutes at RT
The mixture was combined with 40 μl Ampure XP beads (100μl Ampure XP beads were used for the Flongle sample), and incubated on a rotator mixer at RT for 5 minutes
The tube was transferred to a magnetic rack [https://www.printables.com/model/532085-open-walled-magnetic-rack]
After the supernatant became clear and colourless, supernatant was pipetted off
The magnetic beads were washed twice with 250 μl of ONT Long Fragment Buffer (LFB) for the P2 Solo run, and 250μl ONT Short Fragment Buffer (SFB) for the Flongle run
The sample was dried briefly for 30s, then eluted for 10 minutes at 37 \degrees in 15 μl ONT Elution buffer (EB)
Addition of sequencing library buffers
A flow cell was prepared by flushing with ONT Flow Cell Flush (FCF) mixed with ONT Flow Cell Tether (FCT). For the P2 Solo, I used 500 μl of a 1170μl FCF solution that had 30μl FCT added to it; for the Flongle I used 60μl of a 117μl FCF solution that had 3 μl FCT added to it
1 μl of the eluted library was quantified on a Quantus Fluorometer, and approximately 50 fmol (assuming 1kb average length) was transferred to a new 1.5μl tube
For the P2 Solo run, the volume was topped up to 32 μl TBS; for the Flongle run, the volume was topped up to 12 μl TBS
To the sample tube was added ONT Sequencing Buffer (SB; P2 Solo - 100μl; Flongle - 30μl) and ONT Library Beads (LIB; P2 Solo - 68μl; Flongle - 20μl)
The flow cell was re-flushed with additional FCF/FCT mixture (500 μl for the P2 Solo; 30 μl for the Flongle)
The sequencing library was then added to the flow cell (200 μl for the P2 Solo; 30 μl for the Flongle)
The prepared flow cell was left for 10 minutes to allow the library to settle before starting sequencing
DNA Sequencing and basecalling
Sequencing was carried out using MinKNOW v23.04.6, sequencing in fast mode at 400 bases per second with a 20bp minimum sequence length and 5 kHz sampling rate, with reads output as POD5 files
The Flongle flow cell was run for a full standard run length (24h), whereas the PromethION flow cell was run for 1.5 hours (after which the counts of 15kb reads exceeded 200)
Sequenced reads were recalled in standard (simplex) mode using Dorado v0.4.0 and the 2023-09-22 bacterial methylation model [res_dna_r10.4.1_e8.2_400bps_sup@2023-09-22_bacterial-methylation]
Bioinformatics Analysis of Ladder Sequences
Sequence assembly
Assembly process for bands that are 3k in length and greater (done on LFB-depleted P2 Solo sequences):
Filter >q20 reads for a 100bp region around the target length (e.g. 4950-5050bp for the 5k band) [High quality reads were not sufficient for the 15kb band; all reads were needed]
Chop the reads up with a 1000bp overlap (e.g. 3000bp for the 5k band). This works around a Canu expectation that any read overlaps should be less than X% of the read.
Assemble the reads with Canu v2.2 [#REF], treating them as "pacbio" reads (for correction and homopolymer compression), with the GenomeSize parameter set to the expected band length (e.g. GenomeSize=5000).
Extract the first reported assembled contig.
Map the contig to the nanopore adapter sequences, and trim to exclude any matching sequence.
[Canu has a default genome size and read length cutoff of 1kb, and performs poorly on sequences shorter than this] Assembly process for bands under 3k in length (done on LFB-depleted P2 Solo sequences):
Filter >q20 reads for a 100bp region around the target length (e.g. 4950-5050bp for the 5k band) [High quality reads were not in sufficient abundance for the 100bp band; all reads were needed]
Assemble using a kmer-based de-bruijn assembler, trimming off low-count kmers
Extract the first reported trimmed assembled chain
Map the assembled chain to the nanopore adapter sequences, and trim to exclude any matching sequence
Use web BLASTn [#REF] to help trim any additional trailing non-matching sequence
Mapping
Use a kmer-based lightweight mapper to map reads to assembled bands
Created LAST mismatch matrix using `last-train` on the 5k reads together, using the full assembled ladder sequences as a reference:last-train -Q 1 ladder_seqs.fa 5k_reads.fq.gz
Mapped all reads to the assembled ladder sequences (only the reference corresponding to the most likely band source), retaining (for each read) the mapping that had the longest combined proportion of read and reference sequence mapped:lastal -p bacterial.mat -P 10 ladder_seqs.fa reads_2023-Oct-12_DNA-Ladder-1kbplus_called_all.fq.gz | \ ~/scripts/maf2csv.pl | \ awk -F ',' '{print $0","($8/100 * $13/100)}' | \ sort -t ',' -k 16rg,16 | sort -t ',' -k 1,1 -u | sort -t ',' -k 1r,1 | \ perl -pe 's/,[^,]*$/\n/' > LAST_reads_vs_ladder_longestMatch.csv.gz</description><subject>DNA Ladder</subject><subject>dorado</subject><subject>duplex</subject><subject>Flongle</subject><subject>nanopore</subject><subject>P2 Solo</subject><subject>Sequencing</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2023</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNo1jz1vgzAURb10qJLOXf0HSPwwBjwi0g8klA4lM3rYj8ZKMKmhqtpf36ZJpytd3XOlw9g9iJWKc1h_kx_tuAIhYgECbtmuCWgOzr_xzfjpebl3AwVneDFNNHRHRxOvPG_2xP-G1cw324LXaC0FvpvO4Bb9eBoD8Vd6_yBvfrslu-nxONHdNReseXxoyueofnmqyqKObA4QJanSWgqZxj11xsaGIAcrMUfCTOeQIILqtFVKQtZZ2_cSUp2lvUp1IkjKBVtfbi3OaNxM7Sm4AcNXC6I9-7YX3_bfV_4AqGZPHw</recordid><startdate>20231019</startdate><enddate>20231019</enddate><creator>Eccles, David Andrew</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0003-4634-4995</orcidid></search><sort><creationdate>20231019</creationdate><title>Tracking Down Chimeric Assemblies In The TrackIt DNA Ladder Using Nanopore Sequencing</title><author>Eccles, David Andrew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d811-4659930362febcd2ce181d3a8aea79814aa15b9d55317bddff316976f56940e33</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2023</creationdate><topic>DNA Ladder</topic><topic>dorado</topic><topic>duplex</topic><topic>Flongle</topic><topic>nanopore</topic><topic>P2 Solo</topic><topic>Sequencing</topic><toplevel>online_resources</toplevel><creatorcontrib>Eccles, David Andrew</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Eccles, David Andrew</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Tracking Down Chimeric Assemblies In The TrackIt DNA Ladder Using Nanopore Sequencing</title><date>2023-10-19</date><risdate>2023</risdate><abstract>Dataset Description
These files represent two different LSK114 sequencing runs on a TrackIt 1kb Plus DNA Ladder sample, and associated data analysis.
July 20 2023 Flongle Run (191 Mb; 465k reads)
pod5_files_2023-Jul-20_DAE_DNA_Ladder.tar.gz- raw POD5 format files
called_2023-Jul-20_DAE_DNA_Ladder_duplex.bam- duplex called reads, called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Jul-20_DAE_DNA_Ladder.pdf- sequence length / quality QC plots
LAST_2023-Jul-20_DAE_DNA_Ladder_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Jul-20_DAE_DNA_Ladder.txt- Length / QC summary statistics
October 12 2023 P2 Solo Run (1.95 Gb, 1.11M reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_fail.tar.gz- raw POD5 format files (all failed reads)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_000-059.tar.gz- raw POD5 format files (passed reads, bundle #000-059)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_060-119.tar.gz- raw POD5 format files (passed reads, bundle #060-119)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_120-179.tar.gz- raw POD5 format files (passed reads, bundle #120-179)
pod5_files_2023-Oct-12_DNA-Ladder-1kbplus_pass_180-222.tar.gz- raw POD5 format files (passed reads, bundle #180-222)
called_2023-Oct-12_DNA-Ladder-1kbplus_duplex.bam- duplex called reads [October 12, 2023], called using dorado v4.0 with the 2023-09-22 bacterial methylation model
sequence_QC_2023-Oct-12_DNA-Ladder-1kbplus.pdf- sequence length / quality QC plots
LAST_2023-Oct-12_DNA-Ladder-1kbplus_reads_vs_reference.tar.gz- Alignment summary statistics from LAST mapping of reads to their associated reference
lengths_summary_2023-Oct-12_DNA-Ladder-1kbplus.txt- Length / QC summary statistics
ladder_seqs.fa- assembled DNA ladder sequences, based on simplex reads
Methods
Sample preparation
Preparation of DNA for sequencing was carried out following the ONT Ligation Sequencing DNA V14 (SQK-LSK114) protocol, with modifications to exclude DNA repair, and keeping the sample in the same 1.5ml tube to reduce sample loss.
Tris-buffered Saline (TBS) buffer preparation
1M stock of NaCl was made by adding 2.922g of NaCl into a 50 ml Falcon tube, then made up to 50 ml with MilliPore water
A 50 mM TBS stock was created by adding 750 μl 1M NaCl solution to a 15 ml Falcon tube, then made up to 15 ml using Qiagen Elution Buffer (EB, i.e. 10 mM Tris-HCl at pH 8.0)
The pH was confirmed to be 7.9-8.1 using a pH indicator strip (e.g. MColorpHast 6.5 - 10.0; MER1095430001)
End prep
1 μg DNA ladder (i.e. 10 μl of 0.1 μg / μl DNA ladder) was transferred into a 1.5ml Eppendorf DNA LoBind tube
The volume was topped up to 43.5 μl with TBS (i.e. 33.5 μl TBS)
3.5 μl Ultra II End-prep Reaction Buffer and 3 μl Ultra II End-prep Enzyme Mix was added
After mixing by gentle pipetting, the mixture was incubated at RT for 5 minutes, then 65 \degrees for 5 minutes
Bead cleanup
The mixture was combined with 60 μl Ampure XP beads, and incubated on a rotator mixer at RT for 5 minutes
The tube was transferred to a magnetic rack [https://www.printables.com/model/532085-open-walled-magnetic-rack]
After the supernatant became clear and colourless, supernatant was pipetted off
The magnetic beads were washed twice with 150 μl of an 80% ethanol solution
The sample was dried briefly for 30s, then eluted in 60 μl TBS
Adapter ligation and final bead cleanup
To the sample tube was added 25 μl ONT Ligation buffer (LNB), 5μl NEBNext Quick T4 DNA Ligase (reduced from the protocol-suggested 10μl because that was all that was left in the tube), and 5μl ONT Ligation Adapter (LA)
The tube was mixed by gentle pipetting, spun down for 1-3s on a mini centrifuge, then incubated for 10 minutes at RT
The mixture was combined with 40 μl Ampure XP beads (100μl Ampure XP beads were used for the Flongle sample), and incubated on a rotator mixer at RT for 5 minutes
The tube was transferred to a magnetic rack [https://www.printables.com/model/532085-open-walled-magnetic-rack]
After the supernatant became clear and colourless, supernatant was pipetted off
The magnetic beads were washed twice with 250 μl of ONT Long Fragment Buffer (LFB) for the P2 Solo run, and 250μl ONT Short Fragment Buffer (SFB) for the Flongle run
The sample was dried briefly for 30s, then eluted for 10 minutes at 37 \degrees in 15 μl ONT Elution buffer (EB)
Addition of sequencing library buffers
A flow cell was prepared by flushing with ONT Flow Cell Flush (FCF) mixed with ONT Flow Cell Tether (FCT). For the P2 Solo, I used 500 μl of a 1170μl FCF solution that had 30μl FCT added to it; for the Flongle I used 60μl of a 117μl FCF solution that had 3 μl FCT added to it
1 μl of the eluted library was quantified on a Quantus Fluorometer, and approximately 50 fmol (assuming 1kb average length) was transferred to a new 1.5μl tube
For the P2 Solo run, the volume was topped up to 32 μl TBS; for the Flongle run, the volume was topped up to 12 μl TBS
To the sample tube was added ONT Sequencing Buffer (SB; P2 Solo - 100μl; Flongle - 30μl) and ONT Library Beads (LIB; P2 Solo - 68μl; Flongle - 20μl)
The flow cell was re-flushed with additional FCF/FCT mixture (500 μl for the P2 Solo; 30 μl for the Flongle)
The sequencing library was then added to the flow cell (200 μl for the P2 Solo; 30 μl for the Flongle)
The prepared flow cell was left for 10 minutes to allow the library to settle before starting sequencing
DNA Sequencing and basecalling
Sequencing was carried out using MinKNOW v23.04.6, sequencing in fast mode at 400 bases per second with a 20bp minimum sequence length and 5 kHz sampling rate, with reads output as POD5 files
The Flongle flow cell was run for a full standard run length (24h), whereas the PromethION flow cell was run for 1.5 hours (after which the counts of 15kb reads exceeded 200)
Sequenced reads were recalled in standard (simplex) mode using Dorado v0.4.0 and the 2023-09-22 bacterial methylation model [res_dna_r10.4.1_e8.2_400bps_sup@2023-09-22_bacterial-methylation]
Bioinformatics Analysis of Ladder Sequences
Sequence assembly
Assembly process for bands that are 3k in length and greater (done on LFB-depleted P2 Solo sequences):
Filter >q20 reads for a 100bp region around the target length (e.g. 4950-5050bp for the 5k band) [High quality reads were not sufficient for the 15kb band; all reads were needed]
Chop the reads up with a 1000bp overlap (e.g. 3000bp for the 5k band). This works around a Canu expectation that any read overlaps should be less than X% of the read.
Assemble the reads with Canu v2.2 [#REF], treating them as "pacbio" reads (for correction and homopolymer compression), with the GenomeSize parameter set to the expected band length (e.g. GenomeSize=5000).
Extract the first reported assembled contig.
Map the contig to the nanopore adapter sequences, and trim to exclude any matching sequence.
[Canu has a default genome size and read length cutoff of 1kb, and performs poorly on sequences shorter than this] Assembly process for bands under 3k in length (done on LFB-depleted P2 Solo sequences):
Filter >q20 reads for a 100bp region around the target length (e.g. 4950-5050bp for the 5k band) [High quality reads were not in sufficient abundance for the 100bp band; all reads were needed]
Assemble using a kmer-based de-bruijn assembler, trimming off low-count kmers
Extract the first reported trimmed assembled chain
Map the assembled chain to the nanopore adapter sequences, and trim to exclude any matching sequence
Use web BLASTn [#REF] to help trim any additional trailing non-matching sequence
Mapping
Use a kmer-based lightweight mapper to map reads to assembled bands
Created LAST mismatch matrix using `last-train` on the 5k reads together, using the full assembled ladder sequences as a reference:last-train -Q 1 ladder_seqs.fa 5k_reads.fq.gz
Mapped all reads to the assembled ladder sequences (only the reference corresponding to the most likely band source), retaining (for each read) the mapping that had the longest combined proportion of read and reference sequence mapped:lastal -p bacterial.mat -P 10 ladder_seqs.fa reads_2023-Oct-12_DNA-Ladder-1kbplus_called_all.fq.gz | \ ~/scripts/maf2csv.pl | \ awk -F ',' '{print $0","($8/100 * $13/100)}' | \ sort -t ',' -k 16rg,16 | sort -t ',' -k 1,1 -u | sort -t ',' -k 1r,1 | \ perl -pe 's/,[^,]*$/\n/' > LAST_reads_vs_ladder_longestMatch.csv.gz</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.10020101</doi><orcidid>https://orcid.org/0000-0003-4634-4995</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.5281/zenodo.10020101 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_5281_zenodo_10020101 |
source | DataCite |
subjects | DNA Ladder dorado duplex Flongle nanopore P2 Solo Sequencing |
title | Tracking Down Chimeric Assemblies In The TrackIt DNA Ladder Using Nanopore Sequencing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T16%3A45%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Eccles,%20David%20Andrew&rft.date=2023-10-19&rft_id=info:doi/10.5281/zenodo.10020101&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_10020101%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |