How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae)

[Display omitted] •Modified RADseq protocol yields strongly reduced number of length extended loci.•Evaluation of assembly metrics eases clustering threshold selection using ipyrad.•Locus filtering by length facilitates detection of biased data.•Dataset reduction improves overall data quality.•Infor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular phylogenetics and evolution 2022-02, Vol.167, p.107342-107342, Article 107342
Hauptverfasser: Hühn, Philipp, Dillenberger, Markus S., Gerschwitz-Eidt, Michael, Hörandl, Elvira, Los, Jessica A., Messerschmid, Thibaud F.E., Paetzold, Claudia, Rieger, Benjamin, Kadereit, Gudrun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Modified RADseq protocol yields strongly reduced number of length extended loci.•Evaluation of assembly metrics eases clustering threshold selection using ipyrad.•Locus filtering by length facilitates detection of biased data.•Dataset reduction improves overall data quality.•Informative RADseq loci support coalescent-based phylogenetic inference with ASTRAL. Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
ISSN:1055-7903
1095-9513
DOI:10.1016/j.ympev.2021.107342