Multivariate autoregressive model for a study of phylogenetic diversity
We present a computationally effective model to parameterize DNA sequences in a way describing comprehensively its auto and cross-correlation structure. The approach is based on four-channel Multivariate Autoregressive Model (MVAR). The model was applied to a study of genes from the globin family fo...
Gespeichert in:
Veröffentlicht in: | Gene 2009-04, Vol.435 (1), p.104-118 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a computationally effective model to parameterize DNA sequences in a way describing comprehensively its auto and cross-correlation structure. The approach is based on four-channel Multivariate Autoregressive Model (MVAR). The model was applied to a study of genes from the globin family for 6 vertebrate species. First, the sequences were coded as four signals (corresponding to the nucleotides), which were fitted to a four-channel MVAR. From the correlation matrices the vectors of model coefficients were calculated as functions of the nucleotide distance. The between-chromosomes and inter-species differences were best distinguished in the cross-coefficients binding different nucleotide sequences. For clustering purposes different metrics were tested and then two clustering procedures (Nearest Neighbor and UPGMA) were applied. The clustering trees and consensus trees were constructed for exons, introns and whole genes. The results were in agreement with the known dependencies between the chromosomes of the globin family. The orthological genes for different species were grouped together. Inside these groups the phylogenetically close organisms were localized in proximity. |
---|---|
ISSN: | 0378-1119 1879-0038 |
DOI: | 10.1016/j.gene.2009.01.009 |