Analysis of variance for genomic sequences in unbalanced designs

In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Brazilian journal of probability and statistics 2007-12, Vol.21 (2), p.203-223
Hauptverfasser: Pinheiro, Hildete P., de Souza, Roberta, Pinheiro, Aluísio S., da Silva, Cibele Q., dos Reis, Sérgio F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.
ISSN:0103-0752
2317-6199