Analysis of variance for genomic sequences in unbalanced designs

In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Brazilian journal of probability and statistics 2007-12, Vol.21 (2), p.203-223
Hauptverfasser: Pinheiro, Hildete P., de Souza, Roberta, Pinheiro, Aluísio S., da Silva, Cibele Q., dos Reis, Sérgio F.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 223
container_issue 2
container_start_page 203
container_title Brazilian journal of probability and statistics
container_volume 21
creator Pinheiro, Hildete P.
de Souza, Roberta
Pinheiro, Aluísio S.
da Silva, Cibele Q.
dos Reis, Sérgio F.
description In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.
format Article
fullrecord <record><control><sourceid>jstor</sourceid><recordid>TN_cdi_jstor_primary_43601099</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>43601099</jstor_id><sourcerecordid>43601099</sourcerecordid><originalsourceid>FETCH-jstor_primary_436010993</originalsourceid><addsrcrecordid>eNqFjk0KwjAUhIMoGLRHEN4FAvmpLdkpongA9yW2aUlpE82zQm9vBPfOZuCbbzELQqUSJSuE1ktCueCK8XIv1yRD7HmK0jKXnJLD0ZthRocQWnib6IyvLbQhQmd9GF0NaJ-TTRDBeZj83QxfpYHGous8bsmqNQPa7Ncbsrucb6cr6_EVYvWIbjRxrnJVpBdaq3_7B9wSNro</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Analysis of variance for genomic sequences in unbalanced designs</title><source>JSTOR Mathematics &amp; Statistics</source><source>JSTOR Archive Collection A-Z Listing</source><creator>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</creator><creatorcontrib>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</creatorcontrib><description>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</description><identifier>ISSN: 0103-0752</identifier><identifier>EISSN: 2317-6199</identifier><language>eng</language><publisher>Brazilian Statistical Association</publisher><subject>Analysis of variance ; Cytochromes ; Design analysis ; Genomes ; Genomics ; Nucleotide sequences ; Null hypothesis ; Sample size ; Statistical variance ; Watersheds</subject><ispartof>Brazilian journal of probability and statistics, 2007-12, Vol.21 (2), p.203-223</ispartof><rights>Copyright ©2007. Brazilian Statistical Association</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/43601099$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/43601099$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,803,832,58015,58019,58248,58252</link.rule.ids></links><search><creatorcontrib>Pinheiro, Hildete P.</creatorcontrib><creatorcontrib>de Souza, Roberta</creatorcontrib><creatorcontrib>Pinheiro, Aluísio S.</creatorcontrib><creatorcontrib>da Silva, Cibele Q.</creatorcontrib><creatorcontrib>dos Reis, Sérgio F.</creatorcontrib><title>Analysis of variance for genomic sequences in unbalanced designs</title><title>Brazilian journal of probability and statistics</title><description>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</description><subject>Analysis of variance</subject><subject>Cytochromes</subject><subject>Design analysis</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Nucleotide sequences</subject><subject>Null hypothesis</subject><subject>Sample size</subject><subject>Statistical variance</subject><subject>Watersheds</subject><issn>0103-0752</issn><issn>2317-6199</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid/><recordid>eNqFjk0KwjAUhIMoGLRHEN4FAvmpLdkpongA9yW2aUlpE82zQm9vBPfOZuCbbzELQqUSJSuE1ktCueCK8XIv1yRD7HmK0jKXnJLD0ZthRocQWnib6IyvLbQhQmd9GF0NaJ-TTRDBeZj83QxfpYHGous8bsmqNQPa7Ncbsrucb6cr6_EVYvWIbjRxrnJVpBdaq3_7B9wSNro</recordid><startdate>20071201</startdate><enddate>20071201</enddate><creator>Pinheiro, Hildete P.</creator><creator>de Souza, Roberta</creator><creator>Pinheiro, Aluísio S.</creator><creator>da Silva, Cibele Q.</creator><creator>dos Reis, Sérgio F.</creator><general>Brazilian Statistical Association</general><scope/></search><sort><creationdate>20071201</creationdate><title>Analysis of variance for genomic sequences in unbalanced designs</title><author>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-jstor_primary_436010993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Analysis of variance</topic><topic>Cytochromes</topic><topic>Design analysis</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Nucleotide sequences</topic><topic>Null hypothesis</topic><topic>Sample size</topic><topic>Statistical variance</topic><topic>Watersheds</topic><toplevel>online_resources</toplevel><creatorcontrib>Pinheiro, Hildete P.</creatorcontrib><creatorcontrib>de Souza, Roberta</creatorcontrib><creatorcontrib>Pinheiro, Aluísio S.</creatorcontrib><creatorcontrib>da Silva, Cibele Q.</creatorcontrib><creatorcontrib>dos Reis, Sérgio F.</creatorcontrib><jtitle>Brazilian journal of probability and statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pinheiro, Hildete P.</au><au>de Souza, Roberta</au><au>Pinheiro, Aluísio S.</au><au>da Silva, Cibele Q.</au><au>dos Reis, Sérgio F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analysis of variance for genomic sequences in unbalanced designs</atitle><jtitle>Brazilian journal of probability and statistics</jtitle><date>2007-12-01</date><risdate>2007</risdate><volume>21</volume><issue>2</issue><spage>203</spage><epage>223</epage><pages>203-223</pages><issn>0103-0752</issn><eissn>2317-6199</eissn><abstract>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</abstract><pub>Brazilian Statistical Association</pub></addata></record>
fulltext fulltext
identifier ISSN: 0103-0752
ispartof Brazilian journal of probability and statistics, 2007-12, Vol.21 (2), p.203-223
issn 0103-0752
2317-6199
language eng
recordid cdi_jstor_primary_43601099
source JSTOR Mathematics & Statistics; JSTOR Archive Collection A-Z Listing
subjects Analysis of variance
Cytochromes
Design analysis
Genomes
Genomics
Nucleotide sequences
Null hypothesis
Sample size
Statistical variance
Watersheds
title Analysis of variance for genomic sequences in unbalanced designs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T17%3A57%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analysis%20of%20variance%20for%20genomic%20sequences%20in%20unbalanced%20designs&rft.jtitle=Brazilian%20journal%20of%20probability%20and%20statistics&rft.au=Pinheiro,%20Hildete%20P.&rft.date=2007-12-01&rft.volume=21&rft.issue=2&rft.spage=203&rft.epage=223&rft.pages=203-223&rft.issn=0103-0752&rft.eissn=2317-6199&rft_id=info:doi/&rft_dat=%3Cjstor%3E43601099%3C/jstor%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_jstor_id=43601099&rfr_iscdi=true