Analysis of variance for genomic sequences in unbalanced designs
In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categor...
Gespeichert in:
Veröffentlicht in: | Brazilian journal of probability and statistics 2007-12, Vol.21 (2), p.203-223 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 223 |
---|---|
container_issue | 2 |
container_start_page | 203 |
container_title | Brazilian journal of probability and statistics |
container_volume | 21 |
creator | Pinheiro, Hildete P. de Souza, Roberta Pinheiro, Aluísio S. da Silva, Cibele Q. dos Reis, Sérgio F. |
description | In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic. |
format | Article |
fullrecord | <record><control><sourceid>jstor</sourceid><recordid>TN_cdi_jstor_primary_43601099</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>43601099</jstor_id><sourcerecordid>43601099</sourcerecordid><originalsourceid>FETCH-jstor_primary_436010993</originalsourceid><addsrcrecordid>eNqFjk0KwjAUhIMoGLRHEN4FAvmpLdkpongA9yW2aUlpE82zQm9vBPfOZuCbbzELQqUSJSuE1ktCueCK8XIv1yRD7HmK0jKXnJLD0ZthRocQWnib6IyvLbQhQmd9GF0NaJ-TTRDBeZj83QxfpYHGous8bsmqNQPa7Ncbsrucb6cr6_EVYvWIbjRxrnJVpBdaq3_7B9wSNro</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Analysis of variance for genomic sequences in unbalanced designs</title><source>JSTOR Mathematics & Statistics</source><source>JSTOR Archive Collection A-Z Listing</source><creator>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</creator><creatorcontrib>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</creatorcontrib><description>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</description><identifier>ISSN: 0103-0752</identifier><identifier>EISSN: 2317-6199</identifier><language>eng</language><publisher>Brazilian Statistical Association</publisher><subject>Analysis of variance ; Cytochromes ; Design analysis ; Genomes ; Genomics ; Nucleotide sequences ; Null hypothesis ; Sample size ; Statistical variance ; Watersheds</subject><ispartof>Brazilian journal of probability and statistics, 2007-12, Vol.21 (2), p.203-223</ispartof><rights>Copyright ©2007. Brazilian Statistical Association</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/43601099$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/43601099$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,803,832,58015,58019,58248,58252</link.rule.ids></links><search><creatorcontrib>Pinheiro, Hildete P.</creatorcontrib><creatorcontrib>de Souza, Roberta</creatorcontrib><creatorcontrib>Pinheiro, Aluísio S.</creatorcontrib><creatorcontrib>da Silva, Cibele Q.</creatorcontrib><creatorcontrib>dos Reis, Sérgio F.</creatorcontrib><title>Analysis of variance for genomic sequences in unbalanced designs</title><title>Brazilian journal of probability and statistics</title><description>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</description><subject>Analysis of variance</subject><subject>Cytochromes</subject><subject>Design analysis</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Nucleotide sequences</subject><subject>Null hypothesis</subject><subject>Sample size</subject><subject>Statistical variance</subject><subject>Watersheds</subject><issn>0103-0752</issn><issn>2317-6199</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid/><recordid>eNqFjk0KwjAUhIMoGLRHEN4FAvmpLdkpongA9yW2aUlpE82zQm9vBPfOZuCbbzELQqUSJSuE1ktCueCK8XIv1yRD7HmK0jKXnJLD0ZthRocQWnib6IyvLbQhQmd9GF0NaJ-TTRDBeZj83QxfpYHGous8bsmqNQPa7Ncbsrucb6cr6_EVYvWIbjRxrnJVpBdaq3_7B9wSNro</recordid><startdate>20071201</startdate><enddate>20071201</enddate><creator>Pinheiro, Hildete P.</creator><creator>de Souza, Roberta</creator><creator>Pinheiro, Aluísio S.</creator><creator>da Silva, Cibele Q.</creator><creator>dos Reis, Sérgio F.</creator><general>Brazilian Statistical Association</general><scope/></search><sort><creationdate>20071201</creationdate><title>Analysis of variance for genomic sequences in unbalanced designs</title><author>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-jstor_primary_436010993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Analysis of variance</topic><topic>Cytochromes</topic><topic>Design analysis</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Nucleotide sequences</topic><topic>Null hypothesis</topic><topic>Sample size</topic><topic>Statistical variance</topic><topic>Watersheds</topic><toplevel>online_resources</toplevel><creatorcontrib>Pinheiro, Hildete P.</creatorcontrib><creatorcontrib>de Souza, Roberta</creatorcontrib><creatorcontrib>Pinheiro, Aluísio S.</creatorcontrib><creatorcontrib>da Silva, Cibele Q.</creatorcontrib><creatorcontrib>dos Reis, Sérgio F.</creatorcontrib><jtitle>Brazilian journal of probability and statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pinheiro, Hildete P.</au><au>de Souza, Roberta</au><au>Pinheiro, Aluísio S.</au><au>da Silva, Cibele Q.</au><au>dos Reis, Sérgio F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analysis of variance for genomic sequences in unbalanced designs</atitle><jtitle>Brazilian journal of probability and statistics</jtitle><date>2007-12-01</date><risdate>2007</risdate><volume>21</volume><issue>2</issue><spage>203</spage><epage>223</epage><pages>203-223</pages><issn>0103-0752</issn><eissn>2317-6199</eissn><abstract>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</abstract><pub>Brazilian Statistical Association</pub></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0103-0752 |
ispartof | Brazilian journal of probability and statistics, 2007-12, Vol.21 (2), p.203-223 |
issn | 0103-0752 2317-6199 |
language | eng |
recordid | cdi_jstor_primary_43601099 |
source | JSTOR Mathematics & Statistics; JSTOR Archive Collection A-Z Listing |
subjects | Analysis of variance Cytochromes Design analysis Genomes Genomics Nucleotide sequences Null hypothesis Sample size Statistical variance Watersheds |
title | Analysis of variance for genomic sequences in unbalanced designs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T17%3A57%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analysis%20of%20variance%20for%20genomic%20sequences%20in%20unbalanced%20designs&rft.jtitle=Brazilian%20journal%20of%20probability%20and%20statistics&rft.au=Pinheiro,%20Hildete%20P.&rft.date=2007-12-01&rft.volume=21&rft.issue=2&rft.spage=203&rft.epage=223&rft.pages=203-223&rft.issn=0103-0752&rft.eissn=2317-6199&rft_id=info:doi/&rft_dat=%3Cjstor%3E43601099%3C/jstor%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_jstor_id=43601099&rfr_iscdi=true |