Analysis of variance for genomic sequences in unbalanced designs

In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Brazilian journal of probability and statistics 2007-12, Vol.21 (2), p.203-223
Hauptverfasser:	Pinheiro, Hildete P., de Souza, Roberta, Pinheiro, Aluísio S., da Silva, Cibele Q., dos Reis, Sérgio F.
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis of variance Cytochromes Design analysis Genomes Genomics Nucleotide sequences Null hypothesis Sample size Statistical variance Watersheds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	223
container_issue	2
container_start_page	203
container_title	Brazilian journal of probability and statistics
container_volume	21
creator	Pinheiro, Hildete P. de Souza, Roberta Pinheiro, Aluísio S. da Silva, Cibele Q. dos Reis, Sérgio F.
description	In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.
format	Article
fullrecord	<record><control><sourceid>jstor</sourceid><recordid>TN_cdi_jstor_primary_43601099</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>43601099</jstor_id><sourcerecordid>43601099</sourcerecordid><originalsourceid>FETCH-jstor_primary_436010993</originalsourceid><addsrcrecordid>eNqFjk0KwjAUhIMoGLRHEN4FAvmpLdkpongA9yW2aUlpE82zQm9vBPfOZuCbbzELQqUSJSuE1ktCueCK8XIv1yRD7HmK0jKXnJLD0ZthRocQWnib6IyvLbQhQmd9GF0NaJ-TTRDBeZj83QxfpYHGous8bsmqNQPa7Ncbsrucb6cr6_EVYvWIbjRxrnJVpBdaq3_7B9wSNro</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Analysis of variance for genomic sequences in unbalanced designs</title><source>JSTOR Mathematics & Statistics</source><source>JSTOR Archive Collection A-Z Listing</source><creator>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</creator><creatorcontrib>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</creatorcontrib><description>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</description><identifier>ISSN: 0103-0752</identifier><identifier>EISSN: 2317-6199</identifier><language>eng</language><publisher>Brazilian Statistical Association</publisher><subject>Analysis of variance ; Cytochromes ; Design analysis ; Genomes ; Genomics ; Nucleotide sequences ; Null hypothesis ; Sample size ; Statistical variance ; Watersheds</subject><ispartof>Brazilian journal of probability and statistics, 2007-12, Vol.21 (2), p.203-223</ispartof><rights>Copyright ©2007. Brazilian Statistical Association</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/43601099$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/43601099$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,803,832,58015,58019,58248,58252</link.rule.ids></links><search><creatorcontrib>Pinheiro, Hildete P.</creatorcontrib><creatorcontrib>de Souza, Roberta</creatorcontrib><creatorcontrib>Pinheiro, Aluísio S.</creatorcontrib><creatorcontrib>da Silva, Cibele Q.</creatorcontrib><creatorcontrib>dos Reis, Sérgio F.</creatorcontrib><title>Analysis of variance for genomic sequences in unbalanced designs</title><title>Brazilian journal of probability and statistics</title><description>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</description><subject>Analysis of variance</subject><subject>Cytochromes</subject><subject>Design analysis</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Nucleotide sequences</subject><subject>Null hypothesis</subject><subject>Sample size</subject><subject>Statistical variance</subject><subject>Watersheds</subject><issn>0103-0752</issn><issn>2317-6199</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid/><recordid>eNqFjk0KwjAUhIMoGLRHEN4FAvmpLdkpongA9yW2aUlpE82zQm9vBPfOZuCbbzELQqUSJSuE1ktCueCK8XIv1yRD7HmK0jKXnJLD0ZthRocQWnib6IyvLbQhQmd9GF0NaJ-TTRDBeZj83QxfpYHGous8bsmqNQPa7Ncbsrucb6cr6_EVYvWIbjRxrnJVpBdaq3_7B9wSNro</recordid><startdate>20071201</startdate><enddate>20071201</enddate><creator>Pinheiro, Hildete P.</creator><creator>de Souza, Roberta</creator><creator>Pinheiro, Aluísio S.</creator><creator>da Silva, Cibele Q.</creator><creator>dos Reis, Sérgio F.</creator><general>Brazilian Statistical Association</general><scope/></search><sort><creationdate>20071201</creationdate><title>Analysis of variance for genomic sequences in unbalanced designs</title><author>Pinheiro, Hildete P. ; de Souza, Roberta ; Pinheiro, Aluísio S. ; da Silva, Cibele Q. ; dos Reis, Sérgio F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-jstor_primary_436010993</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Analysis of variance</topic><topic>Cytochromes</topic><topic>Design analysis</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Nucleotide sequences</topic><topic>Null hypothesis</topic><topic>Sample size</topic><topic>Statistical variance</topic><topic>Watersheds</topic><toplevel>online_resources</toplevel><creatorcontrib>Pinheiro, Hildete P.</creatorcontrib><creatorcontrib>de Souza, Roberta</creatorcontrib><creatorcontrib>Pinheiro, Aluísio S.</creatorcontrib><creatorcontrib>da Silva, Cibele Q.</creatorcontrib><creatorcontrib>dos Reis, Sérgio F.</creatorcontrib><jtitle>Brazilian journal of probability and statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pinheiro, Hildete P.</au><au>de Souza, Roberta</au><au>Pinheiro, Aluísio S.</au><au>da Silva, Cibele Q.</au><au>dos Reis, Sérgio F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analysis of variance for genomic sequences in unbalanced designs</atitle><jtitle>Brazilian journal of probability and statistics</jtitle><date>2007-12-01</date><risdate>2007</risdate><volume>21</volume><issue>2</issue><spage>203</spage><epage>223</epage><pages>203-223</pages><issn>0103-0752</issn><eissn>2317-6199</eissn><abstract>In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical, being one out of four categories (looking at the nucleotide level). Light and Margolin (1971) developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the DNA sequence for balanced designs. Here we consider a methodology for multiple category data with a different number of sample units (i.e., sequences) in each group, that is, the sampling design is unbalanced. In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic is derived and its power is evaluated. An application to real data is illustrated using resampling methods to generate the empirical distribution of the test statistic and a simulation study is performed to evaluate the asymptotic behavior of the distribution of the test statistic.</abstract><pub>Brazilian Statistical Association</pub></addata></record>
fulltext	fulltext
identifier	ISSN: 0103-0752
ispartof	Brazilian journal of probability and statistics, 2007-12, Vol.21 (2), p.203-223
issn	0103-0752 2317-6199
language	eng
recordid	cdi_jstor_primary_43601099
source	JSTOR Mathematics & Statistics; JSTOR Archive Collection A-Z Listing
subjects	Analysis of variance Cytochromes Design analysis Genomes Genomics Nucleotide sequences Null hypothesis Sample size Statistical variance Watersheds
title	Analysis of variance for genomic sequences in unbalanced designs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T17%3A57%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analysis%20of%20variance%20for%20genomic%20sequences%20in%20unbalanced%20designs&rft.jtitle=Brazilian%20journal%20of%20probability%20and%20statistics&rft.au=Pinheiro,%20Hildete%20P.&rft.date=2007-12-01&rft.volume=21&rft.issue=2&rft.spage=203&rft.epage=223&rft.pages=203-223&rft.issn=0103-0752&rft.eissn=2317-6199&rft_id=info:doi/&rft_dat=%3Cjstor%3E43601099%3C/jstor%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_jstor_id=43601099&rfr_iscdi=true