A new method of solution for the occupancy problem and its application to operon size prediction

The problem of estimating the expected number of transcription units containing a specific number of genes arises in the context of operon size prediction in prokaryotic genomes, where operons are defined to be transcription units containing two or more genes. It turns out that this problem is ident...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of theoretical biology 2004-04, Vol.227 (3), p.315-322
Hauptverfasser: Lamboy, Warren F., Moreno-Hagelsieb, Gabriel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The problem of estimating the expected number of transcription units containing a specific number of genes arises in the context of operon size prediction in prokaryotic genomes, where operons are defined to be transcription units containing two or more genes. It turns out that this problem is identical mathematically to the balls in urns occupancy problem in probability theory. In that problem, a fixed number of indistinguishable balls are randomly placed in a known number of distinguishable urns, subject to the restriction that no urns may remain empty, and an estimate is desired for the expected number of urns containing a specific number of balls. In this paper we present a new simple technique for solving the occupancy problem when empty urns are allowed and extend it to the case when each urn must contain the same non-zero minimum number of balls. Treating transcription units as equivalent to urns, and genes as equivalent to balls, we then use that result to solve the problem of estimating the expected number of transcription units that contain a specific number of genes, and then apply that result to predicting the expected number of transcription units present in an entire genome. Since these predictions can be made for any completely sequenced and annotated prokaryotic genome, they provide a starting point for the comparison of regulatory complexity across such genomes.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2003.11.009