Re-clustering the database for crystallization of macromolecules
The current version of the biological macromolecule crystallization database (BMCD version 3.0) was statistically analyzed using clustering techniques. This is an effort to look for trends that may be useful in the crystallization of new macromolecules. Our previous statistical analysis of the BMCD...
Gespeichert in:
Veröffentlicht in: | Journal of crystal growth 1998-02, Vol.183 (4), p.653-668 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The current version of the biological macromolecule crystallization database (BMCD version 3.0) was statistically analyzed using clustering techniques. This is an effort to look for trends that may be useful in the crystallization of new macromolecules. Our previous statistical analysis of the BMCD was performed on version 1.0 [C.T. Samudzi, M.J. Fivash, J.M. Rosenberg, J. Crystal Growth 123 (1992) 47]. That database contained information on a total of 1025 crystallization experiments for 820 biological macromolecules (about 35% of those entries were incomplete and, thus, inappropriate for analysis). Version 3.0 of the BMCD is more than 90% complete and contains information on a total of about 2300 crystallization experiments for approximately 1500 biological macromolecules [G.L. Gilliland, M. Tung, D.M. Bakerslee, J.E. Ladner, Acta Cryst. D 50 (1994) 408]. With significantly more data in the BMCD, the question is whether trends have changed. The SAS software [SAS Institute Inc., SAS/STAT, Version 6, 4th ed., vol. 1] was used throughout the analysis. The following crystallization parameters were used in defining an experiment: pH, temperature, molecular weight, macromolecular concentration, precipitant type and crystallization method. Using these parameters, a measure of the differences between experiments was developed. Groups or clusters of similar experiments were identified as those close together based upon this difference measure. The database was successfully resolved into 25 clusters. The pseudo-F statistic for 25 clusters was 306.30 and is statistically significant (
p < 0.0001). Although eight of these clusters can be treated as outliers, the other 17 clusters provide useful information in recognizing new patterns and developing strategies for crystallization of macromolecules. |
---|---|
ISSN: | 0022-0248 1873-5002 |
DOI: | 10.1016/S0022-0248(97)00492-2 |