BactInt: A domain driven transfer learning approach for extracting inter-bacterial associations from biomedical text
The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and a...
Gespeichert in:
Veröffentlicht in: | Computational biology and chemistry 2024-04, Vol.109, p.108012-108012, Article 108012 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and ascertaining such inter bacterial associations. However, the complexity of the reported text as well as the ever-increasing volume of information necessitates development of methods for automated and accurate extraction of such knowledge.
A BioBERT (biomedical domain specific language model) based information extraction model for bacterial associations is presented that utilizes learning patterns from other publicly available datasets. Additionally, a specialized sentence corpus has been developed to significantly improve the prediction accuracy of the ‘transfer learned’ model using a fine-tuning approach.
The final model was seen to outperform all other variations (non-transfer learned and non-fine-tuned models) as well as models trained on BioGPT (a domain trained Generative Pre-trained Transformer). To further demonstrate the utility, a case study was performed using bacterial association network data obtained from experimental studies.
This study attempts to demonstrate the applicability of transfer learning in a niche field of life sciences where understanding of inter bacterial relationships is crucial to obtain meaningful insights in comprehending microbial community structures across different ecosystems. The study further discusses how such a model can be further improved by fine tuning using limited training data. The results presented and the datasets made available are expected to be a valuable addition in the field of medical informatics and bioinformatics.
[Display omitted]
•Inter-bacterial associations constitute the building blocks of bacterial community.•Automated and accurate methods are essential to extract such information from text.•Transfer learning using information from similar domains can improve predictions.•Fine tuning such model using bacterial association data can further increase accuracy.•A BERT language model fined tuned on inhouse bacterial association data is presented. |
---|---|
ISSN: | 1476-9271 1476-928X |
DOI: | 10.1016/j.compbiolchem.2023.108012 |