BactInt: A domain driven transfer learning approach for extracting inter-bacterial associations from biomedical text

The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational biology and chemistry 2024-04, Vol.109, p.108012-108012, Article 108012
Hauptverfasser: Das Baksi, Krishanu, Pokhrel, Vatsala, Pudavar, Anand Eruvessi, Mande, Sharmila S., Kuntal, Bhusan K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and ascertaining such inter bacterial associations. However, the complexity of the reported text as well as the ever-increasing volume of information necessitates development of methods for automated and accurate extraction of such knowledge. A BioBERT (biomedical domain specific language model) based information extraction model for bacterial associations is presented that utilizes learning patterns from other publicly available datasets. Additionally, a specialized sentence corpus has been developed to significantly improve the prediction accuracy of the ‘transfer learned’ model using a fine-tuning approach. The final model was seen to outperform all other variations (non-transfer learned and non-fine-tuned models) as well as models trained on BioGPT (a domain trained Generative Pre-trained Transformer). To further demonstrate the utility, a case study was performed using bacterial association network data obtained from experimental studies. This study attempts to demonstrate the applicability of transfer learning in a niche field of life sciences where understanding of inter bacterial relationships is crucial to obtain meaningful insights in comprehending microbial community structures across different ecosystems. The study further discusses how such a model can be further improved by fine tuning using limited training data. The results presented and the datasets made available are expected to be a valuable addition in the field of medical informatics and bioinformatics. [Display omitted] •Inter-bacterial associations constitute the building blocks of bacterial community.•Automated and accurate methods are essential to extract such information from text.•Transfer learning using information from similar domains can improve predictions.•Fine tuning such model using bacterial association data can further increase accuracy.•A BERT language model fined tuned on inhouse bacterial association data is presented.
ISSN:1476-9271
1476-928X
DOI:10.1016/j.compbiolchem.2023.108012