A machine learning framework for extracting information from biological pathway images in the literature
466 target chemicals_selected chemicals: Target chemicals satisfying the criteria for biochemical reactions not covered by MetaNetX and KEGG. 466 target chemicals_statistics:Numbers of MetaCyc reactions, papers, and pathway images collected for 466 target chemicals from the bio-based chemicals map...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | 466 target chemicals_selected chemicals:
Target chemicals satisfying the criteria for biochemical reactions not covered by MetaNetX and KEGG.
466 target chemicals_statistics:Numbers of MetaCyc reactions, papers, and pathway images collected for 466 target chemicals from the bio-based chemicals map (Jang et al., Trends in Biotechnology, 2023).
arrow detection_bounding box labels:Bounding box labels for 6,471 images in the training and validation datasets and 100 images in the test dataset. The corresponding images are available in "arrow detection_training and validation datasets.zip".
arrow detection_test dataset:
Test dataset for arrow detection using Faster R-CNN model. A total of 100 images have been prepared from 89 papers searched through PubMed Central (PMC).
arrow detection_training and validation datasets:
Training and validation datasets for arrow detection using Faster R-CNN model. A total of 6,471 images have been prepared, including 2,332 images from five different sources and 4,139 augmented images.
EBPI outputs:
Reaction information extracted using EBPI from 49,846 biological pathway images across 466 target chemicals.
text classification_training, validation and test datasets:
Dataset for text classification using BioBERT. A total of 59,370 terms have been prepared, including 15,101 “gene” terms, 21,417 “protein” terms, and 22,852 “others” terms by combining the data from MetaCyc and the PaddleOCR results from the papers. |
---|---|
DOI: | 10.5281/zenodo.10875300 |