Relation Extraction From Unstructured Species Descriptions Using TaxonNERD and LLaMA 2 7B
Ontologies are essential tools for organizing information on taxonomy, ecology, and inter-species relationships, helping to standardize ecological data and facilitate integration of large datasets. Combining ontologies with advanced Natural Language Processing (NLP) techniques, such as Named Entity...
Gespeichert in:
Veröffentlicht in: | Biodiversity Information Science and Standards 2024-11, Vol.8 (3), p.103 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Ontologies are essential tools for organizing information on taxonomy, ecology, and inter-species relationships, helping to standardize ecological data and facilitate integration of large datasets. Combining ontologies with advanced Natural Language Processing (NLP) techniques, such as Named Entity Recognition (NER) and Relation Extraction (RE), has greatly improved the discovery of insights from unstructured scientific texts, particularly in biodiversity (Gabud et al. 2023, Abdelmageed et al. 2022, Hearst 1992).
This study combines ontologies and NLP to analyze complex trophic interactions among animal species (Gabud et al. 2023), using a dataset (National Biodiversity Institute of Costa Rica (INBio) 2015) containing species descriptions in English and Spanish. We applied TaxoNERD to identify taxonomic entities (Le Guillarme and Thuiller 2021) and we fine-tuned the Large Language Model Meta AI (LLaMA 2 7B) to extract feeding interactions and predator-prey relationships (CheeKean 2023), due to its effectiveness in handling complex language patterns and its adaptability to diverse scientific domains.
Our results (Fig. 1) showed a recall of 0.73 and a precision of 0.68, indicating that the model effectively identifies feeding relationships in most cases. However, the lower precision suggests that the model may still capture some unrelated interactions, highlighting an area for improvement to reduce false positives and increase accuracy (Touvron et al. 2023). Previous studies also emphasize the need for further refinement of relation extraction models to enhance accuracy (Mora-Cross et al. 2023). The structured dataset offers valuable insights into species’ diets and roles, contributing to biodiversity research and conservation efforts (Mora-Cross et al. 2023, Touvron et al. 2023).
Moreover, this research highlights the potential of integrating AI-driven tools with ontological frameworks to manage and analyze biodiversity data at scale (Abdelmageed et al. 2022). By transforming unstructured text into structured data, we make ecological information more accessible, supporting better decision-making in conservation strategies (Abdelmageed et al. 2022, Hearst 1992). This approach scales well with the growing volume of biodiversity data, offering a more efficient and accurate method for analyzing species interactions, which are crucial for ecosystem management and endangered species protection (Gabud et al. 2023). |
---|---|
ISSN: | 2535-0897 2535-0897 |
DOI: | 10.3897/biss.8.142382 |