Objectively Evaluating the Reliability of Cell Type Annotation Using LLM-Based Strategies
Reliability in cell type annotation is challenging in single-cell RNA-sequencing data analysis because both expert-driven and automated methods can be biased or constrained by their training data, especially for novel or rare cell types. Although large language models (LLMs) are useful, our evaluati...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reliability in cell type annotation is challenging in single-cell
RNA-sequencing data analysis because both expert-driven and automated methods
can be biased or constrained by their training data, especially for novel or
rare cell types. Although large language models (LLMs) are useful, our
evaluation found that only a few matched expert annotations due to biased data
sources and inflexible training inputs. To overcome these limitations, we
developed the LICT (Large language model-based Identifier for Cell Types)
software package using a multi-model fusion and "talk-to-machine" strategy.
Tested across various single-cell RNA sequencing datasets, our approach
significantly improved annotation reliability, especially in datasets with low
cellular heterogeneity. Notably, we established objective criteria to assess
annotation reliability using the "talk-to-machine" approach, which addresses
discrepancies between our annotations and expert ones, enabling reliable
evaluation even without reference data. This strategy enhances annotation
credibility and sets the stage for advancing future LLM-based cell type
annotation methods. |
---|---|
DOI: | 10.48550/arxiv.2409.15678 |