Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation
Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the bio...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Biomedical data and benchmarks are highly valuable yet very limited in
low-resource languages other than English such as Vietnamese. In this paper, we
make use of a state-of-the-art translation model in English-Vietnamese to
translate and produce both pretrained as well as supervised data in the
biomedical domains. Thanks to such large-scale translation, we introduce
ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20
million translated abstracts from the high-quality public PubMed corpus.
ViPubMedT5 demonstrates state-of-the-art results on two different biomedical
benchmarks in summarization and acronym disambiguation. Further, we release
ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the
recently public En-vi translation model and carefully refined by human experts,
with evaluations of existing methods against ViPubmedT5. |
---|---|
DOI: | 10.48550/arxiv.2210.05598 |