Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?
Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning, and pr...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Over the past few decades, Artificial Intelligence(AI) has progressed from
the initial machine learning stage to the deep learning stage, and now to the
stage of foundational models. Foundational models have the characteristics of
pre-training, transfer learning, and self-supervised learning, and pre-trained
models can be fine-tuned and applied to various downstream tasks. Under the
framework of foundational models, models such as Bidirectional Encoder
Representations from Transformers(BERT) and Generative Pre-trained
Transformer(GPT) have greatly advanced the development of natural language
processing(NLP), especially the emergence of many models based on BERT. BERT
broke through the limitation of only using one-way methods for language
modeling in pre-training by using a masked language model. It can capture
bidirectional context information to predict the masked words in the sequence,
this can improve the feature extraction ability of the model. This makes the
model very useful for downstream tasks, especially for specialized
applications. The model using the bidirectional encoder can better understand
the domain knowledge and be better applied to these downstream tasks. So we
hope to help understand how this technology has evolved and improved model
performance in various natural language processing tasks under the background
of foundational models and reveal its importance in capturing context
information and improving the model's performance on downstream tasks. This
article analyzes one-way and bidirectional models based on GPT and BERT and
compares their differences based on the purpose of the model. It also briefly
analyzes BERT and the improvements of some models based on BERT. The model's
performance on the Stanford Question Answering Dataset(SQuAD) and General
Language Understanding Evaluation(GLUE) was compared. |
---|---|
DOI: | 10.48550/arxiv.2411.18021 |