L2 proficiency assessment using self-supervised speech representations
There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though the...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | There has been a growing demand for automated spoken language assessment
systems in recent years. A standard pipeline for this process is to start with
a speech recognition system and derive features, either hand-crafted or based
on deep-learning, that exploit the transcription and audio. Though these
approaches can yield high performance systems, they require speech recognition
systems that can be used for L2 speakers, and preferably tuned to the specific
form of test being deployed. Recently a self-supervised speech representation
based scheme, requiring no speech recognition, was proposed. This work extends
the initial analysis conducted on this approach to a large scale proficiency
test, Linguaskill, that comprises multiple parts, each designed to assess
different attributes of a candidate's speaking proficiency. The performance of
the self-supervised, wav2vec 2.0, system is compared to a high performance
hand-crafted assessment system and a BERT-based text system both of which use
speech transcriptions. Though the wav2vec 2.0 based system is found to be
sensitive to the nature of the response, it can be configured to yield
comparable performance to systems requiring a speech transcription, and yields
gains when appropriately combined with standard approaches. |
---|---|
DOI: | 10.48550/arxiv.2211.08849 |