LOW COMPLEXITY VOICE ACTIVITY DETECTION ALGORITHM

A first VAD system outputs a pulse stream for zero crossings in an audio signal. The pulse density of the pulse stream is evaluated to identify speech. The audio signal may have noise added to it before evaluating zero crossings. A second VAD system rectifies each audio signal sample and processes e...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	SERWY, Roger
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A first VAD system outputs a pulse stream for zero crossings in an audio signal. The pulse density of the pulse stream is evaluated to identify speech. The audio signal may have noise added to it before evaluating zero crossings. A second VAD system rectifies each audio signal sample and processes each rectified sample by updating a first statistic and evaluating the rectified sample per a first threshold condition that is a function of the first statistic. Rectified samples meeting the first threshold condition may be used to update a second statistic and the rectified sample evaluated per a second threshold condition that is a function of the second statistic. Rectified samples meeting the second threshold condition may be used to update a third statistic. The audio signal sample may be selected as speech if the second statistic is less than a downscaled third statistic. Un premier système VAD délivre un flux d'impulsions pour des passages par zéro dans un signal audio. La densité d'impulsions du flux d'impulsions est évaluée pour identifier la parole. Le signal audio peut comporter du bruit ajouté à celui-ci avant d'évaluer les passages par zéro. Un second système VAD redresse chaque échantillon de signal audio et traite chaque échantillon redressé par mise à jour d'une première statistique et évaluation de l'échantillon redressé par une première condition de seuil qui est une fonction de la première statistique. Des échantillons redressés répondant à la première condition de seuil peuvent être utilisés pour mettre à jour une deuxième statistique et l'échantillon redressé évalué par une seconde condition de seuil qui est une fonction de la deuxième statistique. Des échantillons redressés satisfaisant la seconde condition de seuil peuvent être utilisés pour mettre à jour une troisième statistique. L'échantillon de signal audio peut être sélectionné en tant que parole si la deuxième statistique est inférieure à une troisième statistique à échelle réduite.