FACE DETECTION GUIDED SOUND SOURCE LOCALIZATION PAN ANGLE POST PROCESSING FOR SMART CAMERA TALKER TRACKING AND FRAMING
A videoconferencing system includes a camera acquiring image data and a microphone array acquiring audio data. Image data is used in conjunction with sound source localization (SSL) data to locate a talker depicted in the image data. SSL processes the audio data and determines SSL pan angle values i...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A videoconferencing system includes a camera acquiring image data and a microphone array acquiring audio data. Image data is used in conjunction with sound source localization (SSL) data to locate a talker depicted in the image data. SSL processes the audio data and determines SSL pan angle values indicative of an estimated direction of a sound. Columns of pixels in an image are associated with bins. A bin count is incremented for each SSL pan angle value of the audio data that falls within a given bin. A bounding box in the image data is determined that encompasses a face depicted in the image data. A range of pixels is determined for the bounding box, such as extending from a leftmost column to a rightmost column. The bin with the highest bin count that also overlaps a range of pixels for a bounding box is deemed to contain the talker. |
---|