Context-based speaker counters for speaker segmentation clustering systems
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in video and corresponding audio using visual context are disclosed. In one aspect, a method includes detecting a plurality of speakers within a video; for each d...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining the number of speakers in video and corresponding audio using visual context are disclosed. In one aspect, a method includes detecting a plurality of speakers within a video; for each detected speaker, determining a bounding box comprising the detected person in the image frame and an object within a threshold distance of the detected person; determining a unique descriptor of the person based in part on image information depicting an object within the bounding box; determining a cardinal number of unique speakers in the video; a cardinal number of unique speakers is provided to a speaker segmentation clustering system.
公开了使用视觉上下文来确定视频和对应音频中的说话者的数量的方法、系统和装置,包括编码在计算机存储介质上的计算机程序。在一个方面,方法包括:在视频内检测多个说话者;对于每个所检测到的说话者,确定包括图像帧中的所检测到的人和在所检测到的人的阈值距离内的对象的边界框;部分地基于描绘边界框内的对象的图像信息来确定该人的独特描述符;确定视频中的独特说话者的基数;向说话者分割聚类系统提供独特说话者的基数。 |
---|