A speech/music discriminator based on RMS and zero-crossings

Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2005-02, Vol.7 (1), p.155-166
Hauptverfasser:	Panagiotakis, C., Tziritas, G.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied sciences Artificial intelligence Audio databases Audio segmentation Audio signals Audio-visual systems Broadcasting. Videocommunications. Audiovisual Classification Classification algorithms Computation Computer science control theory systems Content based retrieval Data mining Exact sciences and technology Multiple signal classification Music Music information retrieval Real time systems Segmentation Segments Speech Speech and sound recognition and synthesis. Linguistics speech/music classification Studies Telecommunications Telecommunications and information theory zero-crossing rate
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	166
container_issue	1
container_start_page	155
container_title	IEEE transactions on multimedia
container_volume	7
creator	Panagiotakis, C. Tziritas, G.
description	Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.
doi_str_mv	10.1109/TMM.2004.840604
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2004_840604</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1386250</ieee_id><sourcerecordid>28094373</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-24f1afb91e9b8cd12098379d9b7fd2bb3e61fc6a7045d0b1da94cea5c9b9d3113</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhhdRsFbPHrwsgnradibJbhLwUopf0CJoPYdsktUt7W5Nugf99aa2UPDgaQbmmWHeJ0nOEQaIIIez6XRAANhAMCiAHSQ9lAwzAM4PY58TyCRBOE5OQpgDIMuB95LbURpWzpmP4bILtUltHYyvl3Wj161PSx2cTdsmfZm-prqx6bfzbWZ8G0LdvIfT5KjSi-DOdrWfvN3fzcaP2eT54Wk8mmSGClxnhFWoq1Kik6UwFglIQbm0suSVJWVJXYGVKTQHllso0WrJjNO5kaW0FJH2k5vt3ZVvPzsX1moZ33SLhW5c2wUlZEEIiJxE8vpfkgiQjHIawcs_4LztfBNTKFFIBIFcRmi4hX4Te1epVXSj_ZdCUBvpKkpXG-lqKz1uXO3O6mD0ovK6MXXYrxU58pyLyF1sudo5tx9TUZAc6A80iYi_</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>869108179</pqid></control><display><type>article</type><title>A speech/music discriminator based on RMS and zero-crossings</title><source>IEEE Electronic Library (IEL)</source><creator>Panagiotakis, C. ; Tziritas, G.</creator><creatorcontrib>Panagiotakis, C. ; Tziritas, G.</creatorcontrib><description>Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2004.840604</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Applied sciences ; Artificial intelligence ; Audio databases ; Audio segmentation ; Audio signals ; Audio-visual systems ; Broadcasting. Videocommunications. Audiovisual ; Classification ; Classification algorithms ; Computation ; Computer science; control theory; systems ; Content based retrieval ; Data mining ; Exact sciences and technology ; Multiple signal classification ; Music ; Music information retrieval ; Real time systems ; Segmentation ; Segments ; Speech ; Speech and sound recognition and synthesis. Linguistics ; speech/music classification ; Studies ; Telecommunications ; Telecommunications and information theory ; zero-crossing rate</subject><ispartof>IEEE transactions on multimedia, 2005-02, Vol.7 (1), p.155-166</ispartof><rights>2005 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-24f1afb91e9b8cd12098379d9b7fd2bb3e61fc6a7045d0b1da94cea5c9b9d3113</citedby><cites>FETCH-LOGICAL-c381t-24f1afb91e9b8cd12098379d9b7fd2bb3e61fc6a7045d0b1da94cea5c9b9d3113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1386250$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1386250$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16517578$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Panagiotakis, C.</creatorcontrib><creatorcontrib>Tziritas, G.</creatorcontrib><title>A speech/music discriminator based on RMS and zero-crossings</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Audio databases</subject><subject>Audio segmentation</subject><subject>Audio signals</subject><subject>Audio-visual systems</subject><subject>Broadcasting. Videocommunications. Audiovisual</subject><subject>Classification</subject><subject>Classification algorithms</subject><subject>Computation</subject><subject>Computer science; control theory; systems</subject><subject>Content based retrieval</subject><subject>Data mining</subject><subject>Exact sciences and technology</subject><subject>Multiple signal classification</subject><subject>Music</subject><subject>Music information retrieval</subject><subject>Real time systems</subject><subject>Segmentation</subject><subject>Segments</subject><subject>Speech</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>speech/music classification</subject><subject>Studies</subject><subject>Telecommunications</subject><subject>Telecommunications and information theory</subject><subject>zero-crossing rate</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp9kE1LAzEQhhdRsFbPHrwsgnradibJbhLwUopf0CJoPYdsktUt7W5Nugf99aa2UPDgaQbmmWHeJ0nOEQaIIIez6XRAANhAMCiAHSQ9lAwzAM4PY58TyCRBOE5OQpgDIMuB95LbURpWzpmP4bILtUltHYyvl3Wj161PSx2cTdsmfZm-prqx6bfzbWZ8G0LdvIfT5KjSi-DOdrWfvN3fzcaP2eT54Wk8mmSGClxnhFWoq1Kik6UwFglIQbm0suSVJWVJXYGVKTQHllso0WrJjNO5kaW0FJH2k5vt3ZVvPzsX1moZ33SLhW5c2wUlZEEIiJxE8vpfkgiQjHIawcs_4LztfBNTKFFIBIFcRmi4hX4Te1epVXSj_ZdCUBvpKkpXG-lqKz1uXO3O6mD0ovK6MXXYrxU58pyLyF1sudo5tx9TUZAc6A80iYi_</recordid><startdate>20050201</startdate><enddate>20050201</enddate><creator>Panagiotakis, C.</creator><creator>Tziritas, G.</creator><general>IEEE</general><general>Institute of Electrical and Electronic Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20050201</creationdate><title>A speech/music discriminator based on RMS and zero-crossings</title><author>Panagiotakis, C. ; Tziritas, G.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-24f1afb91e9b8cd12098379d9b7fd2bb3e61fc6a7045d0b1da94cea5c9b9d3113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Audio databases</topic><topic>Audio segmentation</topic><topic>Audio signals</topic><topic>Audio-visual systems</topic><topic>Broadcasting. Videocommunications. Audiovisual</topic><topic>Classification</topic><topic>Classification algorithms</topic><topic>Computation</topic><topic>Computer science; control theory; systems</topic><topic>Content based retrieval</topic><topic>Data mining</topic><topic>Exact sciences and technology</topic><topic>Multiple signal classification</topic><topic>Music</topic><topic>Music information retrieval</topic><topic>Real time systems</topic><topic>Segmentation</topic><topic>Segments</topic><topic>Speech</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>speech/music classification</topic><topic>Studies</topic><topic>Telecommunications</topic><topic>Telecommunications and information theory</topic><topic>zero-crossing rate</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Panagiotakis, C.</creatorcontrib><creatorcontrib>Tziritas, G.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Panagiotakis, C.</au><au>Tziritas, G.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A speech/music discriminator based on RMS and zero-crossings</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2005-02-01</date><risdate>2005</risdate><volume>7</volume><issue>1</issue><spage>155</spage><epage>166</epage><pages>155-166</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TMM.2004.840604</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2005-02, Vol.7 (1), p.155-166
issn	1520-9210 1941-0077
language	eng
recordid	cdi_crossref_primary_10_1109_TMM_2004_840604
source	IEEE Electronic Library (IEL)
subjects	Algorithms Applied sciences Artificial intelligence Audio databases Audio segmentation Audio signals Audio-visual systems Broadcasting. Videocommunications. Audiovisual Classification Classification algorithms Computation Computer science control theory systems Content based retrieval Data mining Exact sciences and technology Multiple signal classification Music Music information retrieval Real time systems Segmentation Segments Speech Speech and sound recognition and synthesis. Linguistics speech/music classification Studies Telecommunications Telecommunications and information theory zero-crossing rate
title	A speech/music discriminator based on RMS and zero-crossings
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T03%3A21%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20speech/music%20discriminator%20based%20on%20RMS%20and%20zero-crossings&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Panagiotakis,%20C.&rft.date=2005-02-01&rft.volume=7&rft.issue=1&rft.spage=155&rft.epage=166&rft.pages=155-166&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2004.840604&rft_dat=%3Cproquest_RIE%3E28094373%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=869108179&rft_id=info:pmid/&rft_ieee_id=1386250&rfr_iscdi=true