Automatic speech and speaker recognition large margin and kernel methods

Gespeichert in:

Bibliographische Detailangaben
Format:	Buch
Sprache:	English
Veröffentlicht:	Chichester Wiley 2009
Ausgabe:	1. publ.
Schlagworte:	Automatic speech recognition Automatische Sprechererkennung Automatische Spracherkennung
Online-Zugang:	Inhaltsverzeichnis
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

MARC


LEADER	00000nam a2200000zc 4500
001	BV035291889
003	DE-604
005	20111004
007	t
008	090205s2009 xxuad\|\| \|\|\|\| 00\|\|\| eng d
010			\|a 2008038551
020			\|a 9780470696835 \|c cloth \|9 978-0-470-69683-5
035			\|a (OCoLC)245025429
035			\|a (DE-599)BVBBV035291889
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
044			\|a xxu \|c US
049			\|a DE-703 \|a DE-29T \|a DE-11 \|a DE-83
050		0	\|a TK7895.S65
082	0		\|a 006.4/54
084			\|a ST 306 \|0 (DE-625)143654: \|2 rvk
084			\|a ZN 6070 \|0 (DE-625)157501: \|2 rvk
245	1	0	\|a Automatic speech and speaker recognition \|b large margin and kernel methods \|c [ed. by] Joseph Keshet...
250			\|a 1. publ.
264		1	\|a Chichester \|b Wiley \|c 2009
300			\|a XIII, 253 S. \|b Ill., graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
500			\|a Includes bibliographical references and index
650		4	\|a Automatic speech recognition
650	0	7	\|a Automatische Sprechererkennung \|0 (DE-588)4143704-4 \|2 gnd \|9 rswk-swf
650	0	7	\|a Automatische Spracherkennung \|0 (DE-588)4003961-4 \|2 gnd \|9 rswk-swf
689	0	0	\|a Automatische Spracherkennung \|0 (DE-588)4003961-4 \|D s
689	0	1	\|a Automatische Sprechererkennung \|0 (DE-588)4143704-4 \|D s
689	0		\|5 DE-604
700	1		\|a Keshet, Joseph \|e Sonstige \|4 oth
856	4	2	\|m Digitalisierung UB Bayreuth \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-017096920

Datensatz im Suchindex

_version_	1804138588790587392
adam_text	Contents List of Contributors xi Preface xv I Foundations 1 1 Introduction 3 Samy Bengio and Joseph Késhet 1.1 The Traditional Approach to Speech Processing ................ 3 1.2 Potential Problems of the Probabilistic Approach ............... 5 1.3 Support Vector Machines for Binary Classification .............. 7 1.4 Outline ..................................... 8 References ....................................... 9 2 Theory and Practice of Support Vector Machines Optimization 11 Shai Shalev-Shwartz and Nathan Srebo 2.1 Introduction ................................... 11 2.2 SVM and /^-regularized Linear Prediction .................. 12 2.2.1 Binary Classification and the Traditional SVM ............ 12 2.2.2 More General Loss Functions ..................... 13 2.2.3 Examples ................................ 13 2.2.4 Kernels ................................. 14 2.2.5 Incorporating a Bias Term ....................... 15 2.3 Optimization Accuracy From a Machine Learning Perspective ........ 16 2.4 Stochastic Gradient Descent .......................... 18 2.4.1 Sub-gradient Calculus ......................... 20 2.4.2 Rate of Convergence and Stopping Criteria .............. 21 2.5 Dual Decomposition Methods ......................... 22 2.5.1 Duality ................................. 23 2.6 Summary .................................... 25 References ....................................... 26 vi CONTENTS 3 From Binary Classification to Categorial Prediction 27 Koby Crammer 3.1 Multi-category Problems ............................ 27 3.2 Hypothesis Class ................................ 31 3.3 Loss Functions ................................. 32 3.3.1 Combinatorial Loss Functions ..................... 33 3.4 Hinge Loss Functions .............................. 35 3.5 A Generalized Perceptron Algorithm ...................... 36 3.6 A Generalized Passive-Aggressive Algorithm ................. 39 3.6.1 Dual Formulation ............................ 40 3.7 A Batch Formulation .............................. 41 3.8 Concluding Remarks .............................. 43 3.9 Appendix. Derivations of the Duals of the Passive-Aggressive Algorithm and the Batch Formulation ........................... 44 3.9.1 Derivation of the Dual of the Passive-Aggressive Algorithm ..... 44 3.9.2 Derivation of the Dual of the Batch Formulation ........... 46 References ....................................... 48 II Acoustic Modeling 51 4 A Large Margin Algorithm for Forced Alignment 53 Joseph Késhet, Shai Shalev-Shwartz, Yoram Singer and Dan Chazan 4.1 Introduction ................................... 54 4.2 Problem Setting ................................. 54 4.3 Cost and Risk .................................. 55 4.4 A Large Margin Approach for Forced Alignment ............... 56 4.5 An Iterative Algorithm ............................. 57 4.6 Efficient Evaluation of the Alignment Function ................ 62 4.7 Base Alignment Functions ........................... 64 4.8 Experimental Results .............................. 66 4.9 Discussion .................................... 67 References ....................................... 67 5 A Kernel Wrapper for Phoneme Sequence Recognition 69 Joseph Késhet and Dan Chazan 5.1 Introduction ................................... 69 5.2 Problem Setting ................................. 70 5.3 Frame-based Phoneme Classifier ........................ 71 5.4 Kernel-based Iterative Algorithm for Phoneme Recognition .......... 71 5.5 Nonlinear Feature Functions .......................... 75 5.5.1 Acoustic Modeling ........................... 75 5.5.2 Duration Modeling ........................... 77 5.5.3 Transition Modeling .......................... 78 CONTENTS vii 5.6 Preliminary Experimental Results ....................... 78 5.7 Discussion: Can we Hope for Better Results? ................. 79 References ....................................... 80 6 Augmented Statistical Models: Using Dynamic Kernels for Acoustic Models 83 Mark J. F Gales 6.1 Introduction ................................... 84 6.2 Temporal Correlation Modeling ........................ 84 6.3 Dynamic Kernels ................................ 86 6.3.1 Static and Dynamic Kernels ...................... 87 6.3.2 Generative Kernels ........................... 88 6.3.3 Simple Example ............................ 90 6.4 Augmented Statistical Models ......................... 92 6.4.1 Generative Augmented Models .................... 92 6.4.2 Conditional Augmented Models .................... 94 6.5 Experimental Results .............................. 95 6.6 Conclusions ................................... 97 Acknowledgements .................................. 97 References ....................................... 98 7 Large Margin Training of Continuous Density Hidden Markov Models 101 Fei Sha and Lawrence K. Saul 7.1 Introduction ................................... 101 7.2 Background ................................... 103 7.2.1 Maximum Likelihood Estimation ................... 103 7.2.2 Conditional Maximum Likelihood ................... 104 7.2.3 Minimum Classification Error ..................... 104 7.3 Large Margin Training ............................. 105 7.3.1 Discriminant Function ......................... 105 7.3.2 Margin Constraints and Hamming Distances ............. 106 7.3.3 Optimization .............................. 106 7.3.4 RelatedWork .............................. 107 7.4 Experimental Results .............................. 107 7.4.1 Large Margin Training ......................... 108 7.4.2 Comparison with CML and MCE ................... 109 7.4.3 Other Variants ............................. 109 7.5 Conclusion ................................... 112 References ....................................... 113 III Language Modeling 115 8 A Survey of Discriminative Language Modeling Approaches for Large Vocabulary Continuous Speech Recognition 117 Brian Roark 8.1 Introduction ...................................117 viii CONTENTS 8.2 General Framework...............................119 8.2.1 Training Data and the GEN Function .................120 8.2.2 Feature Mapping ............................123 8.2.3 Parameter Estimation ..........................127 8.3 Further Developments ..............................130 8.3.1 Novel Features ............................. 130 8.3.2 Novel Objectives ............................ 131 8.3.3 Domain Adaptation ........................... 132 8.4 Summary and Discussion ............................ 133 References ....................................... 134 9 Large Margin Methods for Part-of-Speech Tagging 139 Yasemin Altun 9.1 Introduction ...................................139 9.2 Modeling Sequence Labeling ..........................140 9.2.1 Feature Representation .........................141 9.2.2 Empirical Risk Minimization ......................142 9.2.3 Conditional Random Fields and Sequence Perceptron .........143 9.3 Sequence Boosting ...............................144 9.3.1 Objective Function ...........................145 9.3.2 Optimization Method ..........................145 9.4 Hidden Markov Support Vector Machines ...................149 9.4.1 Objective Function ........................... 149 9.4.2 Optimization Method .......................... 151 9.4.3 Algorithm ................................ 151 9.5 Experiments ................................... 153 9.5.1 Data and Features for Part-of-Speech Tagging ............. 153 9.5.2 Results of Sequence AdaBoost ..................... 154 9.5.3 Results of Hidden Markov Support Vector Machines ......... 155 9.6 Discussion .................................... 156 References ....................................... 156 10 A Proposal for a Kernel Based Algorithm for Large Vocabulary Continuous Speech Recognition 159 Joseph Késhet 10.1 Introduction ...................................159 10.2 Segment Models and Hidden Markov Models .................161 10.3 Kernel Based Model ..............................163 10.4 Large Margin Training .............................164 10.5 Implementation Details .............................166 10.5.1 Iterative Algorithm ...........................166 10.5.2 Recognition Feature Functions .....................167 10.5.3 The Decoder ..............................169 10.5.4 Complexity ...............................169 CONTENTS ix 10.6 Discussion ....................................170 Acknowledgements ..................................170 References .......................................170 IV Applications 173 11 Discriminative Keyword Spotting 175 David Grangier, Joseph Késhet and Samy Bengio 11.1 Introduction ...................................175 11.2 Previous Work ..................................177 11.3 Discriminative Keyword Spotting .......................180 11.3.1 Problem Setting ............................. 180 11.3.2 Loss Function and Model Parameterization .............. 182 11.3.3 An Iterative Training Algorithm .................... 184 11.3.4 Analysis ................................ 185 11.4 Experiments and Results ............................ 188 11.4.1 The TIMIT Experiments ........................188 11.4.2 The WSJ Experiments .........................190 11.5 Conclusions ...................................191 Acknowledgements ..................................193 References .......................................193 12 Kernel-based Text-independent Speaker Verification 195 Johnny Mariéthoz, Samy Bengio and Yves Grandvalet 12.1 Introduction ...................................196 12.2 Generative Approaches .............................197 12.2.1 Rationale ................................197 12.2.2 Gaussian Mixture Models .......................198 12.3 Discriminative Approaches ...........................199 12.3.1 Support Vector Machines ........................199 12.3.2 Kernels .................................200 12.4 Benchmarking Methodology ..........................201 12.4.1 Data Splitting for Speaker Verification .................201 12.4.2 Performance Measures .........................202 12.4.3 NIST Data ...............................203 12.4.4 Pre-processing .............................203 12.5 Kernels for Speaker Verification ........................203 12.5.1 Mean Operator Sequence Kernels ...................204 12.5.2 Fisher Kernels .............................205 12.5.3 Beyond Fisher Kernels .........................210 12.6 Parameter Sharing ................................212 12.6.1 Nuisance Attribute Projection .....................213 12.6.2 Other Approaches ...........................214 12.7 Is the Margin Useful for This Problem? ....................215 12.8 Comparing all Methods .............................216 x CONTENTS 12.9 Conclusion ...................................218 References .......................................219 13 Spectral Clustering for Speech Separation 221 Francis R. Bach and Michael I. Jordan 13.1 Introduction ...................................221 13.2 Spectral Clustering and Normalized Cuts ....................223 13.2.1 Similarity Matrices ...........................223 13.2.2 Normalized Cuts ............................223 13.2.3 Spectral Relaxation ...........................225 13.2.4 Rounding ................................226 13.2.5 Spectral Clustering Algorithms ....................227 13.2.6 Variational Formulation for the Normalized Cut ............229 13.3 Cost Functions for Learning the Similarity Matrix ...............229 13.3.1 Distance Between Partitions ......................230 13.3.2 Cost Functions as Upper Bounds ....................230 13.3.3 Functions of Eigensubspaces......................231 13.3.4 Empirical Comparisons Between Cost Functions ...........233 13.4 Algorithms for Learning the Similarity Matrix .................234 13.4.1 Learning Algorithm ..........................236 13.4.2 Related Work ..............................236 13.4.3 Testing Algorithm ...........................236 13.4.4 Handling very Large Similarity Matrices ...............237 13.4.5 Simulations on Toy Examples .....................239 13.5 Speech Separation as Spectrogram Segmentation ...............239 13.5.1 Spectrogram .............................. 240 13.5.2 Normalization and Subsampling .................... 241 13.5.3 Generating Training Samples ...................... 241 13.5.4 Features and Grouping Cues for Speech Separation .......... 242 13.6 Spectral Clustering for Speech Separation ................... 244 13.6.1 Basis Similarity Matrices ........................244 13.6.2 Combination of Similarity Matrices ..................244 13.6.3 Approximations of Similarity Matrices ................245 13.6.4 Experiments ..............................245 13.7 Conclusions ...................................247 References .......................................248 Index 251
any_adam_object	1
building	Verbundindex
bvnumber	BV035291889
callnumber-first	T - Technology
callnumber-label	TK7895
callnumber-raw	TK7895.S65
callnumber-search	TK7895.S65
callnumber-sort	TK 47895 S65
callnumber-subject	TK - Electrical and Nuclear Engineering
classification_rvk	ST 306 ZN 6070
ctrlnum	(OCoLC)245025429 (DE-599)BVBBV035291889
dewey-full	006.4/54
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	006 - Special computer methods
dewey-raw	006.4/54
dewey-search	006.4/54
dewey-sort	16.4 254
dewey-tens	000 - Computer science, information, general works
discipline	Informatik Elektrotechnik / Elektronik / Nachrichtentechnik
edition	1. publ.
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01688nam a2200433zc 4500</leader><controlfield tag="001">BV035291889</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20111004 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">090205s2009 xxuad\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2008038551</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780470696835</subfield><subfield code="c">cloth</subfield><subfield code="9">978-0-470-69683-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)245025429</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035291889</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-703</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-83</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7895.S65</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.4/54</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6070</subfield><subfield code="0">(DE-625)157501:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Automatic speech and speaker recognition</subfield><subfield code="b">large margin and kernel methods</subfield><subfield code="c">[ed. by] Joseph Keshet...</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Chichester</subfield><subfield code="b">Wiley</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIII, 253 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic speech recognition</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Keshet, Joseph</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-017096920</subfield></datafield></record></collection>
id	DE-604.BV035291889
illustrated	Illustrated
indexdate	2024-07-09T21:30:35Z
institution	BVB
isbn	9780470696835
language	English
lccn	2008038551
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-017096920
oclc_num	245025429
open_access_boolean
owner	DE-703 DE-29T DE-11 DE-83
owner_facet	DE-703 DE-29T DE-11 DE-83
physical	XIII, 253 S. Ill., graph. Darst.
publishDate	2009
publishDateSearch	2009
publishDateSort	2009
publisher	Wiley
record_format	marc
spelling	Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet... 1. publ. Chichester Wiley 2009 XIII, 253 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references and index Automatic speech recognition Automatische Sprechererkennung (DE-588)4143704-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 s Automatische Sprechererkennung (DE-588)4143704-4 s DE-604 Keshet, Joseph Sonstige oth Digitalisierung UB Bayreuth application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Automatic speech and speaker recognition large margin and kernel methods Automatic speech recognition Automatische Sprechererkennung (DE-588)4143704-4 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd
subject_GND	(DE-588)4143704-4 (DE-588)4003961-4
title	Automatic speech and speaker recognition large margin and kernel methods
title_auth	Automatic speech and speaker recognition large margin and kernel methods
title_exact_search	Automatic speech and speaker recognition large margin and kernel methods
title_full	Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
title_fullStr	Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
title_full_unstemmed	Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
title_short	Automatic speech and speaker recognition
title_sort	automatic speech and speaker recognition large margin and kernel methods
title_sub	large margin and kernel methods
topic	Automatic speech recognition Automatische Sprechererkennung (DE-588)4143704-4 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd
topic_facet	Automatic speech recognition Automatische Sprechererkennung Automatische Spracherkennung
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT keshetjoseph automaticspeechandspeakerrecognitionlargemarginandkernelmethods

Automatic speech and speaker recognition large margin and kernel methods

MARC

Datensatz im Suchindex

Ähnliche Einträge