Automatic speech and speaker recognition large margin and kernel methods

Gespeichert in:
Bibliographische Detailangaben
Format: Buch
Sprache:English
Veröffentlicht: Chichester Wiley 2009
Ausgabe:1. publ.
Schlagworte:
Online-Zugang:Inhaltsverzeichnis
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!

MARC

LEADER 00000nam a2200000zc 4500
001 BV035291889
003 DE-604
005 20111004
007 t
008 090205s2009 xxuad|| |||| 00||| eng d
010 |a 2008038551 
020 |a 9780470696835  |c cloth  |9 978-0-470-69683-5 
035 |a (OCoLC)245025429 
035 |a (DE-599)BVBBV035291889 
040 |a DE-604  |b ger  |e aacr 
041 0 |a eng 
044 |a xxu  |c US 
049 |a DE-703  |a DE-29T  |a DE-11  |a DE-83 
050 0 |a TK7895.S65 
082 0 |a 006.4/54 
084 |a ST 306  |0 (DE-625)143654:  |2 rvk 
084 |a ZN 6070  |0 (DE-625)157501:  |2 rvk 
245 1 0 |a Automatic speech and speaker recognition  |b large margin and kernel methods  |c [ed. by] Joseph Keshet... 
250 |a 1. publ. 
264 1 |a Chichester  |b Wiley  |c 2009 
300 |a XIII, 253 S.  |b Ill., graph. Darst. 
336 |b txt  |2 rdacontent 
337 |b n  |2 rdamedia 
338 |b nc  |2 rdacarrier 
500 |a Includes bibliographical references and index 
650 4 |a Automatic speech recognition 
650 0 7 |a Automatische Sprechererkennung  |0 (DE-588)4143704-4  |2 gnd  |9 rswk-swf 
650 0 7 |a Automatische Spracherkennung  |0 (DE-588)4003961-4  |2 gnd  |9 rswk-swf 
689 0 0 |a Automatische Spracherkennung  |0 (DE-588)4003961-4  |D s 
689 0 1 |a Automatische Sprechererkennung  |0 (DE-588)4143704-4  |D s 
689 0 |5 DE-604 
700 1 |a Keshet, Joseph  |e Sonstige  |4 oth 
856 4 2 |m Digitalisierung UB Bayreuth  |q application/pdf  |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA  |3 Inhaltsverzeichnis 
999 |a oai:aleph.bib-bvb.de:BVB01-017096920 

Datensatz im Suchindex

_version_ 1804138588790587392
adam_text Contents List of Contributors xi Preface xv I Foundations 1 1 Introduction 3 Samy Bengio and Joseph Késhet 1.1 The Traditional Approach to Speech Processing ................ 3 1.2 Potential Problems of the Probabilistic Approach ............... 5 1.3 Support Vector Machines for Binary Classification .............. 7 1.4 Outline ..................................... 8 References ....................................... 9 2 Theory and Practice of Support Vector Machines Optimization 11 Shai Shalev-Shwartz and Nathan Srebo 2.1 Introduction ................................... 11 2.2 SVM and /^-regularized Linear Prediction .................. 12 2.2.1 Binary Classification and the Traditional SVM ............ 12 2.2.2 More General Loss Functions ..................... 13 2.2.3 Examples ................................ 13 2.2.4 Kernels ................................. 14 2.2.5 Incorporating a Bias Term ....................... 15 2.3 Optimization Accuracy From a Machine Learning Perspective ........ 16 2.4 Stochastic Gradient Descent .......................... 18 2.4.1 Sub-gradient Calculus ......................... 20 2.4.2 Rate of Convergence and Stopping Criteria .............. 21 2.5 Dual Decomposition Methods ......................... 22 2.5.1 Duality ................................. 23 2.6 Summary .................................... 25 References ....................................... 26 vi CONTENTS 3 From Binary Classification to Categorial Prediction 27 Koby Crammer 3.1 Multi-category Problems ............................ 27 3.2 Hypothesis Class ................................ 31 3.3 Loss Functions ................................. 32 3.3.1 Combinatorial Loss Functions ..................... 33 3.4 Hinge Loss Functions .............................. 35 3.5 A Generalized Perceptron Algorithm ...................... 36 3.6 A Generalized Passive-Aggressive Algorithm ................. 39 3.6.1 Dual Formulation ............................ 40 3.7 A Batch Formulation .............................. 41 3.8 Concluding Remarks .............................. 43 3.9 Appendix. Derivations of the Duals of the Passive-Aggressive Algorithm and the Batch Formulation ........................... 44 3.9.1 Derivation of the Dual of the Passive-Aggressive Algorithm ..... 44 3.9.2 Derivation of the Dual of the Batch Formulation ........... 46 References ....................................... 48 II Acoustic Modeling 51 4 A Large Margin Algorithm for Forced Alignment 53 Joseph Késhet, Shai Shalev-Shwartz, Yoram Singer and Dan Chazan 4.1 Introduction ................................... 54 4.2 Problem Setting ................................. 54 4.3 Cost and Risk .................................. 55 4.4 A Large Margin Approach for Forced Alignment ............... 56 4.5 An Iterative Algorithm ............................. 57 4.6 Efficient Evaluation of the Alignment Function ................ 62 4.7 Base Alignment Functions ........................... 64 4.8 Experimental Results .............................. 66 4.9 Discussion .................................... 67 References ....................................... 67 5 A Kernel Wrapper for Phoneme Sequence Recognition 69 Joseph Késhet and Dan Chazan 5.1 Introduction ................................... 69 5.2 Problem Setting ................................. 70 5.3 Frame-based Phoneme Classifier ........................ 71 5.4 Kernel-based Iterative Algorithm for Phoneme Recognition .......... 71 5.5 Nonlinear Feature Functions .......................... 75 5.5.1 Acoustic Modeling ........................... 75 5.5.2 Duration Modeling ........................... 77 5.5.3 Transition Modeling .......................... 78 CONTENTS vii 5.6 Preliminary Experimental Results ....................... 78 5.7 Discussion: Can we Hope for Better Results? ................. 79 References ....................................... 80 6 Augmented Statistical Models: Using Dynamic Kernels for Acoustic Models 83 Mark J. F Gales 6.1 Introduction ................................... 84 6.2 Temporal Correlation Modeling ........................ 84 6.3 Dynamic Kernels ................................ 86 6.3.1 Static and Dynamic Kernels ...................... 87 6.3.2 Generative Kernels ........................... 88 6.3.3 Simple Example ............................ 90 6.4 Augmented Statistical Models ......................... 92 6.4.1 Generative Augmented Models .................... 92 6.4.2 Conditional Augmented Models .................... 94 6.5 Experimental Results .............................. 95 6.6 Conclusions ................................... 97 Acknowledgements .................................. 97 References ....................................... 98 7 Large Margin Training of Continuous Density Hidden Markov Models 101 Fei Sha and Lawrence K. Saul 7.1 Introduction ................................... 101 7.2 Background ................................... 103 7.2.1 Maximum Likelihood Estimation ................... 103 7.2.2 Conditional Maximum Likelihood ................... 104 7.2.3 Minimum Classification Error ..................... 104 7.3 Large Margin Training ............................. 105 7.3.1 Discriminant Function ......................... 105 7.3.2 Margin Constraints and Hamming Distances ............. 106 7.3.3 Optimization .............................. 106 7.3.4 RelatedWork .............................. 107 7.4 Experimental Results .............................. 107 7.4.1 Large Margin Training ......................... 108 7.4.2 Comparison with CML and MCE ................... 109 7.4.3 Other Variants ............................. 109 7.5 Conclusion ................................... 112 References ....................................... 113 III Language Modeling 115 8 A Survey of Discriminative Language Modeling Approaches for Large Vocabulary Continuous Speech Recognition 117 Brian Roark 8.1 Introduction ...................................117 viii CONTENTS 8.2 General Framework...............................119 8.2.1 Training Data and the GEN Function .................120 8.2.2 Feature Mapping ............................123 8.2.3 Parameter Estimation ..........................127 8.3 Further Developments ..............................130 8.3.1 Novel Features ............................. 130 8.3.2 Novel Objectives ............................ 131 8.3.3 Domain Adaptation ........................... 132 8.4 Summary and Discussion ............................ 133 References ....................................... 134 9 Large Margin Methods for Part-of-Speech Tagging 139 Yasemin Altun 9.1 Introduction ...................................139 9.2 Modeling Sequence Labeling ..........................140 9.2.1 Feature Representation .........................141 9.2.2 Empirical Risk Minimization ......................142 9.2.3 Conditional Random Fields and Sequence Perceptron .........143 9.3 Sequence Boosting ...............................144 9.3.1 Objective Function ...........................145 9.3.2 Optimization Method ..........................145 9.4 Hidden Markov Support Vector Machines ...................149 9.4.1 Objective Function ........................... 149 9.4.2 Optimization Method .......................... 151 9.4.3 Algorithm ................................ 151 9.5 Experiments ................................... 153 9.5.1 Data and Features for Part-of-Speech Tagging ............. 153 9.5.2 Results of Sequence AdaBoost ..................... 154 9.5.3 Results of Hidden Markov Support Vector Machines ......... 155 9.6 Discussion .................................... 156 References ....................................... 156 10 A Proposal for a Kernel Based Algorithm for Large Vocabulary Continuous Speech Recognition 159 Joseph Késhet 10.1 Introduction ...................................159 10.2 Segment Models and Hidden Markov Models .................161 10.3 Kernel Based Model ..............................163 10.4 Large Margin Training .............................164 10.5 Implementation Details .............................166 10.5.1 Iterative Algorithm ...........................166 10.5.2 Recognition Feature Functions .....................167 10.5.3 The Decoder ..............................169 10.5.4 Complexity ...............................169 CONTENTS ix 10.6 Discussion ....................................170 Acknowledgements ..................................170 References .......................................170 IV Applications 173 11 Discriminative Keyword Spotting 175 David Grangier, Joseph Késhet and Samy Bengio 11.1 Introduction ...................................175 11.2 Previous Work ..................................177 11.3 Discriminative Keyword Spotting .......................180 11.3.1 Problem Setting ............................. 180 11.3.2 Loss Function and Model Parameterization .............. 182 11.3.3 An Iterative Training Algorithm .................... 184 11.3.4 Analysis ................................ 185 11.4 Experiments and Results ............................ 188 11.4.1 The TIMIT Experiments ........................188 11.4.2 The WSJ Experiments .........................190 11.5 Conclusions ...................................191 Acknowledgements ..................................193 References .......................................193 12 Kernel-based Text-independent Speaker Verification 195 Johnny Mariéthoz, Samy Bengio and Yves Grandvalet 12.1 Introduction ...................................196 12.2 Generative Approaches .............................197 12.2.1 Rationale ................................197 12.2.2 Gaussian Mixture Models .......................198 12.3 Discriminative Approaches ...........................199 12.3.1 Support Vector Machines ........................199 12.3.2 Kernels .................................200 12.4 Benchmarking Methodology ..........................201 12.4.1 Data Splitting for Speaker Verification .................201 12.4.2 Performance Measures .........................202 12.4.3 NIST Data ...............................203 12.4.4 Pre-processing .............................203 12.5 Kernels for Speaker Verification ........................203 12.5.1 Mean Operator Sequence Kernels ...................204 12.5.2 Fisher Kernels .............................205 12.5.3 Beyond Fisher Kernels .........................210 12.6 Parameter Sharing ................................212 12.6.1 Nuisance Attribute Projection .....................213 12.6.2 Other Approaches ...........................214 12.7 Is the Margin Useful for This Problem? ....................215 12.8 Comparing all Methods .............................216 x CONTENTS 12.9 Conclusion ...................................218 References .......................................219 13 Spectral Clustering for Speech Separation 221 Francis R. Bach and Michael I. Jordan 13.1 Introduction ...................................221 13.2 Spectral Clustering and Normalized Cuts ....................223 13.2.1 Similarity Matrices ...........................223 13.2.2 Normalized Cuts ............................223 13.2.3 Spectral Relaxation ...........................225 13.2.4 Rounding ................................226 13.2.5 Spectral Clustering Algorithms ....................227 13.2.6 Variational Formulation for the Normalized Cut ............229 13.3 Cost Functions for Learning the Similarity Matrix ...............229 13.3.1 Distance Between Partitions ......................230 13.3.2 Cost Functions as Upper Bounds ....................230 13.3.3 Functions of Eigensubspaces......................231 13.3.4 Empirical Comparisons Between Cost Functions ...........233 13.4 Algorithms for Learning the Similarity Matrix .................234 13.4.1 Learning Algorithm ..........................236 13.4.2 Related Work ..............................236 13.4.3 Testing Algorithm ...........................236 13.4.4 Handling very Large Similarity Matrices ...............237 13.4.5 Simulations on Toy Examples .....................239 13.5 Speech Separation as Spectrogram Segmentation ...............239 13.5.1 Spectrogram .............................. 240 13.5.2 Normalization and Subsampling .................... 241 13.5.3 Generating Training Samples ...................... 241 13.5.4 Features and Grouping Cues for Speech Separation .......... 242 13.6 Spectral Clustering for Speech Separation ................... 244 13.6.1 Basis Similarity Matrices ........................244 13.6.2 Combination of Similarity Matrices ..................244 13.6.3 Approximations of Similarity Matrices ................245 13.6.4 Experiments ..............................245 13.7 Conclusions ...................................247 References .......................................248 Index 251
any_adam_object 1
building Verbundindex
bvnumber BV035291889
callnumber-first T - Technology
callnumber-label TK7895
callnumber-raw TK7895.S65
callnumber-search TK7895.S65
callnumber-sort TK 47895 S65
callnumber-subject TK - Electrical and Nuclear Engineering
classification_rvk ST 306
ZN 6070
ctrlnum (OCoLC)245025429
(DE-599)BVBBV035291889
dewey-full 006.4/54
dewey-hundreds 000 - Computer science, information, general works
dewey-ones 006 - Special computer methods
dewey-raw 006.4/54
dewey-search 006.4/54
dewey-sort 16.4 254
dewey-tens 000 - Computer science, information, general works
discipline Informatik
Elektrotechnik / Elektronik / Nachrichtentechnik
edition 1. publ.
format Book
fullrecord <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01688nam a2200433zc 4500</leader><controlfield tag="001">BV035291889</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20111004 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">090205s2009 xxuad|| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2008038551</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780470696835</subfield><subfield code="c">cloth</subfield><subfield code="9">978-0-470-69683-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)245025429</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035291889</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-703</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-83</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7895.S65</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.4/54</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6070</subfield><subfield code="0">(DE-625)157501:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Automatic speech and speaker recognition</subfield><subfield code="b">large margin and kernel methods</subfield><subfield code="c">[ed. by] Joseph Keshet...</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Chichester</subfield><subfield code="b">Wiley</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIII, 253 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic speech recognition</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Keshet, Joseph</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&amp;doc_library=BVB01&amp;local_base=BVB01&amp;doc_number=017096920&amp;sequence=000002&amp;line_number=0001&amp;func_code=DB_RECORDS&amp;service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-017096920</subfield></datafield></record></collection>
id DE-604.BV035291889
illustrated Illustrated
indexdate 2024-07-09T21:30:35Z
institution BVB
isbn 9780470696835
language English
lccn 2008038551
oai_aleph_id oai:aleph.bib-bvb.de:BVB01-017096920
oclc_num 245025429
open_access_boolean
owner DE-703
DE-29T
DE-11
DE-83
owner_facet DE-703
DE-29T
DE-11
DE-83
physical XIII, 253 S. Ill., graph. Darst.
publishDate 2009
publishDateSearch 2009
publishDateSort 2009
publisher Wiley
record_format marc
spelling Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
1. publ.
Chichester Wiley 2009
XIII, 253 S. Ill., graph. Darst.
txt rdacontent
n rdamedia
nc rdacarrier
Includes bibliographical references and index
Automatic speech recognition
Automatische Sprechererkennung (DE-588)4143704-4 gnd rswk-swf
Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf
Automatische Spracherkennung (DE-588)4003961-4 s
Automatische Sprechererkennung (DE-588)4143704-4 s
DE-604
Keshet, Joseph Sonstige oth
Digitalisierung UB Bayreuth application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle Automatic speech and speaker recognition large margin and kernel methods
Automatic speech recognition
Automatische Sprechererkennung (DE-588)4143704-4 gnd
Automatische Spracherkennung (DE-588)4003961-4 gnd
subject_GND (DE-588)4143704-4
(DE-588)4003961-4
title Automatic speech and speaker recognition large margin and kernel methods
title_auth Automatic speech and speaker recognition large margin and kernel methods
title_exact_search Automatic speech and speaker recognition large margin and kernel methods
title_full Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
title_fullStr Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
title_full_unstemmed Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet...
title_short Automatic speech and speaker recognition
title_sort automatic speech and speaker recognition large margin and kernel methods
title_sub large margin and kernel methods
topic Automatic speech recognition
Automatische Sprechererkennung (DE-588)4143704-4 gnd
Automatische Spracherkennung (DE-588)4003961-4 gnd
topic_facet Automatic speech recognition
Automatische Sprechererkennung
Automatische Spracherkennung
url http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv AT keshetjoseph automaticspeechandspeakerrecognitionlargemarginandkernelmethods