Automatic speech and speaker recognition large margin and kernel methods
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Chichester
Wiley
2009
|
Ausgabe: | 1. publ. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV035291889 | ||
003 | DE-604 | ||
005 | 20111004 | ||
007 | t | ||
008 | 090205s2009 xxuad|| |||| 00||| eng d | ||
010 | |a 2008038551 | ||
020 | |a 9780470696835 |c cloth |9 978-0-470-69683-5 | ||
035 | |a (OCoLC)245025429 | ||
035 | |a (DE-599)BVBBV035291889 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-703 |a DE-29T |a DE-11 |a DE-83 | ||
050 | 0 | |a TK7895.S65 | |
082 | 0 | |a 006.4/54 | |
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a ZN 6070 |0 (DE-625)157501: |2 rvk | ||
245 | 1 | 0 | |a Automatic speech and speaker recognition |b large margin and kernel methods |c [ed. by] Joseph Keshet... |
250 | |a 1. publ. | ||
264 | 1 | |a Chichester |b Wiley |c 2009 | |
300 | |a XIII, 253 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references and index | ||
650 | 4 | |a Automatic speech recognition | |
650 | 0 | 7 | |a Automatische Sprechererkennung |0 (DE-588)4143704-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 0 | 1 | |a Automatische Sprechererkennung |0 (DE-588)4143704-4 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Keshet, Joseph |e Sonstige |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-017096920 |
Datensatz im Suchindex
_version_ | 1804138588790587392 |
---|---|
adam_text | Contents
List of Contributors
xi
Preface
xv
I Foundations
1
1
Introduction
3
Samy
Bengio and Joseph
Késhet
1.1
The Traditional Approach to Speech Processing
................ 3
1.2
Potential Problems of the Probabilistic Approach
............... 5
1.3
Support Vector Machines for Binary Classification
.............. 7
1.4
Outline
..................................... 8
References
....................................... 9
2
Theory and Practice of Support Vector Machines Optimization
11
Shai Shalev-Shwartz and Nathan Srebo
2.1
Introduction
................................... 11
2.2
SVM and /^-regularized Linear Prediction
.................. 12
2.2.1
Binary Classification and the Traditional SVM
............ 12
2.2.2
More General Loss Functions
..................... 13
2.2.3
Examples
................................ 13
2.2.4
Kernels
................................. 14
2.2.5
Incorporating a Bias Term
....................... 15
2.3
Optimization Accuracy From a Machine Learning Perspective
........ 16
2.4
Stochastic Gradient Descent
.......................... 18
2.4.1
Sub-gradient Calculus
......................... 20
2.4.2
Rate of Convergence and Stopping Criteria
.............. 21
2.5
Dual Decomposition Methods
......................... 22
2.5.1
Duality
................................. 23
2.6
Summary
.................................... 25
References
....................................... 26
vi
CONTENTS
3
From Binary Classification to
Categorial
Prediction
27
Koby Crammer
3.1
Multi-category Problems
............................ 27
3.2
Hypothesis Class
................................ 31
3.3
Loss Functions
................................. 32
3.3.1
Combinatorial Loss Functions
..................... 33
3.4
Hinge Loss Functions
.............................. 35
3.5
A Generalized Perceptron Algorithm
...................... 36
3.6
A Generalized Passive-Aggressive Algorithm
................. 39
3.6.1
Dual Formulation
............................ 40
3.7
A Batch Formulation
.............................. 41
3.8
Concluding Remarks
.............................. 43
3.9
Appendix. Derivations of the Duals of the Passive-Aggressive Algorithm
and the Batch Formulation
........................... 44
3.9.1
Derivation of the Dual of the Passive-Aggressive Algorithm
..... 44
3.9.2
Derivation of the Dual of the Batch Formulation
........... 46
References
....................................... 48
II Acoustic Modeling
51
4
A Large Margin Algorithm for Forced Alignment
53
Joseph
Késhet,
Shai Shalev-Shwartz, Yoram Singer and
Dan Chazan
4.1
Introduction
................................... 54
4.2
Problem Setting
................................. 54
4.3
Cost and Risk
.................................. 55
4.4
A Large Margin Approach for Forced Alignment
............... 56
4.5
An Iterative Algorithm
............................. 57
4.6
Efficient Evaluation of the Alignment Function
................ 62
4.7
Base Alignment Functions
........................... 64
4.8
Experimental Results
.............................. 66
4.9
Discussion
.................................... 67
References
....................................... 67
5
A Kernel Wrapper for Phoneme Sequence Recognition
69
Joseph
Késhet
and
Dan Chazan
5.1
Introduction
................................... 69
5.2
Problem Setting
................................. 70
5.3
Frame-based Phoneme Classifier
........................ 71
5.4
Kernel-based Iterative Algorithm for Phoneme Recognition
.......... 71
5.5
Nonlinear Feature Functions
.......................... 75
5.5.1
Acoustic Modeling
........................... 75
5.5.2
Duration Modeling
........................... 77
5.5.3
Transition Modeling
.......................... 78
CONTENTS
vii
5.6
Preliminary
Experimental
Results
....................... 78
5.7
Discussion: Can we Hope for Better Results?
................. 79
References
....................................... 80
6
Augmented Statistical Models: Using Dynamic Kernels for Acoustic Models
83
Mark J.
F
Gales
6.1
Introduction
................................... 84
6.2
Temporal Correlation Modeling
........................ 84
6.3
Dynamic Kernels
................................ 86
6.3.1
Static and Dynamic Kernels
...................... 87
6.3.2
Generative Kernels
........................... 88
6.3.3
Simple Example
............................ 90
6.4
Augmented Statistical Models
......................... 92
6.4.1
Generative Augmented Models
.................... 92
6.4.2
Conditional Augmented Models
.................... 94
6.5
Experimental Results
.............................. 95
6.6
Conclusions
................................... 97
Acknowledgements
.................................. 97
References
....................................... 98
7
Large Margin Training of Continuous Density Hidden Markov Models
101
Fei
Sha
and Lawrence K. Saul
7.1
Introduction
................................... 101
7.2
Background
................................... 103
7.2.1
Maximum Likelihood Estimation
................... 103
7.2.2
Conditional Maximum Likelihood
................... 104
7.2.3
Minimum Classification Error
..................... 104
7.3
Large Margin Training
............................. 105
7.3.1
Discriminant Function
......................... 105
7.3.2
Margin Constraints and Hamming Distances
............. 106
7.3.3
Optimization
.............................. 106
7.3.4
RelatedWork
.............................. 107
7.4
Experimental Results
.............................. 107
7.4.1
Large Margin Training
......................... 108
7.4.2
Comparison with CML and MCE
................... 109
7.4.3
Other Variants
............................. 109
7.5
Conclusion
................................... 112
References
....................................... 113
III Language Modeling
115
8
A Survey of Discriminative Language Modeling Approaches for Large
Vocabulary Continuous Speech Recognition
117
Brian Roark
8.1
Introduction
...................................117
viii CONTENTS
8.2
General
Framework...............................119
8.2.1 Training
Data and the GEN Function
.................120
8.2.2
Feature Mapping
............................123
8.2.3
Parameter Estimation
..........................127
8.3
Further Developments
..............................130
8.3.1
Novel Features
............................. 130
8.3.2
Novel Objectives
............................ 131
8.3.3
Domain Adaptation
........................... 132
8.4
Summary and Discussion
............................ 133
References
....................................... 134
9
Large Margin Methods for Part-of-Speech Tagging
139
Yasemin Altun
9.1
Introduction
...................................139
9.2
Modeling Sequence Labeling
..........................140
9.2.1
Feature Representation
.........................141
9.2.2
Empirical Risk Minimization
......................142
9.2.3
Conditional Random Fields and Sequence Perceptron
.........143
9.3
Sequence Boosting
...............................144
9.3.1
Objective Function
...........................145
9.3.2
Optimization Method
..........................145
9.4
Hidden Markov Support Vector Machines
...................149
9.4.1
Objective Function
........................... 149
9.4.2
Optimization Method
.......................... 151
9.4.3
Algorithm
................................ 151
9.5
Experiments
................................... 153
9.5.1
Data and Features for Part-of-Speech Tagging
............. 153
9.5.2
Results of Sequence AdaBoost
..................... 154
9.5.3
Results of Hidden Markov Support Vector Machines
......... 155
9.6
Discussion
.................................... 156
References
....................................... 156
10
A Proposal for a Kernel Based Algorithm for Large Vocabulary Continuous
Speech Recognition
159
Joseph
Késhet
10.1
Introduction
...................................159
10.2
Segment Models and Hidden Markov Models
.................161
10.3
Kernel Based Model
..............................163
10.4
Large Margin Training
.............................164
10.5
Implementation Details
.............................166
10.5.1
Iterative Algorithm
...........................166
10.5.2
Recognition Feature Functions
.....................167
10.5.3
The Decoder
..............................169
10.5.4
Complexity
...............................169
CONTENTS ix
10.6
Discussion
....................................170
Acknowledgements
..................................170
References
.......................................170
IV Applications
173
11
Discriminative Keyword Spotting
175
David Grangier, Joseph
Késhet
and
Samy
Bengio
11.1
Introduction
...................................175
11.2
Previous Work
..................................177
11.3
Discriminative Keyword Spotting
.......................180
11.3.1
Problem Setting
............................. 180
11.3.2
Loss Function and Model Parameterization
.............. 182
11.3.3
An Iterative Training Algorithm
.................... 184
11.3.4
Analysis
................................ 185
11.4
Experiments and Results
............................ 188
11.4.1
The
TIMIT
Experiments
........................188
11.4.2
The WSJ Experiments
.........................190
11.5
Conclusions
...................................191
Acknowledgements
..................................193
References
.......................................193
12
Kernel-based Text-independent Speaker Verification
195
Johnny
Mariéthoz,
Samy
Bengio and Yves
Grandvalet
12.1
Introduction
...................................196
12.2
Generative Approaches
.............................197
12.2.1
Rationale
................................197
12.2.2
Gaussian Mixture Models
.......................198
12.3
Discriminative Approaches
...........................199
12.3.1
Support Vector Machines
........................199
12.3.2
Kernels
.................................200
12.4
Benchmarking Methodology
..........................201
12.4.1
Data Splitting for Speaker Verification
.................201
12.4.2
Performance Measures
.........................202
12.4.3
NIST Data
...............................203
12.4.4
Pre-processing
.............................203
12.5
Kernels for Speaker Verification
........................203
12.5.1
Mean Operator Sequence Kernels
...................204
12.5.2
Fisher Kernels
.............................205
12.5.3
Beyond Fisher Kernels
.........................210
12.6
Parameter Sharing
................................212
12.6.1
Nuisance Attribute Projection
.....................213
12.6.2
Other Approaches
...........................214
12.7
Is the Margin Useful for This Problem?
....................215
12.8
Comparing all Methods
.............................216
x
CONTENTS
12.9
Conclusion
...................................218
References
.......................................219
13
Spectral
Clustering for Speech Separation
221
Francis R. Bach and Michael I. Jordan
13.1
Introduction
...................................221
13.2
Spectral Clustering and Normalized Cuts
....................223
13.2.1
Similarity Matrices
...........................223
13.2.2
Normalized Cuts
............................223
13.2.3
Spectral Relaxation
...........................225
13.2.4
Rounding
................................226
13.2.5
Spectral Clustering Algorithms
....................227
13.2.6
Variational Formulation for the Normalized Cut
............229
13.3
Cost Functions for Learning the Similarity Matrix
...............229
13.3.1
Distance Between Partitions
......................230
13.3.2
Cost Functions as Upper Bounds
....................230
13.3.3
Functions of
Eigensubspaces......................231
13.3.4
Empirical Comparisons Between Cost Functions
...........233
13.4
Algorithms for Learning the Similarity Matrix
.................234
13.4.1
Learning Algorithm
..........................236
13.4.2
Related Work
..............................236
13.4.3
Testing Algorithm
...........................236
13.4.4
Handling very Large Similarity Matrices
...............237
13.4.5
Simulations on Toy Examples
.....................239
13.5
Speech Separation as Spectrogram Segmentation
...............239
13.5.1
Spectrogram
.............................. 240
13.5.2
Normalization and
Subsampling
.................... 241
13.5.3
Generating Training Samples
...................... 241
13.5.4
Features and Grouping Cues for Speech Separation
.......... 242
13.6
Spectral Clustering for Speech Separation
................... 244
13.6.1
Basis Similarity Matrices
........................244
13.6.2
Combination of Similarity Matrices
..................244
13.6.3
Approximations of Similarity Matrices
................245
13.6.4
Experiments
..............................245
13.7
Conclusions
...................................247
References
.......................................248
Index
251
|
any_adam_object | 1 |
building | Verbundindex |
bvnumber | BV035291889 |
callnumber-first | T - Technology |
callnumber-label | TK7895 |
callnumber-raw | TK7895.S65 |
callnumber-search | TK7895.S65 |
callnumber-sort | TK 47895 S65 |
callnumber-subject | TK - Electrical and Nuclear Engineering |
classification_rvk | ST 306 ZN 6070 |
ctrlnum | (OCoLC)245025429 (DE-599)BVBBV035291889 |
dewey-full | 006.4/54 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.4/54 |
dewey-search | 006.4/54 |
dewey-sort | 16.4 254 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Elektrotechnik / Elektronik / Nachrichtentechnik |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01688nam a2200433zc 4500</leader><controlfield tag="001">BV035291889</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20111004 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">090205s2009 xxuad|| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2008038551</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780470696835</subfield><subfield code="c">cloth</subfield><subfield code="9">978-0-470-69683-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)245025429</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035291889</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-703</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-83</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7895.S65</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.4/54</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6070</subfield><subfield code="0">(DE-625)157501:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Automatic speech and speaker recognition</subfield><subfield code="b">large margin and kernel methods</subfield><subfield code="c">[ed. by] Joseph Keshet...</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Chichester</subfield><subfield code="b">Wiley</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIII, 253 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic speech recognition</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Automatische Sprechererkennung</subfield><subfield code="0">(DE-588)4143704-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Keshet, Joseph</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-017096920</subfield></datafield></record></collection> |
id | DE-604.BV035291889 |
illustrated | Illustrated |
indexdate | 2024-07-09T21:30:35Z |
institution | BVB |
isbn | 9780470696835 |
language | English |
lccn | 2008038551 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-017096920 |
oclc_num | 245025429 |
open_access_boolean | |
owner | DE-703 DE-29T DE-11 DE-83 |
owner_facet | DE-703 DE-29T DE-11 DE-83 |
physical | XIII, 253 S. Ill., graph. Darst. |
publishDate | 2009 |
publishDateSearch | 2009 |
publishDateSort | 2009 |
publisher | Wiley |
record_format | marc |
spelling | Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet... 1. publ. Chichester Wiley 2009 XIII, 253 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references and index Automatic speech recognition Automatische Sprechererkennung (DE-588)4143704-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 s Automatische Sprechererkennung (DE-588)4143704-4 s DE-604 Keshet, Joseph Sonstige oth Digitalisierung UB Bayreuth application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Automatic speech and speaker recognition large margin and kernel methods Automatic speech recognition Automatische Sprechererkennung (DE-588)4143704-4 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
subject_GND | (DE-588)4143704-4 (DE-588)4003961-4 |
title | Automatic speech and speaker recognition large margin and kernel methods |
title_auth | Automatic speech and speaker recognition large margin and kernel methods |
title_exact_search | Automatic speech and speaker recognition large margin and kernel methods |
title_full | Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet... |
title_fullStr | Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet... |
title_full_unstemmed | Automatic speech and speaker recognition large margin and kernel methods [ed. by] Joseph Keshet... |
title_short | Automatic speech and speaker recognition |
title_sort | automatic speech and speaker recognition large margin and kernel methods |
title_sub | large margin and kernel methods |
topic | Automatic speech recognition Automatische Sprechererkennung (DE-588)4143704-4 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
topic_facet | Automatic speech recognition Automatische Sprechererkennung Automatische Spracherkennung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017096920&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT keshetjoseph automaticspeechandspeakerrecognitionlargemarginandkernelmethods |