Data mining with decision trees theory and applications

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Roḳaḥ, Liʾor (VerfasserIn), Maimon, Oded Z. (VerfasserIn)
Format: Buch
Sprache:English
Veröffentlicht: New Jersey [u.a.] World Scientific [2015]
Ausgabe:2nd Edition
Schriftenreihe:Series in machine perception and artificial intelligence 81
Schlagworte:
Online-Zugang:Inhaltsverzeichnis
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!

MARC

LEADER 00000nam a2200000zcb4500
001 BV042191765
003 DE-604
005 20220107
007 t|
008 141117s2015 xxka||| |||| 00||| eng d
020 |a 9789814590075  |c hardback  |9 978-981-4590-07-5 
035 |a (OCoLC)897060282 
035 |a (DE-599)BVBBV042191765 
040 |a DE-604  |b ger  |e aacr 
041 0 |a eng 
044 |a xxk  |c GB 
049 |a DE-29T  |a DE-703  |a DE-11 
082 0 |a 006.312 
084 |a ST 530  |0 (DE-625)143679:  |2 rvk 
100 1 |a Roḳaḥ, Liʾor  |e Verfasser  |0 (DE-588)1060772868  |4 aut 
245 1 0 |a Data mining with decision trees  |b theory and applications  |c Lior Rokach, Oded Maimon 
250 |a 2nd Edition 
264 1 |a New Jersey [u.a.]  |b World Scientific  |c [2015] 
300 |a xxi, 305 Seiten  |b Illustrationen 
336 |b txt  |2 rdacontent 
337 |b n  |2 rdamedia 
338 |b nc  |2 rdacarrier 
490 1 |a Series in machine perception and artificial intelligence  |v 81 
650 0 7 |a Data Mining  |0 (DE-588)4428654-5  |2 gnd  |9 rswk-swf 
650 0 7 |a Entscheidungsbaum  |0 (DE-588)4347788-4  |2 gnd  |9 rswk-swf 
689 0 0 |a Data Mining  |0 (DE-588)4428654-5  |D s 
689 0 1 |a Entscheidungsbaum  |0 (DE-588)4347788-4  |D s 
689 0 |5 DE-604 
700 1 |a Maimon, Oded Z.  |e Verfasser  |0 (DE-588)143250833  |4 aut 
830 0 |a Series in machine perception and artificial intelligence  |v 81  |w (DE-604)BV006668231  |9 81 
856 4 2 |m HBZ Datenaustausch  |q application/pdf  |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027630832&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA  |3 Inhaltsverzeichnis 
943 1 |a oai:aleph.bib-bvb.de:BVB01-027630832 

Datensatz im Suchindex

_version_ 1819740000561922048
adam_text Titel: Data mining with decision trees Autor: Rokach, Lior Jahr: 2015 Contents About the Authors vi Preface for the Second Edition vii Preface for the First Edition ix 1. Introduction to Decision Trees 1 1.1 Data Science 1 1.2 Data Mining 2 1.3 The Four-Layer Model 3 1.4 Knowledge Discovery in Databases (KDD) 4 1.5 Taxonomy of Data Mining Methods 8 1.6 Supervised Methods 9 1.6.1 Overview 9 1.7 Classification Trees 10 1.8 Characteristics of Classification Trees 12 1.8.1 Tree Size 14 1.8.2 The Hierarchical Nature of Decision Trees .... 15 1.9 Relation to Rule Induction 15 2. Training Decision Trees 17 2.1 What is Learning? 17 2.2 Preparing the Training Set 17 2.3 Training the Decision Tree 19 xiii xiv Data Mining with Decision Trees 3. A Generic Algorithm for Top-Down Induction of Decision Trees 23 3.1 Training Set 23 3.2 Definition of the Classification Problem 25 3.3 Induction Algorithms 26 3.4 Probability Estimation in Decision Trees 26 3.4.1 Laplace Correction 27 3.4.2 No Match 28 3.5 Algorithmic Framework for Decision Trees 28 3.6 Stopping Criteria 30 4. Evaluation of Classification Trees 31 4.1 Overview 31 4.2 Generalization Error 31 4.2.1 Theoretical Estimation of Generalization Error 32 4.2.2 Empirical Estimation of Generalization Error 32 4.2.3 Alternatives to the Accuracy Measure 34 4.2.4 The F-Measure 35 4.2.5 Confusion Matrix 36 4.2.6 Classifier Evaluation under Limited Resources 37 4.2.6.1 ROC Curves 39 4.2.6.2 Hit-Rate Curve 40 4.2.6.3 Qrecall (Quota Recall) 40 4.2.6.4 Lift Curve 41 4.2.6.5 Pearson Correlation Coefficient 41 4.2.6.6 Area Under Curve (AUC) 43 4.2.6.7 Average Hit-Rate 44 4.2.6.8 Average Qrecall 44 4.2.6.9 Potential Extract Measure (PEM) .... 45 4.2.7 Which Decision Tree Classifier is Better? 48 4.2.7.1 McNemar s Test 48 4.2.7.2 A Test for the Difference of Two Proportions 50 4.2.7.3 The Resampled Paired t Test 51 4.2.7.4 The A;-fold Cross-validated Paired f Test 51 4.3 Computational Complexity 52 Contents xv 4.4 Comprehensibility 52 4.5 Scalability to Large Datasets 53 4.6 Robustness 55 4.7 Stability 55 4.8 Interestingness Measures 56 4.9 Overfitting and Underfitting 57 4.10 No Free Lunch Theorem 58 5. Splitting Criteria 61 5.1 Univariate Splitting Criteria 61 5.1.1 Overview 61 5.1.2 Impurity-based Criteria 61 5.1.3 Information Gain 62 5.1.4 Gini Index 62 5.1.5 Likelihood Ratio Chi-squared Statistics 63 5.1.6 DKM Criterion 63 5.1.7 Normalized Impurity-based Criteria 63 5.1.8 Gain Ratio 64 5.1.9 Distance Measure 64 5.1.10 Binary Criteria 64 5.1.11 Twoing Criterion 65 5.1.12 Orthogonal Criterion 65 5.1.13 Kolmogorov-Smirnov Criterion 66 5.1.14 AUC Splitting Criteria 66 5.1.15 Other Univariate Splitting Criteria 66 5.1.16 Comparison of Univariate Splitting Criteria . ... 66 5.2 Handling Missing Values 67 6. Pruning Trees 69 6.1 Stopping Criteria 69 6.2 Heuristic Pruning 69 6.2.1 Overview 69 6.2.2 Cost Complexity Pruning 70 6.2.3 Reduced Error Pruning 70 6.2.4 Minimum Error Pruning (MEP) 71 6.2.5 Pessimistic Pruning 71 6.2.6 Error-Based Pruning (EBP) 72 6.2.7 Minimum Description Length (MDL) Pruning 73 6.2.8 Other Pruning Methods 73 xvi Data Mining with Decision Trees 6.2.9 Comparison of Pruning Methods 73 6.3 Optimal Pruning 74 7. Popular Decision Trees Induction Algorithms 77 7.1 Overview 77 7.2 ID3 77 7.3 C4.5 78 7.4 CART 79 7.5 CHAID 79 7.6 QUEST 80 7.7 Reference to Other Algorithms 80 7.8 Advantages and Disadvantages of Decision Trees 81 8. Beyond Classification Tasks 85 8.1 Introduction 85 8.2 Regression Trees 85 8.3 Survival Trees 86 8.4 Clustering Tree 89 8.4.1 Distance Measures 89 8.4.2 Minkowski: Distance Measures for Numeric Attributes 90 8.4.2.1 Distance Measures for Binary Attributes 90 8.4.2.2 Distance Measures for Nominal Attributes 91 8.4.2.3 Distance Metrics for Ordinal Attributes 91 8.4.2.4 Distance Metrics for Mixed-Type Attributes 92 8.4.3 Similarity Functions 92 8.4.3.1 Cosine Measure 93 8.4.3.2 Pearson Correlation Measure 93 8.4.3.3 Extended Jaccard Measure 93 8.4.3.4 Dice Coefficient Measure 93 8.4.4 The OCCT Algorithm 93 8.5 Hidden Markov Model Trees 94 9. Decision Forests 99 9.1 Introduction 99 9.2 Back to the Roots 99 Contents xvii 9.3 Combination Methods 108 9.3.1 Weighting Methods 108 9.3.1.1 Majority Voting 108 9.3.1.2 Performance Weighting 109 9.3.1.3 Distribution Summation 109 9.3.1.4 Bayesian Combination 109 9.3.1.5 Dempster-Shafer 110 9.3.1.6 Vogging 110 9.3.1.7 Naïve Bayes 110 9.3.1.8 Entropy Weighting 110 9.3.1.9 Density-based Weighting Ill 9.3.1.10 DEA Weighting Method Ill 9.3.1.11 Logarithmic Opinion Pool Ill 9.3.1.12 Gating Network 112 9.3.1.13 Order Statistics 113 9.3.2 Meta-combination Methods 113 9.3.2.1 Stacking 113 9.3.2.2 Arbiter Trees 114 9.3.2.3 Combiner Trees 116 9.3.2.4 Grading 117 9.4 Classifier Dependency 118 9.4.1 Dependent Methods 118 9.4.1.1 Model-guided Instance Selection 118 9.4.1.2 Incremental Batch Learning 122 9.4.2 Independent Methods 122 9.4.2.1 Bagging 122 9.4.2.2 Wagging 124 9.4.2.3 Random Forest 125 9.4.2.4 Rotation Forest 126 9.4.2.5 Cross-validated Committees 129 9.5 Ensemble Diversity 130 9.5.1 Manipulating the Inducer 131 9.5.1.1 Manipulation of the Inducer s Parameters 131 9.5.1.2 Starting Point in Hypothesis Space . . . 132 9.5.1.3 Hypothesis Space Traversal 132 9.5.1.3.1 Random-based Strategy . . . 132 9.5.1.3.2 Collect ive-Performance-based Strategy 132 xviii Data Mining with Decision Trees 9.5.2 Manipulating the Training Samples 133 9.5.2.1 Resampling 133 9.5.2.2 Creation 133 9.5.2.3 Partitioning 134 9.5.3 Manipulating the Target Attribute Representation 134 9.5.4 Partitioning the Search Space 136 9.5.4.1 Divide and Conquer 136 9.5.4.2 Feature Subset-based Ensemble Methods 137 9.5.4.2.1 Random-based Strategy . . . 138 9.5.4.2.2 Reduct-based Strategy .... 138 9.5.4.2.3 Collect ive-Perfor mance- based Strategy 139 9.5.4.2.4 Feature Set Partitioning . . . 139 9.5.5 Multi-Inducers 142 9.5.6 Measuring the Diversity 143 9.6 Ensemble Size 144 9.6.1 Selecting the Ensemble Size 144 9.6.2 Pre-selection of the Ensemble Size 145 9.6.3 Selection of the Ensemble Size while Training 145 9.6.4 Pruning — Post Selection of the Ensemble Size 146 9.6.4.1 Pre-combining Pruning 146 9.6.4.2 Post-combining Pruning 146 9.7 Cross-Inducer 147 9.8 Multistrategy Ensemble Learning 148 9.9 Which Ensemble Method Should be Used? 148 9.10 Open Source for Decision Trees Forests 149 10. A Walk-through-guide for Using Decision Trees Software 151 10.1 Introduction 151 10.2 Weka 152 10.2.1 Training a Classification Tree 153 10.2.2 Building a Forest 158 10.3 R 159 10.3.1 Party Package 159 10.3.2 Forest 162 Contents xix 10.3.3 Other Types of Trees 163 10.3.4 The Rpart Package 164 10.3.5 RandomForest 165 11. Advanced Decision Trees 167 11.1 Oblivious Decision Trees 167 11.2 Online Adaptive Decision Trees 168 11.3 Lazy Tree 168 11.4 Option Tree 169 11.5 Lookahead 172 11.6 Oblique Decision Trees 172 11.7 Incremental Learning of Decision Trees 175 11.7.1 The Motives for Incremental Learning 175 11.7.2 The Inefficiency Challenge 176 11.7.3 The Concept Drift Challenge 177 11.8 Decision Trees Inducers for Large Datasets 179 11.8.1 Accelerating Tree Induction 180 11.8.2 Parallel Induction of Tree 182 12. Cost-sensitive Active and Proactive Learning of Decision Trees 183 12.1 Overview 183 12.2 Type of Costs 184 12.3 Learning with Costs 185 12.4 Induction of Cost Sensitive Decision Trees 188 12.5 Active Learning 189 12.6 Proactive Data Mining 196 12.6.1 Changing the Input Data 197 12.6.2 Attribute Changing Cost and Benefit Functions 198 12.6.3 Maximizing Utility 199 12.6.4 An Algorithmic Framework for Proactive Data Mining 200 13. Feature Selection 203 13.1 Overview 203 13.2 The Curse of Dimensionality 203 13.3 Techniques for Feature Selection 206 13.3.1 Feature Filters 207 XX Data Mining with Decision Vrees 13.3.1.1 FOCUS 207 13.3.1.2 LVF 207 13.3.1.3 Using a Learning Algorithm as a Filter 207 13.3.1.4 An Information Theoretic Feature Filter 208 13.3.1.5 RELIEF Algorithm 208 13.3.1.6 Simba and G-flip 208 13.3.1.7 Contextual Merit (CM) Algorithm . . . 209 13.3.2 Using Traditional Statistics for Filtering 209 13.3.2.1 Mallows Cp 209 13.3.2.2 AIC, BIC and F-ratio 209 13.3.2.3 Principal Component Analysis (PCA) 210 13.3.2.4 Factor Analysis (FA) 210 13.3.2.5 Projection Pursuit (PP) 210 13.3.3 Wrappers 211 13.3.3.1 Wrappers for Decision Tree Learners 211 13.4 Feature Selection as a means of Creating Ensembles . . . 211 13.5 Ensemble Methodology for Improving Feature Selection 213 13.5.1 Independent Algorithmic Framework 215 13.5.2 Combining Procedure 216 13.5.2.1 Simple Weighted Voting 216 13.5.2.2 Using Artificial Contrasts 218 13.5.3 Feature Ensemble Generator 220 13.5.3.1 Multiple Feature Selectors 220 13.5.3.2 Bagging 221 13.6 Using Decision Trees for Feature Selection 221 13.7 Limitation of Feature Selection Methods 222 14. Fuzzy Decision Trees 225 14.1 Overview 225 14.2 Membership Function 226 14.3 Fuzzy Classification Problems 227 14.4 Fuzzy Set Operations 228 14.5 Fuzzy Classification Rules 229 14.6 Creating Fuzzy Decision Tree 230 Contents xxi 14.6.1 Fuzzifying Numeric Attributes 230 14.6.2 Inducing of Fuzzy Decision Tree 232 14.7 Simplifying the Decision Tree 234 14.8 Classification of New Instances 234 14.9 Other Fuzzy Decision Tree Inducers 234 15. Hybridization of Decision Trees with other Techniques 237 15.1 Introduction 237 15.2 A Framework for Instance-Space Decomposition 237 15.2.1 Stopping Rules 240 15.2.2 Splitting Rules 241 15.2.3 Split Validation Examinations 241 15.3 The Contrasted Population Miner (CPOM) Algorithm 242 15.3.1 CPOM Outline 242 15.3.2 The Grouped Gain Ratio Splitting Rule 244 15.4 Induction of Decision Trees by an Evolutionary Algorithm (EA) 246 16. Decision Trees and Recommender Systems 251 16.1 Introduction 251 16.2 Using Decision Trees for Recommending Items 252 16.2.1 RS-Adapted Decision Tree 253 16.2.2 Least Probable Intersections 257 16.3 Using Decision Trees for Preferences Elicitation 259 16.3.1 Static Methods 261 16.3.2 Dynamic Methods and Decision Trees 262 16.3.3 SVD-based CF Method 263 16.3.4 Pairwise Comparisons 264 16.3.5 Profile Representation 266 16.3.6 Selecting the Next Pairwise Comparison 267 16.3.7 Clustering the Items 269 16.3.8 Training a Lazy Decision Tree 270 Bibliography 273 Index 303
any_adam_object 1
author Roḳaḥ, Liʾor
Maimon, Oded Z.
author_GND (DE-588)1060772868
(DE-588)143250833
author_facet Roḳaḥ, Liʾor
Maimon, Oded Z.
author_role aut
aut
author_sort Roḳaḥ, Liʾor
author_variant l r lr
o z m oz ozm
building Verbundindex
bvnumber BV042191765
classification_rvk ST 530
ctrlnum (OCoLC)897060282
(DE-599)BVBBV042191765
dewey-full 006.312
dewey-hundreds 000 - Computer science, information, general works
dewey-ones 006 - Special computer methods
dewey-raw 006.312
dewey-search 006.312
dewey-sort 16.312
dewey-tens 000 - Computer science, information, general works
discipline Informatik
edition 2nd Edition
format Book
fullrecord <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01681nam a2200409zcb4500</leader><controlfield tag="001">BV042191765</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220107 </controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">141117s2015 xxka||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9789814590075</subfield><subfield code="c">hardback</subfield><subfield code="9">978-981-4590-07-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)897060282</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV042191765</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">GB</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-11</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.312</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Roḳaḥ, Liʾor</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1060772868</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data mining with decision trees</subfield><subfield code="b">theory and applications</subfield><subfield code="c">Lior Rokach, Oded Maimon</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">2nd Edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">New Jersey [u.a.]</subfield><subfield code="b">World Scientific</subfield><subfield code="c">[2015]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxi, 305 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Series in machine perception and artificial intelligence</subfield><subfield code="v">81</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Entscheidungsbaum</subfield><subfield code="0">(DE-588)4347788-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Entscheidungsbaum</subfield><subfield code="0">(DE-588)4347788-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Maimon, Oded Z.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)143250833</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Series in machine perception and artificial intelligence</subfield><subfield code="v">81</subfield><subfield code="w">(DE-604)BV006668231</subfield><subfield code="9">81</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&amp;doc_library=BVB01&amp;local_base=BVB01&amp;doc_number=027630832&amp;sequence=000002&amp;line_number=0001&amp;func_code=DB_RECORDS&amp;service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027630832</subfield></datafield></record></collection>
id DE-604.BV042191765
illustrated Illustrated
indexdate 2024-12-24T04:16:20Z
institution BVB
isbn 9789814590075
language English
oai_aleph_id oai:aleph.bib-bvb.de:BVB01-027630832
oclc_num 897060282
open_access_boolean
owner DE-29T
DE-703
DE-11
owner_facet DE-29T
DE-703
DE-11
physical xxi, 305 Seiten Illustrationen
publishDate 2015
publishDateSearch 2015
publishDateSort 2015
publisher World Scientific
record_format marc
series Series in machine perception and artificial intelligence
series2 Series in machine perception and artificial intelligence
spellingShingle Roḳaḥ, Liʾor
Maimon, Oded Z.
Data mining with decision trees theory and applications
Series in machine perception and artificial intelligence
Data Mining (DE-588)4428654-5 gnd
Entscheidungsbaum (DE-588)4347788-4 gnd
subject_GND (DE-588)4428654-5
(DE-588)4347788-4
title Data mining with decision trees theory and applications
title_auth Data mining with decision trees theory and applications
title_exact_search Data mining with decision trees theory and applications
title_full Data mining with decision trees theory and applications Lior Rokach, Oded Maimon
title_fullStr Data mining with decision trees theory and applications Lior Rokach, Oded Maimon
title_full_unstemmed Data mining with decision trees theory and applications Lior Rokach, Oded Maimon
title_short Data mining with decision trees
title_sort data mining with decision trees theory and applications
title_sub theory and applications
topic Data Mining (DE-588)4428654-5 gnd
Entscheidungsbaum (DE-588)4347788-4 gnd
topic_facet Data Mining
Entscheidungsbaum
url http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027630832&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
volume_link (DE-604)BV006668231
work_keys_str_mv AT rokahliʾor dataminingwithdecisiontreestheoryandapplications
AT maimonodedz dataminingwithdecisiontreestheoryandapplications