1 What's it all about? 1 --
1.1 Data mining and machine learning 2 --
Describing structural patterns 4 --
1.2 Simple examples: The weather problem and others 8 --
Contact lenses: An idealized problem 11 --
Irises: A classic numeric dataset 13 --
CPU performance: Introducing numeric prediction 15 --
Labor negotiations: A more realistic example 16 --
Soybean classification: A classic machine learning success 17 --
1.3 Fielded applications 20 --
Decisions involving judgment 21 --
Marketing and sales 25 --
1.4 Machine learning and statistics 26 --
1.5 Generalization as search 27 --
Enumerating the concept space 28 --
1.6 Data mining and ethics 32 --
2 Input: Concepts, instances, attributes 37 --
2.1 What's a concept? 38 --
2.2 What's in an example? 41 --
2.3 What's in an attribute? 45 --
2.4 Preparing the input 48 --
Gathering the data together 48 --
Getting to know your data 54 --
3 Output: Knowledge representation 57 --
3.1 Decision tables 58 --
3.3 Classification rules 59 --
3.4 Association rules 63 --
3.5 Rules with exceptions 64 --
3.6 Rules involving relations 67 --
3.7 Trees for numeric prediction 70 --
3.8 Instance-based representation 72 --
4 Algorithms: The basic methods 77 --
4.1 Inferring rudimentary rules 78 --
Missing values and numeric attributes 80 --
4.2 Statistical modeling 82 --
Missing values and numeric attributes 85 --
4.3 Divide and conquer: Constructing decision trees 89 --
Calculating information 93 --
Highly branching attributes 94 --
4.4 Covering algorithms: Constructing rules 97 --
A simple covering algorithm 98 --
Rules versus decision lists 103 --
4.5 Mining association rules 104 --
Generating rules efficiently 108 --
Numeric prediction 112 --
4.7 Instance-based learning 114 --
5 Credibility: Evaluating what's been learned 119 --
5.1 Training and testing 120 --
5.2 Predicting performance 123 --
5.3 Cross-validation 125 --
5.4 Other estimates 127 --
5.5 Comparing data mining schemes 129 --
5.6 Predicting probabilities 133 --
Quadratic loss function 134 --
Informational loss function 135 --
5.7 Counting the cost 137 --
Cost-sensitive learning 144 --
5.8 Evaluating numeric prediction 147 --
5.9 Minimum description length principle 150 --
5.10 Applying MDL to clustering 154 --
6 Implementations: Real machine learning schemes 157 --
6.1 Decision trees 159 --
Numeric attributes 159 --
Estimating error rates 164 --
Complexity of decision tree induction 167 --
From trees to rules 168 --
C4.5: Choices and options 169 --
6.2 Classification rules 170 --
Criteria for choosing tests 171 --
Missing values, numeric attributes 172 --
Good rules and bad rules 173 --
Generating good rules 174 --
Generating good decision lists 175 --
Probability measure for rule evaluation 177 --
Evaluating rules using a test set 178 --
Obtaining rules from partial decision trees 181 --
Rules with exceptions 184 --
6.3 Extending linear classification: Support vector machines 188 --
Maximum margin hyperplane 189 --
Nonlinear class boundaries 191 --
6.4 Instance-based learning 193 --
Reducing the number of exemplars 194 --
Pruning noisy exemplars 194 --
Weighting attributes 195 --
Generalizing exemplars 196 --
Distance functions for generalized exemplars 197 --
Generalized distance functions 199 --
6.5 Numeric prediction 201 --
Nominal attributes 204 --
Pseudo-code for model tree induction 205 --
Locally weighted linear regression 208 --
Iterative distance-based clustering 211 --
Incremental clustering 212 --
Probability-based clustering 218 --
Extending the mixture model 223 --
Bayesian clustering 225 --
7 Moving on: Engineering the input and output 229 --
7.1 Attribute selection 232 --
Scheme-independent selection 233 --
Searching the attribute space 235 --
Scheme-specific selection 236 --
7.2 Discretizing numeric attributes 238 --
Unsupervised discretization 239 --
Entropy-based discretization 240 --
Other discretization methods 243 --
Entropy-based versus error-based discretization 244 --
Converting discrete to numeric attributes 246 --
7.3 Automatic data cleansing 247 --
Improving decision trees 247 --
Detecting anomalies 249 --
7.4 Combining multiple models 250 --
Error-correcting output codes 260 --
8 Nuts and bolts: Machine learning algorithms in Java 265 --
8.2 Javadoc and the class library 271 --
Classes, instances, and packages 272 --
Weka.classifiers package 274 --
8.3 Processing datasets using the machine learning programs 277 --
Scheme-specific options 282 --
Meta-learning shemes 286 --
8.4 Embedded machine learning 297 --
A simple message classifier 299 --
8.5 Writing new learning schemes 306 --
An example classifier 306 --
Conventions for implementing classifiers 314 --
Conventions for writing filters 317 --
9.1 Learning from massive datasets 322 --
9.2 Visualizing machine learning 325 --
Visualizing the input 325 --
Visualizing the output 327 --
9.3 Incorporating domain knowledge 329 --
Finding key phrases for documents 331 --
Finding information in running text 333 --
9.5 Mining the World Wide Web 335.