lets-talk-about-classifiers

Parents, talk to your kids about classifiers.

thinking about regular expressions, precision, recall, and classifiers

Regular expressions are classifiers! they're just rule-based classifiers.

In this sense, they define a set, and they give you a deterministic way to ask "is this thing in this set"?

Important things to note about that:

- we say that regexes describe NFAs -- they're nondeterministic in that they can be in many states at once, not in the answer they produce! The answer is always the same.

- as you probably know from a theory class, regexes can't describe all sets of strings. (they can't describe context-free languages, for example)

For any formalism, there are only certain things that are expressible.

You could also write down a little flow chart to classify things. Another deterministic set of rules, where you feel like you definitely know what you're looking for. regexes and flow charts and decision trees, these are just little programming languages, and they do exactly what you tell them to.

They can have a PRECISION and a RECALL over a certain data set. These are important performance measures for any kind of classifier. So we could say that a classifier is better or worse... (another consideration: is how complicated is your classifier? you might get one that's too complicated and likely to OVERFIT, say if you tuned the heck out of it to match a particular dataset...)

But probably most importantly, for things like a regular expressions, you're making the claim that you know how to

What are things that you might want to classify?

What are some kinds of classifiers?

decision trees
Naïve Bayes
logistic regression (aka maxent)
SVMs
memory-based learners / KNN ...
many, many more!

And now for some maths.

(talk about probability here)

What's a probability distribution?

Rules of Probability.

Conditional probability. Joint probability.

What's an event? An event is a subset of the possible outcomes.

MONTY HALL PROBLEM if we feel like it.

Bayes Rule.

references and sources

Markus Dickinson's probability slides: http://jones.ling.indiana.edu/~mdickinson/12/645/slides/02-prob/02-prob.pdf

http://jones.ling.indiana.edu/~mdickinson/12/645/slides/02-prob/inclass1.pdf

Mike's notes on statistical NLP http://www.cs.indiana.edu/classes/b651/Notes/stats.html

Page updated

Google Sites

Report abuse