cfgs-and-pcfgs

Readings: Chapters 12, 13 and 14, J&M.

So what *is* a CFG?

What makes it context-free?

... discussion of what makes a CFG ...

In-class exercise where we built a CKY parser from scratch and parsed sentences like "people fish tanks" and "people fish tanks with rods" ...

What good are formal grammars? Or parsers? Why do we want these?

any time we want to get a model of "who did what to whom...", it's pretty useful.

question answering, textual entailment, building some more abstract semantic representation...

"What goes with what?"

As a side note, what are the ways that a language tells you "what goes with what"?

- Well, you could use order.

- The other common thing a language can do is use morphology! You can know which words go with which other words, because they agree!

--> then the sentence tends to be rearrange-able. "free word order".

- similarly, there are languages that mark "case" on nouns. Different languages have different case systems, of course.

In German...

Die Katze jagt den Hasen.

Hey, who does the cat hunt again?

Die Maus jagt die Katze.

NB: this behavior is called "topicalization" or "fronting". Can you do it in English?

"I always liked Bob. But Bill, I never liked."

"Tomorrow, I'll go to the store."

... can you say those things?

The central idea here is that there are quite a lot of sentences that we might have to understand, and we need some compact representation for all the possibilities, so that we can understand novel sentences.

use in translation:

--> SCFGs for simultaneous parsing/generation!

--> RBMT: parse a sentence first, use some rules to transform input to output? Use transducers, maybe?

preprocessing step for many other tasks: use the tree to extract features!

Word-sense disambiguation?

Maybe certain shapes of trees are informative for some text classification tasks? ...

How about anaphor/coreference resolution?

Grammars aren't just for parsers, though! We can also use them to generate!

They encode our ideas about grammaticality and obligatoriness.

"He took [a rake] out of [the closet]".

"He ate [a big bowl] of [pudding]."

(one of these noun phrases doesn't have an article!!)

So where do you get CFGs?

- You could build them by hand! ... well... hrm. Could you?

Let's look at a real sentence!

http://www.cs.indiana.edu/classes/b651/Notes/syntax1.html (slide 2, first bullet)

What's the structure of that sentence?

It's basically the same as:

They have done so very reluctantly and after courts questioned the legality of a system that has impoverished tens of millions...

So what do we have in there? We've got a main clause (MATRIX CLAUSE for linguists), and that has a subordinate clause that's sort of inside an adverbial phrase...

- You could extract them from a treebank! Where do you get a treebank from?!

Well, there are treebanks that you can get, for many languages...

Let's see how many rules you can get for all the sentences that are under 15 words, from Sophie's World in the SMULTRON corpus...

Could we parse with our CKY parser with this grammar?

Problems and solutions at this point, recap:

- too hard to write a broad-coverage grammar by hand

--> so we'll generate a grammar from a treebank, OK.

- We saw how many different interpretations there were, just for "people fish tanks with rods" with that tiny grammar. How are we going to know which one of those interpretations is the right one?

--> We need some scoring mechanism! We'll take the interpretation with the highest score!

- Again, we saw how many different interpretations there were: there's a combinatorial explosion of possible constituent chunks for us to use -- isn't this going to get slow, if we have to loop through all of them?

--> We need some scoring mechanism! We'll try the ones with high scores first!!

So clearly some parses have to score better than others.

Let's use PCFGs!

What's a PCFG?

It's just like a CFG, except that the rules have probabilities associated with them.

It's the probability of the right-hand side, given the left-hand side of the rule. Thus the sum of the probabilities for the rules with a given LHS must be 1: we think about this from the generation perspective.

Last time, we introduced PCFGs.

Let's remind ourselves, what's a PCFG?

Oh, how convenient! Here's one! (thank you, Chris Manning!)

What are some questions you can ask with a PCFG?

- You might want to parse a sentence!

(how can you do that?)

- You might want to get the probability of a tree.

(why?)

The probability of a given analysis of a sentence is the product of the probabilities of all of the rules used in the analysis (this assumes statistical independence of all of the rules).

- You might want to get the probability of a string!

(why? how? ...)

Could you imagine wanting to use a PCFG instead of an n-gram language model? When might that be a good idea?

- predictive text?

- spelling correction?

- translation?

- speech recognition? ...

- You might want to generate a sentence!

why? how?

What are some operations you might want to do with a PCFG?

- You might want to extract one from a treebank.

How do we do that again?

- How do we get the rules?

- How do we set the weights?

Well, you could just count and divide...

- What if you had some more text, and you wanted to tune your probabilities to match the text better?

You could use the Inside-Outside algorithm! (we won't really cover this, but you should get the intuition that it's possible)

http://en.wikipedia.org/wiki/Inside-outside_algorithm

The thing here is, you really want to use a good model to label your unlabeled data. But you don't have one yet. OK, well, we'll use a bad model and see if we can use that to get a good model...

In general, this is what's called an Expectation-Maximization algorithm. We can use EM in situations where we want to iteratively improve our model by:

- using the existing model to label some more data (EXPECTATION STEP)

- then using the labels to learn another model! (MAXIMIZATION STEP)

There's a whole world of algorithms and probabilistic models for this stuff. Work on this has been

Can we think of any shortcomings with PCFGs?

There's a really good example in J&M, about the probability of an NP going to a pronoun.

In English, where are pronouns likely to show up? In subject position.

What could you do with this information?

It's almost like there are two different kinds of noun phrases in English... (in fact, there are probably more than that!)

You could do what's called parent annotation, where instead of learning probabilities for NP, you learn probabilities for NP^S and NP^VP.

Can we think of how you would do that? Can you think of possible problems with it?

You can apply a similar idea to parts of speech: maybe different POS tags behave differently if we're inside a certain kind of phrase.

evaluating constituency parsers

Precision: for every bracket that your parser predicts, is it in the gold standard?

Recall: for every bracket that's in the gold standard, did your parser predict it?

... what would it mean to have "accuracy" for a parser? ...

You should, as a person working in or nearby machine learning, get a good sense for what precision and recall mean.

interesting extensions to PCFGs: synchronous CFGs

Popularized by David Chiang! See his tutorial below.

Let's build a little synchronous CFG that can translate "Él escribe una carta" into English!

references

http://www.cs.indiana.edu/classes/b651/Notes/syntax1.html

http://www.cs.indiana.edu/classes/b651/Notes/syntax2.html

http://spark-public.s3.amazonaws.com/nlp/slides/Parsing-Probabilistic.pdf

http://www.isi.edu/~chiang/papers/synchtut.pdf (pretty advanced stuff)

http://www.isi.edu/natural-language/people/bayes-with-tears.pdf (pretty advanced stuff, alexr needs to understand this soon)