dependency-parsing

introduction: what's the head of a phrase?

Let's take a sentence! Any old sentence.

Can we mark the "head" word of each of the phrases? Sure we can!

Now let's just draw an arrow from each head to each "governed" or "subordinate" word. OK.

If we get rid of the phrases, we've still got kind of a nice structure for the sentence, right?

Every word in the sentence has some head? That's cool.

well, have you ever seen a constituent? ...

Maybe this is nicer for some purposes?

What does this buy us?

- free word order is cleaner.

- Try writing CFG rules for a free-word-order language. It's hard!!

- do we really believe in constituents, anyway? Some linguists do.

Which constituents? ...

So this sort of a thing is really old, dating back to old Latin syntax, and even older, to ideas about Sanskrit syntax:

http://en.wikipedia.org/wiki/P%C4%81%E1%B9%87ini

This is really clean: we don't have to believe in constituents, we just have to agree that some words are describing other words.

what's hard to represent with dependency syntax?

(probably coordination: you have to make a choice about how to wire it together)

(also you have to make a choice, in doing this at all, about what kinds of dependency relations you have.)

variations that you might see

- which way do the arrows point?

- is there a HEAD node?

(whatever, these are just notational differences)

other variations you might see

- do we allow non-projective structures?

Wait, what's a non-projective structure?

"John saw a dog yesterday which was a Yorkshire Terrier."

Who did John see yesterday?

These things correspond nicely with what it's hard to express with a CFG. That's kind of nice.

how to get dependency parses?

- well, you could have a CKY-style parser

- Eisner 1996 --> this is pretty new. Jason is still working, professor at JHU!

- MST-style parser. It just needs to build a minimum spanning tree over all the words in the sentence. (Ryan McDonald, in Google NYC)

- constraint-based: have a bunch of rules that describe what can link to when, under what circumstances...

- ... or....

- Shift-reduce style parser!!

MaltParser is one such!

- greedy, deterministic parsing

- very high accuracy, super fast.

- discriminative classifiers (in fact, any classifier you happen to want)

- big win

stack.

buffer.

set of arcs.

transition systems. There are several different possibilities for transition systems!

Let's run through an example of parsing a sentence!!

This example is courtesy of Sandra Kübler, Ryan McDonald, and Joakim Nivre, from their lovely book Dependency Parsing.

It turns out that this sort of minimal one, with just three operations -- shift, left arc, right arc -- is sufficient to get any projective dependency parse. In fact, this transition system can *only* create projective dependency parses. But it can produce *any* of them. It's both sound and complete wrt projective dependency parses.

That doesn't mean that there aren't better transition systems. One that works rather better is what's called the arc-eager transition system, which has four operations instead of three, and the right arc operation doesn't get rid of the dependent. How can this be better if the basic one is sound and complete with respect to projective dependency parses?

How do you train this thing?

This just amounts to training the classifier. But you have to extract all the "instances" from the treebank.

Instances are just all of the configurations that we were in, coupled with the correct transition decision to make in that configuration. If you always make the correct transition decision, then you're going to get the right parses!

This leaves us with two problems: