What do we mean by "morphology"? What kinds of linguistic phenomena count as morphology?
When we say "morphology", we're talking about the structures and contents of words. Words aren't atomic things, although in many NLP applications we treat them as if they are! (... for English, that can work pretty well a lot of the time. Not so with many languages)
When we talk about morphology, we often use the word "morpheme" (by analogy: lexeme, grapheme, phoneme... there are a bunch of linguistics terms like that). So a morpheme is the smallest meaningful chunk of a word. Roots are morphemes, and so are affixes (prefixes and suffixes). It can get more complex than that.
Let's find the parts of a word! Some words in English break down a lot, and most less so.
"conceptualism"
"cat"
"operationalize"
Inflectional morphology takes a stem word and adds grammatical properties to it without changing the basic meaning.
cat -> cats, cat's, cats'
So the "citation form" here is the singular. But you can make it posessive or plural or both.
What's the rule here for turning cat into cat +PLURAL ?
loss?
box? ox?
complex?
rice? sand? fish?
There are lots of things going on here. How did they happen, historically?
There are clumps of irregular ones, and also some irregular ones that act in the same way. People spend years and books studying this; there's a pretty good book on the topic, Words and Rules (by Steven Pinker), although it's not uncontroversial. alexr is probably not linguistically well-informed enough to have a good opinion about it, but the subject matter is pretty interesting. The claim there is that while you inflect most words by remembering a base form and a rule, for irregulars, you've also memorized the inflected forms (as opposed to that you've memorized a special-purpose rule for that word). Sounds plausible.
German has a bunch of different ways to make nouns plural.
Some words ("Box") get -en, some get -e, some get a sound change and -e ...
Äpfel from Apfel.
What are some other things you might want to stick into the grammar of a language, aside from plurality? What's pretty common in languages that we know about?
- gender? (how many genders?)
--> some languages have "gender systems" that include animacy
--> or maybe whether a thing is human or not
So for example in Spanish...
http://en.wikipedia.org/wiki/File:Gatos.png (awwww, cats with gender)
There's also grammatical case for nouns.
What about verbs inflecting? (this is the same, roughly, as "conjugating", like you might be familiar with from a language class)
walk, walks, walked, walking ...
swim, swam, swum
bring, brought
arise, arose
be -> am, was, were (to be is super irregular in English!)
But the point here is that there just aren't that many irregular verbs in English. Maybe a few hundred.
What do you know about the verb, from the form here? What is encoded?
In English, and languages that you know, what does the verb have to agree with?
In some languages, the verb also agrees with the object! Arabic, for example.
Let's also look at some more complex inflectional morphology going on. Mike Gasser gives us some good examples from Swahili.
http://www.indiana.edu/~gasser/Q450/fs.html#(3)
What do we mean by "tense"?
Other ideas that people are usually less familiar with: tense, aspect, mood. You can say verbs differently based on when they happened in time, the degree to which they're finished (aspect), and also probability or ability to do something (mood/modality). In English, we use auxiliary or modal verbs to express most of these things, but
"He has jumped the shark." (this has "perfect" aspect, since it's completed, in the past)
http://en.wikipedia.org/wiki/Tense%E2%80%93aspect%E2%80%93mood
https://www.google.com/search?tbm=isch&q=verbing+weirds+language
Derivational morphology is when you change the meaning of the root -- you want to say that this is a different "stem".
Examples from Mike Gasser:
http://www.indiana.edu/~gasser/Q450/fs.html#(4)
When we're talking about morphology, languages fall into a few different clusters. Some languages have very simple morphology, and these are called analytic or isolating languages. We usually use these terms in a relative way; you might say that English is less inflectional (more analytic) than other Germanic or Romance languages.
It still has some inflection going on, as we saw earlier.
Some languages are what's called agglutinative, and this means that we just stack morphemes together to produce new words.
Some languages are fusional, which means that there are significant changes when you put morphemes together.
Which one do you think is harder for us to build systems to handle?
Mike's slides on morphology and FSTs...
http://www.cs.indiana.edu/classes/b651/Notes/morph.html
http://www.cs.indiana.edu/classes/b651/Notes/morph2.html
Remember FSTs?
Let's talk about FSTs for a minute.
What does an FST have?
- it has an input alphabet
- it has an output alphabet
- it has some states
- ALSO THERE CAN BE WEIGHTS
Can we build an FST to analyze and generate English words with their grammatical markers?
For some languages, it's not obvious which word you're looking at, given the text of the word...
... Buckwalter MA for Arabic ...
But this isn't just a problem for languages where the surface text doesn't include the vowels. This is a general problem.
What other task that we did earlier in the class is similar to this problem? ...
Mike Gasser's slides on morphology: http://www.indiana.edu/~gasser/Q450/fs.html
... and his book: http://www.indiana.edu/~hlw/Inflection/morphemes.html
http://www.indiana.edu/~hlw/Inflection/nouns.html
http://www.indiana.edu/~hlw/Inflection/verbs.html
http://www.indiana.edu/~hlw/Derivation/derivation.html
http://en.wikipedia.org/wiki/Isolating_language
http://en.wikipedia.org/wiki/Morphology_(linguistics)
http://wiki.apertium.eu/index.php/Session_3:_Morphological_disambiguation