ideas for projects

Pretty much anything from the NLTK list of projects would be OK!

Do some Recognizing Textual Entailment. Find a data set for this: does this sentence entail this other sentence? What would a human say?

Build a tiny machine translation system: IBM Model 1? Phrase-based? Learn translation tables, language models. Build a decoder!

Learn about speech recognition. Cepstral features. Gaussian mixture models, etc. Also needs a decoder and a language model, probably! Probably easier: set up a free speech recognition system.

Learn about and set up a speech synthesis system.

spam detection, flame detection, advertising detection... could you detect political positions?

text generation problems:
- summarize some text! How about summarize short stories? http://www.mitpressjournals.org/doi/pdfplus/10.1162/coli.2010.36.1.36102

Build a part-of-speech tagger or parser for some interesting language. Where will you get appropriate data?

Poetry Bot! (at least two groups have been talking about this already)
- what kinds of interesting constraints could you enforce, during text generation?
- rhymes? internal rhymes? alliteration/assonance?
- staying on topic? what kind of semantic information could you use?
- meter? rhyme structure?
- enumerate all possible sonnets in the English language, in alphabetical order?

chat bot that has some kind of interesting internal mental model of the conversation it's having, with variable emotional states?
Comments