Latent Dirichlet allocation


Latent Dirichlet allocation

Say you have K topics
For each document, for each word assign a topic in random.
This gives a random representations of all the documents and word distributions of all the topics
For each document
For each word
For each topic
Compute two things:
1) p(topic t | document d) = the proportion of words in document d that are currently assigned to topic t,
2) p(word w | topic t) = the proportion of assignments to topic t over all documents that come from this word w.
Reassign w a new topic, where we choose topic t with probability p(topic t | document d) * p(word w | topic t) (according to our generative model, this is essentially the probability that topic t generated word w, so it makes sense that we resample the current word’s topic with this probability)



We’re assuming that all topic assignments except for the current word in question are correct, and then updating the assignment of the current word using our model of how documents are generated.
That sampling of topics based on that prob. for each word is called Gibbs Sampling.




Comments