Yesterday I went to a lunch seminar by Bin Yu. It ended up being closely related to my contrastive LDA work. Very serendipitous.
She described LDA as being “heavy modeling” and described a simple discriminative approach. The setup is as follows. Suppose we want to find a list of keywords that describe “China” (she called this summarization). The user inputs a few tokens about “China”: China, Chinese, Beijing, etc. From these tokens, the algorithm labels each document in the corpus as being 1 (if the tokens occur in the document at a frequency above some threshold) or 0 (if they don’t occur at all). Then each document is represented by a word count vector and LASSO is performed to predict the labels. If we want 15 keywords, then the in LASSO is tuned to give 15 non-zero features.