Bin Yu’s talk

Yesterday I went to a lunch seminar by Bin Yu. It ended up being closely related to my contrastive LDA work. Very serendipitous.

She described LDA as being “heavy modeling” and described a simple discriminative approach. The setup is as follows. Suppose we want to find a list of keywords that describe “China” (she called this summarization). The user inputs a few tokens about “China”: China, Chinese, Beijing, etc. From these tokens, the algorithm labels each document in the corpus as being 1 (if the tokens occur in the document at a frequency above some threshold) or 0 (if they don’t occur at all). Then each document is represented by a word count vector and LASSO is performed to predict the labels. If we want 15 keywords, then the \lambda in LASSO is tuned to give 15 non-zero features.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s