machine learning versus statistics

My last two posts were both very much statistical in nature. The process of writing them has crystalized for myself the difference between statistics and machine learning. Statistics tends to be more focused on explaining and modeling data, where as machine learning (in addition to modeling) is also interested in making accurate predictions even if it’s not clear how to explain the model.

A lot of classical statistics was developed for settings where there’s not very much data. Hence to get the most out of the data, a lot of thoughts go into making complex models that captures much of the inductive bias and allows pooling of data. Machine learning deals with a lot more data, and cares more about having flexible models that can accommodate the data.

Representation learning and feature extraction seems to be mostly in the domains of ML. Hypothesis testing is very classical statistics.

In statistics, people are content to run MCMC for a long time since the data is relatively small. In ML algorithmic efficiency is more important, and people try variational and other approximations.

Take away: for most day-to-day usages (esp. in compbio) it is important to have a good understanding of basic hypothesis testing, p values, FDR, power, etc. But the basics is enough and one rarely needs advanced statistics. For more advanced applications and more complex data, ML tends to be the better framework/toolkit. Plus the infusion of algorithmic ideas rather than asymptotic makes ML much more fun to work on than statistics!

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s