Andreas Müller | Bringing Machine Learning to the Masses

January 9, 2017

*Andreas Müller will teach a new DSI course this spring, Special Topics in Computer Science | Applied Machine Learning.*

This fall Andy Müller finally gave his 13,000 Twitter fans what they’d been clamoring for: a layman’s guide to machine learning. The book, Introduction to Machine Learning with Python, is a continuation of work that has made him the go-to engineer for questions on using Python for data science.

For five years Müller has led the daily upkeep of scikit-learn, Python’s suite of data-analysis tools, and attracted a growing following on Twitter for his authoritative advice. He will soon wear a third hat, teaching Columbia’s first course in applied machine learning this spring while running a clinic for researchers grappling with big data problems. Soft-spoken and approachable, he seems to enjoy the challenge of helping problem-solvers overcome technical hurdles.

He started out in pure math, “no numbers at all,” as he puts it, “just pure algebra.” After finishing his master’s in mathematics at the University of Bonn, Müller approached a robotics professor about a Ph.D. position. He wanted to solve real-world problems, and a newspaper story about a robot soccer team hatched in the professor’s lab had caught his interest.

But the professor steered Müller toward perception and machine learning, where his math skills would be more useful, and in 2014, Müller graduated with a Ph.D. in computer science. He spent a year developing computer-vision software in Amazon’s Berlin office before moving to New York to take a research position at NYU. He arrived at Columbia this fall and recently took time out from emails and coding to talk about work, play, and why software developers should strive to eat their own dog food.

What did you at Amazon?
I helped develop acomputer vision application for Amazon’s Marketplace that checks whether clothing offered for sale meets Amazon standards: white background and no mannequins or nudity, among other things. A second project involved forecasting daily apparel sales for the coming year. Demand, as you can imagine, changes over time. The goal is to stock just enough product to deliver quickly.

*Words in larger-size fonts appear most frequently in Müller’s new book, Machine Learning with Python: A Guide for Data Scientists.*

Any take away lessons?
I came to appreciate how hard it is to make good, fine-grained predictions from massive data sets.

In the word cloud version of your book (left), the words “data,” “feature,” and “model” jump out. Why those words?
Well, they appear most often, but also represent core machine-learning concepts. I had a friend ask if “model” should come first. That makes sense for some tasks, but I would call that statistical inference. Machine learning is about making predictions, often in the future, based on past data. The model matters in so far as it makes good predictions. If you’re asking an inference question, like “Can this medicine treat cancer?” you’re in completely different territory. There may be no past data to rely on. Your inference will be accurate only if your model matches the process you’re trying to investigate.

That visualization comes from a program you developed yourself in Python. Why?
I wanted to make it easy to produce word clouds in any shape. In this example, the most frequent words in the Wikipedia entry for “rainbow” are visualized as a parrot. I can usually tell if someone has used my program if smaller, less common words are packed between the large letters of the most common words.

So, it’s popular?
I’ve seen printed fliers on the streets of NYC that used it, and once saw it on a video screen in a Philadelphia hotel lobby, so I think a bunch of people are using it.

*Müller’s new book is geared toward programmers without extensive math background.*

How did you get started with scikit-learn?
I was in graduate school looking for tools to apply to my research, and scikit-learn had the best, easiest ones to use. I contributed with a few simple pull requests and then asked if the developers would fly me to a sprint they were hosting at the 2011 Neural Information Processing Systems (NIPS) conference in Grenada. When it ended they asked me to be the new release manager.

What does a release manager do?
I oversee the project’s daily maintenance with a few others. We write code, review code contributed by others, and find and fix bugs. While the software is constantly evolving, it’s occasionally “released” under a new version number in an easy-to-install package with our stamp of approval. I guess I’m more of an editor.

Do you get paid?
I was supported by grants from the Moore and Sloan foundations when I was at NYU. At Columbia, I juggle my scikit-learn work with my paid teaching responsibilities. Though software development is core infrastructure for data science research and applications, it’s basically unfunded.

You once competed in a Kaggle contest to build a system for flagging hateful troll comments. How did it work?
I used a variation on the classic “bag of words” approach. You throw all the words into a big bag and count how often each word appears, or how often two words appear consecutively, called “bigrams.” While “your” and “mother” are not that informative on their own, the phrase “your mother” is. Once text is converted to bigrams, I used a classification model from scikit-learn, a linear support vector machine, to flag insulting comments.

Did you win?
I came in sixth place but the evaluation metric I built led to an entirely new evaluation tool in scikit-learn. It’s now an integral part of the library. This shows why software developers should use their own tools. We call it eating your own dog food.

What goes into maintaining scikit-learn?
We receive a hundred new comments each day. I try to check them all out, but rarely have time to go through each one in detail. The comments typically include 10 to 20 pull requests, which are edits submitted others. It’s like managing a Wikipedia page. You need to carefully review which changes to accept.

Everyone seems to define data science differently. What does data science mean to you?
I like to think about methods versus applications—methods are data-independent algorithms that can accomplish many different tasks, and applications are the research questions in psychology, astronomy, medicine, and so on, that data scientists try to solve. Data scientists are comfortable with the methods, and try to apply them to domain-specific problems.

*At the request of a scikit-learn user, Müller drew the above flow-chart to explain which algorithms to use on various problems. It was meant to be a joke, he* wrote on his blog at the time, but beginners ended up finding it useful.

What’s a typical day like?
I start by reading emails, and catching up with the scikit-learn project. There’s usually a talk or meeting in the afternoon, but mostly I’m in front of my computer, coding, reviewing code, and writing course material.

What was the hardest concept in Python to master?
I started Python during my Ph.D., in 2009. I already knew C++ and some perl, so Python was straightforward except for array-centric computing because I hadn’t done numeric computations before. You try to think about how arrays of numbers can be transformed to speed up computations.

Do you think machine learning will advance enough that machines eventually outsmart humans?
Machines are better at Go than humans, but even 20 years ago, a machine could have beaten me at chess. So, is it smarter? Computers are generally better at narrow, repetitive tasks like screening patient-X-rays for cancer. Unlike a doctor, a computer has the chance to learn from millions of records. No wonder it performs better.

As we feed computers more data and expand their skillsets, will they take over our jobs?
That’s happening already. The better the algorithm, the fewer employees needed. Amazon’s automated image-quality controls are replacing human curators. Something similar occurred during the industrial revolution when machines took over low-skilled jobs. The issue then, and today, is how to retrain people at the bottom of the economic ladder who are most likely to lose their jobs. Unless we are careful, increasing adoption of AI may add to income inequality.

Do you have any advice for job-hunters?
Whether you’re looking for a job in computer science or data science, make sure you understand data structures and programming patterns. You will be asked to implement a linked list on a whiteboard. Don’t ask me why.

What’s a linked list?
It’s the most simple data structure you learn in CS 101. It can store a list, and lets you add and remove items.

What do you for fun outside of work?
Photography—I recently got a Sony alpha 7R-II and am saving for a 70-200 F2 —because acronyms and serial numbers are great. I did kite boarding in Berlin, but it’s tricky to find large open spaces in Manhattan.

What makes Columbia’s data science program unique?
I haven’t taught or taken a course here yet, but I think the mix of world-class research, collaboration across schools, and close ties to industry are quite attractive.

Further reading: We Make the Software You Make the Robots, O’Reilly Media, 2015.

— Kim Martineau