Andreas Mueller: New Associate Research Scientist at the Data Science Institute
Andreas Mueller is working hard to make machine learning easier to use. His book, “Introduction to Machine Learning with Python,” describes a user-friendly approach to machine learning using Python and scikit-learn. He is a core developer of scikit-learn, an open source machine-learning library, and has co-managed this toolkit for several years. And that explains why he has 30,000 twitter followers, who contact him for authoritative advice on applications of machine learning, a field that enables computers to learn without being explicitly programmed.
Last year, Mueller also received National Science Foundation (NSF) grant to create software to make machine learning easier to use. The NSF is funding his project to develop automatic machine learning, which means users will not have to select an algorithm for their data. Mueller’s software will automatically select the best algorithm for the user’s project. It not easy to select the right algorithm, says Mueller, especially for lay people who can be baffled by the complex choice of data processing, knobs and settings.
“Researchers who develop new algorithms often don't provide them in a way that’s useful for a wide audience,” adds Mueller, an associate research scientist at the Data Science Institute. “While there have been some attempts at creating software packages that do this task, they are not tuned towards simple day-to-day use. This grant will help me to create something that is end-user friendly instead of research-quality software.”
Once developed, Mueller’s software tools will supplement scikit-learn, reducing the level of expertise required to apply models to a problem. The tools will have a interface requiring minimal user interaction as well as easy-to-understand documentation on how to use the software. Machine learning has been at the forefront of recent technological advancements: self-driving cars, computer vision and speech-recognition systems being three examples. It’s increasingly used in academia, industry and government, but its use by non-technical people has been curtailed by the complexity of choosing the right algorithm.
“This project will make it much easier for biology majors, astronomy majors and people from the liberal arts to use machine learning to solve problems in their fields,” says Mueller.
At DSI, along with his research, Mueller teaches the applied machine learning course as well as a Capstone course, a semester long class where students divide into teams to work on data-intensive projects. This semester, he is supervising Capstone projects on renewable energy and generic entity resolution medical complications following hip surgery. He’s also an academic adviser for the DSI masters’ students.
Before coming to Columbia, he was a Machine Learning Scientist at Amazon Development Center in Germany, where he designed and implemented large-scale machine learning and computer vision applications. He also worked as a research scientist at NYU Center for Data Science, during which he developed open source tools for machine learning and data science. He has a diploma in math as well as a Ph.D. in computer science from the University of Bonn, Germany.
He says he started out in pure math, “no numbers at all,” just pure algebra.” After finishing his master’s in mathematics at the University of Bonn, though, he approached a robotics professor about a Ph.D. position. Mueller wanted to work on real-world problems and a newspaper story about a robotic soccer team developed in the professor’s lab had intrigued him. The professor, however, steered him toward computer science and machine learning, where his math skills would be more useful. And in 2014, Mueller graduated from Bonn with his Ph.D. in computer science. He spent a year developing computer-vision software in Amazon’s Berlin office before moving to New York to take the research job at NYU.
Now, as an associate research scientist at DSI, Mueller is teaching, working on his NSF research, advising students and making machine learning more accessible to a wide audience.
“It’s become my mission,” he says, “to create open source software to make machine learning available to everyone who has a problem to solve.”
By Robert Florida