AI’s 10 to Watch | Daniel Hsu

February 15, 2016

*Computer science professor Daniel Hsu has been recognized for his work in machine learning, a branch of artificial intelligence. (Ryan John Lee Photo)*

An electrocardiogram tells doctors how fast and steadily the heart is beating, helping to pinpoint the source of heart problems. In the not so distant future, thanks to the algorithms that Daniel Hsu develops, computer-aided analysis could improve the accuracy and speed of diagnoses.

A computer science professor at Columbia Engineering and a member of the Data Science Institute, Hsu was recently named a rising star in artificial intelligence research by IEEE’s Intelligent Systems magazine. In addition to classifying heartbeat recordings, Hsu’s work has been applied to automated language translation, personalized medicine and privacy transparency systems.

Hsu specializes in a branch of machine learning called interactive learning. In standard machine learning, humans train a learning algorithm with hand-labeled data, for example spam and not spam. Once the algorithm learns to recognize spam it can apply this sorting method to all incoming mail. In interactive learning, the algorithm is turned loose on a much smaller set of hand-labeled data. When it finds an email it doesn’t recognize, it requests a label.

By shrinking the number of labels needed, the active learning process exponentially speeds up the process of training algorithms to do useful things. As a graduate student at University of California, San Diego in the late 2000s, Hsu helped develop an active learning method that was later applied to electrocardiograms, reducing the amount of training data needed by 90 percent.

*Hsu’s algorithms have been used in the automated analysis of electrocardiograms, among other applications. (Dr. Michael Rosengarten, McGill, EKG World Encyclopedia)*

His theoretical work on interactive learning problems has application for optimized recommendations for news, movies and ads. If Netflix or The New York Times recommends content and fine-tunes its system based on those suggestions, it ends up ignoring movies or stories you might like even more. With colleagues at Microsoft Research, Hsu has developed methods to break the feedback loop known as contextual bandits.

“Daniel’s work comes up with some of the first methods that are easy to use but endowed with strong theoretical guarantees,” said Alekh Agarwal, a researcher at Microsoft who collaborated on this work.

In another area of machine learning, Hsu has developed algorithms for Hidden Markov Models (HMMs), a type of latent variable model that infers hidden states from a series of observations. At the heart of speech-recognition systems, HMMs allow smartphone assistants like Siri and Cortana to infer written words, or hidden states, from a stream of sounds, or observations.

Hsu’s work on HMMs has been applied in genomics to understand the role of gene regulation in disease, and how the chromatin packaging a cell’s DNA may be implicated. Biologists have linked patterns of chromatin marks, or chemical changes within the chromatin, to genetic variations that cause disease. A software tool built with Hsu’s algorithms has made the process of inferring regulatory changes in the cell from sequences of chromatin marks faster and more accurate.

As algorithms get better and faster at synthesizing data, one downside is a loss of privacy. Emails, Internet searches and location trackers may together reveal sensitive information we never meant to share. Hsu also develops algorithms to protect this data. While statistical noise, or errors, can be inserted into personal data to protect privacy, the data loses its value to those aggregating it for insights. Hsu recently helped develop a method for reducing noise and preserving privacy while maintaining data integrity. He has also helped develop a tool to bring greater transparency to how our data is used on the Web.

“He’s a theoretician but has a remarkable intuition for system design,” said Roxana Geambasu, a computer science professor at Columbia Engineering and a member of the Data Science Institute who worked with Hsu on the data-transparency tool. “He quickly grasps practical restrictions, system requirements and adapts his machine-learning-statistics recommendations accordingly. He’s also incredibly fun to work with.”

AI’s 10 to Watch: Algorithms for Machine Learning, Intelligent Systems, Jan./Feb. 2016

— Kim Martineau