Aaron Schein was weaned on politics. Both his mother and father studied at MIT with Noam Chomsky, the noted political commentator and linguist. And both were politically active, frequently discussing progressive politics at home. Schein grew up reading newspapers and books on political science and history and his first job, at 15, was working as a street canvasser for John Kerry’s 2004 presidential campaign. “I probably should’ve canvassed harder,” Schein says wryly. And his high school counselors still recall when he showed up on the first day of school carrying a copy of “the Communist Manifesto.”
It thus came as no surprise when Schein entered the University of Massachusetts, Amherst, as a political science major. He later added linguistics as a second major, and later still in graduate school he switched to computer science. He’s expected to earn his doctorate from UMass in the fall, after which he’ll begin his postdoctoral fellowship at DSI, where he’ll be advised by Professors David Blei and John Paisley, both experts in statistics and machine learning.
As a postdoctoral fellow, Schein will call upon all his interests to develop statistical models to understand and predict factors that drive voter turnout, especially new voter turnout, in American political elections. It’s important to understand the dynamics of new voter turnout, he says, since it can decide elections. Traditionally, quantitative political scientists have based statistical models on polling data pertaining to who voted in past elections. “But such a model naturally does poorly in predicting new voters,” he says, “given that new voters don’t have voting records.”
Most statisticians and pollsters failed to predict the outcome of the 2016 presidential election, and he’s eager to see how they do in predicting the midterm congressional elections. He’ll begin his postdoc after the midterm elections, but he’s hoping to analyze data from the vote. He suspects the election will be decided not by swing voters but by the level at which candidates can appeal to their bases and persuade them to vote, especially new voters. And he hopes the techniques he develops with his advisers will help improve the science of polling data.
“The question is how can we come up with new methods to forecast voter turnout, particularly who the new voters will be,” he asks?
With guidance from Blei and Paisley, Schein hopes to build a more accurate and predictive turnout model for new voters by taking into account new sources of data: News articles and media coverage of candidates; their policy positions; and the rhetoric or word choice from their speeches and writings will be analyzed for insights.
His interest in using nontraditional information to predict election outcomes stems from an interest he cultivated as a political science major at UMass, where he focused on language and politics, specifically on the role that narrative plays in communicating political opinion. He took a class in computational linguistics, where he learned Python and some tools for counting word frequencies in text documents. His final project for that class was a study of how different news entities – the New York Times, the BBC, Al Jazeera, and Dawn, a daily newspaper in Pakistan – used different language when reporting on the U.S. drone campaign in Afghanistan and Pakistan. Basically, he found that the Western outlets (NYT, BBC) focused less attention on civilian casualties than the regional (Al Jazeera, Dawn) ones, as evidenced by different word frequencies (e.g., “civilian”). Though his method for this project was simplistic, it nonetheless made him realize the power of computational methods for measuring constructs like narrative and framing.
“My hope is that if we build richer models that include things like the way in which the media is covering the candidates, their policy platforms, and the language they use, we might do a better job of predicting who will come to the polls to vote for the first time,” he says.
To get the data for his model, Schein is partnering with a polling company that has amassed demographic and survey data on U.S. voters from recent local and national elections. The data touch upon variables such as age, race, gender, income, as well as responses to survey questions about political opinions.
“I’m personally invested in this research because I think traditional polling methods encode an inherent bias against progressive politicians,” he says, “in that they often make it look like progressives have no chance of winning. But in reality, progressive platforms may attract many new voters to the polls.”
Schein’s polyglot background will help him merge politics and data science at DSI. As a college senior, he studied Farsi in preparation for applying to Ph.D. programs in international relations; he had then hoped to specialize in Iranian foreign policy. The combination of Farsi and computational linguistics skills, though, gave him the chance to work at the MITRE Corporation, a federally-funded research and development center dedicated to problems crucial to national security and foreign policy. There, he worked on a project called Social Radar for Smart Power, which was written about in Wired, and also developed natural language processing (NLP) tools for analyzing the Persian-language blogosphere. Since the U.S. and Iran haven’t had diplomatic relations since 1979, U.S. policymakers have had to rely on indirect methods of understanding public opinion in Iran, Schein says. In this work, he analyzed open-source blog posts to determine Iranian public opinion on a range of topics. While working at MITRE, he learned software engineering, machine learning and other tools of data science, which helped him to switch to computer science for his doctoral degree.
In 2013, after his first year in the Ph.D. program, he also worked as a software engineering intern at Google. He recalls Google employees speaking in hushed tones about new advances in neural networks, which over the next few years would become all the rage, He was part of a team attempting to build a system that could answer reading comprehension questions like those on the SAT. “My team was part of the machine intelligence group, which was full brilliant people,” he says. “Being there really opened my eyes to the broader landscape of AI and machine learning and what the future of technology would look like.”
At DSI, Schein feels fortunate to be advised by Professors Blei and Paisley, in his view two of the leading data scientists in the nation. For the past two years, he’s been a visiting student in Blei’s lab, which he has found inspiring.
“I’ve often left research meetings with Dave or John feeling like a cloud was lifted,” he says. “They think and communicate very lucidly, and that forces you to do the same.”
Along with being a techie and a political wonk, Schein loves music and is eager to acquaint himself with the music scene in NYC, where he now lives. At UMass, he played piano in a cover band called Loopy Belief and the Propagators, named after a classic algorithm in machine learning called “loopy belief propagation.” The members of the band, unsurprisingly, were all computer science students. Basketball is another of his passions and he played (“well, rode the bench,” he says) for his high school team. He continues to play pickup ball in local parks. Though a New Yorker now, he’s an avid fan of the Boston Celtics as well as the Maine Red Claws, the minor league team for the Celtics whose fans are known as Crustacean Nation.
With his many interests and his varied background, Schein says he’s happy to be a postdoctoral fellow at DSI, which he characterizes as inherently and seamlessly interdisciplinary.
“You feel a fearless collaborative energy from everyone at DSI,” he says. “It feels like the border between DSI and the rest of Columbia doesn’t exist. There are so many researchers working at the top of their fields at Columbia who seem eager to collaborate with the data scientists at DSI. I’m excited to collaborate with the folks in political science, sociology, journalism, public policy and experts all across Columbia.”
— Robert Florida