DSI Research Scientist Kriste Krstovski Uses Data Science to Transform Several Fields

June 11, 2020

Kriste Krstovski has a strong background in quantitative reasoning. By training, he is a computer scientist, with specialties in natural language processing, machine learning, and information retrieval. Now, as an associate research scientist at the Data Science Institute (DSI) and adjunct assistant professor at Columbia Business School, he is using advanced techniques such as probabilistic modeling and Bayesian latent-variable models of text to enhance a variety of fields.

He’s working on several research projects that show the depth and scope of his interest and expertise. His research portfolio covers business, finance, sociology as well as the internet and the news media. Some of the specific topics he studies include inequality in the labor market; the effect of news articles on the stock market, and how “fake news” goes viral. He’s especially interested in textual data and developing methods that help people mine large quantities of text data to solve research questions.

“We live in a complex society in a global world where on a daily basis we ingest a plethora of information from written text,” says Krstovski. “Developing AI tools that help us understand how textual data affects and improves certain aspects of society can ultimately enhance our daily lives.”

Additionally, as an adjunct assistant professor at Columbia Business School, Krstovski is teaching a new generation of business students to understand and apply the most advanced data science techniques while helping enhance the school’s data-driven curriculum.

Teaching:

For the business school, he helped create a doctoral-level course, Big Data in Finance, which explores the use of data science methods to analyze big data. The course is taught by six professors, and he teaches the section on data science, which students use to assess data compiled in financial reports, filings, news, and product reviews. The immense data generated by the financial industry is difficult to analyze, and techniques such as natural language processing (NLP), machine learning (ML), and information retrieval—Krstovski’s main areas of interest and expertise—are effective ways for students to find meaning in an ocean of data. In the spring, he’ll also introduce a new doctoral-level course on NLP. It will cover an array of NLP topics, including statistical models of text such as word-embeddings and topic models.

Costis Maglaras, dean of Columbia Business School, says Krstovski’s efforts are “raising the data-driven research and teaching profile of Columbia Business School.”

“Data science is becoming increasingly more integral to our curriculum, as many of today’s businesses use data science to solve real-world problems,” adds Maglaras, a member of the Data Science Institute who formerly served on its executive committee. “Columbia Business School is emerging as a leader through researchers and teachers like Kriste, who use techniques like NLP and ML to inform business research and teaching.”

Advising:

Krstovski is also a faculty adviser for DSI’s capstone project, in which students are introduced, led, and supervised on independent semester-long projects with guidance from industry affiliates. In 2019, he advised a capstone team that partnered with Unilever on a project that used NLP and machine learning to create tools that can analyze patents. In this instance, Unilever wanted to evaluate patents filed by competitors. The goal of developing these tools is to provide insight that could help Unilever plan its research. In a separate project in 2018, he also advised a capstone team that developed a proof of concept for an intelligent bot that could automatically prioritize emergency calls in critical situations. The bot could monitor incoming calls, decide which are most urgent, and send those calls first to dispatchers.

Research:

Given his strong quantitative knowledge and skills—NLP, ML and information retrieval—Krstovski is a sought-after researcher and academic adviser who is collaborating on several research projects at Columbia. For one project, he is working with Yao Lu, sociology professor and DSI member, to study inequality in the high-skilled job market. Despite the educational strides of women and racial minorities during the past decades, inequality based on race and gender still persists in the U.S. labor market, especially in high-skilled jobs. To understand this inequality, the two are using NLP and ML to scan millions of online resumes of skilled employees. They’re also scanning employee reviews on sites such as Glassdoor, and using data science to identify factors and environments that shape gender and racial inequality in the high-skilled labor market.

Krstovski is collaborating as well on a second project that studies how news articles affect stock-market returns. The project, led by business professors and DSI members Harry Mamaysky and Paul Glasserman, looks at which news topics most influence a stock’s performance. They also intend to develop a generative, topic-based, model of company returns during trading and non trading hours.

He and DSI member Bruce Kogut, business and sociology professor, are working on a project to identify what aspects of news stories, even “fake news,” cause them to go viral. The two are using NLP and ML to examine thousands of new stories from online media channels, paying particular attention to the emotions generated by the stories and the images in them.

In addition, he’s collaborating on a project led by DSI’s Smaranda Muresan that won first place in the Fact Extraction and Verification (FEVER) challenge. The team used NLP techniques to create an end-to-end fact-checking system for claims based on evidence retrieved from Wikipedia. The team will present the system at this year’s Association for Computational Linguistics (ACL) conference.

Developing technology:

Krstovski also recently led the development of the search capability, or the information retrieval component, for a Columbia website called Covid-19 Hub. The site, a repository of research on Covid-19, makes it easy for anyone to locate Covid-19 research projects at Columbia. The search engine was created in a matter of days—work Krstovski was able to accomplish given his strong background in information retrieval. During his doctoral studies, he belonged to one of the world’s leading laboratories on information retrieval—the Center for Intelligent Information Retrieval at UMass Amherst. As a predoctoral fellow at the Harvard-Smithsonian Center for Astrophysics, he also worked on latent-variable models of text for a NASA sponsored information retrieval project.

In the end, Krstovski says it’s hard to balance the various aspects of his professional responsibilities—researching, teaching, advising students, and helping Columbia develop technologies—but that he enjoys the challenge of using AI to transform and enrich various disciplines.

“Being a member of DSI has allowed me to expand my research knowledge and skills into many other fields,” he says. “I enjoy the transdisciplinary environment that DSI nurtures, which challenges and motivates me to work with domain experts from across the university—all with the ultimate aim of using data for good.”

— Robert Florida