A Q & A with Casey Huang | Detecting Fraud with Data Science
June 27, 2016 
               
              Born and raised in Shanghai, Casey Huang left mainland China as a teenager to study risk management at the Chinese University of Hong Kong. After graduating in 2013, she went to work at the Hong Kong hedge fund, Orient Asset Management, doing quantitative analysis, portfolio rebalancing and risk monitoring. She came to New York in fall 2014 to study for a master’s in data science at Columbia and graduated in December with the highest grade point average in her class. She now works at the risk assessment firm, Verisk Analytics, in Jersey City.
How did you get interested in data science?

I was a working as a hedge fund analyst in Hong Kong when I stumbled across Google’s trend page, which lists stories trending in the U.S. and abroad. I was impressed by how much power data can give us. Since I had a statistics background, I thought that improving my coding skills would open up new career options beyond finance. Columbia’s program seemed to offer everything I’d need, so I went for it.
What made you leave Shanghai to study in Hong Kong for college?
Hong Kong’s academic system is similar to the U.S., and I thought it would give me freedom to follow my curiosity. Culturally, it was a big adjustment, but I like being challenged and the diversity has made me more open-minded. It also prepared me for an even more diverse city —New York!
You picked up guitar in New York. How is learning to play an instrument like solving a big data problem?
Both require passion, practice and a certain amount of logic—understanding how algorithms and chords do what they do. Sometimes I feel frustrated that I’m not getting anywhere, but then something clicks and I realize I’ve learned something new. Sweet moments.
You taught yourself Python. How did you manage that?
I already knew R and C, and started learning Python when I came to Columbia. I needed Python for Algorithms for Data Science and realized it would also be useful in my other classes. I read the documentation to understand the data structures and basic functions, and tried some examples. When I got stuck on coding I googled the problem. The more problems I tried, the better I got.
What kind of work do you do at Verisk Analytics?
I’m in their Data Excellence rotational program for new graduates. My first placement is with the insurance claims department, which helps to design products that allow insurance companies to investigate and combat fraud. My work includes building new data tables, cleaning data, building predictive fraud detection models, making dashboards and presenting results to management. This is the first of three departments I’ll work in to become familiar with the company’s data assets and products.
I’m currently helping to build an intelligent indicator that will help our customers—insurance companies—spot suspicious claims and expedite the genuine ones. It’s an exciting time for us. With comprehensive claims data and the potential to bring in outside data we can start building accurate prediction models.
What’s your favorite type of data?
I like working with various types: text, time-series and images. Each data type captures extra information that when integrated into a model may improve the decision-making process. I like learning new techniques to interpret and synthesize different data. I spent a lot of time at Columbia and my internships processing text data, from Wikipedia entries to social media posts. Text continues to be a powerful mode of communication. It’s interesting to see what people are talking about, apply methods that allow programs to understand and organize massive text, and exploit it through targeted marketing and other use cases.

What were your favorite classes?
Algorithms for Data Science gave me the tools to solve harder problems and look for optimal solutions. In Machine Learning for Data Science I learned the fundamentals of why models do what they do. It’s a theoretical perspective you won’t learn on the job. In the Data Science Capstone class, I experienced what it’s like to work with a team on a project from start to finish, from collecting, cleaning, exploring and analyzing the data to presenting it to a live audience.
What problem did your capstone team take on?
We partnered with Synergic Partners, a consulting firm in Spain, to analyze Twitter communities organized around Big Data. We identified four major communities: a diverse global community, a second defined by language (French or Spanish speakers), a third defined by specialty interest (marketing, Search Engine Optimization or Internet of Things), and a fourth focused on data science related events. We also identified influencers who bridge these communities, people like Kirk Borne, Booz Allen’s chief data scientist. Data collecting took up half of the three months we had for the project, so we had to focus on what we and our mentors thought were the most promising directions.
Any advice for incoming students?
Learn as much of the foundational and theoretical elements as you can. No matter which direction you choose for your career, you will benefit from a solid background in algorithms, databases and statistics. It also helps to know whether you want to focus on data engineering, machine learning, business intelligence or visualization. If, like me, you have no idea, New York offers plenty of opportunities to explore, from internships to Kaggle projects and social events.
— Kim Martineau