My First Year as a Data Scientist: A Q&A with Sunanda Koduvayur
Sunanda Koduvayur was a physicist on the hunt for materials to replace silicon in computer chips when she realized she was having more fun playing with the data. The days spent analyzing her results were the most exciting. So, she traded in long hours tinkering with scanning electron microscopes and fixing leaks in vacuum chambers to become a data scientist. She finished our data science certificate program last spring and now works at DataXu, an ad tech startup in Boston that buys online ads for clients such as Ford, AmEx and Lexus, as well as traditional ad agencies.
At DataXu, Koduvayur wrangles data and applies machine learning and predictive models to bid on and purchase ads that will get maximum exposure for the firm’s clients. She holds a PhD in experimental condensed matter physics from Purdue University and completed her postdoctoral fellowship at Princeton University.She recently returned to campus to speak on a panel, My First Year as a Data Scientist, cosponsored by the Columbia Data Science Society and moderated by incoming master’s student Jerui Song. In the excerpt below, Koduvayur offers advice for aspiring data scientists.
What does your average workday look like?
It ranges from pulling data from our data systems (Hadoop Distributed File System), cleaning and exploring the data, to applying predictive and statistical models to glean business insights from it. As a member of the innovation and analytics team, I get to explore further than the immediate business question at hand. We are constantly looking for hidden stories that can lead us to new solutions and offerings.
Can you describe a recent project?
I recently modeled the lifetime of a cookie, the tag that gets stored in a user’s web browser and records her activity each time she visits the site. Advertising doesn’t always lead to an immediate purchase, and so cookies allow companies to track the lag between ad exposure and purchase. Understanding the delay allows companies to refine their message and avoid annoying consumers (who wants to be nagged about unpurchased items in their shopping cart?). Modeling the life of a cookie helped us change some of our assumptions about average lag from ad exposure to purchase.
What are the best and worst parts of your job?
Worst part–quick project turn over. You never get the chance to go as deep into the data as you can in academia.Best part- quick project turn over. Day to day and week to week the projects I work on are constantly evolving. It’s impossible to get bored.
How much coding and statistics do you do?
About 80 percent of my time is spent on regressions and simple clustering. As a member of the innovation team, I also look for ways to incorporate new machine learning techniques into our work.
How do you keep learning on the job?
Most companies allow you to set aside a percentage of your quarterly goals toward personal development. Ask about this when you interview. If they don't care about your personal development, you probably don't want to be there! Go to hackathons and meetups to stay current.
What was your hiring process like?
It was half technical, half culture. The technical part is usually a timed test. Most companies are tool agnostic. So while you should be familiar with a few of the in-market tools, be super strong in one. Also, have at least one cool project to discuss. Talk about the challenges you encountered, how you solved them and what excited you most.
Graduate school or job?
I wish I had gotten into the workforce earlier but I can't say grad school was useless. The training definitely helped me, plus data science is one of the few fields where you can work in industry and still contribute to leading conferences and journals. You may not need grad school, but it can help you decide whether to choose a research or application track as a data scientist.
Where do you see data science heading?
In the last year or so I’ve seen the field mature tremendously. Employers now require stronger programming skills of entry-level employees and greater attention is being paid to the questions that we ask of models to make sure we measure the right things. There is lots of opportunity for people who want to build data science teams and align the company's goals with analytics.
What soft skills help?
Be able to take a step back after immersing yourself in numbers for weeks on end. The ability to understand what the numbers are telling you is paramount. Einstein said, “If you can't explain it simply, you don't understand it well enough.” I try to live by those words. Remember that your model will be judged by its accuracy and your ability to explain it to the least technical person in the office.
Can you give one last piece of advice?
Use your time in school to master the tools that are necessary to succeed. Go deeply into the theory behind many of the models /statistical tests you learn, because you will never have the time to learn them on the job. Practice your networking skills and take advantage of Columbia’s great career-services department.