Not only did Tanvi Pareek graduate from Vellore Institute of Technology (VIT) with a bachelor’s degree in computer science, but she was also a nationally-ranked basketball player in India before enrolling in the M.S. in Data Science program at Columbia University. Today, the 2021 alumna works as a data scientist at Here, she shares a bit about her experience in graduate school.

What helped to pique your interest in data science?

At VIT, I had a choice to pick my electives and some of the courses I picked were machine learning, data visualization, AI, and web mining. These courses exposed me to the basics and pushed me to explore more. Doing internships as a data scientist really helped me understand that this is something that interests me and I would love to pursue my career in this field. The power of data in making decisions is what inspired me the most. I wanted to learn the process of collating enormous amounts of data, and apply the right techniques to get valuable insights. Pursuing the M.S. in Data Science at Columbia really helped me with this.

How did your undergraduate experience prepare you for graduate studies in data science?

Doing my undergrad in computer science really helped me, plus the choice of electives I had helped me pick data science-related courses. I would say having knowledge of Python, machine learning, and statistics is essential and, more importantly, getting hands-on experience by doing small projects and internships really helped me a lot.

Why did you choose to come to Columbia for the M.S. in Data Science program?

Firstly, [Columbia] has one of the best data science programs and a well-organized curriculum, which includes the choice of electives. I did some research regarding the professors and courses and research assistant opportunities, which were a really big factor in choosing Columbia. Furthermore, the availability of the Industry Affiliates Program would help me acquire adequate practical knowledge before stepping into the real world.

How did the coronavirus pandemic impact your experience during the M.S. in Data Science program?

I continued staying in New York City with my roommates. I obviously feel bad that I couldn’t get the classroom experience, and I missed interacting with new people and professors. The job and the internship search were also difficult and that added to the stress. My Columbia experience was a roller coaster ride. This has really helped me build my perseverance!

Tell us about your summer internship.

I did one internship last summer at Hindsight Technology Solutions. It was challenging for sure as both my projects were problems I had never solved before, but along with another intern, I was able to provide a working solution at the end of the internship which was really exciting. I got to learn about applying statistical techniques to real-world problems and also learned about hierarchical classification.

What was your favorite course during the program?

I had two favorite courses: Machine Learning taught by John Paisley and Applied Machine Learning taught by Andreas Mueller. The ML course helped me understand the basics as well as the details of the algorithms really well, which, according to me, is very important. The AML course taught me how to apply that theoretical knowledge to practice. The assignments were very lengthy, but very very helpful I have to say!

Tell us about your capstone project.

Our capstone project, Identifying Patients’ Missing HS (hidradenitis suppurativa) Diagnosis, was with Columbia University Irving Medical Center dermatology professor Lynn Petukhova. It was a great experience as this was my first time working with health data. One thing that was very different about this project in comparison to all the projects I previously did was the feature engineering process as well as the model evaluation process. The subject matter expertise and all the information specific to HS disease helped improve the model performance.

— Sharnice Ottley