Looking for a dream job in data science?
There’s now an app for that.

Indeedoor combines job ads with industry ratings to list the best data science jobs in NYC.

Indeedoor, designed by a team of data science students at Columbia, merges job ads from Indeed with industry ratings on Glassdoor to list the best data science jobs in and around New York City. It also offers resume-optimization tips and insights into the values that drive employee ratings of their industry.

Data scientists go by many names, from data architect to business analyst to data visualization developer. Indeedoor pulls in job ads for those terms and more. It also provides a glimpse at how data science jobs vary geographically. Data scientist employers in Atlanta, for example, home of the U.S. Centers for Disease Control and Prevention, are most likely to use “health” and “medical” in their ads, while “machine” and “learning” tops the list in tech-centric San Jose, Calif.

The app includes a “Ratings analysis” model that analyzes employee rankings of their industry, allowing job seekers to explore the variables that influence “overall” and “recommend to friend” ratings. The team discovered that  employees in the Cable, Internet and Telephone Provider industry weight the variable “culture and values” relatively low in their ratings while investment bankers weight culture heavily when recommending their industry to friends. Other surprises: employees in the real estate and news industry rate “work life balance” highly despite the odd and sometimes long hours.

With its focus on New York data science jobs, Indeedoor also offers insight into the minds of area employers. From a small list of recent job postings, the team found that Python and Hadoop were the most sought after skills, and Statistics, Mathematics and Machine Learning, the most desired coursework.

Mike Malecki looked on as the app was presented in his data visualization class this spring.

For now, Indeedoor is limited to data science jobs in New York, but other cities and professions could be added with further development, the team said.The app was built as a final project in Mike Malecki’s Exploratory Data Analysis and Visualization class. Here, Indeedoor’s creators–Jade Bailey-Assam, Lucy Drotning, Christine Lee, Shruti Pandey and Janet Prumachuk—explain how they made the app and what they learned.

Who should use this app?

Bailey-Assam: Anyone looking for a job in data science, especially in New York City. You can see job listings merged with company ratings and optimize your resume within a single app.

Pandey: The app also lets you explore the data science field by geographic location and value systems across industries.

How did you get the idea to merge data science job listings with employee satisfaction ratings?

Bailey-Assam: Job seekers commonly search job listings, then research the company, or vice versa. We thought it would be useful to see both sets of data in one place.

Pandey: Leveraging both datasets allows you to look for a job in a more unified way.

Any surprises?

Prumachuk: We found that companies actively recruiting data scientists appear to have happier employees. For instance, employees give investment banks that are hiring data scientists a 3.6 rating out of 5, compared to a 3.3 average rating for all investment banks.

Employers look for data scientists by many names. What terms did you include?

Prumachuk:   Indeed’s API call for “data+science” is overly broad, returning too many jobs that have nothing to do with data science.  We improved relevancy by filtering on “data” and “analytics” in the job title.  

In the “Resume help” section you list the most common words to pop up in job listings in the categories: “skills,” “coursework,” “expertise” and “buzzwords.” Any surprises?

Bailey-Assam: It was interesting to see Python, Hadoop and MapReduce top the list for skills, and nice to see Statistics and Machine Learning top the list for coursework, since both are required in our data science certification program. I was surprised to learn “digital” isn’t much of a buzzword.

How did you pull your data in?  What were some challenges?

Prumachuk:  We used REST APIs provided by Indeed and Glassdoor. Requests are made in the form of an HTTP GET. We refresh our company statistics from Glassdoor each week.

Bailey-Assam: To create the “Resume help” section I needed full-length job listings to do text mining on but the Indeed API provided only short snippets. I found a second API that included a link to the original job postings. By scraping about 10 web pages, I compiled a large enough sample to analyze for a proof of concept.

Pandey: After scraping the data, we realized that the CSV files were slowing down our app. So I set up a MySQL server and loaded the data in a relational database which makes searches and other operations faster by decoupling them from our analysis in R.

The app includes a model that analyzes the values that contribute to employee ratings of their industry.

Why did you build Indeedoor in Shiny? 

Pandey: Shiny lets you build interactive apps while allowing you to write complex algorithms in R in the backend. Anyone with a web browser can access our data. The power is in the interactivity – Shiny makes it easy.

Prumachuk: Shiny also also uses Twitter Bootstrap, which renders your application properly on mobile phones and in any browser.  

What obstacles did you face in building the app?

Prumachuk:  We thought about analyzing LinkedIn profiles and using Glassdoor data to make company recommendations. When LinkedIn announced it would start restricting access to its data in May, we decided to pull job postings from Indeed instead.

Pandey: Data scraping. Some text fields were not very clean and got difficult to handle. The APIs were not clearly documented.

With more time, would you have changed anything?

Prumachuk: We would have added location awareness and filtering options to the map and converted the industry graph to d3.js and animated the transitions. We also could have added a resume analysis tool to look at sentiment or latent themes. 

Bailey-Assam: I would have added a visualization to the “Resume help” section and possibly changed how I grouped keywords into categories.

Pandey: We would have made the app more general. Data science was a proof of concept but our ideas and techniques could apply to any job search – in accounting, HR, marketing. With time, we would have pulled in job listings from other fields.

What was the most important thing you learned?

Bailey-Assam: The power of collaboration and tackling a problem with different skill sets and personalities.

Lee: Each person brought different ideas to the table and as a class we were challenged to create useful applications with data analytics and programming tools. 

The students who developed Indeedoor are: (left to right) Lucy Drotning, Jade Bailey-Assam, Janet Prumachuk, Shruti Pandey and Christine Lee (not pictured).

What was the best piece of advice you got during this project?  

Bailey-Assam: Make something useful, not just pretty.

What happens next?

Prumachuk:  We have something tangible to show employers and some reusable components.  We also plan to track usage over the summer.  

Bailey-Assam: I plan to turn the “resume help” section into a standalone app when I take “Algorithms for Data Science” this fall. I hope to write an algorithm that will text mine and group job keywords automatically.

Pandey: I’d like to extend the app to other job categories.

Lee: I’d like to create a section that highlights the most active data science employers by region, for example, in San Francisco, Washington D.C., and Boston. 

— Kim Martineau