Sela Rozov is only in fifth grade, but she ably presented a data-science project she worked on for a science fair to a group of teachers, students and data scientists who gathered at Columbia University for a workshop called Data Science in the Classroom. For her project, Sela collected and evaulated Facebook data on two advertisements, trying to understand which ad lured more viewers and why.

“It was really fun to work with data and I really liked making the visualizations for my project,” said Sela, who attends the F.E. Bellows Elementary School in Mamaroneck, N.Y.

The workshop, held Aug. 13 in the Smith Learning Theater at Teachers College, explored how teachers can instruct children like Sela in the fundamentals of data science, an emerging field that’s widely taught at universities but not in high schools and elementary schools. The Data Science Institute (DSI) at Columbia, co-sponsor of the workshop along with the Learning Analytics Program at Teachers College, is trying to change that by helping K-12 teachers learn practical strategies for introducing data science into their elementary and secondary school classrooms.

Gary Natriello, Gottesman Professor of Educational Research at Teachers College and Director of the Learning Analytics Program, welcomed everyone to the workshop, after which three DSI leaders spoke about the myriad benefits of having K-12 students study data science: Jeannette M. Wing, Avanessians Director of the Data Science Institute (DSI) and Professor of Computer Science at Columbia, offered welcoming remarks about her vision for the institute and detailed how data science is enhancing all fields at Columbia; Sharon Sputz, Executive Director of Strategic Programs at DSI, discussed how data science can help children ask the right questions about subjects and answer the questions more accurately by using data; and Tian Zheng, Professor and Chair, Department of Statistics at Columbia and Associate Director for Education at DSI, stressed how important it is for school children to think about how data are being used to affect society and their lives.

Zheng has two children, ages 10 and 7, and said she introduced experiential, hands-on learning into her data-science classes at Columbia after observing her children’s teachers. “I hope all the teachers will leave today’s workshops with some ideas for hands-on data projects you can try in your classes,” she said.

In the afternoon, two data scientists led workshops for the teachers on how to do precisely what Zeng said: Engage students with hands-on data science projects. Benjamin Shapiro, a Postdoctoral Fellow, School of Interactive Computing at Georgia Institute of Technology, began his workshop by discussing communities that are developing free data science technologies for K-12 teachers.  He mentioned CODAP by the Concord Consortium and Hans Rosling as examples of powerful technologies that teachers are using to get K-12 students interested in data science. He also cited educational tools from his research (available on his website such as a platform teachers can use to integrate physical-activity data from their students into their lesson plans. He also designed a visualization tool for teachers who’d like to discuss data about controversial social issues in America – an example being New York City’s Stop-And-Frisk policy.

“Integrating data science into K-12 classrooms is important,” Sharpiro said, “but as years of research have told us it’s critical to do so by working closely with teachers and teacher educators to develop activities, curricula, and learning technologies for data science education that are personally and culturally relevant to students.”

The other workshop was led by Ipek Ensari, a Postdoctoral Fellow at DSI who has developed hands-on projects teachers can use to interest young students in data science. Earlier this summer, with help from a team of graduate students, Ensari designed an interactive data project to engage children who attended Columbia’s Family STEM Day. During her workshop she demonstrated that project, “The Dog Data Scientists and Citizen Science,” and showed teachers how to adapt its educational components in their classes.

Working with her team of grad students, especially DSI master’s student Ameya Karnad, Ensari built a dashboard which visualizes data about dogs registered in New York City; the team collected the data from New York City’s Open Data website. The dashboard is intended to prompt children to begin asking intuitive questions such as which dogs in NYC bite the most? What’s the most popular dog name? Or which breeds are most popular in the city? Once they have formulated a question, Ensari said, the children can explore the dashboard to try to answer their questions. Once they have the right answer, they verify their results for plausibility and try to interpret their findings by asking why they might have arrived at their conclusions. The final step is for them to disseminate their findings, where they summarize their work and tweet it out on the project’s Twitter page.

“Teachers can use this project to experientially teach children a multitude of critical thinking skills about research,” said Ensari. “The dashboard teaches them how to formulate a research question, how to look for the answer in the data, how to make accurate judgements (i.e., derive results and conclusions) and how to disseminate results in a way that’s understandable to scientists and laymen. Children are natural data scientists, inquisitive about data, always asking why, and Sela is a perfect example of that; her curiosity prompted her to undertake a science project on persuasion, something that’s relevant to all of us in our daily lives.”

Hui Soo Chae, Senior Director of Research at Teachers College, who helped organize the workshop and manage the outreach to NY area teachers, said there was great interest among K-12 educators in learning more about how data science can be integrated into the curriculum, “and the large turnout of teachers at the workshop reflected this interest,” he said.

At the close of the conference, Sela, the fifth grader, gave her slideshow presentation about how viewers are persuaded to click on Facebook ads. As a case study, she analyzed two Facebook ads that promoted Rock the Ridge, a 50-mile ultra-marathon along the Mohonk Preserve in New Paltz, N.Y. She worked with officials at the preserve to get access to data about how the two Facebook ads performed. The two ads were similar, except one featured a photo of a deer while the other had a photo of a fox. Sela wanted to know which ad was more persuasive – a question that guided her research.

In total, 250,000 people clicked on the two ads, and Sela divided the data by gender, age groups and two locations: Ulster County, where the race is held, and nearby Westchester County. She used software to design data visualizations illustrating how the various demographic groups responded to the ads, showing which preferred the deer ad and which the ad with the fox. She did some basic AB testing on the Facebook data and used elementary statistics to make predictions about the ads. She rightly predicted, for example, that more people in Westchester than in Ulster would click on the ads, since this was the first time the preserve advertised in Westchester, while many people in Ulster already knew about the Rock the Ridge ultra-marathon.

During the course of the project, Sela was lucky to have advice and encouragement from her father, Yadin Rozov, a data lover who is working on a master’s degree in data science at DSI. And overall, she learned a great deal about data: She learned how to use software to design data visualizations; how to do AB testing and use statistics to make predictions; how to gather data from a major company; what free tools are available to analyze that data and perhaps most importantly, she also had a ton of fun.

“It was really fun to see how different groups of people responded to the two ads,” she said. “When adults say that children are natural data scientists, they are right because I did it!”

Sela already has a related project in mind for next year’s science fair. At home she has two dogs, one of which – a 12 pound terrier mix named Ranger – likes to jump on the family couch when no one’s home. She thus wants to learn enough code to program a video camera to monitor Ranger, and to instill the camera with the audio capacity to reprimand him whenever he makes a go for the couch.

“When we aren’t home and he tries to jump on the couch,” says Sela, “I’ll program the camera to yell: ‘HEY YOU! GET OFF THE COUCH!’”

— Robert Florida