Synthesizing Sandy: Automating the Gathering of Vital News During Disasters
As the magnitude of Hurricane Sandy became clear, so did the problem of wading through torrents of information about the storm’s aftermath. Journalists, bloggers and everyday people all had stories to share. The ability to sum it up in real time would have provided vital information to all of those in need. It could have also improved the government's response.
But synthesizing all that data takes time, which is why computers are now on the task. Kathleen McKeown, a computer scientist at Columbia University’s Data Science Institute, is building a system that selects and summarizes key accounts of a disaster—in this case, Sandy—and weaves them into a master narrative. She spoke about highlights of her work on Jan. 29 at Columbia.
The news after Sandy fell into three main genres: mainstream news told in the third person, personal narrative shared via blogs, and conversation over social media like Twitter and message boards. McKeown and her colleagues are using natural language processing to sift through text in all three genres to extract, normalize and summarize the most relevant, compelling details.
It sounds easier than it is. To parse the news, a computer needs to identify which stories are covering the same event, and which events may have stemmed from the main event, for example the blackouts and flooding after Sandy that may not have been flagged as “Sandy” events in the news. Applying this work to other disasters, say tornados or mass shootings, might require targeting other sub-events.
On blogs and social media, a computer needs to cut to the chase and identify important conflict and action as well as extract quotes and colorful descriptions, all while overcoming the challenge of translating slang. “While there is work on what makes scientific writing good, recognizing what makes a story compelling is new research,” said McKeown, in this recent Q&A.
The work on disaster summarization builds on her earlier creation, Newsblaster, software that identifies related news stories and sums up their main points. Think Google News meets Cliff’s Notes. One future application of this research could be the translation of streaming social media in disaster zones. Many relief workers abroad do not speak the native language; the ability to get timely updates would allow responders to act more quickly and effectively.