Leaders of the Northeast Big Data Innovation Hub’s spokes and connectors came together for their first workshop last week. Here are some topics they covered, along with emerging plans for the future. A video recap of the summaries can be found online, at 5:55:40, on the workshop website.

Columbia’s George Hripcsak led the health analytics group.

Health Led by informatics researcher George Hripcsak, chairman of Columbia’s Department of Biomedical Informatics, the group will analyze patient and biological data at scale, and examine ways of harnessing data from social media, environmental sensors and other alternative sources to deliver individualized treatment. 

· The group will hold its first workshop on January 7 and 8 at the New York Academy of Sciences, “Data Science, Learning and Applications to Biomedical and Health Applications.”

· Health challenges to address include health outcomes, cost and access with a focus on behavioral change and disparities, especially those created by technology.

· The group discussed how to integrate disparate data sources and engage the public through citizen science.

· Members will submit four to six proposals currently being circulated for editing under the National Science Foundation’s $10 million spokes solicitation program.

UBuffalo’s Abani Patra led the energy analytics group.

Energy Led by Abani Patra, at the State University of New York, Buffalo, the group will explore how data analytics can help manage the massive amounts of real-time information coming from an increasingly diverse energy supply (wind, solar, natural gas and other sources) and an increasingly local delivery system.

· The group discussed challenges on 1-5 year time scale that could include collaborations among academics, industry and non-profits. Could the group create a platform for data and information exchange across the energy sector? (e.g. ConEd, NYSERDA, Company X, EPRI)

· The group discussed promoting unconventional data usage (e.g. system disruption data, traffic data) and incorporating and advancing “data science” in the design of micro grids.

· Could a database of expertise (people and skills) be created in the region? Could the use of unused data in organizational decision and policy making be promoted?

·  A workshop is planned to address grand challenges and strategies; a short retreat focused on grant writing was proposed.

The Regional Plan Association’s Sanjay Seth (left) co-led the cities and regions group.

Cities and Regions Led by urban research analyst Sanjay Seth, at the Regional Plan Association, and urban informatics researcher Constantine Kontokosta, deputy director of NYU’s Center for Urban Science and Progress, the group will look at how data analytics can improve the delivery of public services and make cities more equitable, sustainable and resilient. 

· The White House Smart City initiative is offering $160 million across several federal agencies that could be a source of funding.

· Areas of focus could include transportation, climate change adaptation and mitigation, building energy efficiency and mobility.

· Workshops could include successful case studies, a hackathon and building of community around thematic areas.

Finance Led by computer scientist Michael Kearns, director of University of Pennsylvania’s Warren Center for Network and Data Sciences, the group will apply data analytics to our increasingly automated financial markets to understand their underlying connections and vulnerabilities.

UPenn’s Michael Kearns led the financial analytics group.

· One problem to address after 2008 is systemic risk and how models might be developed to predict aggregated risk. One problem is that much of the data is held by companies unwilling to share without incentives. Potential partners include the U.S. Federal Reserve and Treasury and other regulators.

· Another problem is how to combat illegal activities such as insider trading, money laundering and cyberthreats. Potential partners include commercial banks and the FDIC.

· Cryptocurrencies could be another avenue to explore. Could data scientists team up with banks and tech companies to see what the future looks like?

Big Data in Education Led by computer scientist Beverly Woolf at the University of Massachusetts, Amherst, and computer scientist Ryan Baker at Columbia’s Teachers College, the group will look at turning behavioral feedback from online courses into techniques for teaching subjects more effectively.

UMass at Amherst’s Beverly Woolf led the education analytics group.

· The group discussed whether big data from online instructional resources might allow researchers to address ‘wicked problems’ in education, e.g., performance gaps that produce cycles of underachievement and cultural-racial differences in learning.   

· Could data analysis identify outlier children who have learning difficulties and require special teaching strategies? 

· Could online resources be developed that adapt to student’s needs and emotions and learn which components of a system are most effective? By analyzing this unique data, researchers might learn how to teach difficult topics for students. 

· Future workshops could focus on teaching analytics that support student retention and graduation.

· Hundreds of companies are building online instructional systems. However, two companies that focused on big data have already gone out of business − InBloom and Amplify.  The group discussed the need to learn from successes and failures. 

MIT’s Chris Hill co-led the discovery science group.

Discovery Science Led by computational earth scientist Chris Hill at MIT, and computer scientist Manish Parashar, director of the Rutgers Discovery Informatics Institute, the group will look for ways to accelerate discovery in the natural sciences by applying machine learning tools and large scale hardware and software systems to massive amounts of observational data.  

· One challenge identified by the group is bringing together experimental data and research questions.

· Data preservation, accelerating the timeline for scientific discovery and citizen science/data collection are all possible areas of focus.

· Potential workshop topics include artificial intelligence and digital data curation—infrastructure and sharing data among domains and regions.

Education Led by computer scientist James Hendler, director of the Rensselaer Institute for Data Exploration and Applications at RPI, the group will develop data science education materials for K-12, college, and continuing and online instruction, and with the New York Hall of Science and other organizations develop public exhibits related to data analytics.

The New York Academy of Science’s Stephen Uzzo helped lead the education group last week.

· The group proposed a resource review on big data literacy projects underway. What education, skills, mentoring and tools efforts are out there? This list could eventually be published online, and a related workshop developed.

· The group discussed a workshop on applying big data literacy to K-12 education, corporate training and lifelong learning and developing a literacy framework.

· The group will start circulating a Google doc among group members and other interested parties to assemble a resource review.

Data Sharing Led by computer scientist Sam Madden at MIT’s Computer Science and Artificial Intelligence Laboratory, the group will study platforms and formats for regional data sharing, including software to allow researchers to annotate and publish their own data.

MIT’s Chris Madden led the data sharing group.

· The group discussed common themes to Maslow’s “Hierarchy of Needs” and data sharing; it starts with data collection, storage and networking and culminates with fully actualized applications.

· What data is there to share (big sciences, medicines) and how do we reduce the friction to sharing? Can people access what they want without totally releasing the data?

· The group discussed how it might allieve data science drudgery: 80/20 cleaning and collection vs. analysis.

· Discussed leveraging private sector companies to share data and incentivize commercial data sharing.

· The group discussed creating one or more “tiger teams” to address challenge problems in a spoke area. Is it possible to create a standard operating procedure?

· A data sharing workshop could be held next summer with participants bringing their own dataset (BYOD!). Another workshop could study success stories (LANDSAT; Rio flood/mudslide tracking; Haiti Red Cross).

Syracuse University’s Jennifer Stromer-Galley co-led the ethics and policy group.

Ethics and Policy Led by digital communication researcher Jennifer Stromer-Galley at Syracuse University, and technology researchers Mark Latonero and Karen Levy at the Data & Society Research Institute, the group will focus on questions tied to the ethical collection and use of big data, including consumer and health information.

· The group discussed whether the group could define “big” as a trip wire for ethical policies. One challenge is anticipating what will raise ethical concerns as technology advances.

· The group discussed the need for guidance on human subjects research within companies like Facebook. A future workshop could bring together tech companies, government and academic researchers to come up with best practices for ethical collaboration.

· Other potential workshop ideas include meeting with spoke members to identify ethical challenges in their domain and discusing research ethics involving large-scale data from humans subjects that doesn’t fall under Common Rule.

Penn State’s Adam Smith led the privacy and security group.

Privacy and Security Led by computer scientist Adam Smith, at Penn State, the group will focus on how to keep data safe but accessible at a wide scale all while protecting individual privacy.

· The group discussed the security and privacy challenges involved in big data, including the collection and storage of sensitive data, the delegaton of computation to the cloud and the sharing of data.

· Potential connector workshops could address barriers to data sharing and security of cyber-physical systems.

· Potential partners include members of academia, industry, nonprofit and goverment groups.

— Kim Martineau