Columbia Researchers Join Efforts to Pool Consumer Genetic Data
Millions of people now have access to their genome but the power of those data to reveal new insights into how genes influence health and behavior has yet to be fully tapped.
A new, nonprofit crowdsourcing project, DNA.Land wants to change that. To encourage people to share their genetic information, researchers promise to provide new findings about their genome in return. As the database grows, gaps in individual genomes will be filled, researchers say, unlocking new information about distant ancestors and individual health risks.
With enough data, researchers hope to reconstruct human migration patterns, explore the genetic origins of disease and tailor drugs to individual patients, among other ambitious research questions. A joint project by geneticists at Columbia University and the New York Genome Center, DNA.land has uploaded more than 7,500 genomes since its launch on Oct. 10.
“We never imagined the project would take off this quickly,” said DNA.land cofounder, Yaniv Erlich, a professor of computational genetics at Columbia Engineering, the Data Science Institute and the Center for Computational Biology and Bioinformatics.
The consumer genomics market has exploded in the last two years as DNA testing has grown cheaper and more convenient. For as little as $100 and a vial of saliva, anyone can order up a copy of their genetic blueprint. Companies such as 23andMe, Ancestry.com DNA, and Family Tree DNA, have reportedly recruited a million customers each. By 2025, an estimated 100 million to 2 billion people are projected to have their genome sequenced.
For now, much of the data remain scattered across multiple sites. Some companies aggregate them internally to help customers find additional relatives and health information. A growing number of third-party sites, including DNA.Land, are attempting to pool the data and think bigger. “Are we just going to let these data sit in silos? Erlich asked recently in The Atlantic. “Or can we partner with these large communities to enable some really large science?”
Erlich and DNA.Land’s cofounder, Joe Pickrell, got the idea for the project from two similar crowdsourcing sites, GEDMatch and openSNP, which have together uploaded 173,000 genomes. Thousands of other participants have been recruited through medical research projects such as GenesforGood and the Personal Genomes Project.
“Can you get to the point that instead of paying for each study from scratch, we can use the crowd to collect and repurpose this data?” Erlich asked recently in Nature News.
Security is always a concern when storing and processing large amounts of personal data. This is especially true in genomics, where highly sensitive medical information could be used to embarrass or discriminate against individuals if made public.
Erlich himself has demonstrated how vulnerable genetic data in the digital age can be. In a 2013 study in Science, he and his colleagues showed that the identities of research participants could be unmasked by linking their ‘anonymous’ DNA data with information easily found on the Internet.
DNA.Land attempts to address concerns about privacy and security. The website uses best security practices such as slating passwords and storing them with a strong hashing scheme; encrypting client communications during all procedures; and designating an independent board to vet the person managing the database, said Erlich.
DNA.Land’s consent policy is short and easy-to-read, but warns participants that the database is not impregnable. “We are doing our best, but we cannot guarantee the chance of a data breach is zero,” he said.
The researchers are confident enough to have put their skin in the game. Among the first to upload their genomes to the site, they were also quick to publicizethe benefits. Almost immediately, Erlich discovered he was related to a colleague at the New York Genome Center. “@GenomeNathan + me ~= 4th-cousins,” he announced to his followers on Twitter five days before DNA.Land launched.
If participants are willing to share their tweets, the researchers hope to draw on that data to understand the genetic basis of personality and social traits.
— Kim Martineau