David Paul Morris/Bloomberg via Getty
Cloud processing of DNA sequence data promises to speed up discovery of disease-linked gene variants.
The dream for tomorrows medicine is to understand the links between DNA and disease and to tailor therapies accordingly. But scientists working to realize such personalized or precision medicine have a problem: how to keep genetic data and medical records secure while still enabling the massive, cloud-based analyses needed to make meaningful associations. Now, tests of an emerging form of data encryption suggest that the dilemma can be solved.
At a workshop on 16 March hosted by the University of California, San Diego (UCSD), cryptographers analysed test genetic data. Working with small data sets, and using a method known as homomorphic encryption, they could find disease-associated gene variants in about ten minutes. Despite the fact that computers were still kept bogged down for hours by more-realistic tasks such as finding a disease-linked variant in a stretch of DNA a few hundred-thousandths the size of the whole genome experts in cryptography were encouraged.
This is a promising result, says Xiaoqian Jiang, a computer scientist at UCSD who helped to set up the workshop. But challenges still exist in scaling it up.
Physicians and researchers think that understanding how genes influence disease will require genetic and health data to be collected from millions of people. They have already started planning projects, such as US President Barack Obamas Precision Medicine Initiative and Britains 100,000 Genomes Project. Such a massive task will probably require harnessing the processing power of networked cloud computers, but online security breaches in the past few years illustrate the dangers of entrusting huge, sensitive data sets to the cloud. Administrators at the US National Institutes of Healths database of Genotypes and Phenotypes (dbGaP), a catalogue of genetic and medical data, are so concerned about security that they forbid users of the data from storing it on computers that are directly connected to the Internet.
Homomorphic encryption could address those fears by allowing researchers to deposit only a mathematically scrambled, or encrypted, form of data in the cloud. It involves encrypting data on a local computer, then uploading that scrambled data to the cloud. Computations on the encrypted data are performed in the cloud and an encrypted result is then sent back to a local computer, which decrypts the answer. If would-be thieves were to intercept the encrypted data at any point along the way, the underlying data would remain safe.
If we can show that these techniques work, then it will give increased reassurance that this high-volume data will be computed on and stored in a way that protects individual privacy, says Lucila Ohno-Machado, a computer scientist at UCSD and a workshop organizer.
Homomorphic data encryption, first proposed in 1978, differs from other types of encryption in that it would allow the cloud to manipulate scrambled data in essence, the cloud would never actually see the numbers it was working with. And, unlike other encryption schemes, it would give the same result as calculations on unencrypted data.
But it remained largely a theoretical concept until 2009, when cryptographer Craig Gentry at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, proved that it was possible to carry out almost any type of computation on homomorphically encrypted data. This was done by transforming each data point into a piece of encrypted information, or ciphertext, that was larger and more complex than the original bit of data. A single bit of unencrypted data would become encrypted into a ciphertext of a few megabytes the size of a digital photograph. It was a breakthrough, but calculations could take 14 orders of magnitude as long as working on unencrypted data. Gentry had rendered the approach possible, but it remained impractical.
Read this article:
Extreme cryptography paves way to personalized medicine