Data Sharing: Lessons from the Cancer Moonshot

The five-year anniversary of the funding of the Cancer Moonshot is coming up this December. As described in the archived White House website describing the moonshot, “Here’s the ultimate goal: to make a decade’s worth of advances in cancer prevention, diagnosis, and treatment, in five years.” The goal was to get done in five years what would normally take ten years. A critical element of the strategy was to share cancer-related data to accelerate research.

One of the success stories of the Cancer Moonshot is the BloodPAC Consortium, whose mission is to accelerate the development, validation, and accessibility of liquid biopsies to improve the outcomes of patients with cancer. The BloodPAC Consortium is now an independent 501(c)(3) organization with over 50 consortium members. The consortium operates the BloodPAC Data Commons to support data sharing among its members and with the liquid biopsy research and development community.

Another Cancer Moonshot success story is the NCI Genomic Data Commons (GDC), which provides a home for cancer genomics data and is used by over 100,000 researchers each year. Processing cancer genomics data to understand mutations is complex. One of the benefits of hosting your data in the GDC is that all the data is processed uniformly with a common set of bioinformatics pipelines, which significantly simplifies the use of the data by the research community.

Both these projects leverage data commons for data sharing. A data commons is a software platform for exploring, analyzing, and sharing data in a secure and compliant manner. In other words, a data commons is an engine for data sharing that can accelerate research. It is one of the technologies that can make it possible to get done in five years what would normally take ten years.

Other cancer data sharing success stories include the American Association for Cancer Research (AACR) Project GENIE as well as the American Society of Clinical Oncology (ASCO) CancerLinQ. Both of these projects provide large-scale data sharing that has accelerated cancer research.

A long way to go

Unfortunately, these four success stories don’t tell the whole story, and the lack of robust data sharing is still holding back research discoveries that can benefit patients. Perhaps the most important question that we can ask is: what can change?

First, all research funders can require and enforce data sharing and fund data commons and other infrastructure that are necessary to support data sharing. Federally funded research has taken some important steps in this direction, but philanthropically funded research still has a long way to go.

Second, research projects can use data sharing technologies that enable patients to share their data directly. A good example of this technology is the CMS Blue Button. For this to work, healthcare providers must make it easier for patients to share their data using these types of technologies.

Third, we can develop data ecosystems that link together data from different medical research centers and support federated learning. With federated learning, data remains in place and the computation is sent to the data, with the results returned. This way, research data that cannot be shared easily with others due to privacy and other concerns can stay within the security and compliance boundaries of the healthcare provider.

Sharing data during the pandemic was bumpy at best. Having a persistent infrastructure in place that includes data commons at medical research centers or supporting geographical regions would provide a good foundation for biomedical research in general, as well as for the type of surge research that is needed in times of crisis, such as a pandemic.

Data Sharing: Lessons from the Cancer Moonshot

Data sharing success stories

A long way to go