Data Processing/Generation

Data processing overview

Data processing by the project team

Depending on their funded aims, teams might have special data workflows, such as:

  • Generating sequencing data and deriving data with multiple variant calling pipelines
  • Producing high-resolution images and creating summary features from images
  • Combining different types of data

The anticipated workflow should helpfully be discussed as part of the onboarding, especially if it is complicated or outside of what is typical. Teams should provide information or other documentation of their workflow.

By having information on the data-generating process, the DCC can better work with each team to answer the questions:

  • What are the different forms of data that will be generated -- how to optimally intake and manage the data artifacts from this workflow?
  • Are there recommendations that can be suggested for this workflow to avoid potential problems downstream?
  • What other resources can the DCC provide if possible?

Data processing by the DCC

Data uploaded to Synapse may also be processed by the DCC for:

  • Quality control
  • File format conversions
  • Other data transformations to allow data to be loaded and shared in cBioPortal/other analysis application