Data Submission Guidelines

advertisement

D ATA S UBMISSION G UIDELINES

To submit data, please contact data@niagads.org with the required documentation. Please use the following guidelines when submitting data to NIAGADS.

Required documentation for all data submissions:

NOTE: All documents related to the application should be provided in English. For institutions where

English is not the primary language, please provide translations of documents along with the original document. Translated documents should be signed by the institutional signing official.

NIA AD Genetics Sharing Plan

Please include the NIA AD Genetics Sharing Plan , signed by both the PI and a supervisor with signatory authority for the PI’s institution.

Informed Consent Form(s) and IRB approved Consent Levels

Please provide the IRB-approved informed consent form(s) that are in compliance with the

NIH Genomic Sharing Policy ( http://gds.nih.gov/ ) in PDF format as well as IRB approved consent levels according to the Consent Level Guidelines document.

Institutional Certification

A signing official from the investigator’s institution must provide an Institutional Certification

Document .

Our submission process parallels that of dbGaP, so submitters of data are asked to use the same document.

Phenotype Data File

Please use tab-delimited plain text (.txt) or excel (.xls/.xlsx) file formats along with a data dictionary listing each variable and their description.

Pedigree Data File

Please use tab-delimited plain text (.txt) or excel (.xls/.xlsx) file formats following the standard pedigree file format.

Use standard labels for the following fields: FAMID (family ID), SUBJID (subject ID),

FATHER (father ID), MOTHER (mother ID), SEX (1 for male and 2 for female).

Example:

FAMID SUBJID FATHER MOTHER SEX

100 1 0 0 1 45 0

100 2 0 0 2 43 0

100 3 1 2 1 12 0

100 4 1 2 1 10 0

Note: If you decide to use alternative genotype format (see below) to store your genotype data

NIAGADS Data Request Guidelines

Last Update: 4/15/2020

Page 1 of 4

D ATA S UBMISSION G UIDELINES and the file also contains pedigree information (the five columns above), the pedigree data does not need to be saved in a separate file as required here.

README file

This file should contain citations and abstracts of publications that describe the study, a concise description of study design, final subject and marker count, platform, list of included files and formats, and primary contact information. Please use plain text (.txt), PDF (.pdf), or

Microsoft Word (.doc or .docx).

If you are submitting Polymorphism Genotyping data:

APOE Genotypes

When available, please provide the APOE genotype. Describe the lab(s) that performed genotyping and the genotyping methodology in the README file mentioned above.

Preferred Format

Computer files containing genotype or genetic mapping data should be plain text files in the genetic pedigree file format. We ask the contributor(s) to use either the PLINK (.ped and

.map files) or MERLIN pedigree formats (.ped, .map, and .dat files). For more detailed definition please refer to the following two URLs or the example file formats listed below.

PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

MERLIN: http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html

There must be separate documentation (preferably in the README file) that clearly explains the format used for the files mentioned above. For PLINK pedigree format, including a URL to the definition of the file format is enough; for MERLIN pedigree format, both a detailed definition of the fields and a URL to the format definition should be included. The columns in each file should be listed and explained. Also, there should be some summary statistics, such as the number of individuals, number of markers, and so on. If there was some system used to divide the genotyping into "plates", etc., this system should be explained. For example, if the genotype files are named "plate1, plate2...,” this system should be explained.

Loci Labeling

Microsatellite markers, SNPs, genes, etc, should be labeled with the common usage employed at NCBI. For microsatellites, use the "DnSmmmm" format so that the marker appears in the

Marshfield or deCODE maps. For SNPs, use dbSNP rs numbers, and not ss numbers. For genes, use the official NCBI Entrez Gene name and numerical gene ID, not an alias. For example, DRD2 is an official name and has aliases D2DR, D2R, etc.

Alternative Genotype Format

If it is difficult to format data into the formats above, data sets can be formatted as

NIAGADS Data Request Guidelines

Last Update: 4/15/2020

Page 2 of 4

D ATA S UBMISSION G UIDELINES additional plain text files according to the following guidelines:

● Data should be formatted as tables in plain text, using space or tab to separate records.

● Include a line at the top of each file that indicates the labels of the columns. These labels should begin with letters and not contain spaces (i.e., standard rules for the definition of variables in computer programs). The field labels should be distinct and case insensitive (e.g., SEX, Sex, and sex are considered to be the same).

● Use standard labels for the following fields: FAMID (family ID), SUBJID (subject ID),

FATHER (father ID), MOTHER (mother ID), SEX (1 for male and 2 for female), AGE, DX

(for diagnosis: 1 for control, 2 for case).

Example:

FAMID SUBJID FATHER MOTHER SEX AGE DX RS1001 RS1002

100 1 0 0 1 45 0 A/T A/A

100 2 0 0 2 43 0 A/A C/C

100 3 1 2 1 12 0 A/T A/C

100 4 1 2 1 10 0 A/T A/C

If you are submitting Next Generation Sequencing (NGS) data:

Required Files:

Called reads prior to quality control in FASTQ format, compressed using the gzip or bzip2 program

Mapped reads in BAM format (see SAMtools, http://samtools.sourceforge.net

).

For studies focusing on genetic variants (e.g. genomic resequencing): called variants in the

Variant Call Format ( VCF ).

For read abundance studies (e.g. RNA-seq for gene expression profiling, ChIP-seq), provide summaries in tab-separated file format with explanations.

Called reads prior to quality control in FASTQ format, compressed using the gzip or bzip2 program.

NIAGADS Data Request Guidelines

Last Update: 4/15/2020

Page 3 of 4

D ATA S UBMISSION G UIDELINES

Additional Relevant Information

The contributor(s) should also provide additional information to facilitate future analysis and enable replication of primary findings by other researchers:

● Sequencer information (technology, machine type, version, protocol)

● How the FASTQ files are generated (i.e. pipeline version, base calling software, settings).

● How the BAM files are generated (i.e. workflow, read alignment software, parameters, reference genome, quality control parameters).

● How the VCF files are generated (i.e. call program, stringency settings).

● For targeted enrichment sequencing and whole exome sequencing, provide information on the enrichment target regions using the UCSC Genome Browser BED format (see http://genome.ucsc.edu/FAQ/FAQformat.html

). The genomic coordinates should be based on the reference genome version used for read alignment.

Note how the coordinates are defined (using “zero-based” coordinate system, and the ending position does not include the rightmost basepair in the interval) in BED files.

● For RNA-seq experiments, how the transcript-level summaries are generated (i.e. statistical/computational steps to generate the summaries, transcript and isoform assemblers, software used).

● For ChIP-seq experiments, how the summaries are generated (i.e. peak caller software).

● Other information that the submitter deems relevant.

Other considerations:

Family and Individual Identifiers

If using dashes to represent different identifiers of an individual, please explain the different parts of the ID. Example: individual "SJ-12321-1". Do not use identification numbers with leading zeros to avoid issues with some computer programs such as Excel, which may convert ID text into numerals without notice.

Information for the reference genome

If a genetic or physical map is supplied, please explain the source. For example, "NCBI physical map build 34.3".

Be consistent

Make sure all files use exactly the same system. We noticed in the past that this rule is typically violated when data are sent at different time points. Please make sure subsequent submissions use the same format as the original data.

If you have any further questions about data submission or would like to submit data, please contact data@niagads.org.

NIAGADS Data Request Guidelines

Last Update: 4/15/2020

Page 4 of 4

Download