STATA_BC_PLINK.RJLA.NOV2007

advertisement
BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS
DATA:
DATA MANAGEMENT AND ANALYSIS
RICHARD ANNEY
NEUROPSYCHIATRIC GENETICS RESEARCH GROUP
WORKSHEET, TUTORIALS AND SLIDES AVAILABLE ON
P:\Personal Folders\anneyr\stata9\talk
http://www.medicine.tcd.ie/psychiatry/research/neuropsychiatry/
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
Overview
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9
• A STATISTICAL SOFTWARE PACKAGE
• LESS PRETTY THAN SPSS GUI
• POWERFUL AND “SCRIPT” FRIENDLY
• LESS CLICKING AND DROP-DOWN …MORE SCRIPTING
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: SET UP FOLDER STRUCTURE
• SET UP FOLDERS TO STORE
YOUR;
• DO-FILES
• CR FILE
• AN FILE
• DTA-FILES
• LOG-FILES
• INPUT-FILES (TXT)
• OUTPUT-FILES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
• HOW DO I GET FILES INTO STATA?
• HOW DO I MERGE MY DATA WITH ANOTHER FILE?
• CAN I GENERATE A FEW BASIC STATISTICS ON MY
MARKERS?
• CAN I PERFORM A CASE-CONTROL STUDY?
• IS MY QUANTITATIVE VARIABLE ASSOCIATED WITH A
GENOTYPE?
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! MAIN WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! DO-WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! MAIN WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! DTA-EDITOR WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1.
ADDING TAB-TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1.
ADDING TAB-TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1.
ADDING TAB-TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1.
ADDING TAB-TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
• THE COMBINED *.DTA FILE
• THE TABULATE FUNCTION
• 1= ONLY IN 1st FILE
• 2=ONLY IN 2nd FILE
• 3=IN BOTH 1st & 2nd FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1.
ADDING TAB-TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
an00 genotype_qtlsnp.do
• CREATING THE LOG FILE USING
THE LOG COMMAND
• OPENING THE *.DTA FILE USING
THE USE COMMAND
• CREATING GENOTYPE
VARIABLES FROM ALLELE
VARIABLES USING GTYPE
PROTOCOL
• TABULATE THE GENOTYPE
VARIABLES USING THE
TABULATE COMMAND
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST HWE USING GTAB
COMMAND
2. TEST HWE USING GENHW
COMMAND
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST PAIR-WISE LINKAGE
DISEQUILIBRIUM USING
PWLD COMMAND
2. TEST ASSOCIATION WITH
BINARY TRAIT USING
GENCC COMMAND
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
• QTLSNP COMMAND MODELS
• CODOMINANT (THREE MODELS)
• DOMINANT
• RECESSIVE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST WHETHER A
QUANTITATIVE VARIABLE IS
ASSOCIATED WITH
DIFFERENT INHERITENCE
MODELS USING QTLSNP
COMMAND - CODOMINANT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST WHETHER A
QUANTITATIVE VARIABLE
IS ASSOCIATED WITH
DIFFERENT INHERITENCE
MODELS USING QTLSNP
COMMAND – DOMINANT
2. NOT ASSOCIATED SO
MINIMAL OUTPUT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST WHETHER A
QUANTITATIVE VARIABLE
IS ASSOCIATED WITH
DIFFERENT INHERITENCE
MODELS USING QTLSNP
COMMAND - RECESSIVE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1:
BASIC CASE-CONTROL ASSOCIATION STUDY
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax©
• DATABASE AND ANALYSIS PLATFORM
• MASTER DATABASE FOR STORING ALL OUR
“MASTER” GENETIC AND PHENOTYPE DATASETS
• ONGOING PROCESS TO UPLOAD AND MANAGE DATA
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA)
• TWO EXAMPLES
1. BASIC EXCEL FILE
2. TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
BASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN PED AFF-FILE (VIA STATA):
BASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
BASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
BASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
BASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):
TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Types of Analysis
•
•
QUALITY
•
CASE-CONTROL
•
PED-CHECK
• ALLELE ASSOCIATION
•
MERLIN
• MENDEL
•
BASIC MEASURES (MAF, HWE, CALL)
• PHASE
FAMILY-BASED
• SNPHAP
•
MENDEL
• PLINK
•
MERLIN
• R-PACKAGE
•
GENEHUNTER
•
SIMWALK
•
FBAT/PBAT
•
TRANSMIT
•
QTDT
•
PLINK
•
HAPLOVIEW
•
R-PACKAGE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Types of Analysis
• FOR MOST ANALYSIS YOU NEED TO SELECT MATCHED
• GEN
• PED
• MAP – b128 NOW UPLOADED
• AFF
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… GETTING STARTED
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…
• RUNNING PLINK FROM YOUR OWN COMPUTER
• WHY?
1.
2.
3.
4.
5.
6.
7.
MULTIPLE ANALYSES
KEEP A RECORD OF YOUR WORK IN BAT AND SCRPT
EASE OF USE
EASE OF REPEATING TASK
SCRIPTS NOT DROP DOWN MENUS
RUNNING >1 CHROMOSOME (BC|SNPmax ADDRESSED)
POST-ANALYSIS INTERGRATION USING PERL AND
STATA
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…
• FOLDER STRUCTURE
• ANALYSIS
• DATASET
• OUTPUT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… DATASETS
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
EXAMPLE ANALYSES IN PLINK…
•
FAMILY-BASED
•
DATA TRANSFORMATION
•
DATA FILTERING AND PRUNING
• TDT
•
DATA MERGING
• POO
•
SUMMARY STATS
•
PERMUTATION
• MISSINGNESS
•
EPISTASIS
• HWE
•
HAPLOTYPE ANALYSIS
• MAF
•
NEW PROXY-ASSOCIATION (FROM SNP TO
HAPLOTYPE)
•
R-PACKAGE
•
NEW MODIFY OUTPUT
• MENDEL ERRORS
•
INCLUSION THRESHOLDS
•
POPULATION STRATIFICATION
•
ASSOCIATION
• CASE/CONTROL
• QTL
• GxE
•
NEW MULTIPLE CORRECTION TESTING (--adjust)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
• PLOG10
• P<x
• GENOMIC CONTROL
• QQ-PLOT
PLINK… : RUNNING TDT IN PLINK
• CAN RUN FROM COMMAND LINE
AND USING gPLINK (GUI)
• RECOMMEND BAT AND SCRPT FILES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… : SUMMARY TABLES IN STATA
• INSHEET THE TDT.CLEAN FILE
• ADD GENE NAMES
• ADD CHROMOSOME POSITION
• ADJUST OR TO RISK
• GENERATE GRAPHS OF DATA
• GENERATE TABLES BY GENE
• GENERATE TABLES BY POSITION
• GENERATE TABLES BY P-VALUE
• SELECT COLUMNS FOR OTHER
ANALYSES (GENMAPP)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
THE END!
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
Download