Sweep Commands - Broad Institute

advertisement
Sweep Commands
To run a command, go to the Temp/ folder and type
../Other/Ilya_Other/sweep/scripts/run-sweep <command name> [<arguments>]
AnalyzeCores
LRH Test
./run AnalyzeCores [<core options>] [<lrh test options>]
[--background <file>] <project> <outfile> (<pop> <chr>)*
Runs the LRH test on all the SNPs of chromosome <chr> genotyped in population
<pop>, using the data in <project>. More than one pop/chr pair can be specified; if none
are given, reads a tab-separated list of pop/chr pairs from standard input. Outputs its
results to <outfile>. Ignores alleles with frequencies < 5% or > 95%.
Core Options (defaults in bold, ‘include’ options have ‘exclude’ analogs):
 --core-window-size N: (e.g., N = 1000000) Maximum number of bases to
examine in each direction of the core. (Normally other criteria will be limiting –
e.g. EHH will drop below the 0.04 threshold – but this hard limit is set just in
case).
 --include-mono-core-snps: (default: true) Include monomorphic SNPs in cores
 --include-multi-core-snps: (default: false) Include multi-allelic SNPs in cores
(incompatible with Gabriel et al core selection)
 --include-mono-side-snps: (default: true) Include monomorphic SNPs on both
sides of the core in the analysis
 --include-multi-side-snps: (default: true) Include multi-allelic SNPs on both sides
of the core in the analysis
 --core-size MIN-MAX or --core-size N: (e.g., --core-size 3-10) Core size limits
 --match-side-snp-density N bases/kb/mb: (default: don’t match) Pick side SNPs
at a density of 1 every N bases/kb/mb
 --match-side-snp-density X cM: (default: don’t match) Pick side SNPs at a
density of 1 every X cM
 --dont-match-side-snp-density: (default: don’t match) Don't filter side SNPs to
attain a fixed density
LRH Test Options (defaults in bold):
 --match-markers-at (AllEHH X | distance X (bases | Kb | Mb | cM) | EHHBar X),
(e.g. --match-markers-at AllEHH 0.04): For each core allele, select representative
markers when AllEHH = approx X, or when SNP is at X bases/Kb/Mb/cM from
core, or when EHHBar = approx X. Actual marker AllEHH/distance/EHHBar
has to be within 25% of X
Other options:

--background <file>: Annotate ln(EHH) and ln(REHH) scores with significance
using the given background file (see below)
CalcLRHBackground
LRH Test Background
./run CalcLRHBackground <bin_count> <lrh_file_1> ...
<lrh_file_N> <out_background_file>
Measures mean and std. dev. of ln(EHH) and ln(REHH) scores in the given <lrh_files>
grouped into <bin_count> allele frequency bins, and outputs the results into
<out_background_file>. Excludes alleles that have REHH of 0 or 100, since these are
unreliable.
CalcLRHSignificance
LRH Test Significance Annotator
./run CalcLRHSignificance <background_file> <lrh_file_1> ...
<lrh_file_N> <out_sig_file>
Rescales all the ln(EHH) and ln(REHH) scores in the <lrh_files> according to the
background file so as to have zero mean and unit variance within small allele frequency
bins. Outputs the results to <out_sig_file>.
iHS
iHS Test
./run iHS [<ihs test options>] <project> <outfile> (<pop> <chr>)*
Runs the iHS test on all the biallelic SNPs with ancestral data (except if --use-minorallele-freq is used) of chromosome <chr> genotyped in population <pop>, using the data
in <project>. More than one pop/chr pair can be specified; if none are given, reads a tabseparated list of pop/chr pairs from standard input. Outputs its results to <outfile>.
Ignores core SNPs with frequencies < 5% or > 95%. The iHH integral centered around
each SNP is performed in three ways: from the SNP to the left side, from the SNP to the
right side and from the SNP extending in both directions.
Options (defaults in bold):
 --integrate-to EHH X: (e.g. --integrate-to EHH 0.05) Set upper bound of iHH
integral (both ancestral and derived) to the point where the EHH drops to X.
 --integrate-from AllEHH X: (e.g. --integrate-from AllEHH 1.0) Set lower bound
of iHH integral (both ancestral and derived) to the point where AllEHH is X.
 --integrate-wrt cM | bases: Set “x”-axis with respect to which to integrate.
 --allow-integrate-to-edge or --disallow-integrate-to-edge: If EHH doesn’t drop to
the level specified by --integrate-to, simply integrate as far out as possible; this is


useful for simulations, where the 1MB simulation window size is comparable to
the length over which EHH decays to 0.05.
--use-ancestral-allele-freq: Report the frequency of the ancestral allele; the
frequency is used by CalcIHSBackground and CalcIHSSignificance to see what
bin to place data in.
--use-minor-allele-freq: Report the frequency of the ancestral allele; the frequency
is used by CalcIHSBackground and CalcIHSSignificance to see what bin to place
data in. iHH_A refers to the minor allele, while iHH_D refers to the major one.
Core SNPs without ancestral data can thus be assigned iHS values.
CalcIHSBackground
iHS Test Background
./run CalcIHSBackground [--one-sided | --two-sided] <bin_count>
<ihs_file_1> ... <ihs_file_N> <out_background_file>
Measures mean and std. dev. of either one-sided or two-sided unstandardised iHS scores
in the given <ihs_files> grouped into <bin_count> allele frequency bins, and outputs the
results into <out_background_file>.
CalcIHSSignificance
iHS Test Significance Annotator
./run CalcIHSSignificance <background_file> <ihs_file_1> ...
<ihs_file_N> <out_sig_file>
Rescales all the unstandardised iHS scores in the <ihs_files> according to the background
file so as to have zero mean and unit variance within small allele frequency bins. Outputs
the results to <out_sig_file>.
CrossPopAllEHH
XPop Test
./run CrossPopAllEHH [<options>] <project> <outfile>
(<pop1> <pop2> <chr>)*
Runs the XPop test on all the SNPs of chromosome <chr> genotyped in both population
<pop1> and <pop2>, using the data in <project>. More than one pop1/pop2/chr trio can
be specified; if none are given, reads a tab-separated list of pop1/pop2/chr trios from
standard input. Outputs its results to <outfile>. Ignores alleles with frequencies < 5% or
> 95%.
Options:
 --extends-to AllEHH X: (e.g. --extends-to AllEHH 0.04) Set the right bound of
integration to the point where the pop1+pop2 AllEHH drops to X

--integrate-wrt cM | distance | delta: Set “x”-axis with respect to which to
integrate; the ‘delta’ options stands for ‘instead of integrating EHH, simply report
the value of EHH for each population at the integration right bound.’
CalcXPopBackground
XPop Test Background
./run CalcXPopBackground <xpop_file_1> ... <xpop_file_N>
<out_background_file>
Measures mean and std. dev. of the AllEHH integral logratios in the given <xpop_files>.
Outputs the results into <out_background_file>.
CalcXPopSignificance
XPop Test Significance Annotator
./run CalcXPopSignificance <background_file> <xpop_file_1> ...
<xpop_file_N> <out_sig_file>
Rescales all the AllEHH integral logratios in the <xpop_files> according to the
background file so as to have zero mean and unit variance within small allele frequency
bins. Outputs the results to <out_sig_file>.
ExportGenes
Find the RefSeq genes in certain genome windows
./run ExportGenes <project> <species> <baseIndiv> <pos_file>
Using the species-wide data in <project> (or downloading the data from the UCSC
genome website if missing), finds the genes of <species> and <baseIndiv> (e.g. “human”
and “hg17”) in the given genome regions. The <pos_file> should have a header line
followed by tab-separated lines. The first three columns should be Chromosome / Start /
Stop. The program writes its results in an extra final column called “Genes in region”.
ExportSnpPhase
Export the SNP data of a certain genome region in Sweep1’s .snp and .phase format
./run ExportSnpPhase <project> <pop> <chr> <minPos> <maxPos>
Exports the genotyped data for <pop> stored in <project>, in the specified region, to the
files “region.snp” and “region.phase”.
ExportLRHFor
Export EHH / EHHBar / REHH decay curves for a particular SNP
./run ExportLRHFor <project> <pop> <chr> <core_pos>
Export the EHH / EHHBar / REHH decay curves extending to 1MB on both sides for the
SNP in chromosome <chr>, position <core_pos>, using the genotype data for population
<pop> stored in <project>.
CreateCompoundPopulation
DumpMarkerH
ExtractData
Import
ImportHapMapAncestral
ImportSteveSims
ListCores
LocateSelection
Download