„Omics“ Module Exam Date: Friday 7th, 2014 Time: 10.15h Location

advertisement
„Omics“ Module Exam
Date:
Friday 7th, 2014
Time:
10.15h
Location:
Seminar room 0.34, Genetics building Zülpicher Str. 47a
Duration
120 min
Maximum points: 60 points, 2/3 of the final grade (seminar: max 30 pts)
Pass mark: 50% of the max total points = 45 pts
Additional remarks: This is a written exam. No computers allowed, no calculators required.
Topics covered in the exam
(Achim Tresch)
Elementary Statistics









Data description (measures of location and scale, quantiles), graphical representation of
continuous (univariate and bivariate)data, Boxplots, Scatterplots, MA-Plots)
Measures of dependence (Pearson-, Spearmankorrelation – which is preferable under which
condition?)
k-means clustering (2-step procedure, center calculation, assignment of points to centers)
Testing: decision boundary, false positives, false negatives, p-value, significance of a test, null
hypothesis, acceptance/rejection region
Tests for location: t-Test vs. Wilcoxon rank sum test (which is preferable under which
condition?)
(Fisher test, Chi-squared test will be covered in Kay Hofmann’s part)
Multiple testing (why, Bonferroni procedure)
Classification: test error, training error, overfitting, bias-variance tradeoff, Linear classifiers
Cross validation (how does it work, purpose)
Sequencing:






systematic and random errors in NGS: Mappability of reads, sequence (GC) bias, positional
bias (chromatin structure), …
Sequencing techniques: Sanger, Illumina bridge amplification + sequencing by synthesis,third
generation sequencing (Pacific Biosciences)
advantages of paired-end reads
Normalization procedures for gene expression data: Housekeeping-, Spike-ins, VSN, Quantile
Normalization, Lowess (basic idea behind quantile normalization)
Statistics for RNA-Seq data: RPKM values
Mapping: Hash tables (how do they work? key, index (hash value), collisions)








NOT covered in the exam: suffix arrays, Burrows wheeler transform
ChIP-Seq: How does it work?
Peak detection: Poisson distribution, “Peak score” = (log) likelihood ratio (NOT: negative
binomial distribution)
NOT: motif search
Hidden Markov Models: What do they learn (what are the parameters)? What do they
output (Viterbi path)?
Epigenetics: bisulfite sequencing, mapping strategies for mapping bislufite converted reads
to a reference sequence
lollipop plot, methylation rate calculation
biological role of DNA methylation? (correlation with chromatin structure and histone
modifications, role in differentiation, aging and degeneration)
(Kay Hofmann)
Generally no calculations. Questions on the problems, failures of a method. Finding errors in
statements.
Proteomics








Tandem MS: technology
protein identification – what can go wrong?
quantification of MS data: SILAC, iTRAC, label-free methods
post-translational modifications: which, how can they be identified?
protein-protein interactions: Large scale analysis methods of p-p interactions (Y2H)
Gene set enrichment: Gene Ontology? DAG? Statistical tests for gene set enrichment– Fisher
test
Which databases are there for which purposes (which database contains which information)?
protein function prediction, guilt by association
(Andreas Beyer)
Interaction networks, integrative Data Analysis







Epistasis: definition + meaning
eMAP: methods for finding interactions (experiment + computation)
ANOVA + linear models (NO formulas, however: ability to transform an experimental design
into a linear model)
allele incompatibility: definition
Classification: random forests – advantages / problems relative to e.g. linear models
QTLs
NOT: Boolean algebra
(Thomas Wiehe)
Genomics






Coverage
Genome assembly – problems
Alignments
gene prediction (using HMMs) (no concrete calculations)
Sequencing and Assembling of genomes
shotgun sequencing
Genome assembly


Overlap alignment (theory - how to calculate an alignment; alignment matrix), tiling paths,
coverage
Theory: connection between sequencing effort (number of fragments) and coverage
Genome annotation




Aequence signals (donor, acceptor, etc)
Sequence logos; information content
Accuracy measurements of gene predictions
What other elements, besides protein coding genes, belong to the 'annotation' of a genome?
Metagenomics



Concept and scope
What kind of data are produced and analyzed
How to assign genomes to 'species'. Phylogeny mapping (placing a sequence into a
phylogeny).
Download