EXAM2015

advertisement
Exam Questions
1) Why do we use divergent crosses for QTL analyses, i.e. crosses between
breeds or lines that are very different for our trait of interest? (2 points)
2) We have identified a QTL affecting aggression in dogs. We have
sequenced a region of 1 Mb underlying the QTL in 10 aggressive and 10
non-aggressive dogs. Sequencing of this region has identified hundreds of
SNPs that differ between the QTL genotypes. How can we prioritize the
SNPs? (4 points)
3) Describe briefly the steps to perform a genome-wide association analysis
with PLINK (4p).
4) Is the consensus important in secondary structure prediction? Explain. (2
points)
5) Analyze the following protein sequence. What is the function? Do you
have an idea of the structure? Give the evidences that you find (patterns,
profiles, structure predictions…)
>prot_A
YLSNHNYVHRDLAARNILVNQNLCCKVSDFGLTRLLDDFDGTYETQGGKIPIR
WTAPEAIAHRIFTTASDVWSFGIVMWE
VLSFGDKPYGEMSNQEVMKSIEDGYRLPPPVDCPAPLYELMKNCWAYDRARR
PHFQKLQAHLEQLLANPHSLRTIANFD
(4 points)
6) What does ‘–r ’ option mean in cp (copy) command.
a- Which other command does also use this option?
(1pt)
b- Explain what the following command does.
grep –l ^Bio > file.txt
(1 pt)
7) Write commands that do the following:
a. Prints first 5 lines of the file long.seq on the screen
(1 pt)
b. Does a case insensitive search for the string "length" in all files in the
current directory.
(1 pt)
c. Puts the first 7 lines of the file long.seq into a file called first-and-last.txt
(1 pt)
d. Puts last 7 lines of the file long.seq into a file called first-and-last.txt (1 pt)
8) After performing a sequencing run using an Illumina NGS instrument, you
decide to assess the quality of the generated data using FastQC. This is the
resulting plot for “quality scores across all bases”:
You need the read data to have as few errors as possible for a downstream
analysis.
What is the problem with the data as it is, and what would you do to fix this?
(2p)
9) You have been tasked with mapping a set of NGS reads generated from
genomic DNA against a reference sequence.
The read data consists of a large number of entries like this:
a) What is the file format of the read data? (1 p)
The reference sequence was from a different individual of the same species as
the sequenced individual. Part of the mapping was visualized in Tablet:
b) What term is commonly used for the total number of blue lines (i.e. reads,
marker 1 in the figure) that are stacked under a given position in the reference
sequence? (1 p)
c) Suggest explanations for the phenomena observed at markers 2 and 3. (2 p)
10) Your PI wants you to sequence the parasite genome of Babesia microti. Do a
quick literature search, establish the genome size and familiarise yourself with
the layout of the genome. Answer these four questions (0,5p each):
1) What’s the genome size estimated to be?
2) How is the genome arranged; Number of chromosomes, Number of genes,
average length of a gene and G/C content
3) Is the genome previously sequenced? If so, what’s your proposed
methodology for the sequencing experiment?
4) Given your chosen methodology quickly describe the initial pipeline for
analysing the data (expect raw fastq data from sequencing centre)
11) Study the picture below, it’s an output graph from prinseq a sequence data
QC software suite.
Two datasets are shown, Input data should resemble each other (same
technology used and same preparation of library) what’s the possible cause of
the difference between the datasets? (1,5p)
12) Which type of job usually pass through the job queue on a cluster the fastest,
and why?
a) a job booked for 2 days on a whole node. (1P)
b) a job booked for 5 hours on a single core. (1P)
13) Describe the concept and purpose with the CRAM format. (2P)
14) Describe ways how you can improve the statistical detection of differentially
expressed (DE) genes in RNAseq data. What is the most important thing? What
can you do when planning the experiment? What should you take into account
when choosing the DE analysis algorithm? (4P)
15) Use the following chromosomal sequence and answer to the following
questions (explain how you did):
a) Determine the species this coming from (1P)
b) Translate the coding part (OBS, think exon-introns)(3 P)
c) Any know disease in any species that is know to be connected to this gene?
(2P)
>chromosomal DNA
GGCACTCTTCCCACCTAGAAGCGGCTCCTCGCGCTCCTTCTGGAACCTCTGTCAGGTT
CGGCCTCCTCGCCTCCACTCCAGCCTCCACCATGTCCATCAGGGTGACCCAGAAGTCC
TACAAGATGTCCACCTCCAGCCCCCGGGCCTTCAGCAGCCGCTCCTACACGAGCGGGC
CCAGCTCCCGCATCAGCTCCTCCGCCTTCTCCCGGGTGGGCAGCAGCAGCGGCAGCTT
CCGGGGTGGCCTGAACAGCAGCATGAGTGTGGTCGGGGGCTACGGCGGGCCCGGGGT
CGTGGGGAGCATCACGGCCGTCTCAGTGAACCAGAGCCTGCTGAACCCCCTGAAGCTG
GAGGTGGACCCCAACATCCAGGCGGTGCGCACCCAGGAGAAGGAGCAGATCAAGAGC
CTCAACAACAAGTTTGCCTCCTTCATCGACAAGGTGAGCCCCCCACCCTCCCCCGCGG
GGCGGGCAGTGCCTGGGG CTGGCGAGGGGCTCCGCCTGTGTCTTGGTGGCC
Download