Prof. Kamakaka's Lecture 14 Notes

advertisement

Studying differences/similarities in Individuals

Methods used to study differences between individuals

RFLP

SNP

DNA Repeats

1

Genetic polymorphism

•Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations.

Genetic mutations are a kind of genetic polymorphism.

Genetic Variation

Single nucleotide

Polymorphism

(point mutation)

Repeat heterogeneity

2

Repeats and DNA fingerprint

Variation between people- small DNA change – a single nucleotide polymorphism [SNP] – in a target site,

RFLPs and point mutations are proof of variation at the DNA level.

Satellite sequences: a short sequence of DNA repeated many times.

Interspersed tandem

Chr1

Chr2

3

Repeats

Micro-satellite

Extremely small repeats (2-10bp)- GCGCGCGC

AGCAGCAGCAGC

Mini-satellite- larger repeats (20-100 bp long)

Repeats can be dispersed or in tandem-

E

2

E

5

E

6

E

Chr1

E

3

E

1

E E

0.5

Chr2

4

Mini Satellite Repeats and Blots

Mini Satellite sequences: a short sequence (20-100bp long) of

DNA repeated many times

E

2

E

5

E

3

E

1

E

E

6

E

Chr1

E

0.5

Chr2

5

3

1

Take Genomic DNA

Digest with EcoRI

Probe southern blot with repeat probe

5

Repeat expansion/contraction

Tandem repeats expand and contract during recombination.

Mistakes in pairing leads to changes in tandem repeat numbers

These can be detected by Southern blotting because as the number of repeats expand at a specific site, the restriction fragment at that site expands in size

An allele of a mini-satellite varies by the number of repeats

One repeat to many repeats- (varying in length from 0.5 to 20 kb)

Chr1 Individual 1

2

E E

Chr1 Individual 2

3

E E

5

3

1

There are on average between 2 and 10 alleles

(repeats) per mini-sat locus

6

5

3

1

Homozygous/heterozygous?

E

E

E

E

Chr1

Chr1

7

Micro-satellite and PCR

Microsatellite repeat expansion and contraction is investigated using PCR and gels instead of gels and southern blots

AGCGTCA GCGCGCGC TTATTGA

TCGCAGT CGCGCGCG AATAACT

AGCGTCA   AATAACT 22 bp PCR fragment

AGCGTCA GCGCGCGCGCGCGC TTATTGA

TCGCAGT CGCGCGCGCGCGCG AATAACT

AGCGTCA 

8

 AATAACT 28 bp PCR fragment

DNA fingerprint

1 2 3 4

1

2

The use of microsatellite analysis in genetic profiling. In this example, 2 totally different microsatellites (1) and (2) located on the short arm of chromosome 6 have been amplified by the polymerase chain reaction (PCR).

The PCR products are labeled with a blue or green fluorescent marker and run in a gel each lane showing the genetic profile of a different individual .

Each individual has a different genetic profile because each person has a different number of microsatellite length repeats, the number of repeats giving rise to bands of different sizes after PCR.

Locus1

Individual1 2,4

Individual2

Individual3

2,6

5,6 locus2

6,6

1,4

4,5

9

FBI and Microsatellite

The FBI uses a set of 13 different microsatellite markers in forensic analysis.

13 sets of specific PCR primers are used to determine the allele present in the test sample for each marker.

The marker used, the number of alleles at each marker and the probability of obtaining a random match for a marker is shown.

E

F

G

H

C

D

A

B

K

L

I

J

M

How often would you expect an individual to be mis-identified if all 13 markers are analyzed

Locus No. of alleles

11

19

7

7

10

10

10

11

10

8

8

15

20 probability of random match

0.112

0.036

0.081

0.195

0.062

0.075

0.158

0.065

0.067

0.085

0.089

0.028

0.039

P= 0.112x0.036x0.081x0.195x----- =

= 1.7x10

-15

10

DNA finger printing

Variation between people- small DNA change – a single nucleotide polymorphism [SNP] – in a target site,

RFLPs and SNPs are proof of variation at the DNA level,

Satellite sequences: a short sequence of DNA repeated many times.

Micro satellite are 2-4 bp repeats in tandem repeats 15-100 times in a row

Mini satellite are 20-100 bp repeats in tandem (0.5 to 20kb long)

Class size

Micro ~200bp

Mini

SNP

0.2-20kb

1 bp

No of loci

200,000

30,000

100 million method

PCR southern blot

PCR/microarray

11

The innocence project

12

Prop 69 2004

Commits serious crime

Not in database

Commits minor crime

DNA in database DNA from crime scene has

Partial match. Focus on family.

13

Jeffreys

In 1986, the Enderby murder case, a case local to Leicester, saw the first use of DNA profiling in criminology. Two young girls had been raped and murdered, one in 1983 and one in

1986. After the second murder, a young man was arrested and gave a full confession. The police thought he must have committed the first murder as well, so they asked Professor

Jeffreys to analyse forensic samples – semen from the first and second victims, samples from the victims, and blood from the prime suspect.

"The police were right – both girls had been raped by the same man," says Professor Jeffreys. "But it wasn't the man who had confessed. At first I thought there was something wrong with the technology, but we and the Home Office's Forensic

Science Service did additional testing and it was clear that it was not his semen. He had given a false confession and was released – so the first time DNA profiling was used in criminology, it was to prove innocence."

Blood samples from more than 5000 men in the local community were collected. The murderer nearly got away with it – sending a proxy in to give a blood sample – but eventually he was apprehended and got two life sentences.

14

xxxxxxx

15

Individuals

Methods used to study differences between individuals

DNA Repeats

RFLP

SNP

16

Analysis of globin gene (deletion)

H M

M

0.2kb

Exon1

D

0.3kb deletion

1.1kb

Exon2

M H

MstII

HindIII

1.1

1

0.2

1.35

1.05

17

Very small Deletion (of a restriction site)

1kb

E

2kb

E

3kb

E

4kb

E

5kb

GeneR

3kb H

GeneC

8kb H

GeneX

E

4.5kb

GeneA

9kb

E

0.5kb

WT

EcoRI EcoRI

Deletion

18

RFLP analysis

RFLP= Restriction fragment length polymorphism

Refers to variation (presence or absence) in restriction sites between individuals Because of mutations in Restriction sites

These are extremely useful and valuable for geneticists (and lawyers)

On average two individuals (humans) vary at 1bp in every 300-

1000 bp

The human genome is 3x10 9 bp

This means that they will differ in more than 3 million bp!!!

By chance these changes will create or destroy the recognition sites for restriction enzymes

19

RFLP

Lets generate a EcoRI map for the region in one individual

GAATTC

3kb

GAATTC

4kb

GAATTC

The the same region of a second individual may appear as

7kb

GAATTC GAGTTC GAATTC

1 2

Normal GAATTC

Mutant GAGTTC

EcoRI 20

RFLP

The internal EcoRI site is missing in the second individual

For X1 the sequence at this site is

GAATTC

CTTAAG

This is the sequence recognized by EcoRI

The equivalent site in the X2 individual is different

GAGTTC

CTCAAG

This sequence IS NOT recognized by EcoRI and is therefore not cut

Now if we examine a large number of humans at this site we may find that 25% possess the EcoRI site and 75% lack this site.

We can say that a restriction fragment length polymorphism exits in this region

These polymorphisms usually do not have any phenotypic consequences

Silent mutations that do not alter the protein sequence because of redundancy in codon usage, localization in introns or non-genic regions

21

RFLP

RFLP are identified by southern blots

In the region of the human X chromosome, two forms of the

X-chromosome are Segregating in the population.

B R

4 5

R

1

R B

3 6 3.5

R

2

X1

Digest DNA with

EcoRI or BamHI and probe with

Probe1/ probe2

What do we get?

B R

4 8

1

R B

6 3.5

2

R

X2

22

RFLP in individuals

If we used probe1 for southern blots with a BamHI digest what would be the results for X1/X1, X1/X2 and X2/X2 individuals?

Probe1 BamHI

X1/X1

18

X2/X2

X1/X2

18 18

If we used probe1 for southern blots with a EcoRI digest what would be the results for X1/X1, X1/X2 and X2/X2 individuals?

Probe1 EcoRI

X1/X1

X2/X2

X1/X2

5, 3 8, 5, 3

Probe2 EcoRI 9.5

B

4

R

5

R

1

3

R

6

B

3.5

R

2

X1

9.5

B

4

R R

1

8

R

6

B

3.5

R

2

X2

8

9.5

23

RFLP

RFLP’s are found by trial and error

They require an appropriate probe and appropriate enzyme

They are very valuable because they can be used just like any other genetic marker to map genes

They are employed in recombination analysis (mapping) in the same way as conventional morphological allele markers are employed

The presence of a specific restriction site at a specific locus on one chromosome and its absence at a specific locus on another chromosome can be viewed as two allelic forms of a gene

The phenotype in this case is a Southern blot rather than white eye/red eye

4

R

6

1

R

2 2

R R

4

R

8

1

2

R R

X1

X2

1

R

4

1

R

4

R

5

2

R

R

2

R

3

24

2

R

4

R

6

1

R

2 2

R R

4

R

8

1

2

R R

X1

X2

1

R

4

1

R

4

R

5

2

R

R

2

R

3

2

R

Probe1

6+2

8

X1/X1 individual

X2/X2 individual

Probe2

5

3

25

Using RFLPs to map human disease genes

8

6

EcoRI

Probe1

5

3

EcoRI

Probe2

9

1

EcoRI

Probe3

1

3 2

Which RFLP pattern segregates specifically with all of the diseased individual BUT NOT WITH NORMAL INDIVIDUALS?

1, 2 or 3

Which band segregates with the phenotype

Top or bottom?

Using DNA probes for different RFLPs you can screen individuals for a RFLP pattern that shows co-inheritance with the disease

Conclusion: the actual mutation resides at or near RFLP1 bottom band

26

RFLPs and Mapping unknown Genes

Lets review standard mapping:

To map any two genes with respect to one another, they must be heterozygous at both loci.Gene W and B are responsible for wing and bristle development

Centromere

W B

Telomere

To find the map distance between these two genes we need allelic variants at each locus

W=wings w= No wings

B=Bristles b= no bristles

To measure genetic distance between these two genes, the double heterozygote is crossed to the double homozygote

27

Mapping

To map a gene with respect to another, you perform crosses and measure recombination frequency between the two genes.

Gene W and B are responsible for wing and bristle development

Centromere

W B

Telomere

To find the map distance between these two genes we need

ALLELIC variants at each locus

W=wings w= No wings

B=Bristles b= no bristles

To measure distance between these two genes, the double heterozygote is crossed to the double homozygote

WB/wb

Female

Wings

Bristles

X wb/wb

Males

No wings

No bristles

----W--------B---

----w--------b---

----w--------b---

----w--------b---

28

Mapping

WB wb

Wb wB

Male gamete (wb)

Genotype phenotype

WB/wb wb/wb

Wb/wb wB/wb

Wings

Bristles

No wings

No bristles

Wings

No bristles

No wings

Bristles

51

43

3

4

Map distance= # recombinants /Total progeny

7/101= 7 M.U.

29

Mapping

Both the normal and mutant alleles of gene B (B and b) are sequenced and we find

B

Telomere

B

E

2

E

3

E

GAATTC b

AAATTC

E 5 E

The mutation disrupts the sequence and alters a EcoRI site!

If DNA is isolated from B/B, B/b and b/b individuals, cut with EcoRI and probed in A Southern blot, the pattern that we will obtain is

B/B Bristle B/b Bristle b/b No bristle

30

Mapping

To find the map distance between two genes we need

ALLELIC variants at each locus

Therefore in the cross (WB/wb x wb/wb), the genotype at the B locus can be distinguished either by the presence and absence of bristles OR by a Southern blot x WB/wb

Female

Wings

Bristles

Southern blot:

5 and 2 kb band wb/wb

Male

No wings

No Bristles

Southern blot:

5 kb band

You can use RFLPs instead of genes as markers along a chromosome

Just like Genes, RFLPs mark specific positions on chromosomes and can be used for mapping.

31

Mapping

Parental

WB wb

Recombinant

Wb wB

Wb/wb wB/wb

Male gamete (wb)

Genotype phenotype

WB/wb wb/wb

Wings

5kb 2kb

No wings

5kb

Wings

5kb

No wings

5kb 2kb

3

4

51

43

Mapping

To find the map distance between genes, multiple alleles are required.

We know the distance between W and B by the classical method because multiple alleles exist at each locus

(W & w, B & b). It is 7MU.

We know the distance between B and R by the classical method as

20MU.

Centromere

W

7MU

B

20MU

C R Telomere

Now suppose you find a new gene C .

You could map this gene with respect to Genes W, B and R using classical methods.

However, what if it is difficult to study the function of this new gene (the phenotype is difficult to see with the naked eye)

If the researcher identifies an RFLP in this gene you can map the gene mutation by simply following the RFLP.

Mapping

C

E 8 E c

E

6

E

2

E

With this RFLP, the C gene can be mapped with respect to other genes in any cross.

Genotype/phenotype relationships for the W and C genes

WW and Ww = Red eyes ww = white eyes

CC = 8kb band

C/c = 8, 6, 2 kb bands cc = 6, 2 kb bands

To determine map distance between R and C, the following cross is performed

W C

------------

-----------w c w c

------------

-----------w c

34

W

Mapping

7MU

C(8)

W w c(6,2)

B

20MU

C w w

R c(6,2) c(6,2)

Parental

WC wc

Recombinant

Wc wC

Wc/wc wC/wc

Male gamete (wc)

Genotype

WC/wc wc/wc phenotype

Red/8,6,2 white/6,2

Red/6,2 white/8,6,2

5

5

45

45

Map distance between W and C is 10MU

35

Mapping

Prior to RFLP analysis, only a few classical markers existed in humans (approximately 200)

Now over 7000 RFLPs have been mapped in the human genome.

Newly inherited disorders are now mapped by determining whether they are linked to previously identified RFLPs

Centromere

W or w

7MU

B or b

20MU

R or r

Telomere

Centromere RFLP1

Probe1

7kb or

4kb

RFLP2

Probe2

1kb or

2kb

RFLP3

Probe3

3kb or

9kb

Telomere

36

Individuals

Methods used to study differences between individuals

RFLP

SNP

DNA Repeats

37

Genetic polymorphism

Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations.

Genetic Mutation: A change in the nucleotide sequence of a

DNA molecule.

Genetic mutations are a subset of genetic polymorphism

Genetic Variation

Single nucleotide

Polymorphism

(point mutation)

Repeat heterogeneity

38

SNP

A Single Nucleotide Polymorphism is a source variance in a genome.

A SNP ("snip") is a single base change in DNA.

SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human

DNA polymorphisms).

There are two types of nucleotide base substitutions resulting in SNPs:

Transition: substitution between purines (A, G) or between pyrimidines (C, T). Constitute two thirds of all

SNPs.

Transversion: substitution between a purine and a pyrimidine.

While a single base can change to all of the other three bases, most SNPs have only one allele.

39

SNPs-

Single Nucleotide Polymorphisms

-----------------------

ACGGCTAA

-----------------------

ATGGCTAA

Instead of using restriction enzymes, these are found by direct sequencing/PCR

They are extremely useful for mapping

Classical Mendelian

RFLPs

SNPs

Markers

~200

7000

1.4x10

6

SNPs occur every 300-1000 bp along the 3 billion long human genome

Many SNPs have no effect on cell function

Note: RFLPs are a subclass of SNPs

40

SNPs

Humans are genetically >99 per cent identical: it is the tiny percentage that is different

Much of our genetic variation is caused by single-nucleotide differences in our DNA : these are called single nucleotide polymorphisms, or SNPs.

As a result, each of us has a unique genotype that typically differs in about three million nucleotides from every other person.

SNPs occur about once every 300-1000 base pairs in the genome, and the frequency of a particular polymorphism tends to remain stable in the population.

Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of

"coding sequences".

41

How did SNPs arise?

F2a----AC G GACTGAC----CCTTACGTTG----TACTACGCAT----

|

F1 ----ACTGACTGAC----CCTTACGTTG----TACTACGCAT----

P ----ACTGACTGAC----CCTTACGTTG----TACTACGCAT----

|

F1 ----ACTGACTGAC----CCTTACGTTG----TACTA G GCAT----

| |

F2b----ACTGACTGAC----CC A TACGTTG----TACTA G GCAT----

Compare the two F2 progeny

Haplotype1 (F2a) = SNP allele1

----AC G GACTGAC----CCTTACGTTG----TACTACGCAT----

Haplotype2 (F2b) = SNP allele2

----ACTGACTGAC----CC A TACGTTG----TACTA G GCAT----

42

Each of 10 13 cells in the human body receives approximately thousand DNA lesions per day (Lindahl and Barnes 2000)

When these mutations are not repaired they are fixed in the genome of that particular cell

If a mutation is fixed in germ cells that go on to be fertilized and form an embryo they will be propagated to progeny

43

SNPs, RFLPs, point mutations

GAATTC

GAATTC

GAATTC GAATTC GAATTC

GAATTC

SNP

GAGTTC

RFLP

SNP

Pt mut

SNP

GAATTC GAATTC GACTTC

RFLP

Pt mut

SNP

44

Coding Region SNPs

Types of coding region SNPs

Synonymous: the substitution causes no amino acid change to the protein it produces. This is also called a silent mutation.

Non-Synonymous: the substitution results in an alteration of the encoded amino acid. A missense mutation changes the protein by causing a change of codon. A nonsense mutation results in a misplaced termination.

More than half of all coding sequence SNPs result in non-synonymous codon changes.

Alzheimer’s SNP

Occasionally, a SNP may actually cause a disease.

SNPs within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein.

One of the genes associated with Alzheimer's, apolipoprotein E or

ApoE, is a good example of how SNPs affect disease development.

This gene contains two SNPs that result in three possible alleles for this gene: E2, E3, and E4.

Each allele differs by one DNA base, and the protein product of each gene differs by one amino acid.

Each individual inherits one maternal copy of ApoE and one paternal copy of ApoE.

Research has shown that an individual who inherits at least one E4 allele will have a greater chance of getting Alzheimer's.

Apparently, the change of one amino acid in the E4 protein alters its structure and function enough to make disease development more likely. Inheriting the E2 allele, on the other hand, seems to indicate that an individual is less likely to develop Alzheimer's.

Intergenic

SNPs

Researchers have found that most SNPs are not responsible for a disease state because they are intergenic SNPs

Instead, they serve as biological markers for pinpointing a disease on the human genome map , because they are usually located near a gene found to be associated with a certain disease.

Scientists have long known that diseases caused by single genes and inherited according to the laws of Mendel are actually rare.

Most common diseases, like diabetes, are caused by multiple genes.

Finding all of these genes is a difficult task.

Recently, there has been focus on the idea that all of the genes involved can be traced by using SNPs.

By comparing the SNP patterns in affected and non-affected individuals —patients with diabetes and healthy controls, for example—scientists can catalog ALL of the DNA sequence variations in affected Vs unaffected individuals to identify mutations that underlie susceptibility for diabetes

GAATTC

SNP

GAGTTC

RFLP

SNP

Pt mut

SNP

GAATTC GAATTC GACTTC

RFLP

Pt mut

SNP

How do you identify SNPs in individuals- PCR

PCR is quick sensitive and robust and is useful when dealing with small amounts of DNA, or where rapid and high-throughput screening is required.

PCR:

* The polymerase chain reaction involves many rounds of DNA synthesis.

* All DNA synthesis reactions require a template, a primer, a enzyme and a supply of nucleotides. In the standard PCR, two primers flank the target for amplification and face inwards. DNA synthesis therefore proceeds across the region between the primers.

PCR results in exponential amplification of the target sequence.

How does it work?

The reaction begins by heating up the DNA template to 94 ° C, which splits (denatures) the double strands into single strands. The sample is then cooled to about 54 ° C, which allows the primers to stick (anneal) to the template. When the sample is heated up again to 72 ° C, the polymerase enzyme uses the primers as starting points to copy the single strands.

Special DNA polymerase that can withstand high temperatures is used

The cycle of denaturation, primer annealing and primer extension is repeated over and over again (using a machine that automates the heating and cooling of the samples), each time producing more copies of the original template.

During repeated rounds of these reactions, the number of newly synthesized DNA strands increases exponentially . After 25 to 30 cycles, the initial template DNA will have been copied several million-fold.

Doubling occurs in every cycle of the PCR leading to exponential amplification of the target.

After 25 cycles there are over 8 000 000 copies!

The PCR is useful where the amount of starting material is limited or poorly preserved.

Examples of PCR applications include cloning DNA from single cells, prenatal screening for mutations in early human embryos, and the forensic analysis of DNA sequences in samples such as fingerprints, blood stains, semen or hairs.

The PCR is also very useful where many samples have to be processed in parallel. For example, the large-scale analysis of single nucleotide polymorphisms involves PCR-based techniques

48

PCR

If a region of DNA has already been sequenced in one individual, the sequence information can be used to isolate and amplify that sequence from other individuals DNA in a population.

Individuals with mutations in p53 are at risk for colon cancer

To determine if an individual had such a mutation, prior to PCR one would have to clone the gene from the individual of interest

(construct a genomic library, screen the library, isolate the clone and sequence the gene).

With PCR, the gene can be isolated directly from DNA isolated from that individual.

No lengthy cloning procedure necessary

Only small amounts of genomic DNA required

30 rounds of amplification can give you >10 9 copies of a gene

49

PCR

Heat and add primers

Heat resistant

DNA polymerase

Heat and add primers + DNA pol

50

51

5’AAAGATC GGGGGGGGGGGGGGG TCGATCTA3’

3’TTTCTAG CCCCCCCCCCCCCCC AGCTAGAT5’

PRIMER1  5’AAAGATC3’

3’AGCTAGAT5’  PRIMER2

5’AAAGATC GGGGGGGGGGGGGGG TCGATCTA3’

3’AGCTAGAT5’

5’ AAAGATC3 ’

3’TTTCTAG CCCCCCCCCCCCCCC AGCTAGAT5’

5’AAAGATC GGGGGGGGGGGGGGG TCGATCTA3’

3’TTTCTAGCCCCCCCCCCCCCCC AGCTAGAT 5’

5’ AAAGATC GGGGGGGGGGGGGGGTCGATCTA 3’

3’TTTCTAG CCCCCCCCCCCCCCC AGCTAGAT5’ 52

How do you detect PCR?

Agarose Gels

Size of PCR product will depend upon location of PCR primers

53

SNPs and Primers- ASO hybridization

Individual 1

GACTCCTGAGGAGAAGTG

Individual 2

GACTCCTG T GGAGAAGTG

Raise Temperature Raise Temperature

DNA from individuals 1 and 2 are tested under

CONDITIONS that only allow perfect matches of oligos to anneal to the genomic DNA.

54

PCR allows only one SNP to be tested at a time

1 aAa

2 cCc

3

A

4

G aTa cGc A T

5

G

6

T

7

A

8

T

9

C

Ind1

G T G T G Ind2

55

Microarrays and SNPs

SNP1 A

SNP1 B

GACTCCTGAGGAGAAGTG

SNP2 A

GACTCCTG

SNP2 B

T GGAGAAGTG

GGGGGGGGGGGGGGGGGG

GGGGGGGG C GGGGGGGGG

Design oligonucleotides complementary to each Polymorphism.

These oligos are arrayed on a slide

Each spot corresponds to a polymorphism

Isolate genomic DNA from individual

Label Genomic DNA and hybridize to array

Oligo probes on slide

GACTCCTGAGGAGAAGTG

SNP1

GACTCCTG T GGAGAAGTG

GGGGGGGGGGGGGGGGGG

SNP2

GGGGGGGG C GGGGGGGGG

56

1 2 3 4 5 6 7 8 9

Individuals

1TT 2GG

1AA

1AT

2CC

2GG

Microarray slide

GACTCCTGAGGAGAAGTG

SNP1

GACTCCTG T GGAGAAGTG

GGGGGGGGGGGGGGGGGG

SNP2

GGGGGGGG C GGGGGGGGG

GACTCCTGAGGAGAAGTG

GACTCCTG T GGAGAAGTG

SNP1

GGGGGGGGGGGGGGGGGG

SNP2

GGGGGGGG C GGGGGGGGG

GACTCCTGAGGAGAAGTG

GACTCCTG T GGAGAAGTG

SNP1

GGGGGGGGGGGGGGGGGG

SNP2

GGGGGGGG C GGGGGGGGG

GACTCCTGAGGAGAAGTG

GACTCCTG T GGAGAAGTG

SNP1

GGGGGGGGGGGGGGGGGG

GGGGGGGG C GGGGGGGGG

SNP2

57

Genotype and Haplotype

In the most basic sense, a haplotype is a “haploid genotype”.

Haplotype: particular pattern of SNPs (or alleles) found on a single chromosome in a single individual.

The DNA sequence of any two people is 99 percent identical.

Sets of nearby SNPs on the same chromosome are inherited in blocks.

Therefore while Blocks may contain a large number of SNPs, a few SNPs are enough to uniquely identify the haplotype of that block.

The HapMap is a map of these specific SNPs.

SNPs that identify the haplotypes are called tag SNPs.

This makes genome scan approaches to finding regions with genes that affect diseases much more efficient and comprehensive.

Haplotyping: involves grouping individuals by haplotypes, or particular patterns of sequential SNPs, on a single chromosome.

There are thought to be a small number of haplotype patterns for each chromosome.

Microarrays or PCR are used to accomplish haplotyping.

Haplotype and SNPs

Each individual has a characteristic pattern of SNPs

SNPs occur every 300-1000bp apart. There are over a million

SNPs in each individual

When we generate a SNP map for an individual we DO NOT check every single SNP in that individuals DNA

SNPs are transmitted as blocks (Recombination hot spot)- so no point analyzing SNPs that go together

1 aAa

2 cCc

3

A

4

G aGa cGc A T

5

G

6

T

7

A

8

T

9

C

Ind1

G T G T G Ind2

SNPs in red were not studied. Only the 9 black SNPs were studied

SNP mapping is used to narrow down the known physical location of mutations to a single gene.

The human genome sequence provided us with the list of many of the parts to make a human.

The HapMap provides us with indicators which we can focus on in looking for genes involved in common disease.

By using HapMap data to compare the SNP patterns of people affected by a disease with those of unaffected people, researchers can survey the whole genome and identify genetic contributions to common diseases more efficiently than has been possible without this genome-wide map of variation: the HapMap Project has simplified the search for gene variants.

Oligonucleotide chips contain thousands of short DNA sequences immobilised at different positions. Such chips can be used to discriminate between alternative bases at the site of a SNP.

Chips allow many SNPs to be analyzed in parallel.

Short DNA sequences on the chip represent all possible variations at a polymorphic site;

A labeled genomic DNA from an individual will only stick if there is an exact match . The base is identified by the location of the fluorescent signal.

60

A recessive disease pedigree

61

Mapping recessive disease genes with DNA markers

SNP markers are mapped evenly across the genome.

The markers are polymorphic.

We can tell looking at the SNP pattern of a particular grandchild which grandparent contributed a certain part of its

DNA.

If we knew that grandparent carried the disease, we could say that part of the DNA might be responsible for the disease.

1 2 3 4 5

4 different alleles at each locus

Position1 can be A or C or G or T

Position2 can be A or C or G or T

Position3 ………………..

6 7 8 9

SNPs in red were not studied. Only the 9 black SNPs were studied

Grand parent 1

Chromosome A-A-A-A-A-A-A-A-A

A-A-A-A-A-A-A-A-A

2

C-C-C-C-C-C-C-C-C

C-C-C-C-C-C-C-C-C

3 4

G-G-G-G-G-G-G-G-G

G-G-G-G-G-G-G-G-G

T-T-T-T-T-T-T-T-T

T-T-T-T-T-T-T-T-T

62

Mapping recessive disease genes with SNP markers

1 2 3 4 5 6 7 8 9

Grand-parent

1

A-A-A-A-A-A-A-A-A

A-A-A-A-A-A-A-A-A

2

C-C-C-C-C-C-C-C-C

C-C-C-C-C-C-C-C-C

3

G-G-G-G-G-G-G-G-G

G-G-G-G-G-G-G-G-G

4

T-T-T-T-T-T-T-T-T

T-T-T-T-T-T-T-T-T

Dad A-A-A-A-A-A-A-A-A

C-C-C-C-C-C-C-C-C

Mom

G-G-G-G-G-G-G-G-G

T-T-T-T-T-T-T-T-T

Offspring1

A-A-A -C-C -A-A -C-C

G-G-G-G -T-T-T-T -G

Offspring2

C-CA-A -CA -CA-A

G-G-G-GT-T-TG-G

Offspring3

A-A-A-A-A -C-C-C-C

T-T -G-G-G-GT-T-T

Offspring4 C-C-C-C-C-CA-A-A

G-GT-T-T-T-T-T-T

Grandparents 1 and 4 and offspring 1 and 4 have the disease

We would look at the markers and see that ONLY at position 7 do offspring 1 and 4 have the DNA from grandparents 1 and 4.

It is therefore likely that the disease gene will be somewhere near marker 7.

SNP typing

Oligonucleotide chips contain thousands of short DNA sequences immobilised at different positions. Such chips can be used to discriminate between alternative bases at the site of a

SNP.

Chips allow many SNPs to be analysed in parallel, which is necessary for large-scale association or pharmacogenomic studies.

Key principles

* DNA chips are miniature devices with thousands of different DNA sequences immobilised at different positions on the surface. Oligonucleotide chips contain very short

DNA sequences (~25 nucleotides).

* A DNA sequence containing a single nucleotide polymorphism is hybridised to the chip.

* A method is employed to discriminate between alternative bases at the polymorphic site. This is known as typing the polymorphism.

* A signal, corresponding to the specific identified base, is detected.

* A chip can be used to type many SNPs simultaneously.

How does it work?

Two chip-based typing methods are widely used. One method relies on allele-specific hybridisation. Short DNA sequences on the chip represent all possible variations at a polymorphic site; a labelled DNA will only stick if there is an exact match. The base is identified by the location of the fluorescent signal.

Alternatively, the oligonucleotide on the chip may stop one base before the variable site. In this case typing relies on allele-specific primer extension. A DNA sample stuck onto the chip is used as a template for DNA synthesis, with the immobilised oligonucleotide as a primer.

The four nucleotides, containing different fluorescent labels, are added along with DNA polymerase. The incorporated base, which is inserted opposite to the polymorphic site on the template, is identified by the nature of its fluorescent signal. In a variation of this technique, the added nucleotide is identified not by a fluorescent label but by mass spectrometry.

How is it used?

The chip-based methods discussed above are particularly suitable for high-throughput SNP typing which is required for large-scale studies of populations. Two of the most important applications are 'association studies', which attempt to correlate SNP profiles with predisposition to disease, and pharmacogenomic studies, which attempt to correlate SNP profiles with drug response patterns.

64

A disadvantage of chip-based assays is that they are somewhat inflexible - new SNPs cannot easily be incorporated onto a chip, requiring a new chip to be made. This is being overcome by the use of bead arrays.

WT

PCR and RFLP

---------CCTGAG GAG----------------

---------GGACTC CTC----------------

MSTII

Mut ----------CCTGTGGAG----------------

----------GGACACCTC----------------

PCR amplify DNA from normal and sickle cell patient

Digest with MstII

WT Mut

500

400

300

200

100

65

Download