Parade2013

advertisement
Alexei Fedorov, Ph.D.
Associate Professor
Head of Bioinformatics Lab
Department of Medicine
Vice Director
Program in Bioinformatics and Genomics/Proteomics
Tel: (419)-383-5270
Email: alexei.fedorov@utoledo.edu
http://bpg.utoledo.edu/~afedorov/lab/
1
May 2011
Bioinformatics Lab in 2013-2014
PhD students
Shuhao Qiu
Masters students
Ahmed Al-Khudair
Current grants
NSF Career Development 2007-2012
“Investigation of intron cellular roles”
4
MAJOR GOAL:
Bioinformatics
Investigation
of the
Human
Genome
5
Education in Bioinformatics
(TWO TYPES OF STUDENTS)
• Computer/math background  gain experience in
Biology (Sam, Andy)
• Biological background  gain experience in
programming (Dave, Maryam)
• Example of computational projects:
Binary-absrtacted Markov models and their
application to sequence classification
http://etd.ohiolink.edu/view.cgi?acc_num=mco1271271172
http://bpg.utoledo.edu/~sshepard/defense/ video
Genomic MRI
http://bpg.utoledo.edu/gmri/
http://www.jove.com/Details.php?ID=2663
Job perspectives
(example: Ashwin Prakash)
PhD – November 2011, HSC UT
PhD research fellow -- from January 2011
Johns Hopkins School of Medicine
Declined offers:
• Cold Spring Harbor Laboratory
• Baylor College of Medicine
The PI’s students received the following
awards:
• Jason Bechtel, Outstanding MSBS student in 2008 at HSC UT.
• Theodor Rais, Second/Third Poster award by Ohio Bioinformatics
Consortium, 2009.
• Samuel Shepard, Outstanding PhD student in 2010 at HSC UT.
• Lorraine Walters, Undergraduate Research Recognition Award, UT
May 2012.
• Arnab Saha-Mandal, 1) Outstanding MSBS student in 2013 at HSC
UT; and 2) Canadian Institute of Health Research fellowship support
($20,000).
• Jasmine Serpen, 1) Ohio Governor's Thomas Edison Award for
Excellence in Biotechnology & Biomedical Technologies-1st place;
and 2) OSERA Biomedical Research/Bioengineering Award-1st
place (for high school students).
Program in Bioinformatics and
Genomics/Proteomics (BPG)
• http://hsc.utoledo.edu/depts/bioinfo/
• BPG offers a Certificate in association with
the degrees of Doctor of Philosophy (Ph.D.)
or Doctor of Medicine (M.D.). BPG also
offers a Master of Science in Biomedical
Sciences (MSBS).
10
Two courses in Spring semester:
• Application of Bioinformatics, Proteomics, and
Genomics (BIPG 640) or “Advanced Bioinformatics”
(should be taken after “Fundamental Bioinformatics” of Dr. Trumbly)
• Introduction to Bioinformatic Computation (BIPG 610)
The main goal of this course is to provide basic
programming skills to biological and medical students who
may lack a background in computer sciences. Programming
will be specifically taught using important biological
examples, focusing in particular on the PERL language.
No programming skills are required!
11
In the “Introduction to Bioinformatic Computation” course, rather than doing
“cookbook” lab exercises, students participate in real-world, challenging
problems whose resolution advances the field of genome biology. In addition to
learning programming and other bioinformatic skills the students of this course
acquire knowledge in how to present the final product of bioinformatic research
and how to write a scientific paper on the subject.
•In 2005 the class developed a program to identify novel genes for non-coding
RNAs in humans and other mammals. This work resulted in publication of an
article in Nucleic Acids Research1, coauthored by the group of students who
were actively working on this project.
•In 2006 course students created a novel public database (ASMD) and also a
novel computational resource “Splicing Potential”. Ten students were coauthors in two manuscripts2,3.
•In 2007 the class participated in the “Genomic MRI” project. Seven of these
4
students are co-authors in BMC Genomics, 2008
•2008 class continued “Genomic MRI” project. They performed whole genome
comparisons for human, chimpanzee, and macaque and also analyzed
distribution of 4 million SNPs inside and outside MRI regions. The results are in
preparation for publication in Genome Research with 6 students among the
authors.
12
Publications with IBC students
54. Prakash A., Shepard S., Mileyeva-Biebesheimer O., He J., Hart B., Chen M., Amarachiniha
S., Bechtel J., Fedorov A. “Molecular forces shaping human genomic sequence at midrange scales”, BMC Genomics 2009, 10:513.
53. Bechtel J.M., Wittenschlaeger T., Dwyer T., Song J., Arunachalam S., Ramakrishnan S.K.,
Shepard S., Fedorov A. Genomic mid-range inhomogeneity correlates with an abundance
of RNA secondary structures. BMC Genomics 2008, 9:284.
52. Bechtel J. M., Rajesh P., Ilikchyan I., Deng Y., Mishra P.K., Wang G., Wu X., Afonin K.,
Grose W., Wang Y., Khuder S., and Fedorov A. Calculation of Splicing Potential from the
Alternative Splicing Mutation Database Research Notes 2008, 1:4.
51. Bechtel J. M., Rajesh P., Ilikchyan I., Deng Y., Mishra P.K., Wang G., Wu X., Afonin K.,
Grose W., Wang Y., Khuder S., and Fedorov A. The Alternative Splicing Mutation
Database: a hub for investigations of alternative splicing using mutational evidence.
Research Notes 2008, 1:3.
44. Fedorov A, Stombaugh J., Harr M.W., Yu S., Nasalean L., Shepelev V. Computer
identification of snoRNA genes using a Mammalian Orthologous Intron Database. Nucl.
Acids Res. 2005. 33, 4578-4583.
http://www.utoledo.edu/centers/brim/index.html
COURSE: Bioinformatics of Biomarkers and Individualize Medicine,
Spring 2012
• Course time line: 14 Weeks
• No prerequisites, recommended: Introduction
of bioinformatics and molecular biology
• Reserve materials: None
• Unit 1 Biomarker discovery and
validation
• Unit 2 Individualized Medicine
Investigation of the human genome
BASE COUNT
846302 a 578512 c 575805 g 843114 t
1703 others
ORIGIN
1 gaattcaaaa aagaaagaca atgacttgta gctgaagcta tgatcaggaa
61 ggacggcatt tgagaaaatc aggacagtgg tgtacttatc aaataagaag
121 aagattgttg aaaaagcaga cacagcactg agtagcagca tggagcagaa
181 aacaagtagt gcagtgtgcc tgaacatagg atgggaaatt aggaaagata
241 gactgtggga agccttacat tccaggctta gtggaataag taaatattta
301 gttcttttct ctctgctttc tatttttcac gacctgaact cacctcccag
361 tttccaccta gcactaaaca gtaactagtt cagactatat atttaaaaaa
421 aaaaaaaaaa gcagaacagc tcagatcatc cagtgaagtg gtgctactat
481 acggggagat gaaagccaga taagatggag aagtaggaaa tttacgaaac
541 aaaatttatt tattcatcaa tatttacata aatgtttatt aattctaagt
601 gcacccattt attactttca aaaattgaca atatacaagt taataaaatc
661 cctcttctaa taaaattatc tcactcaaat tcatataact aaaaatacat
721 ttatttttaa aatataggcc acttctactc tattcatttt tgcacttaac
781 tttcaaaaat gtatgaaaaa tttcagttta gtccccacca aatctcaatt
841 ataaagagta aataaattaa agagctgtca gaattaaaac actactacag
901 ctttatggca tagatgaagg caggaaatac tggctgaaaa ttttgtttat
961 ttgatgatta ccatcagaga tctgatatct cagggaagaa aagcctttca
1021 aaaaaattct gccaggcgcg gtggctcacg cctgtaatcc cagcactttg
1081 gtgggcagat cacctgaggt cagaagttcg agaccagcct gaccaacatg
1141 gtctctacta aaaatacaaa atcagccggg cgtggtggcg catgcctgta
1201 cttgggaggc tgaggcagga gaatcacttg aacccaggag gcagaggttg
1261 agatcacacc attgcactcc agcctgggca acaagggcga aactctgtct
1321 aaaacttctg gggaaatggt ggcctgcctt gtaacatcta tgtgtcttag
1381 tatgacaccc ttgggcagtc atttatagag tccttccctg accagggaat
aagatggggt
atctgggcag
aagcataagg
aatggaggct
aatctcatga
tgaggagatg
aaaaaaaaaa
tatactatta
attttaaaag
actatagtag
atattagttt
ttaataaatt
attctcttgc
tagaccccgg
gtctccttca
gtcaaagatt
tataccactt
ggaggctgag
gagaaaccct
atcccagcta
cggtgagccg
caaaaaaaaa
agggccatgg
catcctgcca
16
... after the first 50 pages ..
141601
141661
141721
141781
141841
141901
141961
142021
142081
142141
142201
142261
142321
142381
142441
142501
142561
142621
142681
142741
142801
142861
142921
142981
143041
143101
143161
143221
143281
143341
cagcaccaaa
tgtccatgca
cccacactat
aaacttgaaa
acagataacc
cttaagtact
gcatttatta
aagaatgcta
acctgaggaa
acttaaaaac
agagcagcat
gcaattaggc
ccacacgtgt
ttagccaaaa
gtatatcaaa
agacctcaaa
tacgtaatga
cacttacatt
cacccaggct
caagcaattc
tccagctaat
tcgaactcct
gcatcagccg
ctgtctctac
ctacttggga
ccgagatcac
aaaaaaaaaa
tgttgatgct
agttaaaatg
agttgagaaa
tcctctcatt
atctgttgaa
atatcaaaat
atattgagat
aacagaggaa
tcaaaaaagt
caaataattc
agatcacatt
aaaagctaac
ctatcgaaat
ttttccccat
aatcttgtat
gaatcctaaa
ggaaaacgac
atgatgaaat
aatgcccaaa
aacagaatac
cagatttttt
ggagggcagt
tcctgcctca
ttttgtattt
ggcctcaagt
ggtgcggtgg
taaaatacaa
ggctgaggga
accactgtac
aaaaaagaaa
agtctattgt
tatcaaaatg
aatgtaagca
gcctttttaa
aaatctggct
aaacccaagt
gaatattagt
gtcagaaaac
cattacaata
agaaaaagga
ttttaaaaag
ctcacaagta
aacgaagtgt
tgtggaggga
caaaaatctt
acaattaaaa
ctaaatgacg
attttgcagc
atatattaat
agttgatcct
tctttttgct
ggcaccattc
gcctcccaag
ttagtagaga
aatccacctg
cttatgcctg
aaaattagct
tgagaattgc
tccagcctgg
aagaaaaaga
gtaatttacc
tatacacaaa
aacatgaaga
aaaatgttgt
atttgcaaac
gtataaaaga
tagagctttg
agtaatcatt
cttaaaaacc
tttatatccc
tagctaaagg
ttcaaccaaa
ttggaaaatg
gtgtgtaaat
caaagtgttc
gtatgaacat
aatgatgtgc
tttgaaaagg
tgaaaaggat
tgaacaacgc
tttttttttt
tggctcacta
tagctggaat
cggagtttca
cctcagcctc
caatcccatc
gagtgtggtg
ttgaacctgg
gcaacagagc
aaaaggtatg
accataaaat
cacttagaga
tgcagtatta
ccaatttaac
aaagaaaaaa
gaaaatttta
agtaggaaag
tccttaatga
ttacaacaat
taataactaa
ataatataaa
taaaataacc
acaagattca
tggtgtggtc
ttactctttg
atttttatgc
aactgcatgg
taattttgaa
acaaaacttt
tggtttgaac
gagacgaagt
caacctgcgt
tacaggcgcc
ccatgttggc
ccaaagtgct
ctggctaaca
gcacatgcct
gaggcagagg
aagactccat
ttatgaatgc
atacacaggt
tagtacatgg
aatcataact
atcaagacac
tgtatagcct
agtgaaacca
gattttttga
aaatacaaaa
catgtggaaa
agaagtgagg
tgactaacag
tcgagatacc
aaatctggta
tttctgaaaa
atgaagaatt
acaaagatgt
ataaattgtt
aaaactttaa
attatttcac
tgcactcgtc
ctcactctgt
ataccaggtt
tgtcaccacg
caggctggtc
gggattacag
cggtgaaacc
atagttccag
ttgcagtgag
ctcaaaaaaa
agaaagtata
ctattataga
tatcattccc
gtataaaatt
17
... after next 200 pages
683041
683101
683161
683221
683281
683341
683401
683461
683521
683581
683641
683701
683761
683821
683881
683941
684001
684061
684121
684181
684241
684301
684361
684421
684481
684541
684601
684661
684721
684781
684841
684901
ggaggtgggg
agccaccaac
gaggagcacc
agcgaccatc
aatgtgggga
ataggagact
tctataacct
aaatggatta
aaaaaagaaa
ctccaacact
tttaaaggtt
aagaatgttg
cactgttagt
ttttttcctt
tcaaggagta
ataggttggg
tctcccagtc
atatttcttg
gctttatttc
acatttcttg
agtaccatta
taatatgttg
gaatgggtgg
caattccact
taaaacagtg
agtgtcatag
tgtttgtccc
gcactatttt
taatcccagc
ccaacgtggt
atgcctataa
cagaggttgt
agcgcctctg
ccatctggga
tctgccgggc
gagaatgggc
aaagaaagag
ccattttgtt
tacccccaaa
agggcgatgc
gagaaaaaaa
tgtcacctaa
ttcagcttaa
aatattggcc
ctgatggctt
catttcaacc
tctttgtggt
gaagttctcc
actttcaggt
gaggctttgc
attaagttag
tcttttttgg
cgctccgtga
cctggtccag
gtggttagat
tactggtgag
acaatgatat
tgatcaggaa
aatgtatatg
atgaacttta
actttgggag
gaaaccacat
tcccagctac
ggtgagctga
cccagccgcc
agtgaggagc
tgccccgtct
catgatgacg
agatcagatt
ctgtactaag
cccctgctct
aagatgtgct
aaatcattga
tgaccaggga
ctgttttgtc
cccactctct
ccctttgtgg
atggtgaatc
gttctctgta
tggataatat
acaccaatca
tcattccttt
tttatatttg
gcctgataat
ggacagggac
agtagatact
gaatggaatt
aagccttgtc
tgtttctgct
taaagccagg
gcagagggag
aaatcctcat
gccaaggcag
ctctactaaa
ttgggaggct
gattgtgcca
ccatctggga
gcctctgcct
gggaagtgtt
atggtggttt
gttactgtgt
aaaaattctt
ctgaaacatg
ttgttaaaca
aggattattt
tcaataccca
tcttaataaa
tctggcttgt
gtaacccagt
tgacaattat
tttcctgaat
cctgaagagt
aatgtaggtt
tcattctttt
actgtgcttt
tactctgcaa
tattttgttc
catatataaa
tgccttaatt
taagtcttta
accacaatgg
gcttgaagca
aaagaaaacc
agcagggcca
gcagatcact
aatacaaaaa
gaggcaggag
ctgtactcca
ggtggggagc
ggccaccccg
cccaacagct
tgtcgaaaag
ctgtgtagaa
ctgccttggg
tgctgtgtca
gatgcttgaa
atgccctatg
caaatacagt
tttttatata
agagtttctg
ctttctttct
gtgtcttggt
ttgaatattg
gttttccaac
tggtcttttc
ttctctaatc
atacttgaca
gttaaaaagg
attgttgcaa
tacttgctga
ttcaagatgg
aaccttactt
aaaaaaggac
tctcctgatt
gttgagtctt
ggtgcagtgg
tgaggtcagg
ttagccaggc
aaatgcttga
gcctgggcaa
gcctctgtcc
tctgggaagt
ctgaagagac
aaaaggggga
agaagtagac
atgctgttaa
actcagggtt
gacagaaaaa
gcatcccttt
aagacctatt
ggaaaaaaaa
cagagagatc
gcccttaaca
gttgctcttc
gcctgtgtgg
ttggttccat
acatagtccc
ttgtcttcaa
aagcactttc
aaaaactcca
cctaagcact
ataaagggat
attcaatttc
tcctcatcta
agaattactt
cctagggcat
aatctgtcag
ctcacacctg
accagcctgt
gtggtggtgc
acctgggagg
cagaacaaga
18
Human chromosome 1
4,814,628 lines =
=100,000 pages
= 100 books
(1000 pages each)
19
Nature 2012, Sept
th
6 ,
v.489, p 46
Lab 2013
THE 1000 GENOME PROJECT
A GUIDE TO YOUR ANCESTRY
The pattern of the human genetic variations believed to be a key to reveal much about the human population
history and diversity. The 1000 Genome project has sequences 1092 genome from different populations and by
identifying the sequence that correspond to LWK, GBR, JPT and FIN, we are aiming to learn more about the
population genetic patterns and to get a picture of the genetic diversity existed within the mentioned populations.
The 1000 genome project effort to catalogue the human genetic variation is utilized in this project to calculate and
compare these genetic differences between 14 populations. I am presenting the results that our bioinformatics
lab’s team obtained so far and working on having it put in a paper.
Using Perl programming to compute the differences between each two individual’s genomes from the 1000
Genome project for the 14 populations
•ASW
HapMap African ancestry individuals from SW US
•CEU
CEPH individuals
•CHB
(CHB) Han Chinese in Beijing
•CHS
(CHB) Han Chinese South
•CLM
Colombian in Medellin, Colombia
•FIN HapMap Finnish individuals from Finland
•GBR
British individuals from England and Scotland (GBR)
•IBS Iberian populations in Spain
•JPT JPT Japanese individuals
•LWK
(LWK) Luhya individuals
•MXL
HapMap Mexican individuals from LA California
•PUR
Puerto Rican in Puerto Rico
•TSI Toscan individuals
•YRI (YRI) Yoruba individuals
Figure 1: Frequency distribution of differences in 14 populations (GBR, JPT, LWK, FIN, ASW, PUR, CLM, CHS, MXL, CEU, IBS, YRI, TSI and CHB), Each peak represents one population and the differences for each of these populations calculated between each two individuals within the same population (example: GBR has 89 individuals
that will make 3916 pair of individuals). The differences ranged between 2.75 million – 5.28 million, and they are plotted on a scale of bins 0.01 million in size
1000
ASW-60
CEU-84
CHB-96
CHS-99
CLM-59
FIN-93
GBR-89
IBS-13
JPT-88
LWK-96
MXL-65
PUR-54
TSI-96
YRI-87
900
800
700
600
500
Number Of Pairs
400
300
200
100
0
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4.1
4.3
4.5
4.7
4.9
5.1
5.3
5.5
The Graph above illustrates the distribution of the genetic differences among the 14 populations.
The X axis shows the range in the number of differences (2.7 million – 5.5 million). The Y axis represents the number of pairs (two individuals
compared by calculating the number of genetic differences between their genomes).
Figure 2:
The Graph below showing the 14 populations consisting 4 distinct origins and lets call them 4 ancestries. 1_African , 2_Hybrid ,
3_European, 4Asian.
4
3
1000
ASW-60
CEU-84
CHB-96
CHS-99
CLM-59
FIN-93
GBR-89
IBS-13
JPT-88
LWK-96
MXL-65
PUR-54
TSI-96
YRI-87
1
900
800
700
600
500
Number Of Pairs
400
2
300
200
100
0
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4.1
4.3
4.5
4.7
4.9
5.1
5.3
5.5
Figure 3:
The three populations that have African origin, they total differences distributed close to each other. The LWK population(Luhya individuals )
showd some individual who had almost half (2.7 million – 4.8 million) the number of differences, almost all of these have been declared as
siblings and relatives.
Some of them are not declared to be relatives by the 100 Genome project so our results suggest that they might be some undeclared
relatives in the 100 genome project.
We further examined some populations for any declared relationships between any of these individuals; the relatives showed
that they have the minimum difference in their genetic variation. For example, In the LWK population as showing in the table
below, the relatives fall at the top of the list when we sorted the total differences from lowest to highest. The green highlighted
cells showing that these individuals are related to each other as been declared by the 1000 genome appendix, The ones that
are not highlighted we suggest that they are somehow relatives but they haven’t been declared by the 1000 genome project.
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
ID1_L
WK
NA193
74
NA193
52
NA194
70
NA193
97
NA194
44
NA193
34
NA193
82
NA194
53
NA194
70
NA193
31
NA193
82
NA194
53
NA193
34
NA194
69
ID2_LW
K
NA1937
3
NA1934
7
NA1944
3
NA1939
6
NA1943
4
NA1933
1
NA1938
1
NA1944
5
NA1946
9
NA1931
3
NA1938
0
NA1944
4
NA1931
3
NA1944
3
Total_LWK
differences
2756691
Siblings
2777456
Siblings
2848500
Aunt/Uncle
2871776
Siblings
3004459
Siblings
3007478
?
3070661
uncertain parent/child relationship
3077137
?
3111728
Niece/Nephew
3119208
?
3970915
Half Siblings
4106949
?
4178970
Unknown relation
4236592
Niece/Nephew
Figure 4:
CLM, PUR and MXL populations, they show a very wide distribution ranged from 3.1-4.86. what our results indicate that these
population have wide range of mixed blood. The PUR population have a second peak showing on the right side (range between
4.74-4.9 million), we expect that these individuals having different blood. More investigation on these people being conducted to
know where do they have blood from.
Figure 5:
Populations from FIN, GBR, TSI, CEU and IBS. All these population fall under European origin. The IBS population show as a really
low curve because only 13 person have been sequenced from this population.
Figure 6:
The population from Asian origin showed how they are close in their blood by having really close shape of distribution that ranged
between 3.4 million- 3.69 million.
We are more investigating the highest differences pairs (the highest differences between pairs of individuals) that
we suggest that they possibly have a different origin. We investigated the highest 40 pairs in some population
and we found that some individuals showed high difference with other individual and that were significantly
repeated. Example in the figure below
80
70
60
50
40
CLM
30
20
10
0
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
The list below is the CLM individuals that showed the highest genetic differences with each other and when we looked at
them individually we noticed that some of them have been repeated significantly more than others as it shows in the right
side list of repeats. We see that HG01551 and HG01342 has been repeated as highest difference for 20 times while
others were
repeated HG01136
2and 3 times. So we decided to investigate the possibility of these individuals having other origin.
•HG01551
4479513
•HG01365
•HG01342
•HG01551
•HG01551
•HG01551
•HG01488
•HG01366
•HG01551
•HG01342
•HG01342
•HG01377
•HG01462
•HG01551
•HG01461
•HG01342
•HG01551
•HG01551
•HG01375
•HG01551
•HG01551
•HG01389
•HG01342
•HG01551
•HG01342
•HG01551
•HG01342
•HG01440
•HG01342
•HG01342
•HG01551
•HG01551
•HG01551
•HG01551
•HG01551
•HG01390
•HG01462
•HG01551
•HG01551
•HG01551
4480834
4481529
4481637
4483529
4485279
4487693
4488647
4490996
4493212
4493218
4494064
4494414
4496682
4497146
4498051
4499694
4499713
4500523
4501432
4503181
4506393
4508562
4510222
4514486
4519187
4520380
4527415
4533004
4535490
4537772
4541901
4542804
4558088
4561600
4562418
4564478
4577349
4608288
4678948
HG01342
HG01250
HG01250
HG01375
HG01125
HG01342
HG01342
HG01259
HG01271
HG01277
HG01342
HG01390
HG01365
HG01342
HG01125
HG01148
HG01345
HG01342
HG01134
HG01495
HG01342
HG01148
HG01377
HG01134
HG01389
HG01124
HG01342
HG01275
HG01272
HG01272
HG01488
HG01461
HG01462
HG01275
HG01342
HG01342
HG01440
HG01390
HG01342
The idea was to take those repeated high difference individuals with 10 other controls from the same population
that showed average number of genetic difference within the same population , we then randomly took
individuals from other populations and calculated the genetic differences between our 10 control +2 high repeats
and the 1 control from the other populations.
The comparison below was
between 10 controls from
CLM plus the 2 high repeated
high genetic difference
(HG01551 and HG01342 ) ,
against one control individual
from YRI population(Yoruba
individuals ) “African Ancestry
“.
HG01551 and HG01342 had
the lowest difference
indicating that these two
persons might be from African
origin.
We more compared CLM controls with individual from African population(LWK) and another individual from
Asian(CHS).
The two control individuals showed lowest genetic difference against LWK control while showed highest
difference when against CHS individual . This suggest that our two individuals from CLM population are
originally belong to an African origin.
CLM - LWK
CLM - CHS
Conclusions
• Total variants showed substantial geographic differentiation,
• Total number of differences determines diverse populations that
are more geographically and ancestrally remote.
• populations are grouped by the predominant component of
ancestry: Europe (CEU, TSI, GBR, FIN and IBS), Africa (YRI,
LWK and ASW), East Asia (CHB, JPT and CHS) and the
Americas (MXL, CLM and PUR).
• Relatives within the same population have significantly less
number of genotype variations “almost half the number”
comparing to the non relatives.
• The study of human genetic variation has evolutionary
significance. It can help to understand ancient human population
migrations as well as how different human groups are
biologically related to one another.
Download