GENOME PROJECT

advertisement
BIOINFORMATICS AND BIOLOGICAL DATABASES
Summary
Modern biology, in particular genomic research, is data and
computation intensive. In biology in general and in genomic research in
particular, it is nowadays-common practice to build up and to query
large databases of biological data. Most biological databases are
accessible through the Web. Most biological databases offer application
specific software tools for data analyses. Added value is key to the
production of a stable base and the need to link these databases
together, and to link other databases back to these resource services culture collections etc. is always stressed. As important as it is to have
links to other international collections, in this era it is indispensable to
establish national biological databases for a country's plant, animal and
human gene pools. Bioinformatics is a newly emerging interdisciplinary
research area, which may be defined as the interface between biological
and computational sciences. Thus, people working in this field in most
cases either have training in biology or computer science, and they
learn about the other field by dealing with problems or using the tools of
the other one. This scientific field deals with the computational
management of all kinds of biological information, whether it may be
about genes and their products, whole organisms or even ecological
systems. Most of the bioinformatics work that is being done can be
described as analysing biological data, although a growing number of
projects deal with the organisation of biological information.
Database support for scientific data management also needs attention.
Requirements and properties of scientific databases; data models for
statistical and scientific databases; semantic and object-oriented
modelling of application domains; statistical database query languages
and query optimisation; advanced logic languages; and case studies
such as the human genome project and earth orbiting satellite are some
topics that should be urgently included in curricula.
Modern biology, in particular genomic research, is data and
computation intensive. In biology in general and in genomic research in
particular, it is nowadays common practice to build up and to query large
databases of biological data. Most biological databases are accessible
through the Web.Most biological databases offer application specific software
tools for data analyses. Progress has been made in building controlled
vocabularies (for example, GO, MIPS, Enzyme, GenProtEC for gene
function, scop and cath for protein structure, and the US National Library of
Medicine's UMLS for disease states and clinical indications) but still much
work remains to be done. Added value is key to the production of a stable
base and the need to link these databases together,and to link other
databases back to these resource services - culture collections etc. is always
stressed. As important as it is to have links to other international collections,
in this era it is indispensible to establish national biological databases for a
country's plant, animal and human gene pools.
Bioinformatics is a newly emerging interdisciplinary research area,
which may be defined as the interface between biological and computational
sciences. Thus, people working in this field in most cases either have training
in biology or computer science, and they learn about the other field by dealing
with problems or using the tools of the other one. This scientific field deals
with the computational management of all kinds of biological information,
whether it may be about genes and their products, whole organisms or even
ecological systems. Most of the bioinformatics work that is being done can be
described as analysing biological data, although a growing number of projects
deal with the organisation of biological information.
Because of the large amount of data produced in the field of molecular
biology, most of the current bioinformatics projects deal with structural and
functional aspects of genes and proteins. Many of these projects are also
related to the Human Genome Project.
The basic steps can be summarised as follows:
First, the data produced by the thousands of research teams all over the world
are collected and organised in databases specialised for particular subjects.
Well-known examples are GDB, SWISS-PROT, GenBank, and PDB. The
latter - for example - deals with three-dimensional structures of biological
molecules.
In the next step, computational tools are needed to analyse the
collected data in the most efficient manner. For example, many
bioinformaticists are working on the prediction of the biological functions of
genes and proteins (or parts of them) based on structural data.
In recent years, many new databases storing biological information
have appeared. However, this does not have only positive effects: nowadays
many scientists complain that it gets harder to find useful information in the
resulting 'data labyrinth'. This may largely be due to the fact that the
information gets more and more scattered over an increasing number of
heterogeneous resources. To ameliorate this situation, there has been a
number of efforts to developed computational tools that integrate the
scattered information in new types of web resources. The principal idea is that
these databases should enable the scientific user to get a quick idea about
the current knowledge that has been gathered about a particular subject.
This resource contains data about human genes, their products and the
diseases in which they are involved. What's special about it is that it contains
only selected information that has been automatically extracted from a variety
of heterogeneous databases (a process similar to "data mining"). In addition,
these resource features advanced user navigation guidance that leads the
user rapidly to the wanted information, eg. the system suggests modifications
of unsuccessful queries and performs a spell check of keywords that could not
be found.
The genome mapping and sequencing projects are generating an
enormous amount of data concerning genetic expression in biological
organisms. In general, these data are poorly understood and only partially
characterised making the field ripe for "data mining." Genome Analysis is a
very broad area of study covering all computational methods for the analysis
of these data. The algorithm basis of these methodologies should be
examined with particular emphasis on their theoretical foundations (including
statistics) with respect to biological science. The analysis of macromolecular
sequence data including applications in homology searching and protein
coding region recognition are also important. Some methods utilised for such
purposes are: sequence comparison and database searching; string and
pattern matching; linguistics, and discriminant, parametric methods, such as
weight matrix (perception), neural network, and hidden-Markov approaches.
Database support for scientific data management also needs attention.
Requirements and properties of scientific databases; data models for
statistical and scientific databases; semantic and object-oriented modelling of
application domains; statistical database query languages and query
optimisation; advanced logic languages; and case studies such as the human
genome project and earth orbiting satellite are some topics that should be
urgently included in curricula.
Project: Regional genetic databases for the Turkish population
The goal: Establishment of on-line accessible databases, which include
phenotype and genotype data of the Turkish population. The objective is to
carry out genetic and health studies in order to identify the genes that cause
and influence common diseases. This will allow preparation for the
development directions of the genomic medicine and pharmacogenetics by
the development and usage of genetic information. Considering the size and
population of the country, it will be best to have regional centres reporting to
one main centre.
Long-term goal: Practical implementation of genomic medicine to public
health at a massive scale. If the data of the majority of adult population will be
included into the databases emerging in the course of the project, the impact
of genomic medicine can be monitored at the whole population level.
OBJECTIVES:
1. The achievement of a new level in Turkish health care, expressing itself in
saving on expenditure and more efficient medical assistance;
enhancement of the competitiveness of the medical services and
preparation of health care in Turkey for new developmental directions and
changes during the new century.
2. Increase in health awareness of the population through objective genotype
based risk assessment and enhancement of one's health and helping of
one's descendants via this information.
3. Increase in the international competitiveness of Turkish economy- the
implementation of the project will include the development of medical,
gene technology and existing and future research institutions'
infrastructure, as well as investments in high technology and creation of
new jobs, and the emergence of knowledge intensive products and
services
4. Development of biology, bioinformatics, biomedicine and their teaching.
The latter will ensure the required constant stream of qualified specialists
in the high technology sector.
5. Support of the integrated development of economic and administrative
areas through various possible applications of gene technology, and the
development of cooperation between different fields (gene technology,
information technology, agriculture, health care, etc.)
The main condition for the success of the project will be the existence of a
clear idea, which corresponds to the long-term interests of Turkey and its
population ie. to increase the competitiveness of the country and to create
added value for every different branch of the economy, by developing high
level science intensive gene technology and the health database for the
population.
Some important factors are:
. Basic approval and support by the government
. A relatively flexible organisation
. Developed structure and nation wide extents of primary health care, this
being the main collector of health data and tissue samples for the genome
project
. Increased educational level of the population and their support for innovative
projects
. Developed information technology and data communication infrastructure
. Relatively low labour and overhead costs
. Geographical size and logistical potential
Legal regulations
For the success of the project, it is necessary to have legal regulations, which
are based on societal agreement, trust of the nation and political will,
observing international norms of ethics and good practice. In association with
this, a Human Genes Research Act (draft) should be approved by the
government and passed on to the parliament. The objective of this act should
be to facilitate genetic research and the establishment and maintenance of a
Gene Bank, whereupon persons shall participate in the research voluntarily
and the confidentiality of the identity of gene donors shall be ensured, and to
protect persons from misuse of genetic data and discrimination based on the
structure of their DNA and genetic risks arising from them.
Ethical issues
In order to treat ethical issues that might emerge during the compilation of the
Genome Project, it might be useful to form an Ethics Committee with advisory
capacity; the members of which are experts having previously been in contact
with medical ethics and medical legislation.
Upon the implementation of the project, and during the launch of later
activities and research, pursuant to the Act, a decisive role might be given to
the Ethics Committee, which could evaluate the pertinence of carrying out
medical studies at the country level.
Relationship with the public
Regarding the relationship with the public, the Genome Project should be an
active party and should give as much information as possible to all target
groups that get into contact. Dissemination of the information should be based
on the principle that the information connected with the establishment and
functioning of the Gene Bank is public and available to everybody. The Gene
Bank database and the results of genetic tests should be accessible to
researchers and clinicians, otherwise be password protected or confidential
and services should be charged. The project should also include a strong
educational component- every human being, despite the fact whether he/she
participates in the project or not, should receive general knowledge of
genetics, either through the contribution of the media, at school, from
governmental or non-governmental education programs or directly from a
specialist.
Sources of financing
The immensity of the project requires existence of different sources of
financing, including the involvement of the state budget and private capital,
international organisations and funds such as trust funds, structural programs
and grants, and other sources. Technical support and know-how from major
software companies and IT specialists is extremely important.
Development of intellectual capital
From the standpoint of the success of the project, development of intellectual
capital is of decisive importance, ie. the development of relevant
infrastructure, management systems, know-how and technology in the
particular field(s), the creation of an organisation and the finding of motivated
people in Turkey. Regarding the realisation of the project as one entirety, it is
important to include strategic partner(s), which can guarantee an access to
the relevant know-how, financing and sales markets for such a service in the
world.
Download