BIOINFORMATICS AND BIOLOGICAL DATABASES Summary Modern biology, in particular genomic research, is data and computation intensive. In biology in general and in genomic research in particular, it is nowadays-common practice to build up and to query large databases of biological data. Most biological databases are accessible through the Web. Most biological databases offer application specific software tools for data analyses. Added value is key to the production of a stable base and the need to link these databases together, and to link other databases back to these resource services culture collections etc. is always stressed. As important as it is to have links to other international collections, in this era it is indispensable to establish national biological databases for a country's plant, animal and human gene pools. Bioinformatics is a newly emerging interdisciplinary research area, which may be defined as the interface between biological and computational sciences. Thus, people working in this field in most cases either have training in biology or computer science, and they learn about the other field by dealing with problems or using the tools of the other one. This scientific field deals with the computational management of all kinds of biological information, whether it may be about genes and their products, whole organisms or even ecological systems. Most of the bioinformatics work that is being done can be described as analysing biological data, although a growing number of projects deal with the organisation of biological information. Database support for scientific data management also needs attention. Requirements and properties of scientific databases; data models for statistical and scientific databases; semantic and object-oriented modelling of application domains; statistical database query languages and query optimisation; advanced logic languages; and case studies such as the human genome project and earth orbiting satellite are some topics that should be urgently included in curricula. Modern biology, in particular genomic research, is data and computation intensive. In biology in general and in genomic research in particular, it is nowadays common practice to build up and to query large databases of biological data. Most biological databases are accessible through the Web.Most biological databases offer application specific software tools for data analyses. Progress has been made in building controlled vocabularies (for example, GO, MIPS, Enzyme, GenProtEC for gene function, scop and cath for protein structure, and the US National Library of Medicine's UMLS for disease states and clinical indications) but still much work remains to be done. Added value is key to the production of a stable base and the need to link these databases together,and to link other databases back to these resource services - culture collections etc. is always stressed. As important as it is to have links to other international collections, in this era it is indispensible to establish national biological databases for a country's plant, animal and human gene pools. Bioinformatics is a newly emerging interdisciplinary research area, which may be defined as the interface between biological and computational sciences. Thus, people working in this field in most cases either have training in biology or computer science, and they learn about the other field by dealing with problems or using the tools of the other one. This scientific field deals with the computational management of all kinds of biological information, whether it may be about genes and their products, whole organisms or even ecological systems. Most of the bioinformatics work that is being done can be described as analysing biological data, although a growing number of projects deal with the organisation of biological information. Because of the large amount of data produced in the field of molecular biology, most of the current bioinformatics projects deal with structural and functional aspects of genes and proteins. Many of these projects are also related to the Human Genome Project. The basic steps can be summarised as follows: First, the data produced by the thousands of research teams all over the world are collected and organised in databases specialised for particular subjects. Well-known examples are GDB, SWISS-PROT, GenBank, and PDB. The latter - for example - deals with three-dimensional structures of biological molecules. In the next step, computational tools are needed to analyse the collected data in the most efficient manner. For example, many bioinformaticists are working on the prediction of the biological functions of genes and proteins (or parts of them) based on structural data. In recent years, many new databases storing biological information have appeared. However, this does not have only positive effects: nowadays many scientists complain that it gets harder to find useful information in the resulting 'data labyrinth'. This may largely be due to the fact that the information gets more and more scattered over an increasing number of heterogeneous resources. To ameliorate this situation, there has been a number of efforts to developed computational tools that integrate the scattered information in new types of web resources. The principal idea is that these databases should enable the scientific user to get a quick idea about the current knowledge that has been gathered about a particular subject. This resource contains data about human genes, their products and the diseases in which they are involved. What's special about it is that it contains only selected information that has been automatically extracted from a variety of heterogeneous databases (a process similar to "data mining"). In addition, these resource features advanced user navigation guidance that leads the user rapidly to the wanted information, eg. the system suggests modifications of unsuccessful queries and performs a spell check of keywords that could not be found. The genome mapping and sequencing projects are generating an enormous amount of data concerning genetic expression in biological organisms. In general, these data are poorly understood and only partially characterised making the field ripe for "data mining." Genome Analysis is a very broad area of study covering all computational methods for the analysis of these data. The algorithm basis of these methodologies should be examined with particular emphasis on their theoretical foundations (including statistics) with respect to biological science. The analysis of macromolecular sequence data including applications in homology searching and protein coding region recognition are also important. Some methods utilised for such purposes are: sequence comparison and database searching; string and pattern matching; linguistics, and discriminant, parametric methods, such as weight matrix (perception), neural network, and hidden-Markov approaches. Database support for scientific data management also needs attention. Requirements and properties of scientific databases; data models for statistical and scientific databases; semantic and object-oriented modelling of application domains; statistical database query languages and query optimisation; advanced logic languages; and case studies such as the human genome project and earth orbiting satellite are some topics that should be urgently included in curricula. Project: Regional genetic databases for the Turkish population The goal: Establishment of on-line accessible databases, which include phenotype and genotype data of the Turkish population. The objective is to carry out genetic and health studies in order to identify the genes that cause and influence common diseases. This will allow preparation for the development directions of the genomic medicine and pharmacogenetics by the development and usage of genetic information. Considering the size and population of the country, it will be best to have regional centres reporting to one main centre. Long-term goal: Practical implementation of genomic medicine to public health at a massive scale. If the data of the majority of adult population will be included into the databases emerging in the course of the project, the impact of genomic medicine can be monitored at the whole population level. OBJECTIVES: 1. The achievement of a new level in Turkish health care, expressing itself in saving on expenditure and more efficient medical assistance; enhancement of the competitiveness of the medical services and preparation of health care in Turkey for new developmental directions and changes during the new century. 2. Increase in health awareness of the population through objective genotype based risk assessment and enhancement of one's health and helping of one's descendants via this information. 3. Increase in the international competitiveness of Turkish economy- the implementation of the project will include the development of medical, gene technology and existing and future research institutions' infrastructure, as well as investments in high technology and creation of new jobs, and the emergence of knowledge intensive products and services 4. Development of biology, bioinformatics, biomedicine and their teaching. The latter will ensure the required constant stream of qualified specialists in the high technology sector. 5. Support of the integrated development of economic and administrative areas through various possible applications of gene technology, and the development of cooperation between different fields (gene technology, information technology, agriculture, health care, etc.) The main condition for the success of the project will be the existence of a clear idea, which corresponds to the long-term interests of Turkey and its population ie. to increase the competitiveness of the country and to create added value for every different branch of the economy, by developing high level science intensive gene technology and the health database for the population. Some important factors are: . Basic approval and support by the government . A relatively flexible organisation . Developed structure and nation wide extents of primary health care, this being the main collector of health data and tissue samples for the genome project . Increased educational level of the population and their support for innovative projects . Developed information technology and data communication infrastructure . Relatively low labour and overhead costs . Geographical size and logistical potential Legal regulations For the success of the project, it is necessary to have legal regulations, which are based on societal agreement, trust of the nation and political will, observing international norms of ethics and good practice. In association with this, a Human Genes Research Act (draft) should be approved by the government and passed on to the parliament. The objective of this act should be to facilitate genetic research and the establishment and maintenance of a Gene Bank, whereupon persons shall participate in the research voluntarily and the confidentiality of the identity of gene donors shall be ensured, and to protect persons from misuse of genetic data and discrimination based on the structure of their DNA and genetic risks arising from them. Ethical issues In order to treat ethical issues that might emerge during the compilation of the Genome Project, it might be useful to form an Ethics Committee with advisory capacity; the members of which are experts having previously been in contact with medical ethics and medical legislation. Upon the implementation of the project, and during the launch of later activities and research, pursuant to the Act, a decisive role might be given to the Ethics Committee, which could evaluate the pertinence of carrying out medical studies at the country level. Relationship with the public Regarding the relationship with the public, the Genome Project should be an active party and should give as much information as possible to all target groups that get into contact. Dissemination of the information should be based on the principle that the information connected with the establishment and functioning of the Gene Bank is public and available to everybody. The Gene Bank database and the results of genetic tests should be accessible to researchers and clinicians, otherwise be password protected or confidential and services should be charged. The project should also include a strong educational component- every human being, despite the fact whether he/she participates in the project or not, should receive general knowledge of genetics, either through the contribution of the media, at school, from governmental or non-governmental education programs or directly from a specialist. Sources of financing The immensity of the project requires existence of different sources of financing, including the involvement of the state budget and private capital, international organisations and funds such as trust funds, structural programs and grants, and other sources. Technical support and know-how from major software companies and IT specialists is extremely important. Development of intellectual capital From the standpoint of the success of the project, development of intellectual capital is of decisive importance, ie. the development of relevant infrastructure, management systems, know-how and technology in the particular field(s), the creation of an organisation and the finding of motivated people in Turkey. Regarding the realisation of the project as one entirety, it is important to include strategic partner(s), which can guarantee an access to the relevant know-how, financing and sales markets for such a service in the world.