Bioinformatics – Opportunities and Challenges for Mauritius

advertisement
Bioinformatics – Opportunities for Mauritius
Oveeyen Moonian1
Shakuntala Baichoo1
Yasmina Jauferally-Fakim2
Zahra Mungloo-Dilmohamud1
Sunilduth Baichoo1
1
Department of Computer Science and Engineering
2Department
of Agricultural and Food Sciences
University of Mauritius
Abstract
Traditional research experiments in molecular biology have been largely overtaken by high-throughput
methods which generate far more data in relatively shorter periods of time. Genome sequences and
gene expression studies have become crucial in understanding biological processes. This has made the
handling, analysis and storage of such data only possible with computational tools, hence the
development of the interdisciplinary field of bioinformatics which brings together the sciences of
biology, biochemistry and computers. Over the past decade or so, bioinformatics has become the
forefront of research on living organisms. The challenges presented by the scale of data produced in the
genomic and post-genomic era, have been addressed by developers of computer programs in order to
provide efficient means for data analysis and management.
A whole new realm of bioinformatics resources has become available to scientists thus allowing for
rapid discovery of new genes and proteins. All disciplines of biological sciences, including medical,
environmental, microbial and plant sciences are set to benefit from such developments. This paper
describes some of those resources and how they are being used. It also presents an overview of the
different bioinformatics organisations which are driving forces behind the rapid implementation of
facilities in this field and the ethical issues related to bioinformatics development. The paper finally
highlights opportunities for Mauritius in the field of bioinformatics.
1. Introduction
Advances in biological sciences over the past few decades have been marked by major developments in
technical methods for studying living cells and tissues more closely. Primarily, the advent of molecular
1
approaches, such as genetic engineering and DNA amplification, has revealed the complexities of
cellular interactions which determine physiological and biochemical characteristics. Such methods were
however relatively limited given that single or only a few genes could be studied at a time. Highthroughput technologies have revolutionised experimental outputs in a way that data coming out of
research activities have to be analysed with powerful computational tools. DNA sequencing, microarray
technology, DNA and protein chips, molecular markers have provided new platforms for understanding
how biological information is organised and utilised in different organisms. They have allowed an insight
into the causes of diseases, how hosts and pathogens interact, and all together depict a much more
detailed picture of living organisms.
Bioinformatics is an area where computational applications are used for interpreting biological data
mainly from sequences of DNA, RNA or proteins, and from patterns of gene expression. Determination
and comparison of protein structures have also become possible through various tools. For this purpose,
specific software and algorithms have been developed for particular uses. The field of bioinformatics has
developed very rapidly over the last decade and has become indispensible in life sciences research. It
integrates various disciplines like computer science, molecular biology and biochemistry as well as
statistics and mathematics. Data from experiments have to be captured, stored, and made easily
accessible to users. Large databases store large amount of information that can be retrieved and
queried by scientists across the world. Many tools are integrated within web-based applications.
This paper discusses bioinformatics resources and tools that are currently used and the opportunities
the area presents for Mauritius. The rest of the paper is organized as follows: Section 2 covers the
resources available to support research in the area. Section 3 discusses the initiatives taken to develop a
Bioinformatics industry in different regions. Section 4 discusses Bioinformatics initiatives on the African
continent. Section 5 draws attention to the legal and ethical issues to be handled when developing the
area of Bioinformatics. Section 6 discusses the prospects of Bioinformatics for Mauritius. Section 7
makes recommendations for Mauritius to better seize the opportunities and meet potential challenges
and concludes the discussions.
2. Bioinformatics resources worldwide
In order to facilitate ongoing research in bioinformatics, a number of resources are available to
researchers. These tools can be broadly categorized as programming tools, databases and data analysis
tools.
2
2.1
Programming Tools in Bioinformatics
The main activities in the Bioinformatics discipline consist of analyzing biological data which is composed
of the following sub-tasks:

Alignment of DNA sequences for comparison

Finding motifs within DNA sequences

Genome assembly following sequencing

Development of methods to predict the structure and/or function of newly discovered proteins and
structural RNA sequences

Clustering protein sequences into families of related sequences and the development of protein
models

Aligning similar proteins and generating phylogenetic trees to determine evolutionary relationships
Programming tools are software development supports that can be used to create bioinformatics tools.
These programming tools need to deal with a huge amount of scattered and complex information
(data/text) accurately, reliably, and effectively. Some of the programming tools can be classified as
follows:

BioJava (Biojava, 09): Biojava is an open source project that provides Java tools for processing
biological data which includes sequences manipulation features, dynamic programming, file parsers
and simple statistical routines. It contains a collection of Java programs that represent and
manipulate biological data and assist bioinformatics research. It started at EBI/Sanger (European
Bioinformatics Institute (EBI, 09)) in 1998 by Matthew Pocock and Thomas Down.

BioPerl (BioPerl, 09): BioPerl consists of Perl tools for bioinformatics and provides online resources
for modules, scripts and web links for developers of Perl-based software. It has a bioinformatics
toolkit for:

–
format conversion
–
report processing
–
data manipulation
–
sequence analyses
–
batch processing
Biopython (Biopython, 09): Biopython is also an open source project with very similar goals to
bioperl. Biopython is a set of freely available tools for biological computation written in Python. It is
3
a distributed collaborative effort to develop Python libraries and applications which address the
needs of current and future work in bioinformatics.

MATLAB Bioinformatics Toolbox: Toolboxes (e.g., bioinformatics) are comprehensive collections of
MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of
problems. The Bioinformatics Toolbox extends MATLAB to provide an integrated and extendable
software environment for genome and proteome analysis. Together, MATLAB and the
Bioinformatics Toolbox give scientists and engineers a set of computational tools to solve problems
and build applications in drug discovery, genetic engineering and biological research.

R-language for Statistical Computing [R-project, 09]: R is a free software environment for statistical
computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and
MacOS. Its bioinformatics counterpart component is Bioconductor (Bioconductor, 09). Bioconductor
provides tools for the analysis and comprehension of genomic data. The broad goals of
Bioconductor are to:
–
provide access to a wide range of powerful statistical and graphical methods for the analysis of
genomic data
–
facilitate the integration of biological metadata in the analysis of experimental data: e.g.
literature data from PubMed, annotation data from LocusLink
2.2
–
allow the rapid development of extensible, scalable, and interoperable software
–
promote high-quality and reproducible research
–
provide training in computational and statistical methods for the analysis of genomic data.
Databases for Bioinformatics
There is a very large number of databases covering a wide range of scientific data available to
researchers in bioinformatics (Zvelebil, 08). Data is highly duplicated in different databases. An
important feature of many databases is that they do not only store sequence data but they also contain
a lot of relevant non-sequence data known as annotation that can include links to related entries in
other databases, interpretation of data and relevant research citations. In addition to simply providing
information, some of the databases also provide web-based interface to programs for online analysis of
their data.
A distinction is sometimes made between databases of primary data and those that contain secondary
data derived from these primary sources. In some cases, the primary data include raw experimental
4
results such as scans of gene-expression arrays and two-dimensional proteomic gels but in many cases
they include the initial experimental interpretation e.g. nucleotide sequences. An example of database
containing primary data is SWISS-PROT for protein sequences. Examples of secondary databases are
those that contain collections of conserved protein motifs, or comparisons of multiple sequences that
give measures of sequence similarity and relatedness and are only based on data existing at that time.
Databases can be categorized as follows:

Sequence databases
Nucleotide sequence related databases include major international collaborations such as GenBank
(NCBI), EMBL-EBI Nucleotide Sequence database (EBI, 09), and DDJB DNA Data Bank of Japan. In
addition, resources that are more gene-specific with information on introns, exons, and splice sites,
as well as motifs and transcriptional regulators and sites. There are a number of different types of
DNA sequences stored in these databases, differing in the way they have been obtained and each
type provides different biological information. They are:
–
the raw genomic sequence representing the sequence of chromosomal DNA which is deposited
in GenBank (produced at National Center for Biotechnology Information (NCBI)) (NCBI, 09) and
the organism-specific DNA sequence databases
–
the cDNAs which refer to the sequences of DNA molecules that have been synthesized by
reverse transcription of mRNA molecules indicating the range of genes being expressed in the
sample used at the time of experimentation
–
Expressed Sequence Tags (ESTs) which is a partial cDNA sequence, also indicating the range of
genes being expressed in the sample used at the time of experimentation.
Protein sequence databases include the major sequence databases such as UniProtKB (UniProt, 09)
and NCBI Protein Database (NCBI-Protein, 09), both being efforts to collect information on all
protein sequences. These protein databases are often compiled from raw nucleotide sequence data.
UniProtKB is produced by analysis of all translations of the EMBL database nucleotide sequences. It
has two components, namely Swiss-Prot which is manually annotated and TrEMBL which is only
computer annotated.
In addition, a multitude of organism-specific or protein families databases have been set up thus
allowing a more structural organisation of information, for example FlyBase (Drosophila
5
melanogaster), TAIR (for Arabidopsis thaliana), VectorBase, PLASMODB (malaria), KEGG Pathway
Database which provides pathway maps based on known molecular interactions.
Most of the databases also provide analysis tools for both DNA and proteins.

Microarray databases and Gene expression databases
Microarray databases are repositories of data from microarray experiments, often accompanied by
data analysis and tools to visualize the raw image. Gene expression databases also contain
expression data collected by other experimental methods such as SAGE (Serial Analysis of Gene
Expression) and EST sequencing. The databases contain expression data and often extensive
annotation as well as techniques to visualize the numerical and statistical analysis programs. One
such database is the Stanford Microarray Database (SMD) which includes data from above 7000
microarray experiments. ArrayExpress (ArrayExpress, 09) is another repository for microarray data
which additionally includes the ArrayExpress Data Warehouse that stores gene-indexed expression
profiles from a curated subset of experiments from the database.

Protein interaction databases
Proteins have to interact with other molecules, including other proteins, to carry out their functions.
The protein interaction databases provide an understanding of the functions of the proteins and
help in building up biological networks that can be used in systems biology. There are a number of
such databases, namely:
–
the Database of Interacting Proteins (DIP) (DIP, 09) that contains information only on proteinprotein interactions
–
the Molecular INTeraction database (MINT) (MINT, 09) that contains additional information on
protein, nucleic acid, and lipid interactions
–
the Biomolecular Interaction Network Database (BIND) (BIND, 09) that describes interactions at
the atomic level for protein, DNA, and RNA
–
protein Signaling, Transcriptional Interaction and Inflammation Networks Gateway (pSTIING)
(pSTIING, 09) is a web-based application as well as an interaction database for protein-protein,
protein-anything else interactions as well as transcriptional associations.
–
Munich Information Center for Protein Sequences (MIPS) hosts a comprehensive, manually
curated databse of mammalian protein-protein interactions.
–
Proteome (Proteome, 10) is a useful reference for a list of protein interactions databases.
6

Structural databases
Structural databases include those containing information on the structure of small molecules,
carbohydrates, nucleic acids (DNA, RNA), and proteins. These are the results obtained using various
experimental techniques, using X-ray crystallography or Nuclear Magnetic Resonance (NMR). The
most common structural databases are the Structural Bioinformatics Protein Databank (RCSB, PDB)
(rcsb, 09) and the Macromolecular Structure Database (MSD) (MSD, 09) at EBI. CATH is a protein
classification of structural domains. SCOP, Structural Classification of Proteins, provides detailed
information on folds, superfamilies and families with the aim of being able to reconstruct structural
and evolutionary relationships among proteins.
2.3
Data analysis tools
A number of organisations which host databases for bioinformatics applications also provide data
analysis tools. The two main ones are the EBI and the NCBI toolboxes.
2.3.1 Toolbox at EBI
The European Bioinformatics Institute (EBI) (EBITools, 09) provides a comprehensive range of tools for
the field of bioinformatics. These are subdivided into the following categories:

Homology and Similarity Programs
The BLAST (Basic Local Alignment Search Tool) enables a researcher to compare a query sequence
(protein or nucleotide) with a database of sequences, and identify sequences that resemble the
query sequence above a certain threshold.
The Smith & Waterman algorithm is used for performing local sequence alignment; that is, for
determining similar regions between two protein sequences. Instead of looking at the total
sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes
the similarity measure.

Protein Functional Analysis
EBI provides the protein analysis application via the InterPro and InterProScan tool (InterProScan,
09). InterPro is an integrated database of predictive protein "signatures" used for the classification
and automatic annotation of proteins and genomes. It classifies sequences at superfamily, family
and subfamily levels, predicting the occurrence of functional domains, repeats and important sites.
It adds in-depth annotation, including GO (Gene Ontology) terms, to the protein signatures.
7
InterProScan tool allows a user to query his/her protein sequence against InterPro and allows for
searching the InterPro by accession number or sequence. It can be used to search for protein
repeats, motifs, biochemical function and family.

Structural Analysis
The determination of a protein's 2D/3D structure is crucial in the study of its functions. EBI provides
a set of tools for protein structure analysis and secondary structure prediction. Some of them are:
–
DaliLite: This program is used for pairwise structure comparison i.e. it compares the given
structure (first structure) to a reference structure (second structure).

–
EMSearch: This is a search tool for electron microscopy depositions.
–
MaxSprout: Allows for the reconstruction of 3D coordinates from C (alpha) trace.
–
PQS and PQS-Quick: These tools are used to search for Protein Quaternary Structure.
Sequence Analysis
Sequence analysis encompasses the use of various bioinformatics methods to determine the
biological function and/or structure of genes and the proteins they code for. Unknown structure and
function
can
be
elucidated
through
comparison
with
database
of
known
structures/sequences/functions. EBI provides a number of tools for sequence analysis, some of
which are:
–
ClustalW is a general-purpose Multiple Sequence Alignment tool for nucleotides or proteins. It
produces biologically meaningful multiple sequence alignments of divergent sequences. It
calculates the best match for the selected sequences, and lines them up so that the identities,
similarities and differences can be seen. Evolutionary relationships can be seen via viewing
Cladograms or Phylograms.
–
EMBOSS-Align contains two programs each using a different algorithm. For an alignment that
covers the whole length of both sequences, the Needle program (based on Needleman-Wunsch
algorithm (Needleman, 70)) is used. In order to find the best region of similarity between two
sequences, the Water program (based on Smith-Waterman algorithm (Waterman, 76)).
There are also a number of Gene finding tools and translation tools.
2.3.2 Tools at NCBI
8
The NCBI (NCBITools, 09) provides a comprehensive range of tools for the field of bioinformatics which
can be categorized as follows:

Nucleotide Sequence Analysis
The nucleotide sequence analysis tools at the NCBI can be summarised as follows:
–
BLAST, used for comparing gene and protein sequences against others in public databases,
comes in several forms including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized
BLASTs are also available for human, microbial, malaria, and other genomes, as well as for
vector contamination, immunoglobulins, and tentative human consensus sequences.
–
Electronic PCR allows a user to search a query DNA sequence for sequence tagged sites (STSs)
that have been used as landmarks in various types of genomic maps. It compares the query
sequence against data in NCBI's UniSTS, which is a unified, non-redundant view of STSs from a
wide range of sources.
–
Model Maker allows a user to view the sequence (mRNAs, ESTs, and gene predictions) that was
aligned to assembled genomic sequence to build a gene model. It is then possible to edit the
model by selecting or removing putative exons. The mRNA sequence and potential ORFs for the
edited model can be viewed and the mRNA sequence data saved for use in other programs.
Model Maker is accessible from sequence maps that were analyzed at NCBI and displayed in
Map Viewer.
–
ORF Finder identifies all possible ORFs in a DNA sequence by locating the standard and
alternative stop and start codons. The deduced amino acid sequences can then be used to
BLAST against GenBank.

Protein Sequence Analysis and Proteomics
BLAST programs are also available for comparing protein sequences.
–
Blink ("BLAST Link") displays the results of BLAST searches that have been carried out for every
protein sequence in the Entrez Proteins data domain.
–
CDART takes a given protein query sequence and displays the functional domains that make up
the protein and lists proteins with similar domain architectures.
–
TaxPlot is a tool for 3-way comparisons of genomes on the basis of the protein sequences they
encode. In TaxPlot, one selects a reference genome to which two other genomes are compared.
Pre-computed BLAST results are then used to plot a point for each predicted protein in the
9
reference genome, based on the best alignment with proteins in each of the two genomes being
compared.

Structural Analysis
Cn3D is a helper application for web browsers and allows a user to view 3-dimensional structures
from NCBI's Entrez retrieval service.
VAST Search is NCBI's structure-structure similarity search service. It compares 3D coordinates of a
newly determined protein structure to those in the MMDB/PDB (Molecular Modeling
Database/Protein Data Bank) database.

Genome Analysis
Entrez Genomes hosts whole genomes of over 1000 organisms. The genomes represent both
completely sequenced organisms and those for which sequencing is in progress. All three main
domains of life - bacteria, archaea, and eukaryota - are represented, as well as many viruses,
phages, viroids, plasmids, and organelles. Entrez Genomes provides graphical overviews of complete
genomes/chromosomes and the ability to explore regions of interest in progressively greater detail.
Clusters of Orthologous Groups (COGs) (a system of gene families) were delineated by comparing
protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages.
Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus
corresponds to an ancient conserved domain.

Gene Expression
Gene Expression Omnibus (GEO) provides several tools to assist with the visualization and
exploration of GEO data. Datasets may be viewed as hierarchical cluster heat maps, providing insight
into the relationships between samples and co-regulated genes.
SAGEmap provides a tool for performing statistical tests designed specifically for differential-type
analyses of Serial Analysis of Gene Expression (SAGE) data. The data include SAGE libraries
generated by individual labs as well as those generated by the Cancer Genome Anatomy Project
(CGAP), which have been submitted to GEO.
The Cancer Genome Anatomy Project (CGAP) - aims to decipher the molecular anatomy of cancer
cells. CGAP develops profiles of cancer cells by comparing gene expression in normal, precancerous,
and malignant cells from a wide variety of tissues.
10
2.3.3 Swiss Bioinformatics Institute
Expert Protein Analysis System (ExPASY) is a proteomics server of the SIB hosts a variety of proteomics
tools with structural viewer. Protein identification and characterisation can be performed using a
number of different tools that distinguish the different molecular properties of proteins such as
isoelectric point, molecular weight and amino acid composition. Similarity search as well as pattern and
profile search are also available. Also provided are a ViralZone and HAMAP for microbial proteomes.
3. Initiatives in other parts of the world
The opportunities presented by biotechnology and bioinformatics have motivated the setting up of
research nodes by most nations throughout the world. Many associations share resources and make up
regional networks. Two such associations are EMBnet (European Molecular Biology Network, RIBioNet
(Latin America) and APBioNET (Asia-Pacific).
3.1 EMBnet
EMBnet (EMBnet, 09) is a science-based group of collaborating nodes throughout Europe and a number
of nodes outside Europe. The combined expertise of the nodes allows EMBnet to provide services to the
European molecular biology community which encompasses more than can be provided by a single
node. This site gives an overview of the organization and of its members. It provides the visitors with
news of the EMBnet community and new links related to bioinformatics. It also combines the services
available on the nodes and publishes EMBnet.news, the electronic letter devoted to provide information
about what is happening at the national and special nodes.
Since its creation in 1988, EMBnet has evolved from an informal network of individuals in charge of
maintaining biological databases into the only organization world-wide bringing bioinformatics
professionals to work together to serve the expanding fields of genetics and molecular biology. Although
composed predominantly of academic nodes, EMBnet gains an important added dimension from its
industrial members. The success of EMBnet has attracted an increasing number of organizations outside
Europe to join the group. EMBnet has a tried-and-tested infrastructure to organise training courses, give
technical help and help its members to effectively interact and respond to the rapidly changing needs of
biological research in a way no single institute is able to do. In 2005 the organization created additional
types of node to allow more than one member per country. The "associated node" was born.
The following are some of the main achievements of EMBnet:
11

Development of the first complete e-learning system for teaching Bioinformatics (EMBER)

EMBnet’s compromise with Society is reflected in its active involvement in dealing with relevant
problems and diseases (AntiSARS, RBMDB, p53FamTaG)

EMBnet has pioneered use of Grid technologies in the Biosciences and has been involved in seminal
Grid projects (SWEGrid, EGEE, EMBRACE, HealthGrid, WebServices)

From very early on, EMBnet has promoted development of distributed computing services initiatives
to share workload among international servers ( (HASSLE, SRSfed, MRSfed, FedBLAST, SIMDAT)

EMBnet is committed to bringing the latest software algorithms to the user, free of charge (EGCG,
Pratt, BITS, HoxPred), and continues to develop state of the art public software (EMBOSS) and
powerful, easy to use intuitive interfaces (CINEMA, W2H, GeneDoc, WWW2GCG, TOPS, Jalview,
wEMBOSS, Jemboss, STACKpack, EMBOSSrunner, eBiotools, WebLab, UTOPIA)

EMBnet has made major contributions to supercomputing in the Life Sciences as a means to deliver
more powerful and advanced services (Bioccelerator, MPSRCH, INSECTS+MOLLUSCS)

EMBnet has contributed to the development and maintenance of advanced database systems for
the Life Sciences (SRS, Bioimage, CpGisle, CLEANUP, Webin & Seqin, GQserv, PRINTS, InterPro,
STACKdb, UniProt, NyHITS, ENSEMBL, MitoDrome, YeastBASE, MRS, MitoRes)

EMBnet was the first to come up with advanced solutions for automated database distribution using
the Internet (NDT, SynCron)

The Ping project was for a long time the only existing project giving continuous information about
network efficiency across the whole of Europe

EMBnet had the first gopher and World Wide Web servers in biology (CSC BioBox).
3.2 APBioNET
The Asia Pacific Bioinformatics Network (APBioNet) (APBioNet, 09) is a non-profit, non-governmental,
international organization. It focuses on the promotion of bioinformatics in the Asia Pacific Region. Since
1998, its mission has been to pioneer the growth and development of bioinformatics awareness,
training, education, infrastructure, resources and research amongst member countries and economies.
Its work includes the technical coordination and liaison with other international bodies such as the
EMBnet.
APBioNet has more than 20 organizational and 300 individual members from over 12 countries in the
region, and members include those from industry, academia, research, government, investors and
12
international organisations. APBioNet has coordinated or co-organised more than 20 international and
national meetings in cooperation with members in different economies. It is spearheading a number of
key bioinformatics initiatives in the region in collaboration with international organisations such as
APAN, APEC, S* Alliance and A-IMBN.
4. The African initiatives
Africa is set to take up the challenge of bringing solutions to its major problems of health and food
through the applications of biotechnology and bioinformatics. Capacity building in this area will ensure
that scientists have the right tools to address the research issues relevant to the continent. South Africa
dominates the scene with well established bioinformatics centres and where various universities have
engaged in this direction in order to ensure manpower training. The research output is evident to the
high level of activities that are currently on-going. Both Malawi and Zambia have many projects in the
health and agricultural sectors that are molecular biology based and therefore need bioinformatics to
make good progress. East Africa has several institutions engaged in the utilisation of bioinformatics
applications. ILRI, International Livestock Research Institute in Nairobi, has a state-of-the art centre
where several pathogen genomes have been sequenced. KEMRI, Kenya Medical Research Institute, is
also involved in the application of bioinformatics tools in malaria research. This institute has a longstanding support from the Welcome Trust in UK which coordinates major sequencing projects at the
Sanger Institute, Cambridge, UK. North Africa is also active in developing bioinformatics; Pasteur
Institute in Tunis has close collaborations with French research centres while working on local problems.
Similarly West Africa runs several health related projects where bioinformatics tools are widely applied.
The New Partnership for Africa’s Development (NEPAD), with the objectives of stimulating Africa’s
development by bridging existing gaps in priority sectors, has identified that the future of Africa lies in
the development of Science & Technology. In this respect, in 2003, it adopted an outline of an action
plan containing a number of flagship programme areas.
It has been recognized that investment in Biosciences can help Africa to ensure food security and better
health for its population. Flagship programmes related to biosciences have been clustered to form the
Bioscience initiative which has created four regional networks in the continent. These are:
1. Biosciences Eastern and Central Africa Network (BecA Net).
2. Southern Africa Biosciences Network (SANBio).
3. West Africa Biosciences Network (WAB Net).
13
4. North Africa Biosciences Network (NAB Net).
Each of these networks consists of a hub and a number of nodes that work towards the development of
biosciences including bioinformatics, in the respective region. These networks provide coordination and
financial support for the nodes for capacity building and development of research projects.
The BecA Net has drawn a 4-year business plan for achieving its objectives. The BecA Hub has a number
of service units among which one is for bioinformatics. The BecA Hub has a bioinformatics platform,
hosted on a High Performance Computer (HPC) platform located on the BecA, Nairobi campus, and
provides advanced computational capabilities in bioinformatics to all BecA Hub scientists to:

Uncover the wealth of biological information hidden in the mass of DNA sequences, structure,
literature and other biological data

Obtain a clearer insight into the fundamental biology of organisms

Use this information to enhance the standard of life for mankind.
The Southern Africa Biosciences Network (SANBio) is to cater for the development of biosciences and
related areas in 12 countries of the South African region, including Mauritius. The strategic objectives of
SANBio are to:
• Address Southern African problems in agriculture, health, and environment through the application
of bioscience technologies
• Use new developments in biosciences to protect the environment and conserve biodiversity in
Southern Africa
• Build and strengthen human capacity in biosciences in Southern Africa
• Promote access to affordable, world-class research facilities within Southern Africa
• Harness indigenous knowledge and technology of the Southern African people for sustainable
utilization of natural resources and wealth generation.
Due to its ability to enhance research and development in Biosciences, Bioinformatics can play an
important role to support the objectives of SANBio. It is acknowledged that in Biosciences in general,
including Bioinformatics, capacity building is an important stepping stone. A recent initiative of the
SANBio has launched a capacity building project for the training of scientists in the region in the various
applications in bioinformatics. The aim is to equip university academics and researchers with the skills to
teach and implement activities in this field. Several collaborations have been set up for this purpose with
14
the European Molecular Biology Network and with ILRI. The University of Mauritius has been selected as
the SANBio regional node for Bioinformatics capacity building (Jauferally-Fakim et al., 09).
5. Legal and ethical issues
The prospects of Bioinformatics have aroused a lot of interest and enthusiasm in the research
community and public at large. In the agricultural industry many plants have already been genetically
modified (Steve Windley, 08) to produce fruits which are resistant to pests, cold and other adverse
effects. Many benefits have been reported (Wolfenbarger L. L., and Phifer P. R., 00) due to the use of
genetically modified plants, such as reduced environmental impacts from pesticides, ease in soil
conservation, increased yield and Phytoremediation (remediation of polluted soils, sediments, surface
waters, and aquifers).
Research in bioinformatics and genetic engineering is also being carried out on human cells to find more
effective cures. MOSS Bernard, in 1996, reported that the Vaccinia virus, no longer required for
immunization against smallpox, now serves as a unique vector for expressing genes within the
cytoplasm of mammalian cells. As a research tool, recombinant vaccinia viruses are used to synthesize
and analyze the structure-function relationships of proteins, to determine the targets of humoral and
cell-mediated immunity, and to investigate the types of immune response needed for protection against
specific infectious diseases and cancer. The vaccine potential of recombinant vaccinia virus has been
realized in the form of an effective oral wild-life rabies vaccine, although no product for humans has
been licensed. A genetically altered vaccinia virus that is unable to replicate in mammalian cells and
produces diminished cytopathic effects retains the capacity for high-level gene expression and
immunogenicity while promising exceptional safety for laboratory workers and potential vaccine
recipients.
Rosenberg et al, in 2006, have reported that they have achieved Cancer Regression in Patients after
transfer of Genetically Engineered Lymphocytes. Search for which gene is responsible for which disease
is a very common topic of research in most groups. Some of the causes have already been identified and
simple tests can now determine who is prone to which disease.
One can definitely appreciate all the benefits that biotechnology and bioinformatics have for the health
sector and also as a solution to food crisis. However, researchers have raised several concerns over the
safety of genetically modified foods. Researchers are concerned about what effects might come by
interfering with the DNA of these crops. What happens to the crops? What happens to the animals and
15
the humans who eat them? Are these plants a problem now? Will they be a problem in the future? Can
the bacteria and viruses used to alter the DNA in these plants also affect the bacteria in our body? These
issues offer avenues for further research.
With the trend in the human genome project, it will be soon possible to identify the genes which are the
causes of different diseases. Simple tests can determine that one is prone to certain diseases or have
high risks of developing certain severe diseases. This raises several ethical issues about how such
information can be used. Can a parent decide to abort a child that may be at risk? Can insurance
companies decide not to insure a person with a high risk? Can a company decide to reject the job
application on the same basis? Will one want to check his/her partner’s genetic information before
getting into a relationship?
Béatrice Godard and her co-authors (Godard et al, 03) examine the professional and scientific views on
the social, ethical and legal issues that impact on genetic information and testing in insurance and
employment in Europe. For this purpose, many aspects were considered, such as the concerns of
medical geneticists, of the insurers and employers, of the public, as well as the regulatory frameworks
and unresolved issues. The work was based on debates from 47 experts from 14 European countries
invited to an international workshop organized by the European Society of Human Genetics Public and
Professional Policy Committee in Manchester, UK, 25–27 February 2000. The results stress on a need for
clear definitions of terms used in genetics, declaring the grounds on which genetic information is or is
not used, and promoting confidence between the public and the insurance industry. In Europe, there is
currently very little use of genetic information in relation to employment, but the situation should be
kept under review.
6. Prospects of Bioinformatics in Mauritius
Two of the areas impacted by bioinformatics and that are of high relevance to Mauritius are healthcare
applications and food security. However the first line of action should target education at tertiary level.
Bioinformatics has been introduced into existing programs at UoM but it is crucial that additional
resources be allocated for implementing programs and initiating research in this area.
16
6.1 Healthcare Applications
Traditional drug discovery has been through the isolation, or synthesis of molecules whose activities are
then screened through a lengthy and costly process. Pharmacokinetic properties and toxicity have to be
determined. This is being replaced by a more molecular targeting approach in which compounds are
screened in silico for their ability to bind to proteins and modifying their function. It is possible to do so
due to improved knowledge of the basis of diseases. Most large pharmaceutical firms are already
applying this technology. Drugs targets can be validated through their 3-D structure using proteomics
tools.
Molecular epidemiology of infectious diseases relies on the knowledge of their genetic variability in
order to have adequate control measures. Bacterial and protozoan genomes have become available
over the past years and the sequences can be compared with appropriate comparison tools. These
methods are more promising for vaccine development as well as finding new antibiotics. In silico
vaccinology allows the identification of appropriate binding molecules to antigenic epitopes that will
enhance an immune response in the vaccinated individual.
6.2 Food Security
Food production relies on a limited number of plant varieties which are bred for optimal yield and
agronomic characters. Major crops, like rice, have already been sequenced while other cereals’ genomes
are in the pipeline. It is estimated that genomes sequences of crops will help improve the quality of food
products and ensure adequate production in the future. Bioinformatics is promising in finding useful
genes and mapping them on the genome of both plants and animals. DNA sequence data as well as
expression patterns of genes are hopeful means of finding ways to deal with insect vectors as well as
disease causing organisms. More effective vaccines are being designed this way.
6.3 Opportunities for Mauritius
The ICT sector has been identified as one of the important pillars of the Mauritian Economy. Software
development is to play an important role in the ICT sector. This activity can be extended to include
bioinformatics software development. Mauritius can participate actively in software for data mining,
simulation and visualization tools. With the advent of Next Generation Sequencing there will be a high
demand for trained man power to work with applications in genome assembly and annotation.
Mauritius can take advantage of such prospects in outsourcing.
17
However, to seize the opportunities, Mauritius will need to invest in the required resources to support
bioinformatics activity. These include the development of the required human resources and high
performance computing facilities to support the development of databases and computing tools.
Equally important, there is an urgent need to invest in research facilities to carry out studies in the fields
of genomics and proteomics. Mauritius has a high degree of endemicity with unique terrestrial and
marine species. The country can have substantial economic prospects from studying the genomics of
these different species, in particular those with medicinal properties. A database of the genomic
information about these species would be extremely valuable.
The population of Mauritius comes from different origins, thus providing unique opportunities for
understanding the effects of genotypes on diseases. This offers interesting prospects from the genomic
perspective. Recent epidemics of both human and animal diseases in the region have resulted in severe
setbacks in the economy, thus emphasizing the need for strengthening research in the area of molecular
epidemiology of pathogens.
6.4 Bioinformatics at the UoM
In order to support the above mentioned development, academic institutions need to take the lead to
drive research and capacity building in the area. The University of Mauritius, conscious of its important
role in this development, has been proactive in initiating appropriate steps. Researchers from the
Faculties of Science, Agriculture and Engineering, have joined efforts to embark on research in the field
of bioinformatics. Among other initiatives, a Bioinformatics Computing Research Group has been set up
since 2006.
Additionally, there is an increasing number of programmes related to bioinformatics or with
bioinformatics components that are being offered both at undergraduate and postgraduate levels at the
different faculties of the University of Mauritius. New programmes with higher emphasis in
bioinformatics are in the pipeline.
Recently, the SANBio (SANBio, 09) Steering Committee approved the designation of the University of
Mauritius as a SANBio Node for capacity building in bioinformatics. Among other activities, the
University of Mauritius through the Faculty of Agriculture will be coordinating the implementation of
training programmes in bioinformatics in the SADC region under the auspices of NEPAD. Under this
18
initiative, a computer laboratory (equipped with necessary hardware and software), sponsored by
SANBio, is being set up at the University of Mauritius to support the capacity building.
7. Recommendations and Conclusion
Bioinformatics is relevant to many fields of life, namely

Basic science for understanding living systems at the molecular level.

Medicine more specifically for clinical informatics.

Agriculture and fisheries so as to improve yield and disease resistance.

Environment so as to better understand the biosphere and do biological spill clean-up.
In Mauritius, a number of institutions are concerned with bioinformatics research due to the nature of
their activities. Among others, we have the Mauritius Sugar Industry Research Institute (MSIRI), the
Mauritius Oceanographic Institute (MOI), the Food and Agricultural Research Council (FARC), the
Ministry of Agriculture, the Ministry of Health and academic institutions such as the University of
Mauritius. Development of bioinformatics at the national level requires coordination and collaboration
among these institutions.
Bioinformatics involve large amounts of data and intensive processing power. In order to support
research in this area, there is a need to increase resources for information infrastructure and build the
appropriate computing environment. Extensive training programmes in the field including hands-on to
the above-mentioned tools can kick-start research in the area of bioinformatics, and the University of
Mauritius can play a key role in this respect.
Mauritius should aim at building the necessary infrastructure to maintain bioinformatics databases for
storing and archiving local data. Such databases should be highly protected against piracy and unethical
use. Therefore access to this data should be properly controlled. However, overprotection may stifle
useful research. Currently Mauritius is equipped only with the Data Protection Act 2004. More research
should be conducted to fine tune the legal aspects of data protection and use.
The field of bioinformatics presents a number of interesting challenges and opportunities for biologists,
computer scientists, information scientists and bioinformaticians. These challenges sit at the
intersection of biology and information. Ideally, larger scale work in this broad area involves a
partnership between those with expertise in relevant foundational domains (e.g. computer scientists)
and application domains (e.g. biologists) as well as bioinformaticians to serve as a bridge.
19
The potential benefits of addressing some of the above-mentioned challenges are numerous both in
terms of improving our understanding in general of how biological systems work and in terms of
applying the knowledge to help improve health and treat diseases.
Above all, bioinformatics has brought together researchers, organisations and institutions from different
areas with the aim of strengthening collaborative output in scientific discovery.
References
[APBioNet 09] APBioNet Homepage, http://www.APBionet.org/, accessed on 17 Dec 2009
[ArrayExpress, 09]
ArrayExpress Homepage, http://www.ebi.ac.uk/microarray-as/ae/, accessed on
16 Dec 2009
[BIND, 09]
Biomolecular
Interaction
Database
Homepage,
http://www.ncbi.nlm.nih.gov/pubmed/11125103, accessed on 16 Dec 2009
[Bioconductor, 09]
Bioconductor Homepage, http://www.bioconductor.org, accessed on 16 Dec
2009
[Biojava, 09]
Biojava Homepage, http://www.biojava.org, accessed on 16 Dec 2009
[BioPerl, 09]
BioPerl Homepage, http://www.bioperl.org, accessed on 16 Dec 2009
[Biopython, 09] BioPython Wiki, http://biopython.org/wiki/Main_Page, accessed on 16 Dec 2009
[DIP, 09]
Database of Interacting Proteins Homepage, http://dip.doe-mbi.ucla.edu, accessed on
16 Dec 2009
[EBI, 09]
EBI Homepage, http://www.ebi.ac.uk, accessed on 16 Dec 2009
[EBI, 09]
EMBL-EBI Homepage, http://www.ebi.ac.uk/embl/, accessed on 16 Dec 2009
[EBITools, 09] EBI Tools Homepage, http://www.ebi.ac.uk/Tools/, accessed on 16 Dec 2009
[EMBnet 09]
EMBnet Homepage, http://www.Embnet.org/, accessed on 17 Dec 2009
Godard Béatrice, Raeburn Sandy, Pembrey Marcus, Bobrow Martin, Farndon Peter and Aymé Ségolène, ,
“Genetic information and testing in insurance and employment: technical, social and ethical issues”,
European Journal of Human Genetics (2003) 11, Suppl 2, S123–S142
[InterProScan, 09] InterProScan Sequence Search, http://www.ebi.ac.uk/Tools/InterProScan/, accessed
on 16 Dec 2009
[Jauferally-Fakim, 09]
Jauferally-Fakim Y., Puchooa D., Mumba L. “Status of Bioinformatics in Southern
Africa: Challenges and Opportunities”, EBMnet.news, vol 15, No. 3, October 2009.
20
[MINT, 09]
Molecular
INTeraction
Database
Homepage,
http://mint.bio.uniroma2.it/mint/Welcome.do, accessed on 16 Dec 2009
[Moss 96]
MOSS Bernard, 1996, “Genetically engineered poxviruses for recombinant gene
expression, vaccination, and safety” Proc. Natl. Acad. Sci. USA Vol. 93, pp. 11341-11348, October 1996
[MSD, 09]
Macromolecular Structure Database Home Page, http://www.ebi.ac.uk/msd/, accessed
on 16 Dec 2009
[NCBI, 09]
NCBI Homepage, http://www.ncbi.nlm.nih.gov, accessed on 16 Dec 2009
[NCBI-Protein, 09]
NCBI Protein Database Homepage, http://www.ncbi.nlm.nih.gov/protein/,
accessed on 16 Dec 2009
[NCBITools, 09] NCBI Tools, http://www.ncbi.nlm.nih.gov/Tools/index.html, accessed on 16 Dec 2009
[Needleman, 70] Needleman, S. B. & Wunsch, C. D. (1970). Journal of Molecular Biology. 48, 443-453.
[Proteome, 10] Proteome Homepage http://proteome.wayne.edu/PIDBL.html accessed on 11 Jan 2010
[pSTIING, 09]
protein Signaling, Transcriptional Interaction and Inflammation Networks Gateway,
http://pstiing.licr.org, accessed on 16 Dec 2009
[Rcsb,
09]
Structural
Bioinformatics
Protein
Databank
Homepage,
http://www.rcsb.org/pdb/home/home.do, accessed on 16 Dec 2009
[Rosenberg 06] Rosenberg* S. A., Morgan R. A., Dudley M. E., Wunderlich J. R., Hughes M. S., Yang J. C.,
Sherry R. M., Royal R. E., Topalian S. L., Kammula U. S., Restifo N. P., Zhili Zheng, Azam N., Christiaan R.
de Vries, Linda J. Rogers-Freezer, Sharon A. M. , , 2006, “Cancer Regression in Patients After Transfer of
Genetically Engineered Lymphocytes”, Science 6 October 2006, Vol. 314. no. 5796, pp. 126 - 129
[R-project, 09] R-Project Homepage,(http://www.r-project.org/, accessed on 16 Dec 2009
[SANBio, 06]
Southern African Network For Biosciences (SANBio) Business Plan 2006-2011”, Prepared
by SANBio Secretariat, c/o CSIR, Box 395, Pretoria 0001, Republic of South Africa, April 2006
[SANBio, 09] SANBio Home, http://www.san-bio.com/, accessed on 16 Dec 2009.
[Swiss-Prot, 09] Swiss-Prot Homepage, http://www.expasy.ch/sprot/, accessed on 16 Dec 2009
[UniProt, 09]
UniProt Homepage, http://www.uniprot.org, accessed on 16 Dec 2009
[Waterman, 76] Waterman, M. S., Smith, T. F. & Beyer, W. A. (1976). Advances in Mathematics, 20, 367387.
[Windley, 08]
Windley Steve 2008, “Genetically Modified Foods”, PureHealthMD.com, Pure Health
Corporation Fort Wayne IN USA, 2008.
21
[Wolfenbarger, 00]
Wolfenbarger L. L., and Phifer P. R., 2000, “The Ecological Risks and Benefits of
Genetically Engineered Plants.”, Science 15 December 2000, Vol. 290. no. 5499, pp. 2088 - 2093
[Zvelebil, 08] Zvelebil M., Baum J.O., “Understanding Bioinformatics”, Garland Science, ISBN 0-81534024-9, 2008
22
Download