1 Site: Bioinformatics Center Institute for Chemical Research Kyoto University Uji Kyoto 611-0011 Japan Date: December 17, 2004 WTEC Attendees: F. Doyle (report author), S. Demir, C. Stokes Host: Prof. Minoru Kanehisa OVERVIEW The host (Prof. Minoru Kanehisa) spent a couple of hours with the group discussing primarily the KEGG database project, as well as some curricular activities in Bioinformatics. Prof. Kanehisa is the director of the Bioinformatics Center and is a former President of the Japanese Society for Bioinformatics. The Center is home to the well-known KEGG (Kyoto Encyclopedia of Genes and Genomes) database (www.genome.jp). The Center includes 6 faculty and 6 instructors in the areas of bioknowledge systems, biological information networks, pathway engineering, proteome informatics, and genome informatics. The KEGG project itself involves 30 full time researchers. Prof. Kanehisa’s group includes a total of 60 researchers at Kyoto, and an additional 12 at U. Tokyo. He is going to start a new lab in Boston. KEGG has been primarily supported from government funding sources (mostly MEXT, also Japanese Society for Promotion of Science (JSPS), and JST). The sizeable computational resources are provided by the Bioinformatics Center at Institute for Chemical Research, U. Kyoto. Operating costs for the KEGG project are $2M/yr (not including the computing). As was explained, the KEGG database is quite different from those maintained by the NCBI, notably the ability to do reconstruction, and the retrieval of network features. As with some other database projects, they are beginning to address the chemical space (metabolites, glycans, lipids, etc.). The core elements of KEGG are the GENSE database, the LIGAND database, and the PATHWAY element for network integration. The combination of both genomic space and chemical space allows both screening for target genes (e.g., disease genes), and screening for lead compounds & molecular probes, respectively. Several research projects were described that implement sophisticated statistical and computational methods for network analysis and construction. The group relies heavily on graph theoretical tools (representations of binary relationships in the form of nodes and edges). Abstractions can be formulated that allow nested graphs (nodes of one graph are graphs). They are using such tools to do predictions (KEGG orthology) using manual and automated methods for curating orthologs. The computational algorithm has produced 600K genes over 200 organisms, and the manual approach has yielded 6K genes thus far. One of the more unique studies in their group is the use of line graphs to integrate chemical and genomic spaces. This leads to a new method for chemical structure comparison that is akin to BLAST for genomic space. They are addressing inter-operability issues, and have provided both SBML and GON versions of their network models (linkable from the www site). Additional research in Prof. Kanehisa’s group includes technology approaches to systems biology, such as scale-free network analysis, and kernel methods for network inference. Two projects were discussed in detail by 2 of his graduate students (Kiyoki Aoki, and Jean-Marc Schwartz). These were, respectively, (i) glycan structure network models and structure search, and (ii) dynamic metabolic models using elementary flux modes. A recurring theme in his work, and an important one for systems biology is the value of networks in generating insight into biological behavior. Although they are static (in this case), the utility of such 2 B. Site Reports information is very high, and can lead to effective screening mechanisms for disease-related genes, lead compounds and molecular probes. There was an extended discussion of training in the area of Bioinformatics, as Prof. Kanehisa was instrumental in formulating the curriculum that has been promoted by the Japanese Society for Bioinformatics. This curriculum can be viewed at www.bic.kyoto-u.ac.jp/egis/course.html. He estimates that there are approximately 10 programs in Japan that are Bioinformatics focused, with a couple outside of universities like CBRC. In addition, they have archived video of lectures that are stored on-line using the webCT course management system. They have strong interactions with Humboldt University (Berlin) and Boston University with student internships and workshops. There was a brief discussion of industry relations for the Center, and he noted that it is primarily through the training program (education as opposed to research). There are some interactions with individual companies, but there is not an extensive contract relationship. They pressure industry to pay for KEGG database download, but compliance is problematic.