IPC Revision WG – Definition Project Rapporteur Proposal Project: A019 Group: G06F 19/10 Date : 10-11-2009 G06F 19/10 Bioinformatics --Definition statement This group covers: Methods or systems for genetic or protein related data processing in computational molecular biology. Bioinformatics methods or systems where digital data processing is inherent or implicit, but not explicitly mentioned. References relevant to classification in this group This group does not cover: In silico methods of screening virtual chemical libraries In silico or mathematical methods of creating virtual chemical libraries C40B 30/02 C40B 50/02 Informative references Attention is drawn to the following places, which may be of interest for search: Medical diagnosis A61B 5/00 B01J 19/00 Manufacture of microarrays, DNA chips C12M 1/34 B01L 7/00 PCR apparatus per se C12M 1/38 Macromolecular X-ray crystallographic or NMR structures per se C07K 14/00 Genetic engineering involving nucleic acids C12N 15/00 Chemical reactions involving the use of microarrays, DNA chips C12Q 1/68 Sequencing using PCR C12Q 1/68 Gel electrophoresis apparatus per se G01N 27/447 Sequencing using electrophoresis G01N 27/447 Sequencing using chromatography G01N 30/00 Sequencing using mass spectrometry G01N 33/68 Pattern recognition G06K 9/00 Computer input/output arrangements G06F 3/00 Computer architectures or program control G06F 9/00 Information retrieval, databases per se G06F 17/30 Computer systems using neural network models per se G06N 3/02 Computer systems using knowledge representation per se, e.g. G06N 5/02 expert systems Computer systems using probabilistic models per se G06N 7/00 Finding positions and orientations in microarray images by image processing Mass spectrometry apparatus per se G06T 7/00 H01J 49/00 Special rules of classification within this group In this group, the first place priority rule is applied, i.e. at each hierarchical level, classification is made in the first appropriate place. Glossary of terms In this group, the following terms (or expressions) are used with the meaning indicated: data mining data visualisation domain drug targeting fragment assembly functional genomics gene expression gene expression profiling gene finding genome annotation genotype genotyping haplotype homology linkage disequilibrium discovery and analysis of patterns within a vast amount of genetic or protein-related data generation and/or display of graphical representations of genetic and protein-related data domain of a protein is an element of the overall molecular structure that is self-stabilising and often folds independently of the rest of a polypeptide chain drug design strategy aiming at optimising the properties of a medicinal compound, based on the 3-dimensional structure of a target, for delivery to a particular tissue or organ in the body method by which linear portions of sequence information are assembled to obtain full length gene sequence data experimental analyses aiming at assessing the function of genes in determining traits, physiology and/or development of an organism, making use of computational and high-throughput technologies process by which proteins are made or transcribed from the instructions encoded in DNA determination of the pattern of genes expressed, i.e. transcribed, under specific circumstances or in a specific cell line method of searching genomic DNA sequences to identify open reading frames which encode proteins allocation of functions to individual genes in the genome genetic makeup or profile of an organism with respect to a trait analysis of an organism's genotype set of one or more polymorphisms (sequence variations) that may be found at a particular genetic location on the same chromosome indication of the amount of similarity between two sequences; homology determinations can include allowance for gaps, insertions, deletions and mismatches between the aligned sequences tendency of alleles located close to each other on the microarray molecular structure motif noise correction model ontology orthologue paralogue pedigree phylogenetic tree phylogeny population genetics probe design and optimisation for microarrays programming tools or database systems protein folding proteomics sequence comparison sequencing by hybridisation same chromosome to be inherited together plurality of nucleic acid probes attached to a substrate, which form an ordered pattern 2-dimensional or 3-dimensional arrangement of atoms, groups of atoms or domains in nucleic aids, proteins, peptides and amino acids specific nucleotide or amino acid sequence pattern model that accounts for non-signal data, such as for microarrays: optical noise, quality control problems and cross hybridisation classification methodology for formalising a subject’s knowledge in a structured and controlled vocabulary homologous sequence found in different species and derived from a common ancestor homologous sequence in the same organism derived from gene duplication family tree describing the occurrence of heritable traits across generations tree-like graphical representation of phylogenetic relationships reconstruction of an evolutionary development and history of a species or higher taxonomic grouping of organisms; typically represented as a phylogenetic tree; methods for creating phylogenetic trees study of genetic variation and genetic evolution of populations designing and selecting (i) optimal, highly specific probes, e.g. oligonucleotides, cDNA, fragments for hybridisation experiments with microarrays and (ii) optimal sets of probes, e.g. oligonucleotides, cDNA, to be chemically attached to a solid support to form an array computer software to assist programming procedures within bioinformatics and database systems for managing genetic/ protein-related data process by which a polypeptide chain folds into a specific 3-dimensional structure large-scale study of the functions of proteins and their interactions with other molecular entities in a biological system process of comparing nucleic or amino acid sequences, generally by a linear alignment in such a way that equivalent positions in adjacent sequences are brought into the correct alignment with each other by introducing insertions in suitable positions, in order to identify similarities and/or differences amongst the compared sequences DNA sequencing technique in which an array of short sequences of nucleotides is brought in contact with a solution of a target DNA sequence, a biochemical SNP structure alignment syntenic regions systems biology taxonomy method determines a subset of probes that bind to the target sequence and a combinatorial method is then used to reconstruct the DNA sequence from the spectrum single nucleotide polymorphism: a DNA sequence variation that involves a change in a single nucleotide and is commonly present in a part of a population form of alignment to establish structural and functional equivalences between two or more proteins based on their secondary or tertiary structures corresponding regions in a species to an observed grouping of genes in the same order and on the same chromosome in another species simulation and mathematical modelling of relationships and interactions between molecular entities in subcellular systems integrating genetic and/or proteinrelated data to describe the dynamic behaviour of, for example, protein-protein/protein-ligand interactions, regulatory networks and metabolic networks classification of organisms to show their evolutionary relationships to other organisms G06F 19/12 for modelling or simulation in systems biology --Definition statement This group covers: Simulation or mathematical modelling of relationships and interactions between molecular entities on a subcellular level, integrating genetic and/or protein-related data to describe the dynamic behaviour of protein-protein/protein-ligand interactions, regulatory or metabolic networks. Mere mention of modelling or simulation is not sufficient to classify in this group. In such cases, see the other subgroups of group G06F 19/10 following this one in the scheme. G06F 19/14 for phylogeny or evolution --Definition statement This group covers: Analysis of orthologous, paralogous, syntenic or taxonomic relationships. Generation of pedigrees and phylogenetic trees. Mere mention of evolutionary data is not sufficient to classify in this group. In such cases, see the other subgroups of group G06F 19/10 following this one in the scheme. G06F 19/16 for molecular structure --Definition statement This group covers: Structural architecture of proteins, peptides, amino acids and nucleic acids and the prediction thereof. Processes including structural alignment, protein folding, domain topology, molecular modelling, receptor-ligand modelling, docking methods, structural-functional relationships and drug targeting using structure data, as well as two- and three-dimensional structure prediction and/or analysis. The structure types include secondary, tertiary and quaternary structures. Mere mention of structural data is not sufficient to classify in this group. In such cases, see the other subgroups of group G06F 19/10 following this one in the scheme. G06F 19/18 for functional genomics or proteomics --Definition statement This group covers: Assessment of the function of genes and proteins in determining traits, physiology and/or development of an organism, making use of computational and large scale, high-throughput technologies. Genotypic-phenotypic associations, including genotyping and genome annotation, linkage disequilibrium analysis and association studies, population genetics, alternative splicing and Short Interfering RNA design (siRNA, RNAi). Binding site identification, mutagenesis analysis, protein-protein or protein-nucleic acid interactions. Mere mention of gene or protein function is not sufficient to classify in this group. In such cases, see the other subgroups of group G06F 19/10 following this one in the scheme. G06F 19/20 for hybridisation or gene expression --- Definition statement This group covers: Analysis of gene expression information. This includes microarray analysis, gel electrophoresis analysis and sequencing by hybridisation. Further covered technologies include probe design and probe optimisation, microarray normalisation, expression profiling, noise correction models, expression ratio estimation. Mere mention of hybridisation or gene expression is not sufficient to classify in this group. In such cases, see the other subgroups of group G06F 19/10 following this one in the scheme. Relationship between large subject matter areas This group does not cover base calling or sequencing methods per se. These are covered by the relevant places listed under section Informative References for group G06F 19/10. G06F 19/22 for sequence comparison involving nucleotides or amino acids --Definition statement This group covers: Comparison of sequence information, wherein the sequences are nucleic acids or amino acids. The comparisons include methods of alignment, homology identification, motif identification, SNP (Single-Nucleotide Polymorphism) discovery, haplotype identification, fragment assembly, gene finding. Mere mention of sequence data is not sufficient to classify in this group. In such cases, see the other subgroups of group G06F 19/10 following this one in the scheme. G06F 19/24 for machine learning, data mining or biostatistics --Definition statement This group covers: Discovery and/or analysis of patterns within a vast amount of genetic or protein-related data, wherein the emphasis is placed on the method of analysis and is largely independent of the type of bioinformatic data. Covered methods include bioinformatic pattern finding, knowledge discovery, rule extraction, correlation, clustering and classification. Multivariate analysis of protein or gene-related data, e.g. analysis of variances (ANOVA), principal component analysis (PCA), support vector machines (SVM). G06F 19/26 for data visualisation --Definition statement This group covers: Visual representations specifically adapted to bioinformatic data, wherein the emphasis is placed on the method of visualisation and is largely independent of the type of bioinformatic data. Visualisation of bioinformatic data specifically includes, for example, graphics generation, map display and network display. G06F 19/28 for programming tools or database systems --Definition statement This group covers: Computer software specifically adapted to assist programming procedures within bioinformatics and database systems specifically adapted for managing bioinformatic data. This includes ontologies, heterogeneous data integration, data warehousing, computing architectures.