PREDICTING PROTEIN FUNCTION BASED ON 3D RESIDUE MOTIF SEARCH: A LINEAR PROGRAMMING APPROACH LAC Fonseca1, RC. de Melo-Minardi¹, DEV. Pires1,2, F. Ferré1,2,W Meira Jr.1 and MM Santoro2 1 Department of Computer Science - Universidade Federal de Minas Gerais Department of Biochemistry and Immunology - Universidade Federal de Minas Gerais 2 According to PFam database about 20% of protein domain families remain with unknown functions and this ratio is growing since there are various genome and metagenome projects producing huge quantities of biological sequences. In this scenario, computational methods to predict function of proteins in newly sequenced genomes is very important since experimental methods to characterize protein function are expensive and labor intensive. In this work we use a linear programing approach to predict protein function(and specially enzyme function) based on 3D residue motifs (or active site) homology. We compare the proposed method to Pints which is the only competitor software with binaries available. We model the residues from the query 3D motif as points represented by the last heavy atoms (LHA) from the side chains. We generate a clique where each edge is labeled by the distance between the adjacent vertices. Thus, we want to match the edges in order to minimize the sum of the distances in the query and in the search space. We optimize this using two constraints: (1) every edge from the query graph must be matched with an edge from the search space graph and (2) each edge in the search space graph must be matched to up to one edge from the query graph. Our goal is to predict function or to classify proteins in terms of their families based on hypothetical active sites homology. We built a dataset with known active sites to test if our algorithm was able to recall proteins from the same function based on these known active sites. Thus we have two main segments in the dataset. The first is a database of proteins which have an SCOP family assignment and the former is a database of about 1,827 proteins randomly chosen PDB chains. To build our families dataset, we used CSA - Catalytic Site Atlas, which is a database that documents enzyme active sites and catalytic residues in enzymes 3D structures. Since SCOP family is related to function, we tried to retrieve active sites in proteins from the same SCOP family of the original active sites from CSA. We compared our method with the Pints algorithm and the results show that our method perform better than Pints considering metrics like AUC and accuracy. These results reveal that we were able to not only retrieve active sites but also infer a possible function to a certain protein. As future work we intend to implement a statistical analysis in order to improve our method and claim with more confidence if a certain protein really have an active site. Beyond this scope, we are now investigating whether it is possible to predict action of known drugs in proteins of pathogens for which this activity has not been described yet, through site homology, since drugs can act in active sites. For instance, we are now trying to find sites in HIV proteins that could have an homologous site with a known drug. In this example, our information about drugs come from a database called DrugBank. The DrugBank database is a bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. Supported by: CAPES, CNPq, FAPEMIG and FINEP.