Biochemistry 3020 Experiment #1 Computers, the Web and Bioinformatics The computer is a critical tool in laboratory research, particularly in biochemistry. All major pieces of scientific equipment in research laboratories are connected to a computer to enable data collection, particularly over extended periods of time. Not only is it used for data collection and analysis as well as writing, but when connected to the Internet it can be used for searching the biochemical literature, accessing data bases for protein and nucleic acid structure, seeking research protocols and methodology, etc. In this experiment, you will be introduced and gain skills in bioinformatics. The internet is a a worldwide matrix that allows all connected computers and networks to communicate with each other. File transfer protocol (ftp) is the most widely used facility on the Internet allowing for the placement and retrieval of network data. The World Wide Web (WWW) is the most rapidly growing component of the Internet. It permits the transfer of data as pages in multimedia form including text, graphs, audio and video, linked together by hypertext pointers allowing for the retrieval of data stored on different computers in different locations. Documents used on the Web are written in Hypertext Markup Language (HTML). Accessing the documents on the Internet requires a browser which is an interface program that allows for reading hypertext documents and the display of Web pages on your computer. The most common browsers are Netscape Navigator and Internet Explorer. If you are using the university computers, the university home page will be displayed as a starting point. Off campus, it will depend on the browser you are using. Often Netscape Navigator or Internet Explorer home pages will be displayed. You may in fact have your own home page as a starting point. Every home page will have a dialogue box into which you can type text. To request a specific web page from another computer type in the web address usually in the form of http://www.~ This will get you to the specific home page which will generally display its own set of instructions for navigating through the site. You may notice on some pages that certain words are highlighted. If you click on these words, called hyperlinks, they quickly will connect you to another related page that provides specific information related to the hyperlink. Clicking on the Back button in the menu bar will take you back to the original home page. Table 1 contains some useful web addresses or websites for biochemistry. However these are just a few and such sites are continually changing, being either deleted or updated and new ones are constantly being added to the Internet. In order to effectively access or find critical sites one uses a search engine which is a searchable directory that organizes Web pages by subject or classification according to the information one types in. Some such search engines include Google, Excite HotBot, Lycos, Netscape Search, Yahoo, AltaVista etc. Surfing the net as it is commonly referred to allows you to find particular information through these search engines. The Biochemical Literature The Internet should not replace the library. It is critical that you become aware of what is in the library and how to use it. An important part of all research is to search the literature and the library, as well as other sources such as the Internet. The Internet should be considered a research tool much like any instrument in the laboratory. Research begins with the generation of an idea, looking for the answer to a question, proving a particular hypothesis, studying a particular problem etc., but this idea most often develops after extensive reading of the literature. Reading the literature gives one a clear idea of what has already been already been done and what is currently known in terms of relevant research that might pertain to the research question. Once that is known, a research direction is more easily focused. Knowledge of the literature, what has already been done and what is currently being done, allows one to design and develop experiments. Such design and development also requires knowledge of the books and journals that are available in which to find such experimental methods. Throughout the experiment the researcher/experimenter may have to refer to the literature for physical constants, known data in which to compare her results. There are many handbooks and encyclopedias that are specific to those types of constants. Biochemistry as you know, in an inter-related discipline overlapping and connecting the biological sciences, the physical sciences, and the basic medical sciences. Thus there are many textbooks, research journals, computer information retrieval services and handbooks available. Your Textbook: Your first exposure to the biochemistry literature will be your textbook for the course, a general textbook of biochemistry. There are many more advanced textbooks and ones that are specific to a particular area of biochemistry, but in the beginning stages of study a general, broad spectrum textbook is a necessary tool. This becomes your starting point. Research Journals and Methodology References Research journals are critical to biochemical research and comprise the core of biochemical literature. This is where all the current research is published. There are many journals in every discipline, some more prestigious than others, all with the intent of keeping scientists current and up to date. Many journals are now available on-line as well as on CD-ROM. The Journal of Biological Chemistry, Biochimica et Biophysica Acta, Biochemical Journal, The Journal of Biochemistry, and Biochemical and Biophysical Research Communications are among the more commonly used and prestigious of biochemistry research journals. Some of the more useful biochemical methodology publications are Analytical Biochemistry (monthly), Analytical Chemistry (monthly), Biochemical Preparations (annual), Current Protocols in Molecular Biology (two volumes updated quarterly). Some important protocol textbooks include: Laboratory Techniques in Biochemistry and Molecular Biology. T.S. Work and R.G. Burdon, (Eds)., Methods of Enzymatic Analysis. H. Bergmeyer (Ed.), Methods in Enzymology, A Practical Guide to Molecular Cloning. These methodology references are excellent sources for describing techniques and aids in designing experiments. Reference Books and Review Publications Textbooks, particularly introductory textbooks, tend not to have or have very limited specialized and detailed biochemical information. This type of information can be found in reference books which can range from general to very specialized series, the best of which are published in multi-volumes on a periodic basis. The periodic basis allows for the publication of updated and current information and generally appears in weekly, biweekly, monthly, semi-annual or annual publications. Each volume will cover a specialized area with articles written by experts in the field. The Annual Review of Biochemistry is one of the most widely used review publications. Trends in Biochemical Sciences in another useful and widely read review publication containing shorter articles. Handbooks of Chemical and Biochemical Data Handbooks of Chemical and Biochemical Data are critical for providing definitions of terms, important reference values such are Rf values, molecular weights, physical constants such as boiling points, melting points, etc. Much of this data is now on the Web but sometimes, particularly when writing a paper for a journal, the Web is not a recognized literature source simply because sometimes the information is not correct. Thus literature handbooks like the Dictionary of Biochemistry and Molecular Biology, Glossary of Biochemistry and Molecular Biology, Merck Index, Practical Handbook of Biochemistry and Molecular Biology, Worthington Enzyme Manual are more legitimate sources. Computer-Based Searches, Web Directories and Databases When you are doing a search it is often a daunting task to review all the journals and literature sources. It is easier to search the abstracts, a publication that provides brief summaries of published articles, reviews, and patents. Such abstracts include Chemical Abstracts and Biological Abstracts. Current Contents and Chemical Titles are two publications that keep up with published articles and they are published every two weeks. Both of these are published on line. There are many scientific databases online. Some of the most useful STN databases for the life sciences include BIOSIS Previews/RN, CA (chemical abstracts), MEDLINE and MEDLARS. Many of these databases can be accessed free of charge particularly if used from a institution, while others do have user fees. Databases are critical to retrieving bibliographic, nucleic acid sequence, protein sequence and structure , metabolic pathways, transcription factors, enzymes and many other types of information. The best way of collecting lists of information and tools relevant to your research is by accessing directories that collect lists of information, tools and other services. Many of these are hyperlinked to other useful sites. FASTA (used for finding protein amino acid sequences) and BLAST (used for comparing protein sequence data) RasMol or RasMac (gives coordinates for protein structure manipulation), Chime (protein structure coordinates, SWISS-MODEL (protein structure modeling), VAST (protein structure similarities, and Molecules R Us (protein structure coordinates) are some of the databases used for modeling. Table 1: Web Databases, Directories and Tools Protein Data Bank (PDB)—Protein Structures determined by X-ray and NMR http://www.rcsb.org/pdb/ European Bioinformatics Institute—DNA Sequences http://www.ebi.ac.uk/ National Center for Biotechnology Information (NCBI)— Variety of databases and resources http://www.nlm.nih.gov/ Swiss-Protein—Protein sequences and analysis http://www.expasy.ch/tools/ Biocatalysis/Biodegradation Database of the University of Minnesota—Microbial metabolism of many chemicals http://www.labmed.umn.edu/umbbd/index.html REBASE-The Restriction Enzyme Database—Restriction enzyme direction and action http://rebase.neb.com/ Georgia Institute of Technology—Tutorials on PDB and RasMol http://www.chemistry.gatech.edu/faculty/williams/bCourse_information/4582/lab s/rasmol_pdb.html The Institute for Genomic Research—Collection of genomic databases http://www.tigr.org/ RasMol (RasMac)—Molecular graphics for proteins http://www.umass.edu/microbio/rasmol/ Predict Protein—Protein sequence and structure prediction http://www.embl-heidelberg.de/predictprotein Gen Quiz—Protein function analysis based on sequence http://www.sander.ebi.dc.uk/gqsrv/submit Pedro’s Biomolecular Research Tools http://www.public.iastate.edu/~pedro/research_tools.html Biology Workbench http://biology.ncsa.uiuc.edu CMS Molecular Biology Resources http://www.sdsc.edu/ResTools/smshp.html BioTech http://bioech.icmb.utexas.edu Protocol Online http://www.protocol-online.net Chem Connection http://chemconnect.com/news/journals.html American Chemical Society http://pubs.acs.org/ Table 2: Useful Programs for Exploring Structures and Sequences BLAST Searches for similar nucleic acid and protein sequences Chime Protein structure on moving 3-dimensional coordinates Entrez (NCBI) Database of gene sequences FASTA Searches for similar protein structures GenBank (NCBI) Database of gene sequences Molecules R Us Provides coordinates for protein 3D structure and manipulation RasMol (Ras Mac) Provides coordinates for protein 3D structure and manipulation SRS (EMBL) Sequence retrieval system for cross-referencing databases Table 3: Internet Terminology Biological databases--computer sites that contain organized and stored files of information consisting of literature references, nucleic acid sequences, protein sequences and protein structures. Bookmark--a function within Netscape Navigator and other browsers that allows the user to save a Web site address for later use. Browser--An interface program such as Netscape that reads hypertext and displays Web pages on your computer. Domain—the computer user’s location or local network e-mail—means of exchanging messages or connecting over the net via computers; electronic mail favorites—form of a bookmark used in Internet Explorer Freeware—software provided free of charge by the developer and is generally able to be downloaded from the internet ftp—file transfer protocol; a mechanism of transferring files or data over the network home page—the beginning page for access to the Web. Each institute or individual will have a home page containing relevant information and often links to other relevant information or sites. HTML—HyperText Markup Language; a special coded language used to write Web pages. Hyperlink—a link or connection between web pages usually highlighted such that if you click on it, it will take you to the page. Hypertext—the language used to connect similar documents on the Web. Internet—world wide connection of computers, a matrix that allows the communication of all computers and networks. Java—a language used on the Web to allow for the incorporation of multimedia into Web pages Modem—an electronic device that allows for the connection of computers by a signal through phone lines. Multimedia—the form of media that allows for the incorporation of all types of media from text to graphics to video and audio, etc. Search engine—a searchable directory on the Web that organizes Web pages and information by category and subject classification. Server—A large mainframe computer that acts as a storage site for retrievable data. The university server, for example, contains all the data for the university and is accessible by other computers on or off campus. URL—Uniform Resource Locator; the standard address form used to identify and locate a document on the Web, usually prefaced by http:// Web site—the collection of documents or Web pages on a server WWW—World Wide Web, refered commonly to as “The Web” – the component of the Web that uses hypertext language to provide resources. The purpose of this experiment is to help you gain knowledge and experience in retrieving information from the Web. The following is a tutorial through which you will work. You will use PubMed to search for mushroom tyrosinase and the other protein databases to search for -lactalbumin. Once you have completed the tutorial and are familiar with the basics of searching the Internet for information on proteins after which you will be given protein to search and answer the questions for your report. 1. Searching the Biochemical Literature on PubMed Tutorial: 1. Log into the computer you are working on and go to the university home page. At the top of the page is the URL or university address. 2. Highlight all of the address and delete it to http:// and then type in the URL (http://www.nlm.nih.gov/) to connect you to the National Center for Biotechnology Information, also called the United States National Library of Medicine (National Institute of Health). On the left hand side there are some topics of interest such as the Human Genome Resources Library Catalogue and Services; Network of Medical Libraries; Biomedical Research and Informatics; Environmental Heath and Toxicology, etc. Clicking on any of these will take you to those sites. 3. You should see a hyperlink (highlighted link) “PubMed” on the right-hand side. Click on this with your mouse and it takes you to the Web page. On the left-hand side of the page you will see under Overview a link to the PubMed Tutorial. You should work through this first so that you have a good idea how to navigate this site. The following instructions will be pointers but it is up to you to become familiar with navigating the site and its features. 4. If you click on Entrez in the upper menu bar it will take you to the features or PubMed-the cross-data base search page. At the bottom of the page it tells you how to use the PubMed Search. The menu bar at the top give a list of searchable items and the left dialogue box allows you to search specific areas for a topic. For example, if you search PubMed for lysozyme (the enzyme you will be isolating in Project I), you will get a number of articles pertaining to that topic. Beside each article will be a highlighted “Related Articles” which when clicked on will take you to the related article. If you want to search all the data bases for a topic type it in and click on Go. The number of articles found in each database will appear as a number on the left of the database. For example, if you search for lysozyme you will see that PubMed alone has more than 19,000 articles in its database-too many to search. Thus you will have to refine your search by using Boolean operators (see #9 below). 5. Under Overview on the left hand side of the PubMed page is MEDLINE, NLM’s premier bibliographic database. Clicking on the highlighted MEDLINE will take you to the Fact Sheet. It can also be accessed through the NLM Gateway: http://gateway.nlm.nih.gov. Of interest is the Fact Sheet, “What’s the Difference Between MEDLINE and PubMed?” MEDLINE has many features but the most basic and one of current interest is the search capability. 6. If you are interested in searching the bibliography for a particular article you can enter in the dialogue box under a search term, author name, or journal name or article. For example, you may want to search lysozyme, an enzyme isolated from hen egg white and a natural component of human tears. You can choose the category you want to search under or do a general search of all the databases. For some of the search categories such as 3D domains for 3D structures, you may have to download the free software to view them. This would be better done on your own computer. Using the category box on the left or the menu bar at the top allows you to search categories quickly and also refine your search. 7. Click on “Search” once you have typed it in and more than 500 citations or articles will appear. The lists will be composed of author(s), title and reference in reverse chronological order. 8. Clicking on the author’s name (in hypertext) will allow you to retrieve the abstract of the article. The hypertext “see Related Articles” another useful and time-saving feature as it allows you to quickly be linked to other related articles without first having to search for them. Thus clicking on this will provide a list of papers related to the specific citation. 9. 500 papers is too many to view at one time and is too broad so you may reduce the number by modifying your search, making it more specific using Boolean operators and the menu categories on the left or top menu bar. Boolean Operators are uppercase terms “AND, OR, NOT” used to refine a search. They are processed from left to right. Thus parentheses should be used to nest terms so they will be processed as a unit and then incorporated into the overall search strategy. For example if you were searching lysozyme you could refine it to (human lysozyme) NOT hen egg white. To become familiar with the site: Search for the enzyme lysozyme again. Using the PubMed bibliographic searches search for the following aspects of the enzyme. After you click on Go, a number of articles will appear along with the menu bar at the top of the articles. The first box will say Display. The box next to it is the Category or Summary box; it will have a scroll arrow in it and will contain various categories under which you can display or categorize the articles, thus refining your search much like using Boolean terminology. It allows you to choose form various categories such as briefs, abstracts, citations, MEDLINE, related articles, etc. The Show button next to Display, gives the number of articles under the categories you have selected and the number of pages. Sort allows you to sort the articles alphabetically and chronologically by author, journal or Publication date. The Send To button allows you to send the search to text, file, clipboard, e-mail or order. Page tells the number of pages in the search. The number displayed in the box tells you the page number you are on and the highlighted Next takes you to the next page. Typing in a specific number will take you directly to that page rather than having to scroll through all the pages. Next to the article there is an empty box. Clicking on the box puts a checkmark in the box thus selecting it. When you have gone through all the articles you can then collate all the ones you have selected and print or save only those selected thus providing you a bibliography of your search. Under the box there is a paper icon. Clicking on this icon allows you to retrieve an abstract of the article. Clicking on the highlighted title and authors will do the same. Above the article there is a highlighted “full text” icon that allows you to download and print the article if it is available. Not all articles are available. In becoming familiar with the search features, answer the following questions. a) How many references and pages are there? b) What are the other sources of the enzyme? c) How has it been purified? d) e) f) g) h) i) j) k) Has it been expressed in other organisms such as E.coli? What is the sequence of the gene coding for it? What is the expected protein sequence? What is the metal ion present in the native enzyme? Find and cite correctly two references that study the inhibition of the enzyme. What inhibitor molecules of the enzyme have been investigated? What are other substrates for this enzyme? What is the expected extinction coefficient for this enzyme in the next experiment? l) What is its expected molecular mass, PI, pH. m) How can it be crystallized? What does its crystal structure look like? 2. Web Tools and Biological Databases Primary databases and structural analytical tools are important in protein biochemistry. In this exercise we will analyze the structure of -lactalbumin from bovine milk and compare it to human -lactalbumin. You will then be given a protein in which to analyze on your own. Tutorial: 1. Type the URL address (http://www.rcsb.org/pdb) into the domain at the top of the browser to take you to the Protein Data Bank (PDB). 2. There is much information on the PDB homepage and you should become familiar with it by clicking on some of the hyperlinks and seeing what they do and where they take you. Also familiarize yourself using the tutorial which is located above the Search box as a question mark. Click on this to access the tutorial. 3. After you have become familiar with the homepage, scroll down to the “SearchLite” under Search in the middle of the page. Clicking on SearchLite will take you to the SearchLite page. 4. Type in “human alpha lactalbumin” (or your protein of choice) in the box and click on Search. A number, in the order of seven or more, of structures should appear with white square boxes to the left of them. The key at the top defines the other symbols; the turquoise arrow, the page and the eye. Click on these to see what they do. 5. Click on the first white square box to the left (Structure 1A4V) and “EXPLORE” to the right to display the “Structure Explorer” with the “Summary Information” about the structure of the protein. If you require help, click on the “?” and a dialogue box will appear. 6. A number of functions will appear on the left side of the screen. “View Structure”- displays the “Interactive 3D Display” and “Still Images”. 7. Click on “Still Images” initially to view the structure in ribbon or cylinder form. To enlarge the structure, click on the appropriate choice (i.e. 250 x 250 or 500 x 500). The -helices and -sheets should be visible. For some of the other features you will have to download the free programs (i.e. Chime, RASMol, Swiss-Protein Viewer). This is a secure site so there should be no problems with virus contamination. 8. Once you are familiar with the structure of the protein, then you can view its rotation about it’s axis by clicking on “Chime” under the “Interactive 3D Display”. The mouse controls are listed under “Chime Help” at the bottom of the screen. 9. Under the “Summary Information” you will find other functions. Clicking on “Sequence Details” will give the amino acid structure and definition of the secondary structure for the protein. If you would like to ftp this file to yourself you can download it by clicking on “Download in FASTA format” which is the format that lists the amino acid sequences in single-letter abbreviation for each amino acid. To display a table of bond angles and lengths, click on “Geometry”. Clicking on “Structural Neighbours” will display the neighbours and their angles and lengths. 10. Other features such as “VAST” will display Sequence Neighbours and Structure Neighbours. Sequence neighbours will display sequences similar to your protein of choice while clicking on Structure Neighbours will display similar structures to your protein of choice. Clicking on “Other Sources” will display data files with references to your protein of choice. 11. Under “View Structure”, just above “Chime”, there is a hyperlink to RasMol (or RasMac) which will allow you to view the detailed structure of a protein and rotate it on its coordinates allowing you to view it from all its perspectives. RasMol instructions can be viewed under “Help” or you may want to use the RasMol Tutorial listed in the Web addresses above. 12. Swiss-Protein Viewer for which the address is given above, is another useful protein viewer. 13. BLAST, available at the NCBI (www.ncbi.nlm.nih.gov) is a commonly used protein viewer and analysis tool. Clicking on “Basic BLAST search” will bring up the dialogue box into which you can type the amino acid sequence of your choice protein. You can also do this by downloading the amino acid sequence in FASTA format into a file saved on your computer and then transferring that file into the BLAST dialogue box to get a list of proteins with similar amino acid sequence to the one you entered. Note: When doing a BLAST search, amino acids have a specific code according to the following table. Table 1: Amino acid codes for BLAST analysis A alanine B aspartate or asparagine C cystine D aspartate E glutamate F phenylalanine G glycine H histidine I isoleucine K lysine L leucine M methionine N asparagine P proline Q glutamine R arginine S serine T threonine U selenocysteine V valine W tryptophan Y tyrosine Z glutamate or glutamine X any * translation stop - gap of indeterminate length 14. Entrez is another approach to studying proteins and nucleic acids which can be accessed through the NCBI home page by clicking on “Proteins” to obtain the dialogue box and then entering your protein of choice and clicking on “Search”. This will provide you with relevant documents. You may also access BLAST through Entrez. Procedure: 1. Using the techniques outlined in the above tutorial, explore the enzyme lysozyme. View structures and look at the amino acid sequences. Provide all the information outlined in the above tutorial for your lab report. 2. Provide two recent research articles on your enzyme and correctly give the references. 3. What methods have been used to purify your protein? Briefly describe them. 4. Include the nucleotide sequence of the gene coding for your protein. Begin on the NCBI home page and enter Entrez. Click on “Nucleotides” and do a search. Review the GenBank report for the position of introns and exons and obtain a FASTA report, transfer (download) the files and complete a BLAST search for related sequences (this should cover many of the above steps outlined in the tutorial). 5. Using the BLAST tool, compare the amino acid sequences to another protein and repeat using BLAST to compare the nucleotide sequences for the genes coding for the protein. 6. Enzyme restriction digestion is critical for determining information about the structure of a protein. A commonly used restriction enzyme is HindIII. Using the REBASE site, determine the specificity of this restriction enzyme. 7. Protein separation and elucidation is often done by SDS-PAGE. Using the Web site on Biocatalysis/Biodegradation, outline the pathway for the microbial degradation of the detergent, sodium dodecyl sulfate (SDS) used to denature proteins for SDS-PAGE. References: This procedure has been adapted in part from R. Boyer, Modern Experimental Biochemistry, (2000), (3rd Ed.). Benjamin Cummings (Toronto) and the following references. Baxevanis and B. Ouellette (Eds), Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (1998), John Wiley & Sons (New York). A new introduction to computing. R. Doolittle, (Ed), Methods in Enzymology (1996), “Computer Methods for Macromolecule Sequence Analysis,” Vol. 266, Academic Press (San Diego). D. Leon, S. Uridil, and J. Miranda, J. Chem. Ed. 75, 731-734 (1998). “Structural Analysis and Modeling of Proteins on the Web.” H. Salter, Biochem, Educ. 26, 3-10 (1998). “Teaching Bioinformatics.” C. Smith, The Scientist, August 31, pp. 17-19 (1998). “Molecular Modeling.”