Weizmann 2012 - Introduction to Matlab & Data Analysis Exercise #6 File handling and regular expression Tutor in charge of this HW: Anat Tzimmer e-mail for Questions: anat.tzimmer@weizmann.ac.il HW instructions: You must submit this assignment in pairs. Please name your script "hw4_your ID number1_ your IDnumber2.m". The script must contain a program description in the beginning of the program. You should submit your script (.m) file only, please do not attach any other files. Please do not use “clc” “clear” close” functions in your script. A program that crushes (from any reason) will grant their owners in a failing grade. Pay attention for “Magic numbers” and meaningful name to variables. Write your code in a readable way – please do not enter too long descriptions and remarks, too many spaces (indent your code before submitting it) do not write too long code lines. In this exercise you will read a text file which contains a list of genes and their GO terms. The Gene Ontology web site allows the user to upload a list of genes, and receive their Gene Ontology terms (into a text file as the file you are about to read). GO terms are genes annotation of functions, related pathways, cell localization and more. Each gene can have a zero to number of GO terms according to the information that is known about the gene. The purpose of this exercise is to find all genes that are related to some cell function using their GO terms . There might be several GO terms that imply the same function of a gene, for instance: Weizmann 2012 - Introduction to Matlab & Data Analysis The gene FBL has the following GO terms: GO:0006364~rRNA processing, GO:0006396~RNA processing , GO:0008033~tRNA processing. Each GO term has an accession number (starts with ‘GO:’) , and the GO term name (description). We would like to find all the relevant GO terms in all the genes according to a given list of key words. 1. You should write a function that receives 2 inputs: a. prot_goTerms_fname - the file name for reading, the file contains the list of genes and their GO terms b. key_word - a cell array specify the key words for the GO terms search I’ll use the following key words cell array to test your work: key_words={'cytoskele', 'movement', 'migration', 'microtubul'}; 2. Your function should open a file for writing the results. The name of the file should be in the format: [the first key word]_ID1_ID2.txt for example: “cytoskele_00000_11111.txt”. 3. Your function should open the file “prot_goTerms_fname” and read it. Each line is in the format of the gene name followed by all of its go terms. 4. You should look for every key word from the list in the gene GO terms using regular expression. 5. If one or more key words are found, you should print the results into the output file that you already opened. The format should be: a. One space line before the gene name b. The gene name c. Below the gene name in a new line (without a space line), the GO term accession number, a tab delimiter and the GO term name. d. If several GO terms were found you should print each GO term in a new line in the format specified above. 6. You have a results file in the Supplementary Material on the web. Your output file should be identical to the results file. Submitting a code which generates a file with a different set of results or in a different format will grant their owners in a failing grade. Use any program to find differences between your output file and the given file, make sure there are no differences. Weizmann 2012 - Introduction to Matlab & Data Analysis 7. Don’t forget to close the files (input and output) in your function.