MySQL/PHP Workshop http://www.trii.org/courses/mysql/ • • • • • 2 MySQL lectures 2 PHP lectures Each lecture builds on concepts taught and learned in the previous lectures. The first two lectures discuss the concept of a relational database such as MySQL and show you how to manipulate the data stored in the database from the command line. It is essential to learn this first because PHP makes use of the language of the database. The third and fourth lectures will introduce you to PHP, a serverside scripting language that allows you to interact with the MySQL database from a web browser and create fancy web pages to display the data. PHP is the go-between that fetches the data from the MySQL database and then spits it out dynamically as the nicely formatted HTML page that the browser expects. Class #1: Introduction to MySQL Manda Wilson, BIC, cBio, MSKCC Trii Training: Bioinformatics and Computational Methods • Relational databases • Database design • SQL o Creating databases o Creating tables o Selecting from, deleting, and updating tables • Exercises • You should have o Class notes o Exercise handout o MySQL Pocket Reference • Useful resource: http://dev.mysql.com/doc/ Relational Databases • A database is a collection of tables • Columns define attributes of the data o All data in a column must have the same data type • A record is stored in a row table name row First Name Nadia Madhu Ajuma Wade Helen Employees Last Name Li Charu Kinsaka Randal Clark Phone 2687 7856 4489 5257 2147 column Use a Relational Database When… • You have a very large dataset • There is redundant data o Wastes disk space o Increases errors • Information must be updated in multiple locations • Security is important o Different users can be granted different permissions • Strict enforcement of data types is important Spreadsheet Example Title A House for Mr. Biswas Midnight's Children On the Road Author VS Naipaul Salman Rushdie Jack Kerouac Borrower Sarah Phone 646.555.1234 Spreadsheet Example Title A House for Mr. Biswas Midnight's Children On the Road Author VS Naipaul Salman Rushdie Jack Kerouac Borrower Sarah Phone 646.555.1234 One Flew Over the Cuckoo's Nest Sula Villette Ken Kesey Toni Morrison Charlotte Bronte Sarah 646.555.1244 Jim 646.555.4586 Data is inconsistent! Now imagine you are designing the New York Public Library database which has tens of million books and well over a million cardholders. Database Design Entity Relationship Design Entity (“thing”, “object”) Table Attributes (describe entity) Columns Entity Instance Row Relationships between entities preserved in relationships between tables. If you are interested in learning more formal design methods look up “normalization” and/or “third normal form”. Our data Ensembl Gene ID Symbol / Name Chromo some Common Name Species ENSG00000186891.3 TNFRSF18 1 1044947 1048147 8784 9606 human Homo sapiens ENSG00000078808.4 CAB45 1 1058370 1073469 51150 9606 human Homo sapiens ENSG00000176022.1 B3GALT6 1 1073703 1076476 126792 9606 human Homo sapiens ENSG00000160087.5 UBE2J2 1 1095352 1115292 118424 9606 human Homo sapiens ENSG00000162572.4 SCNN1D 1 1123634 1133467 6339 9606 human Homo sapiens ENSG00000162576.4 MGC3047 1 1194130 1199973 84308 9606 human Homo sapiens ENSG00000175756.3 AKIP 1 1215168 1216641 54998 9606 human Homo sapiens ENSG00000131586.2 MRPL20 1 1288703 1294063 55052 9606 human Homo sapiens ENSG00000179403.2 WARP 1 1322311 1327547 64856 9606 human Homo sapiens ENSG00000160072.5 ATAD3B 1 1358611 1396091 83858 9606 human Homo sapiens ENSG00000008128.5 CDC2L2 1 1582617 1604060 985 9606 human Homo sapiens ENSG00000169911.4 SLC35E2 1 1611978 1625728 9906 9606 human Homo sapiens ENSG00000008130.3 FLJ13052 1 1630975 1659805 65220 9606 human Homo sapiens ENSG00000078369.3 GNB1 1 1665027 1770792 2782 9606 human Homo sapiens ENSMUSG00000041954.1 TNFRSF18 4 154139702 154142251 21936 10090 mouse Mus musculus mouse Mus musculus ENSMUSG00000023286.1 Start Position (bp) End Position (bp) LocusLink ID Taxonomy ID UBE2J2 154057210 1540722964 140499 10090 What entities or “objects” are 4defined here? Is there any redundant data? What happens if we want to add another species attribute (e.g. genus)? Our tables QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. How do we know which organism a gene belongs to? Do we have unique identifiers? Can the organism have more than one gene? Can the gene have more than one organism? Our tables QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTi me™ and a T IFF (LZW ) decompressor are needed to see t his pict ure. Means each gene has “one and only one” organism, but that each organism can have more than one gene. This is an example of an entity relationship diagram. Our tables QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. • A primary key is a unique identifier for a record in a table. • A foreign key in one table refers to the primary key of another. • The gene table has a foreign key, Organism_ID. Each record in the gene table will have an organism id to link it to the correct species record in the organism table. Database Design Caveat • Sometimes design is “sacrificed” for speed. Data Types float integer tinyint varchar(size) o stores strings o size can be between 0 - 255, inclusive • datetime What data types should our attributes (columns) be? • + more • • • • QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Complete Design Database name: ensmartdb QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Column gene_id ensembl_gene_id organism_id name locuslink chromosome chromo_start chromo_end description Column organism_id taxonomy_id common_name species Gene Data Type integer varchar(50) integer varchar(35) varchar(10) tinyint integer integer varchar(255) Organism Data Type integer integer varchar(35) varchar(35) Connecting to MySQL from the Command Line mysql -uusername -p Example: >mysql -uroot To EXIT MySQL: EXIT; Basic SQL Commands • SQL statements end with a semicolon • View databases SHOW DATABASES; Importing a Database • Creating a database CREATE DATABASE trii; • From the command line: mysql -uusername -ppassword databasename < filename.sql • Example: o mysql -uroot trii < trii.sql Basic SQL Commands • Use database databasename USE databasename; • Display all tables in a database SHOW TABLES; Create Table database name CREATE TABLE organism ( organism_id INTEGER NOT NULL AUTO_INCREMENT, column taxonomy_id INTEGER NOT NULL, names common_name VARCHAR(35) NOT NULL, species VARCHAR(35) NOT NULL, PRIMARY KEY (organism_id), UNIQUE (taxonomy_id) ); View column details for a table DESC tablename; Selecting all data SELECT * FROM tablename; Select only the columns you need be faster) SELECT common_name, species FROM organism; (it will Limiting your data • Get the species name for a specific organism (you have the id) SELECT species FROM organism WHERE organism_id=1; How do we select all the gene names for chromosome 1? Insert • Inserting a gene INSERT INTO gene (ensembl_gene_id, organism_id, name, chromosome, chromo_start, chromo_end) VALUES (‘MY_NEW_GENE’, 1, ‘MY GENE’, 1, 12345, 67890); • Get the id of the gene: SELECT gene_id FROM gene WHERE name='MY GENE'; Delete/Update • Deleting a gene DELETE FROM gene WHERE gene_id=19; • Updating a gene UPDATE gene SET name=‘NEW NAME’ WHERE name=‘AKIP’; Table Joins • Sometimes you want data from more than one table. To do this we join the tables. The result is a new (temporary) table. Cross Join • SELECT gene.name, organism.species FROM gene, organism; Note: There are only two records in the gene table with the name “TNFRSF18”. One is a mouse gene and the other is a human gene. What do you think happened? Cross Join (Continued) Each row of the gene table was joined with every row in the organism table. Cross Join (Continued) • We want to use the organism id to match a gene record with the correct organism record so that we get: Remember there are two gene records with the name “TNFRSF18”. Cross Join (Continued) SELECT gene.name, organism.species FROM gene, organism WHERE gene.organism_id=organism.organism_id; Notice that we have 18 rows and that there are 18 rows in the gene table. Class #2: Introduction to MySQL (Continued) • • • • • • Table Joins Where clause (continued) Aggregate Functions Backing up your database Indexes (if time) Importing large datasets (if time)