Computer Project: The Matrix Market and Sparse Matrices Purpose: To learn how to get and use matrices from the Market Market and the University of Florida Sparse Matrix Collection. Also to learn about Matlab utilities for solving and displaying sparse matrices. Prerequisite: Knowledge of using Gaussian elimination to solve Ax = b (for example Sections 1.3 and 1.4 of Spence, Insel and Friedberg). Optionally, the LU factorization (for example Section 2.5 in SIF) Built-in Matlab functions used: size, nnz, spy, tic, toc, norm, lu, \, prod, subplot and colmmd. Files that need to be downloaded: mmread.m, loadmatrix.m (Matlab files), matrixmarket_project.doc (this file), matrixmarket_worksheet_1.doc, matrixmarket_worksheet_2.doc and possibly 7za.exf. You can find these at my web site www.math.sjsu.edu/~foster/math129a_projects.html Background: The Matrix Market (http://math.nist.gov/MatrixMarket/ ) and the University of Florida (UF) Sparse Matrix Collection ( http://www.cise.ufl.edu/research/sparse/matrices ) are marvelous collections of matrices that come from real world applications that, in many cases, lead to large matrices. Many of the matrices are bigger than 1000 by 1000 and indeed one matrix in the UF collection is over 5,000,000 by 5,000,000. Even though the matrices are large many of them can be solved on your home computer using Matlab. You will do so as part of this assignment. A key reason that such large problems can be quickly solved is that most of the matrices on the Matrix Market are sparse. A sparse matrix is a matrix with a high percentage of its elements equal to zero. Such matrices show up frequently in practice, especially for large problems. A matrix often characterizes interactions between various parts of a problem. In many settings one part of a problem only directly interacts with other nearby parts. The lack of direct interaction with distant parts of the problem leads to zeros in the matrix. For problems that are large there are many more distant parts of a problem than nearby parts and the matrix will have many zero entries. Such problems lead to sparse matrices. Matlab has tools that try to solve problems while maintaining as much of the zero structure as possible. This can reduce the work required by a huge amount since the portion of a calculation that refers to a zero portion of a matrix can often be skipped. (For example 0+x = x and does not require an actual addition). For example in Matlab when a matrix is identified as a “sparse” matrix, Matlab will use a different algorithm to solve Ax = b than if the matrix is “full” (that is mostly nonzero). The different algorithm will take advantage of the zeros in A, if possible. Fortunately the user of Matlab does not need to use different notation to solve Ax = b -- x = A \ b still works! In addition to the \ operator here are some of the other Matlab tools that you will use in this assignment: size(A) returns the number of rows and columns of a matrix nnz(A) returns the number of non-zero entries in a matrix spy(A) draws a picture of the location of the non-zero entries in the matrix A – the nonzeros are drawn as dots and the zeros are left blank. tic and toc – can be used to determine how long it takes to do a calculation norm(v) calculates the length or norm of a vector. This is the square root of the sum of the squares of the components of v [L,U] = lu(A) determines L and U in a L U factorization of a matrix A. If A is stored as a sparse matrix then L and U determined will also be sparse. full(A) will force Matlab to store a matrix as a full (that is not sparse) matrix prod(v) multiplies all the entries in a vector v colmmd - determines an order of the columns of a matrix that will tend to reduce “fill-in” subplot - allows plotting several subplots on the same plot condest – estimates the condition number of a matrix To do this assignment you need to be able to (1) find matrices in the Matrix Market or the UF Sparse Matrix Collection, (2) download files describing the matrices to your computer, (3) to load the matrix into Matlab and (4) to use the Matlab commands listed above to work with the matrices. We will give description of these steps in the remainder of this file. As additional help, in case you find it useful, my web site has streaming videos that describe each of the above steps in greater detail and also associated power point presentations. To download files you should go to the Matrix Market site (http://math.nist.gov/MatrixMarket/ ) or to the UF Sparse Matrix Collection (http://www.cise.ufl.edu/research/sparse/matrices ) and find the matrix that you want. At the Matrix Market site right click on the file name (for example gr_30_30.mtx.gz) in the “Download as Compressed Matrix Market file: (some file name).mtx.gz” section and save the file to your computer by choosing “save link target as” or “save target as” in your web browser. At the UF Sparse Matrix Collection right click on the “MM” entry in the download column and save the file to your computer by choosing “save link target as” or “save target as.” You will need to select the folder where you will save the file but do not change the name of the file. All files related to this project should be saved in the same folder on your computer. I want you to download and collect some information (described later) on at least two different matrices. The matrices should be square and at least 900 by 900 or larger. However, if you have a relatively new computer – less than a few years old -- you should choose larger matrices (2000 by 2000 or more). In addition One of the matrices should be a matrix which includes a description that is at least a paragraph long of the application where the matrix arises. The “NEP” collection in the Matrix Market seems to have more detailed descriptions than some of the other collections. A second matrix can be any square 900 by 900 or larger matrix that seems of interest to you. At least one of your matrices (ideally all your matrices but this is not required) must lead to an accurate solution which we will define as a solution with norm(x – xtrue) / norm(xtrue) < 10-6, say. One key to choosing matrices that have accurate solutions is to choose a matrix whose “condition number” is not too large. Calculations in Matlab start with errors of around 10-16 and if the matrix has a condition number equal to condA then it turns out that the value of norm(x – xtrue) / norm(xtrue) should be about (10-16 x condA) or less. We won’t present the reasons here. The Matrix Market (but not the UF collection) often lists the condition number of matrices in the collection. Also, unless the matrix is quite large, one can calculate (approximately) the condition number of a matrix using condest(A). Finally, as extra credit you can, if you wish, try to solve the largest matrix that is practical to solve on your computer. I will let you decide what “practically solve” means. This will depend on how long you wish to wait for a solution and also on how much memory your computer has. I expect that some of you will be able to solve 10000 by 10000 or even much larger matrices on the super computer, as the name was used 10 or 15 years ago, that is in your backpack or on your desktop. The person(s) in the class who solves the largest system will receive double extra credit. From either site the files that you download will be in compressed format. If you know how to decompress files using programs such as 7-zip, winzip, powerdesk, or winrar you can decompress the file yourself. If you decompress the file yourself make sure that the decompressed file has a “.mtx” extension (for example gr_30_30.mtx). However you do not need to decompress the files yourself -- it is easier to decompress the file using the loadmatrix.m program. Loadmatrix is a Matlab program that will automatically decompress the file (if necessary) and also convert the matrix to Matlab format. It is designed to work with files downloaded from either the Matrix Market site or those downloaded from the UF Sparse Matrix Collection site. To use the loadmatrix.m program you need to download three files – loadmatrix.m, 7za.exf, and mmread.m – from www.math.sjsu.edu/~foster/math129a_projects.html. You should store these three files and also all the files that you download from the Matrix Market and UF sites in the same folder on your computer. To use the loadmatrix program you need to open up Matlab and move to the folder where you are storing all your files for the project. In Matlab 5.3 one way to move to this folder is to select the “file” menu, the “set path” submenu, click on the “browse” button, navigate to the desired folder and click on “ok.” .” In Matlab 6 or 7 to move to the appropriate folder click on the button with three dots (called “browse for folder”) next to “current directory:” in the toolbar menu, navigate to the appropriate folder and then click “ok.” You can confirm that you are in the desired folder by typing “pwd” (print working directory) from the Matlab prompt. You can confirm that all the desired files are in the current folder by type “dir” from the Matlab prompt. Now to decompress the file and load the matrix in Malab, after you get to the proper folder in Matlab, you should type from the Matlab prompt “matixname = ‘the name of the matrix’ ” and then type “loadmatrix”. For example if you downloaded the matrix named gr_30_30 then the name of the file should be “gr_30_30.mtx.gz” (if you downloaded the file from the Matrix Market) or “gr_30_30.tar.gz” (if you downloaded the file from the UF site)” or “gr_30_30.mtx” (for example if you have decompressed the file yourself). Then at the Matlab prompt type >> matrixname = ‘gr_30_30’ >> loadmatrix Do not include the any extensions in the matrixname variable. Loadmatrix will store the matrix as a sparse matrix named A in Matlab. Also loadmatrix will construct a “true” solution xtrue and a corresponding right hand side b = A*xtrue. This true solution can be used to test the accuracy of Matlab’s solver x = A \ b. Of course, in the real world one would not know the true solution (why solve the system if you did) but we are using a true solution to test Matlab’s solver. Finally, you now have the Matrix Market matrix available in Matlab! Now you are ready to begin the mathematical part of the assignment. To do this fill out matrixmarket_worksheet_1 for one of the matrices that you selected and matrixmarket_worksheet_2 for the second. Worksheet_2 assumes that you have studied the LU factorization. If your class has not done this you can fill out worksheet_1 for both examples. If you do the extra credit fill out another copy of worksheet_1. The worksheet contains instructions and the questions that need to be answered. In working through the worksheets you may get pictures like the ones pictured below Every matrix will produce a different picture and so yours won’t be exactly like these.