Introduction to Linux and PC Cluster October 5, 2010 Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University Outline - Linux Introduction to Linux History of UNIX and Linux Login, logout and changing the password Basic Linux command Linux hierarchical file system Linux shell environment Editors: vi, pico, emacs, joe, nano Basic shell scripts Compiling, link and run C, C++, Fortran programs Foreground and Background jobs File transfer from other PCs in different platform Linux distributions Outline – PC Cluster Introduction to the PC Clusters What is a PC cluster The different kinds of PC clusters High Performance Computing (HPC) cluster vs Single System Image (SSI) cluster How to build your own PC cluster Introduction to existing HPC cluster in Faculty of Science, HKBU Introduction to UNIX/Linux UNIX/Linux are multi-tasking, multi-user OS. UNIX is originated from UNICS and MULTICS, 1969. time sharing environment UNIX/Linux commands are reusable and compact. hierarchical file systems with easy-to-manage file permission scheme In 1991, Linus Torvalds, released the first version of Linux kernel on PCs. Linux is an open source system, it grew to be a powerful and competitive operating system in PCs, MACs and even some brand name workstations. History of UNIX / Linux 1969 Unics, by Ken Thompson at Bell Laboratory, runs on Digital Equipment PDP-7. Multics, developed by Bell, MIT and General Electric 1970 Unics moved to PDP-11/20 Ritchie designed and wrote first C compiler for UNIX 1973 Ritchie and Thompson rewrote UNIX kernel in C 1975 Sixth Edition (V6) was released 1978 first version of BSD was built by Bill Joy, University of California, Berkeley (UCB). 1979 Seventh Edition (V7) was released and implemented on DEC PDP-11, the Interdata 8/32, and the VAX. first VAX version of BSD (3BSD) was released 1980 Bill ported the 32V version of UNIX to DEC's VAX machine 4BSD was released 1981 4.1BSD was released 1983 System V developed by AT&T based on V7 was first released. 4.2BSD was released 1984 AT&T start market UNIX hardware and software. System V release 2 X was developed by MIT as part of Project Athena 1986 System V release 3 1987 4.3BSD was released 1988 BSD Networking Release 1 X Consortium was formed. The aim is to formulate the generally accepted standard in X 1989 System V release 4 (SVR4) largely written by SUN Microsystems, included many features in BSD 1990 AT&T established UNIX System Laboratory (USL) for marketing System V and handle license and further development History of UNIX/Linux 1991 BSD Networking Release 2, led to development of 386BSD 1991 Linus Torvalds released Linux version 0.2 1993 USL was acquired by Novell. Novell gave the UNIX trademark to X/Open. Novell add NetWare support to System V 1993 Slackware, the oldest linux distribution was first released. Debian project was established. 1994 Linux kernel version 1.0 released. RedHat and SUSE published version 1.0 of their Linux distributions 1995 Linux was ported to DEC alpha and to Sun SPARC 1996 Linux Kernel 2.0 was released. It supported multi-CPU. 1998 Major companies like IBM, Compaq and Oracle announce their support to Linux. The Graphical Environment KDE began development. 1999 The Graphical Environment GNOME began development. 2003 Linux kernel 2.6 released on 18 December, 2003. 2004 XFree86 team split up and joined the existing X Windows standards body to form the X Org Foundation. 2005 The project OpenSUSE began a free distribution from Novell's community. The OpenOffice.org project introduced version 2.0. What is Linux When Linus Torvalds was still a student in Helsinki University, he developed his hobby to Minix, a small UNIX system and decided to develop a system that exceeded the Minix standards. He began his work in 1991 when he released version 0.02 and worked steadily until 1994 when version 1.0 of the Linux Kernel was released. The current full-featured version is 2.6 (released 18 December 2003) and development continues. Since Linus only developed the Linux kernel, to make Linux a popular operating system nowadays, the contribution of GNU software paid an important role. The GNU project was started in 1984 by Richard Stallman who would like to develop free software. The decision of Linux development under GNU public license accelerated the growth of GNU project after Linux was released in 1991. Some GNU software even develop on Linux platform first before it will be ported to other platform. At the same year, the Internet grew and became a solid ground for collaborating work by volunteers all over the world. Many distribution of Linux was released on different hardware such as PCs, PowerPC, Macintosh and even brand name UNIX. Though Linux kernel is free and open source, these distribution may not be free since the software packaged may include some commercial software. Login and Logout To login to UNIX/LINUX system, you have to find a terminal. There are two kinds of terminal, namely ASCII terminal and graphical terminal. In ASCII terminal, command-line input are supported while in graphical terminal, users can input their command by mouse and keyboard and it also support graphical display. Once you find an ASCII terminal, a login prompt like the following can be found. Fedora release 13 (Goddard) kernel 2.6.33.3-85.fc13.x86_64 on an x86_64 (tty3) cf8200-07 login: Basic Linux commands – working with files & directories cat cat f1 f2 type the content of file f1 and f2 cd cd $HOME change to home directory cp cp f1 f2 dir1 copy file f1 and f2 into directory dir1 ls ls -la list all files (including hidden) in long format mkdir mkdir abc make new directory more more a1 a2 list out files a1, a2 in pages mv mv f1 dir1 move/rename file f1 into dir1 pwd pwd display current working directory rm rm -rf lab1 delete all files in lab1 without confirmation rmdir rmdir lab1 delete an empty directory Basic Linux commands – working in the shell (1/2) cal cal 11 2010 display the calendar of November, 2010 compress compress file1 form a compress file file1.Z date date display the current time and date df df List information of space used in the system diff diff f1 f2 compare text between two files du du summarized disk usage of your home directory find find ./ -name .cshrc -print search and print the file .cshrc grep grep student * search all files with the word student history history 50 find the last 50 commands stored in the shell kill kill -9 2036 terminate the process with pid 2036 Basic Linux commands – working in the shell (2/2) logout logout leave the systems lpr lpr -h f1 f2 print f1, f2 without header page man man tar displaying the manual page on-line, e.g. tar nohup nohup matlab < a & run matlab (a.m) without hang up after logout ps ps -ef find out all process run in the systems sort sort -r -n studno sort studno in reverse numerical order tar tar cvf abc.tar abc/ create archive file uncompress uncompress file1.Z the opposite of compress wc wc -l f1 count the number of lines in f1 who who who is on-line whoami whoami identify yourself Linux file system Linux is a file-oriented system. In Linux, files can be regular files, directories or special files such as devices, sockets. A hierarchical directory structure similar to an inverted tree can be found. / (root) usr lib users tmp bin staff student visitor var dev sbin null /dev/null guest gu09 /users/staff/guest/gu09 File Files are identified by their file names, File names are up to 255 character long. Hidden files are files with name preceding with dot (.). Each file in UNIX/Linux has its own ownership and permissions which can be shown by listing the directory content in long format (ls -l). The following show a file, stafflist, 34 bytes in size, which last modified on 19/09/97. It is owned by a user called morris which is a staff of the Dean's Office. The ownership can be changed by the command chown and chgrp, chown cwyeung stafflist; chgrp math_stf stafflist The first field in the above example represents the permission bits of the files. -rw-rw-r-- 1 morris dean 34 Sep 19 1997 stafflist The first column shows its kind, `d' represent a directory, `-' represent a regular file. The rest can be divided into 3 groups showing its user permission, group permission and other permission respectively. Each group can have read (r), write (w) and/or executable(x) permission bits. A `-' deny the corresponding permission of the file. Refer to the last example, stafflist is a regular file which can be updated (rw- in user bit and group bit) by morris and dean staff. It can be read by other users (r-- in other bit). Unfortunately, the file cannot be executed by any body since a `-' is found in each executable bit. One can change the permission bit by using chmod, two methods can be used. use u,g,o,a flag with +, - to add or delete their permission Use 3 octal numbers calculated using 4 for `r', 2 for `w' and 1 for `x' chmod g-rw,o-r stafflist - deny rw permission for users in same group - deny r permission for other users chmod a+x stafflist - add executable permission to all users chmod 700 stafflist - same effect as the above Use ls -l to check the result. Path To locate a file, one should use the absolute path or relative path. Absolute path is the path describe starting from root (/). /users/staff/guest/gu01/sampledir/sample.txt Relative path is the path describe from the current working directory(.). sampledir/sample.txt refer to the same file when gu01's current working directory is /users/staff/guest/gu01. Use pwd to find the current working directory. In path definition, Current directory can be described by `.'. Parent directory can be described by `..'. Home directory can be described by `~' or the environmental variable '$HOME'. Linux shell environment Shell is the front end for users to interact with the Linux kernel. Commands can be typed in from the shell prompt to do file manipulation Different shells can be found in Linux. The most common shells are file copying, renaming and deleting, start an text editor or compile and run a program, etc. Bourne Again Shell (bash), Bourne (sh, old and standard), Korn (ksh, the default), C (csh, C like command) shell. These shell support both foreground and background processes, pipes, filters and other standard features in Linux. Besides handling Linux commands, these shells support the executions of batch files called shell scripts. The default shell prompt for the Bourne again, Bourne and Korn shells are ($) and that for the C shell is (%). A typical command line have the following syntax, command [-options] arg1 arg2 arg3 ... where arg1, arg2, arg3, etc. are argument input based on the nature of the commands. Built-in command are interpreted directly. If the command contains a path, the shell will only search for the command in the path. If no path is declared, the shell will find in the search path ($PATH) for the command. Linux editor The most frequently used program in Linux is an editor. A good choice of editor to suit your need is crucial to most program developer. Common editor in Linux are, vi (standard Linux full-screen editor) emacs (macro reach) pico/nano (command driven full-screen editor) joe (word star like editor) Since all UNIX systems have installed vi editor, UNIX experts learn vi. Emacs editor are reached in macro for formatting text. Therefore, it is good for program developer to write code in different programming languages. Pico, nano and joe editor support full screen and cursor editing. They are good for novices. X-window editors are editors which support window and mouse editing. Xemacs and gedit are two examples. Shell script examples (1/3) A bash shell script (CheckTemp.sh) for reporting high temperature given an input. #!/bin/bash high=33 if [ $1 -ge $high ]; then echo "*** High temperature signal!" else echo "Normal temperature!" fi Shell script examples (2/3) A bash script (hosts.sh) for setting up hostnames and IP tables for 256 nodes in a cluster #!/bin/bash for i in $(seq 0 255) do k=`expr $i / 16 + 1` l=`expr $i % 16 + 1` echo "compute-0-$i 10.1.$k.$l" done Shell script examples (3/3) A csh script (fingerall.sh) for listing the finger information of all users in the linux workstation #!/bin/csh set username = `cat /etc/passwd |awk –F':' '{print $1}'` foreach i ( $username ) echo $i finger $i end Compiling, link and run C, C++, Fortran programs Compiling C programs Compiling Fortran programs cc [-o a.exe] a.c Without -o option, the executable file will be named as a.out. f77 [-o t1] a.f the name of the executable can be set freely. Compiling C++ programs g++ [-o t1] a.C Background Jobs Program with long running time should be placed in background. UNIX/Linux allowed background running of programs with nohup command. Run the program preceeding with nohup and end with an ‘&’. nohup abc & File transfer from other PCs in different platforms File transfer between MS Windows and UNIX can be done by starting secure ftp program from Windows. For examples, winscp. Install and run winscp downloadable from www.openssh.org Connect a host session with your username and password On the left listed your windows desktop, on the right, you will see your linux file/directories. Just drag and drop files or directories between them to perform file transfer. Assorted Linux distributions SuSE (commercial supported with open source variant OpenSuSE) RedHat (commercial supported) Caldera OpenLinux (SCO open server) Turbo Linux (Japanese, support HA) RedFlag (Chinese based) Xandros (Commercial, fit for netbook and handheld device) Slackware (Earliest distribution) Debian (First community based linux) Mandriva (Derived from Mandrake, good desktop interface) Ubuntu (Charity formed in South Africa) Gentoo Linux (High optimized) Fedora (Redhat support open source variants) CentOS (Enterprised level community support) Knoppix (Live CD/DVD) Outline – PC Cluster Introduction to the PC Clusters What is a PC cluster The different kinds of PC clusters High Performance Computing (HPC) cluster vs Single System Image (SSI) cluster How to build your own PC cluster Introduction to existing HPC cluster in Faculty of Science, HKBU What is a PC cluster? An ensemble of networked, stand-alone common-off-the-shelf computers used together to solve a given problem. Different kinds of PC cluster High Performance Computing Cluster (Beowulf cluster) Load Balancing High Availability High Performance Computing Cluster (Beowulf) Start from 1994 Donald Becker of NASA assemble the world’s first cluster with 16 sets of DX4 PCs and 10 Mb/s ethernet Also called Beowulf cluster Built from commodity off-the-shelf hardware Applications like data mining, simulations, parallel processing, weather modelling, computer graphical rendering, etc. Examples of Beowulf cluster Scyld Cluster O.S. originated by Donald Becker ROCKS from NPACI http://oscar.sourceforge.net OpenSCE from Thailand http://www.rocksclusters.org OSCAR from open cluster group http://www.scyld.com http://www.opensce.org SCore from PC Cluster Consortium, Japan http://www.pccluster.org/ Load Balancing Cluster PC cluster deliver load balancing performance Commonly used with busy ftp and web servers with large client base Large number of nodes to share load High Availability Cluster Avoid downtime of services Avoid single point of failure Always with redundancy Almost all load balancing cluster are with HA capability Examples of Load Balancing and High Availability Cluster RedHat Cluster Suite Turbolinux Cluster Server http://www.turbolinux.com/products/middleware/tlc s8.html Linux Virtual Server Project http://www.redhat.com/cluster_suite/ http://www.linuxvirtualserver.org/ Single System Image Cluster for Linux http://www.openssi.org Screenshots 1 An example of Beowulf Cluster: ROCKS (http://www.rocksclusters.org) ROCKS SNAPSHOTS The schematic diagram of a rocks cluster ROCKS SNAPSHOTS Installation of a compute node ROCKS SNAPSHOTS Ganglia Monitoring tools HPCC Cluster and parallel computing applications Message Passing Interface MPICH (http://www-unix.mcs.anl.gov/mpi/mpich/) LAM/MPI (http://lam-mpi.org) Mathematical fftw (fast fourier transform) pblas (parallel basic linear algebra software) atlas (a collections of mathematical library) sprng (scalable parallel random number generator) MPITB -- MPI toolbox for MATLAB Quantum Chemistry software gaussian, qchem, amber Molecular Dynamic solver NAMD, gromacs, gamess Weather modelling MM5 (http://www.mmm.ucar.edu/mm5/mm5-home.html) NAMD2 – Software for Quantum Chemistry Single System Image (SSI) Cluster MOSIX openMosix MOSIX and openMosix MOSIX: MOSIX is a software package that enhances the Linux kernel with cluster capabilities. The enhanced kernel supports any size cluster of X86/Pentium based boxes. MOSIX allows for the automatic and transparent migration of processes to other nodes in the cluster, while standard Linux process control utilities, such as 'ps' will show all processes as if they are running on the node the process originated from. openMosix: openMosix is a spin off of the original Mosix. The first version of openMosix is fully compatible with the last version of Mosix, but is going to go in its own direction. OpenMosix installation Install Linux in each nodes Download and install openmosix-kernel-2.4.26-openmosix1.i686.rpm openmosix-tools-0.3.6-2.i386.rpm and related packages like thoses in www.openmosixview.com Reboot with openmosix kernel Screenshots 2 OpenMosix cluster management openMosix cluster management tools openMosixView openMosixmigmon 3dmosmon Advantage of SSI cluster Not need to parallelize code Automatic process migration, i.e. load balancing Add / delete nodes at any time Well aware of hardware and system resources PC clusters in Faculty of Science, HKBU PII 4-node clusters started in 1999 (obsolete) PIII 16 node cluster purchased in 2001. (obsolete) Plan for grid For test base HKBU - 64-nodes P4-Xeon cluster at #300 of top500 TDG cluster configuration Master node: DELL PE2650 P4 Xeon 2.8GHz x 2 4GB ECC DDR RAM 36GB x 2 internal HD running RAID 1 (mirror) 73GB x 10 HD array running RAID 5 with hot spare Compute nodes x 64 each with DELL PE2650 P4 Xeon 2.8GHz x 2 2GB ECC DDR RAM 36GB internal HD Interconnect configuration Extreme BlackDiamond 6816 Gigabit ethernet switch 16-node P4 Xeon Cluster for computational research from 2005 16 compute nodes each with P4 Xeon 3.2GHz x 2 2GB RAM 36GB SCSI harddisk ROCKS 4.0.0 Sciblade Cluster 256-node clusters supported by fund from RGC in 2009 51 Hardware Configuration of the newest PC cluster -- sciblade Master Node IO nodes (Storage) Dell PE1950, 2x Xeon E5450 3.0GHz (Quad Core) 16GB RAM, 73GB x 2 SAS drive Dell PE2950, 2x Xeon E5450 3.0GHz (Quad Core) 16GB RAM, 73GB x 2 SAS drive 3TB storage Dell PE MD3000 Compute nodes x 256 each Dell PE M600 blade server w/ Infiniband network 2x Xeon E5430 2.66GHz (Quad Core) 16GB RAM, 73GB SAS drive 52 Hardware Configuration Blade Chassis x 16 Dell PE M1000e Each hosts 16 blade servers Management Network Dell PowerConnet 6248 (Gigabit Ethernet) x 6 Inerconnect fabric Qlogic SilverStorm 9120 switch Console and KVM switch Dell AS-180 KVM Dell 17FP Rack console Emerson Liebert Nxa 120kVA UPS 53 PC cluster nowadays Node hardware Multi-core CPUs with L2 and L3 cache DDR RAM Large harddisk (over 500GB per disk) Blade / Rack mount server Storage SAN, I/O nodes, parallel file systems PC cluster nowadays (cont) Interconnect Gigabit Ethernet, Myrinet, Infiniband, Quadrics Reference URLs Clustering and HA Beowulf , parallel Linux cluster. ROCKS from NPACI OPENMOSIX , scalable cluster computing with process migration High Performance Cluster Computing Centre Supported by Dell and Intel Linux Cluster Information Center The Quantian Scientific Computing Environment Thank you! Welcome to visit HPCCC, HKBU http://www.sci.hkbu.edu.hk/hpccc/