Application Installation Guntis Bārzdiņš Ģirts Folkmanis Artūrs Lavrenovs Normunds Grūzītis “Hello World” $ cat > hello.c #include <stdio.h> int main() { printf("Hello World!\n"); return 0; } Ctrl/D $ gcc hello.c $ ./a.out Hello World! $ GCC: the GNU Compiler Collection Originally: GNU C Compiler (1987) Frontends for: C: gcc (vs. GCC) C++: g++ More (ada, java, objective-c, fortran, …) Backends for: x86, ia-64, ppc, m68k, alpha, hppa, mips, sparc, mmix, pdp-11, vax, … GCC: the GNU Compiler Collection .c .java gcc gcj AST ASM OBJ EXE ... ... Compiling C programs Involves at least four kinds of files: Regular source code files (*.c) Header files (*.h) Object files (*.o) Binary executables (*.out) Other kinds of files: Libraries (*.a) Shared libraries (*.so) gcc: Usage gcc foo.cpp Outputs a.out executable binary gcc -Wall -pedantic foo.cpp bar.cpp baz.o -o foo -Wall: warns about legal but dubious code constructs, helps to catch a lot of bugs early -pedantic: warns about non-portable (non-standard) constructs -o: outputs the executable to the specified file The standard C library (libc) is linked by default To include other libraries, e.g. the math library: -lm The static library m is located in libm.a Steps involved 1. Preprocessor (cpp) 2. Compiler (gcc) 3. Assembler (as) 4. Linker (ld) Linking Combines the program’s object code with other object code to produce an executable binary file The other object code can come from the Run-Time Library, other pre-compiled libraries, or object files that you have created On a Unix system, the executable file is called a.out by default If any linking errors are thrown, no executable file will be generated Translation Process Static Linking libm.a bar.o foo.o libc.a printf.o, fopen.o Linker (ld) fully linked executable object file a.out Merging Relocatable Object Files into an Executable Object File Relocatable Object Files system code .text system data .data Executable Object File 0 headers system code main() m.o a.o main() .text int e = 7 .data a() .text int *ep = &e int x = 15 int y .data .bss .text a() more system code system data int e = 7 int *ep = &e int x = 15 uninitialized data .symtab .debug .data .bss Dynamic Linking a.o b.o Linker -fPIC bar.o libfoo.so (position independent shared object) Linker partially linked executable: dependency on libfoo.so a.out OS Loader (execve) Dynamic Linker (ld-linux.so) fully linked executable in memory Creating a static library Indexed archive with *.o files Static library for linking: libsomething.a Create .o files: gcc -c helper.c Create an archive: ar rlv libsomething.a *.o Generate index: ranlib libsomething.a == ar s Link the library: gcc -L/your/dir -lsomething Creating a dynamic library Details differ for each platform gcc -shared -fPIC -o libhelper.so *.o Linking: the same approach as for static (-llibrary) Also via $LD_LIBRARY_PATH, but: Today used rarely and sometimes disabled completely Security and compatibility issues Replaced by ldconfig /etc/ld.so.conf Library distribution Distribution software management systems provide libraries that the installed software depend on Mostly dynamic libraries Some static libraries – mostly for development Or just to be safe, provide both Program development using gcc Editor Source File pgm.c Preprocessor Modified Source Code (in RAM) Compiler Program Object Code File pgm.o Other Object Code Files (if any) Linker Executable File a.out Unix SW development environments GNU Emacs (1985) Kdevelop, Eclipse, Netbeans Xcode Vim (1991) Geany Other programming tools There are tools to ease the single programmer development of modular projects (make) There are version control systems for multi programmer projects: CVS, SVN, Git Git: a distributed revision control system with an emphasis on speed, data integrity initially designed and developed by Linus Torvalds (2005) for the Linux kernel development There are packaging tools to ease installing programs: Tarball, RPM, DEB, ... GNU make utility If in a directory full of C files there is a text file with the default name makefile then running make will cause it to be used by the make program. The power of make is that it examines timestamps. If a change is made only to part2.c and make runs again, only part2.c is compiled. The linker links the old part1.o, old main.o and the new part2.o to make the final program. Can be used to solve many other timestamp-based issues. Makefiles • When programming large applications: – It is better to modularise – Keep individual source files small – Instead of one large file • Difficult to edit • Slow to compile – Tiresome to do manually (repeatedly) – Use a Makefile Example Makefile • Consider a program contained in two separate source files: – main.c – sum.c – Both include a header sum.h • Require the executable file to be named sum • A simple makefile can be used (see the next slide) Example Makefile Comment # Makes sum Dependency sum: main.o sum.o gcc –o sum main.o sum.o Action main.o: main.c sum.h gcc –c main.c sum.o: sum.c sum.h Dependency + Action = Rule gcc –c sum.c Action line begins with a TAB followed by a command line Dependency line starts with a target at column 0 make (1976) Searches the current directory for GNUmakefile/BSDmakefile, makefile, Makefile Runs the specified (or default) target(s) make [–f makefile] [target] Declarative: necessary end conditions are described but the order in which the actions are to be taken is not specified Maintains dependency graphs (topological sorting of a DAG) Based on modification times If node newer than child, remake child make variables and macros Environment variables are available as $(PATH) etc. Macros can be composed of shell commands by using the command substitution operator (backticks) Predefined internal macros aka. automatic variables: $@ name of current target $? list of dependencies newer than target $< name of dependency file $* base name of current target $% for libraries, the name of member More on make rules Implicit or suffix rules CC = gcc CFLAGS = -g # The special fake target for suffixes .SUFFIXES: .o .c # .c to .o compilation .c.o: $(CC) $(CFLAGS) -c $< Pattern rules Target contains exactly one character % for matching file names Prerequisites likewise use % to show how their names relate to the target # From html to txt %.txt: %.html lynx -dump $< > $@ make all all: hello clean clean: rm –f *.o helper.o: helper.c $(CC) $(CFLAGS) -o $@ $< OBJ = helper.o \ hello.o hello: $(OBJ) $(CC) $(CFLAGS) $(LDFLAGS) –o $@ $(OBJ) The general routine Installing software from the source code generally involves the following three steps: Configuring the makefile Generating the makefiles before the compilation to tailor it to the system on which the executable is to be compiled and run Compiling the code Installing the executables in the appropriate places ./configure make make install Revision management Diffutils Find differences among files (diff and diff3) Update a set of files from generated differences (patch) CVS Maintains a history of changes for groups of files Including: who, when, what, and optionally why Newer alternative: subversion (SVN) git – the Linux way CVS, Concurrent Versions System Modify source code locally until it’s ready to be merged with what everyone sees Multiple people can modify the same file and be ok Develop multiple versions of software simultaneously And merge these branches later on Recall past versions of code base Based on time ago or version See committer's comments on their changes Client/Server git Linux kernel needed SCM after the proprietary system BitKeeper withdraw the free licence Requirements: distributed workflow, speed, safety, open source A non-existent set of features at that time Created by Linus Torvalds in 2005 Taking CVS as an example of what not to do If in doubt, make the exact opposite decision ;) Now de-facto SCM for the UNIX development Locally stored full repository (revision history) Packaging approaches: source vs. binary A software package: a software that is built from source with a package management system (PMS) Two fundamentally different approaches for packaging-based software distributions: Providing source packages containing the vendor sources plus instructions for automated build and installation Providing binary packages containing the final installation files only Most PMS support both approaches, although often not equally well source binary package package distribution size package size package dependencies installation reproducability installation run-time stability installation system alignment installation time Dependencies Dependency management is a very helpful feature of package management systems Keeps the system in a consistent state and guarantees the applications to run in the expected way rpm or dpkg commands have limited dependency management features They can report which library a package relies on, but the library can itself rely on other packages… Repository-based PMS try to solve the dependency hell problem, however, the hell is still faced by the repository maintainers Main package distribution formats in Linux There is no standard package manager in Linux Main package management systems: Tarball files (.tar.gz / .tar.bz2) The old-fashioned way of distributing software in Linux/Unix Usually available for open source projects on their web pages Compatible with all distros The main package manager in Slackware, Gentoo RPM (RedHat Package Manager) (.rpm) Has been adopted by many other distributions (Fedora, Mandrake, SUSE) DEB (Debian Package Manager) (.deb) The most popular Linux package format Installing from Tarball files Software packages coming in source code archives have to be compiled before installed Usually come in .tar.gz or .tar.bz2 archives Preserve file system parameters: timestamps, ownership, permissions etc. Unpack the "tape archive": Configure, compile, install: tar xvf <package_name>.tar tar xzvf <package_name>.tar.gz tar xjvf <package_name>.tar.bz2 cd <package_name> ./configure make make install README and INSTALL files are typically included Originally: backup tools tar -c: create -x: extract -z: use gzip compression -j: use bzip2 compression -v: verbose -f: use file gzip compression is used in tarballs .tar.gz, .tgz and .tar.Z gzip file: compress file, renaming file to file.gz gunzip file.gz: uncompress file, renaming file.gz to file bzip2 compression which is slightly better but requires more CPU is used in tarballs .tar.bz2 and .tbz2 bzip2 file: compress file, renaming file to file.bz2 bunzip2 file.bz2: uncompress file, renaming file.bz2 to file Apache example (historic) 1. 2. 3. 4. 5. 6. 7. 8. 9. gunzip apache_xxx.tar.gz tar -xvf apache_xxx.tar gunzip php-xxx.tar.gz tar -xvf php-xxx.tar cd apache_xxx ./configure --prefix=/www --enable-module=so make make install cd ../php-xxx 10. Now, configure your PHP. This is where you customize your PHP with various options, like which extensions will be enabled. Do a ./configure --help for a list of available options. In our example we'll do a simple configure with Apache 1 and MySQL support. Your path to apxs may differ from our example. ./configure --with-mysql --with-apxs=/www/bin/apxs 11. make 12. make install If you decide to change your configure options after installation, you only need to repeat the last three steps. You only need to restart apache for the new module to take effect. A recompile of Apache is not needed. Note that unless told otherwise, 'make install' will also install PEAR, various PHP tools such as phpize, install the PHP CLI, and more. Managing software in RedHat-based distros Packages are installed using the rpm command-line utility Install a package Update an existing package rpm -i <package_name>.rpm rpm -U <package_name>.rpm Remove a package rpm -e <package_name> yum –Yellowdog Updater, Modified Repository-based package management utility for RPM Used by most rpm-based distros: RHEL, Fedora, CentOS, etc. Solves dependency and update issues Install package and all its dependencies from repository: yum -y install <package_name> Three ways to manage packages in Debian dpkg: used on .deb files (like rpm) Standard Unix archives that include 2 tar archives: one holds the control information and another contains the program data Install: dpkg -i <package_name>.deb If an older version of the package is installed it updates it automatically by replacing it with the new one Remove: dpkg -r <package_name> dselect: a front-ent to dpkg, superseded by APT apt-get: a front-end to dpkg, the most frequent way of managing software packages in Debian Install: apt-get install <package_name> Remove: apt-get remove <package_name> Package management: features Using RPM Install or upgrade a package: rpm -Uvh <package>-1.0.i386.rpm Remove a package: rpm -e <package>-1.0 Determine the version of a package you have installed: rpm -qa | grep <package> Determine the package a file belongs to: rpm -qf <file> Using APT Update your package lists: apt-get update Install all updated packages: apt-get dist-upgrade Install a package: apt-get install <package> Upgrade a package: apt-get upgrade <package> Remove a package (and packages that depend on it): apt-get remove <package> Search for a package: apt-cache search <name> Other Packaging Methods emerge in Gentoo Linux Deals mostly with source files Fetches packages and compiles them according to compilation parameters given in /etc/make.conf e.g. emerge kde #Fetches, compiles and installs packages for KDE YAST in Suse Aptitude – a "replacement" for apt-get Better dependency managment and state tracking Less broken systems and unsuccessful installs Gentoo Portage The emerge command-line tool is the key tool Provides ebuild scripts that download, patch, compile, and install packages Modeled on the ports-based BSD distributions Used also in Chrome OS Source code tarballs are downloaded No need to wait for someone to make a binary package for your distribution Dependency checking, extreme customization Users specify what they want, and the system is built to their needs Compilers are optimized for the specific hardware e.g. Altivec on G4 PPC chips, Pentium versus Athlon Portage tree A collection of ebuilds, files that contain all information Portage needs to maintain software install, search, query, ... ebuilds reside in /usr/portage/ by default The Portage tree is usually updated with rsync: $ emerge --sync $ emerge-webrsync To search for all packages who have "pdf" in their name: $ emerge search pdf Package lifecycle BEGIN developer administrator Evaluation of application Checkout RPM Specification from CVS Fetch Source RPM from Repository Fetch Binary RPM from Repository Edit RPM Specification (.spec) Track Vendor Sources Unpack Source RPM Install Application into Instance Fetch Vendor Sources Unpack Vendor Sources Remove Application from Instance Configure & Build Application Install Application Commit RPM Specification to CVS Roll Source RPM Package Roll Binary RPM Package Store Source RPM into Repository Store Binary RPM into Repository END upgrade On-line package repositories Large package bases on the Web Accessible via FTP or HTTP http://rpm.pbone.net http://www.apt-get.org You can learn how to build Your own distribution http://www.linuxfromscratch.org Using rpm rpm –i install package, check for dependencies rpm –e erase package rpm –U upgrade package rpm –q query packages (e.g., -a = all) rpm -q rpm -q -i telnet Name : telnet Relocations: (not relocateable) Version : 0.17 Vendor: Red Hat, Inc. Release : 18.1 Build Date: Wed Aug 15 15:08:03 2001 Install date: Fri Feb 8 16:50:03 2002 Build Host: stripples.devel.redhat.com Group : Applications/Internet Source RPM: telnet-0.17-18.1.src.rpm Size : 88104 License: BSD Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Summary : The client program for the telnet remote login protocol. Description : Telnet is a popular protocol for logging into remote systems over the Internet. The telnet package provides a command line telnet client. Install the telnet package if you want to telnet to remote machines. This version has support for IPv6. Building your own rpm: spec # # spec file for hello world app # Summary: hello world Name: hello Version: 1.0 Release: 1 Copyright: GPL Group: Applications/Test Source: http://www.cs.columbia.edu/IRT/software/ URL: http://www.cs.columbia.edu/IRT/software/ Distribution: Columbia University Vendor: IRT Packager: Henning Schulzrinne <hgs@cs.columbia.edu> BuildRoot: /home/hgs/src/rpm %description The world's most famous C program. Building your own rpm: spec %prep rm -rf $RPM_BUILD_DIR/hello-1.0 zcat $RPM_SOURCE_DIR/hello-1.0.tgz | tar -xvf - %build make %install make ROOT="$RPM_BUILD_ROOT" install %files %doc README /usr/local/bin/hello /usr/local/man/man1/hello.1 %clean Building your own rpm create ~/.rpmmacros %_topdir /home/hgs/src/test/rpm cd /home/hgs/src/test/rpm/SPECS rpm -ba --buildroot /home/hgs/tmp hello-1.0.spec creates binary and source RPM APT Most popular package manager in the world APT is a system created in the Debian community to automatically manage the packages dependencies APT can install, remove and upgrade packages, managing dependencies and downloading the packages It’s a frontend to other tools, and it uses the underlying package management system, like the rpm or dpkg commands It’s able to fetch packages from several media (cdrom, ftp, http, nfs), and it can be used to create ad-hoc software repositories APT – Using (1/3) [root]@[/] # apt-get install nautilus Reading Package Lists... Done Building Dependency Tree... Done The following extra packages will be installed: bonobo libmedusa0 libnautilus0 The following NEW packages will be installed: bonobo libmedusa0 libnautilus0 nautilus 0 packages upgraded, 4 newly installed, 0 to remove and 1 not upgraded. Need to get 8329kB of archives. After unpacking 17.2MB will be used. Do you want to continue? [Y/n] APT – Using (2/3) [root]@[/] # apt-get remove gnome-panel Reading Package Lists... Done Building Dependency Tree... Done The following packages will be REMOVED: gnome-applets gnome-panel gnome-panel-data gnome-session 0 packages upgraded, 0 newly installed, 4 to remove and 1 not upgraded. Need to get 0B of archives. After unpacking 14.6MB will be freed. Do you want to continue? [Y/n] APT – Using (3/3) [root]@[/] # apt-cache search pdf kghostview - PostScript viewer for KDE tetex - The TeX text formatting system. xpdf - A PDF file viewer for the X Window System … [root]@[/] # apt-cache show xpdf … Filename: xpdf-1.00-3.i386.rpm Description: A PDF file viewer for the X Window System. Xpdf is an X Window System based viewer for Portable Document Format (PDF) files. Xpdf is a small and efficient program which uses standard X fonts. FreeBSD – Hybrid package management Binary packages Uses FreeBSD repositories Common ./configure options for compilation pkg_add -r <package_name> pkg_info pkg_delete <package_name>\* Rather simplistic, no updates pkg_delete <package_name> pkg_add -r <package_name> pkgng – new generation pkg solves lots of issues new advanced functionality FreeBSD – Ports Collection (Source packages) Download snapshot (index) of ports collection from FreeBSD mirrors portsnap fetch Extract index as directory tree to /usr/ports portsnap extract Install software package cd /usr/ports/*/lighttpd make install clean Downloads and installs (compiles) tarballs (also all dependencies) Asks for configuration options (frontend for ./configure) FreeBSD – what is Port? Contents of directory in /usr/ports/<cat>/<name> Some info files Dependency list Link to download original tarball from developer site List of tarball files and SHA256 Patch files Configuration files Configuration options Makefile which does all the work