Week 2: Interpreted vs. Compiled Code and Data Types August 31st, 2012 Interpreted vs. Compiled Code Interpreted languages, such as R, are very convenient to use. You can write code very quickly and you don’t need to worry about the OS you’re running it on. You can make changes to your code and immediately see the difference. The breakdown comes when you need to run a lot of code. Interpreted languages still have to be compiled. They’re compiled line by line in the background as the code runs. This means that if you use loops, the loop code must be compiled over and over every time the loop runs. It also means that the compiler can’t look ahead of where you are and optimize larger pieces of code; it must compile that line as quickly as possible. Compiled code is more complicated to deal with. Every time a change is made, the code must be recompiled. This makes debugging and fixing code very time consuming. You must also know how to use the compilers for each OS that you want to run your code on. Compilers give huge benefits in optimizing the speed of your code. The compiler will analyze your code and optimize it so that it will run as fast as possible. You can run loops without fear. C Data Types In compiled code you must specify exactly what each of your variables will hold. Once the variable is declared, you cannot change the type of data it will hold. While interpreted languages have a few general types, compiled languages have a lot of different types. Ints should be used when possible because they are generally the fastest. Below is a chart of some of the common basic C Data Types you may want to use: TYPE bool char short int long long long float double STORAGE 1 byte 1 byte 2 bytes 4 bytes 4 or 8 bytes 8 bytes 4 bytes 8 bytes SIGNED RANGE 0 or 1 -127: 127 -32,767 : 32,767 +/-2,147,483,647 UNSIGNED RANGE 0 or 1 0 : 255 0 : 65535 0 : 4,294,967,295 +/- 9.07 x 10^18 0 : 18.4 x 10^18 3.4E +/- 38 (7 digits) 1.7E +/- 308 (15 digits) If you want to pass data from R to C, you are more limited to your types. The available base types that R can pass to C are: R Type integer numeric logical character raw C Type int float or double int char unsigned char C variables are declared as follows: int x; float y; If you need to make an array of values, you just use brackets to define the length you need: int x[4]; char name[10]; Compiling Code Compiling C code to run on its own and compiling C functions to call from R are 2 very different things. You can use the Codeblocks program to compile your C code that you want to run on its own (on the Mac you also need xcode and on Linux you need gcc installed). If you have a .C source file, you can also use the command “gcc –o [executable name] [source file name]” to compile your code. This will create a .o file and an executable file. You do not need the .o file for anything; it’s just an intermediate step between your source and the executable. To compile a dynamic library that R can use to call C functions, you need to compile the file with R’s command line functions. To create the C library, you need to run: “R CMD SHLIB [source file name]”. On Windows and Linux, you’ll need to use the full path to R to make sure you get the 64bit version. On the Mac, you need to call R64 instead of R. If you need to use the full path, you should create a script that contains that path so you don’t have to type it every time. R loads the C code using the dyn.load function. Windows creates a .dll file when R compiles the source, and the Mac and Linux compilers create a .so file. You can rename the .dll file .so if you wish so that you don’t have to change your dyn.load command in your R source if you move it to the Linux cluster. You MUST recompile your C code if you move it to a different operating system and you should recompile it whenever you change OS versions.