Week 2: Interpreted vs. Compiled Code and Data Types

advertisement
Week 2: Interpreted vs. Compiled Code and Data Types
August 31st, 2012
Interpreted vs. Compiled Code
Interpreted languages, such as R, are very convenient to use. You can write code very quickly
and you don’t need to worry about the OS you’re running it on. You can make changes to your
code and immediately see the difference. The breakdown comes when you need to run a lot of
code. Interpreted languages still have to be compiled. They’re compiled line by line in the
background as the code runs. This means that if you use loops, the loop code must be compiled
over and over every time the loop runs. It also means that the compiler can’t look ahead of
where you are and optimize larger pieces of code; it must compile that line as quickly as
possible.
Compiled code is more complicated to deal with. Every time a change is made, the code must
be recompiled. This makes debugging and fixing code very time consuming. You must also
know how to use the compilers for each OS that you want to run your code on. Compilers give
huge benefits in optimizing the speed of your code. The compiler will analyze your code and
optimize it so that it will run as fast as possible. You can run loops without fear.
C Data Types
In compiled code you must specify exactly what each of your variables will hold. Once the
variable is declared, you cannot change the type of data it will hold. While interpreted
languages have a few general types, compiled languages have a lot of different types. Ints
should be used when possible because they are generally the fastest. Below is a chart of some
of the common basic C Data Types you may want to use:
TYPE
bool
char
short
int
long
long long
float
double
STORAGE
1 byte
1 byte
2 bytes
4 bytes
4 or 8 bytes
8 bytes
4 bytes
8 bytes
SIGNED RANGE
0 or 1
-127: 127
-32,767 : 32,767
+/-2,147,483,647
UNSIGNED RANGE
0 or 1
0 : 255
0 : 65535
0 : 4,294,967,295
+/- 9.07 x 10^18
0 : 18.4 x 10^18
3.4E +/- 38 (7 digits)
1.7E +/- 308 (15 digits)
If you want to pass data from R to C, you are more limited to your types. The available base
types that R can pass to C are:
R Type
integer
numeric
logical
character
raw
C Type
int
float or double
int
char
unsigned char
C variables are declared as follows:
int x;
float y;
If you need to make an array of values, you just use brackets to define the length you need:
int x[4];
char name[10];
Compiling Code
Compiling C code to run on its own and compiling C functions to call from R are 2 very different
things. You can use the Codeblocks program to compile your C code that you want to run on its
own (on the Mac you also need xcode and on Linux you need gcc installed). If you have a .C
source file, you can also use the command “gcc –o [executable name] [source file name]” to
compile your code. This will create a .o file and an executable file. You do not need the .o file
for anything; it’s just an intermediate step between your source and the executable.
To compile a dynamic library that R can use to call C functions, you need to compile the file with
R’s command line functions. To create the C library, you need to run: “R CMD SHLIB [source file
name]”. On Windows and Linux, you’ll need to use the full path to R to make sure you get the
64bit version. On the Mac, you need to call R64 instead of R. If you need to use the full path,
you should create a script that contains that path so you don’t have to type it every time.
R loads the C code using the dyn.load function. Windows creates a .dll file when R compiles the
source, and the Mac and Linux compilers create a .so file. You can rename the .dll file .so if you
wish so that you don’t have to change your dyn.load command in your R source if you move it
to the Linux cluster. You MUST recompile your C code if you move it to a different operating
system and you should recompile it whenever you change OS versions.
Download