Programming in R COURSE NOTES 2 Hoganson Language Translation Dr. Ken Hoganson, © August 2014 Language Translation • Computer is all 0s and 1s, which is hard for humans. • So we have created languages that are easier for us (humans) to work with. • Human-friendly languages require computer time to translate into computer-executable programs. • Ongoing trend since computer was created: make better human interfaces to the machine, using the ever increasing power of the computer to do the translation work “behind the scenes”. • Thing GUI interfaces and virtual-reality interfaces. Dr. Ken Hoganson, © August 2014 Machine Code Just a taste Machine code – the bottom line in programming. Machine code instructions are divided into fields, and the instruction has a specified format. Simple example: Dr. Ken Hoganson, © August 2014 Machine Code • This instruction has four fields: – – – – Instruction type (two bits) Operation code (6 bits) Register operand 1 (4 bits) Register operand 2 (4 bits) • 16-bit (two-byte) instruction Dr. Ken Hoganson, © August 2014 Machine Code • Two bits for instruction type. How many types of instructions are possible within this format? • Operation Code is 6 bits. How many types of operations are possible for a format? • The register operands are 4 bits each. How many different registers can be indicated with 4 bits? (similar to addressing) Dr. Ken Hoganson, © August 2014 Machine Code • This instruction format is a Register-Register instruction. • That means that it takes its inputs from two register operands. • The operation is performed on those two data elements, and the result goes back into the register specified by the first register operand. Dr. Ken Hoganson, © August 2014 Machine Code Instruction • Machine code is not hard, just painful and slow to work with. – – – – Register-Register instruction format is ‘00’ Op Code to add two registers is ‘010000’ Add contents of register 2 specify ‘0010’ Add contents of register 4 specify ‘0100’ • Complete instruction in 0s and 1s: • 00 010000 0010 0100 Do you remember where the result of the addition is stored? Dr. Ken Hoganson, © August 2014 Assembly Language • Working with 0s and 1s is hard – and humans are prone to making errors. • Languages have been created to make programming easier. • Assembly language is the lowest level language. – Uses mnemonics and abbreviations. • Our add two register instruction: – 00 010000 0010 0100 • Can be represented (1 to 1) with an assembly instruction: – ADR R2 R4 – ADd Registers R2 and R4, result in R2 Dr. Ken Hoganson, © August 2014 High-Level Languages • Assembly language is a big improvement over machine code. • Assembly is translated by an assembler program to 0s and 1s that the computer can work with. • More powerful (and human-readable) languages have been created (which must also be translated to 0s and 1s). • These are called High Level Languages • Basic, Fortran, C, C++, C#, R, etc. Dr. Ken Hoganson, © August 2014 High-Level Languages • Our add two register instruction: – 00 010000 0010 0100 • In assembly language: – ADR R2 R4 – ADd Registers R2 and R4, result in R2 • In a high level language might look like: – Number1 = Number1 + Number2 – Better? Dr. Ken Hoganson, © August 2014 Many-to-1 translation • ADR R2 R4 • High level language might look like: – Sum = Number1 + Number2 • But this high-level language has another type of translation embedded: memory addressing – Number1, Number2, and SUM are data values stored in memory, not registers. – The values for Number1 and Number 2 must be first loaded from memory into registers. – Then the add operation can be performed – Then the result stored back to memory in SUM. • Additional machine-level instructions needed to do this one high-level language instruction Dr. Ken Hoganson, © August 2014 High-level Language Translation • High-level language instructions must be translated/converted to machine code before the computer can run them. • This process requires a translation program: – Compiler – Interpreter – (Assembler was used for assembly language) • Languages like C, C++, Cobol, Fortran and Pascal are all compiled languages. Dr. Ken Hoganson, © August 2014 Compiler • Compiler takes the high-level language program (as text) as its input. • It produces the machine code version of the program as its output. • It does not change the high-level program, the machine code program is a new file. Dr. Ken Hoganson, © August 2014 Interpreter Some languages like BASIC and VisualBASIC are interpreted languages, not compiled. The Interpreter does not convert the entire program all at once. Instead, it converts instructions one at a time, and has the computer execute each instruction. Slower, because every time the program is run, it must be interpreted. Dr. Ken Hoganson, © August 2014 Virtual Machine • A third and more recent way to translate high-level programs is with a Virtual Machine (or byte-code interpreter). Java is an example. • Separates translation into two steps. – Convert the program to “byte-code” – The “byte-code” is then interpreted by a virtual machine. Dr. Ken Hoganson, © August 2014 Virtual Machine • The virtual machine/byte-code interpreter makes programs transportable and deviceindependent. • Converted byte-code can move over the internet. Dr. Ken Hoganson, © August 2014 Virtual Machine • Each different processor/machine needs its own virtual machine, which will be different from CPU to CPU. • Different because of different machine codes and operating systems. Dr. Ken Hoganson, © August 2014 “R” is • A structured programming language (no objects or agents) • With extensions for Big Data – functions and techniques for manipulating large data sets using parallel opportunities. • An interpreted language, running on a Virtual Machine written in a language called “S”. S code is compiled, using a complier for the platform. • The “R” interpreter is compiled “S” code. Dr. Ken Hoganson, © August 2014 End of Lecture End Of Today’s Lecture. Dr. Ken Hoganson, © August 2014