Language Translation

advertisement
Programming in R
COURSE NOTES 2 Hoganson
Language Translation
Dr. Ken Hoganson, © August 2014
Language Translation
• Computer is all 0s and 1s, which is hard for
humans.
• So we have created languages that are easier
for us (humans) to work with.
• Human-friendly languages require computer
time to translate into computer-executable
programs.
• Ongoing trend since computer was created:
make better human interfaces to the machine,
using the ever increasing power of the computer
to do the translation work “behind the scenes”.
• Thing GUI interfaces and virtual-reality interfaces.
Dr. Ken Hoganson, © August 2014
Machine Code Just a taste
 Machine code – the bottom line in
programming.
 Machine code instructions are divided
into fields, and the instruction has a
specified format.
 Simple example:
Dr. Ken Hoganson, © August 2014
Machine Code
• This instruction has four fields:
–
–
–
–
Instruction type (two bits)
Operation code (6 bits)
Register operand 1 (4 bits)
Register operand 2 (4 bits)
• 16-bit (two-byte) instruction
Dr. Ken Hoganson, © August 2014
Machine Code
• Two bits for instruction type. How many
types of instructions are possible within this
format?
• Operation Code is 6 bits. How many types
of operations are possible for a format?
• The register operands are 4 bits each. How
many different registers can be indicated
with 4 bits? (similar to addressing)
Dr. Ken Hoganson, © August 2014
Machine Code
• This instruction format is a Register-Register
instruction.
• That means that it takes its inputs from two
register operands.
• The operation is performed on those two
data elements, and the result goes back
into the register specified by the first register
operand.
Dr. Ken Hoganson, © August 2014
Machine Code Instruction
• Machine code is not hard, just painful
and slow to work with.
–
–
–
–
Register-Register instruction format is ‘00’
Op Code to add two registers is ‘010000’
Add contents of register 2 specify ‘0010’
Add contents of register 4 specify ‘0100’
• Complete instruction in 0s and 1s:
• 00 010000 0010 0100
Do you remember
where the result of
the addition is
stored?
Dr. Ken
Hoganson, © August 2014
Assembly Language
• Working with 0s and 1s is hard – and humans are
prone to making errors.
• Languages have been created to make
programming easier.
• Assembly language is the lowest level language.
– Uses mnemonics and abbreviations.
• Our add two register instruction:
– 00 010000 0010 0100
• Can be represented (1 to 1) with an assembly
instruction:
– ADR R2 R4
– ADd Registers R2 and R4, result in R2
Dr. Ken Hoganson, © August 2014
High-Level Languages
• Assembly language is a big
improvement over machine code.
• Assembly is translated by an assembler
program to 0s and 1s that the computer
can work with.
• More powerful (and human-readable)
languages have been created (which
must also be translated to 0s and 1s).
• These are called High Level Languages
• Basic, Fortran, C, C++, C#, R, etc.
Dr. Ken Hoganson, © August 2014
High-Level Languages
• Our add two register instruction:
– 00 010000 0010 0100
• In assembly language:
– ADR R2 R4
– ADd Registers R2 and R4, result in R2
• In a high level language might look like:
– Number1 = Number1 + Number2
– Better?
Dr. Ken Hoganson, © August 2014
Many-to-1 translation
• ADR R2 R4
• High level language might look like:
– Sum = Number1 + Number2
• But this high-level language has another type of
translation embedded: memory addressing
– Number1, Number2, and SUM are data values
stored in memory, not registers.
– The values for Number1 and Number 2 must be
first loaded from memory into registers.
– Then the add operation can be performed
– Then the result stored back to memory in SUM.
• Additional machine-level instructions needed to
do this one high-level language instruction
Dr. Ken Hoganson, © August 2014
High-level Language
Translation
• High-level language instructions must be
translated/converted to machine code before
the computer can run them.
• This process requires a translation program:
– Compiler
– Interpreter
– (Assembler was used for assembly language)
• Languages like C, C++, Cobol, Fortran and
Pascal are all compiled languages.
Dr. Ken Hoganson, © August 2014
Compiler
• Compiler takes the high-level language
program (as text) as its input.
• It produces the machine code version of
the program as its output.
• It does not change the high-level program,
the machine code program is a new
file.
Dr. Ken Hoganson, © August 2014
Interpreter
 Some languages like BASIC
and VisualBASIC are
interpreted languages, not
compiled.
 The Interpreter does not
convert the entire program
all at once.
 Instead, it converts
instructions one at a time,
and has the computer
execute each instruction.
 Slower, because every time
the program is run, it must
be interpreted.
Dr. Ken Hoganson, © August 2014
Virtual Machine
• A third and more recent way to
translate high-level programs is with
a Virtual Machine (or byte-code
interpreter). Java is an example.
• Separates translation into two steps.
– Convert the program to “byte-code”
– The “byte-code” is then interpreted by
a virtual machine.
Dr. Ken Hoganson, © August 2014
Virtual Machine
• The virtual machine/byte-code interpreter
makes programs transportable and deviceindependent.
• Converted byte-code can move over the
internet.
Dr. Ken Hoganson, © August 2014
Virtual Machine
• Each different processor/machine needs its
own virtual machine, which will be different
from CPU to CPU.
• Different because of different machine
codes and operating systems.
Dr. Ken Hoganson, © August 2014
“R” is
• A structured programming language (no
objects or agents)
• With extensions for Big Data – functions and
techniques for manipulating large data sets
using parallel opportunities.
• An interpreted language, running on a
Virtual Machine written in a language called
“S”. S code is compiled, using a complier
for the platform.
• The “R” interpreter is compiled “S” code.
Dr. Ken Hoganson, © August 2014
End of Lecture
End
Of
Today’s
Lecture.
Dr. Ken Hoganson, © August 2014
Download