Program - Duke University

advertisement
Programs and Processes
Jeff Chase
Duke University
The Operating System
• An operating system:
– Runs programs; sets up execution contexts for programs
– Enables programs to interact with the outside world
– Enforces isolation among programs
– Mediates interactions among programs
User Applications
Operating System(s)
Substrate / Architecture
Today
• What is a program?
– A little bit of C on “classical OS”
• How does a program run?
• How are programs built?
• What does the computer look like to a program?
A simple C program
int
main()
{
}
What’s in a program?
What’s in a program?
code
instructions (“text”)
procedures
data
data
global variables (“static”)
constants (“immutable”)
symbols (import/export)
Names
interfaces
references
A simple module
int val = 0;
int p1(char *s) {
return 1;
}
int p2() {
char *s;
int i;
s = "hello\n";
i = p1(s);
return(i);
}
state
P1()
API
P2()
P3()
P4()
E.g., a library
Calling the module
#include <stdio.h>
extern int p1(); interface
extern int p2(); signatures
(prototypes)
int
main()
{
int i;
state
P1()
P2()
P3()
P4()
Program
i = p2();
printf("%d\n", i);
}
.section __TEXT,__text,regular,pure_instructions
.globl
_p1
.align
4, 0x90
_p1:
## @p1
.cfi_startproc
## BB#0:
pushq %rbp
.globl
_p2
Ltmp2:
.align
4, 0x90
.cfi_def_cfa_offset 16
_p2:
## @p2
Ltmp3:
.cfi_startproc
.cfi_offset %rbp, -16
….
movq
%rsp, %rbp
ret
Ltmp4:
.cfi_endproc
.cfi_def_cfa_register %rbp
movl
$1, %eax
.section
movq
%rdi, -8(%rbp)
__TEXT,__cstring,cstring_literals
popq
%rbp
L_.str:
## @.str
ret
.asciz
"hello\n"
.cfi_endproc
.comm _val,4,2
.subsections_via_symbols
## @val
Global data (“static”)
int g;
int g0 = 0;
int g1 = 1;
.globl _g0
##
@g0
.zerofill
__DATA,__common,_g0,4,2
.section __DATA,__data
.globl _g1
##
@g1
.align 2
_g1:
.long 1
##
0x1
.comm
## @g
_g,4,2
The Birth of a Program (C/Ux)
myprogram.c
int j;
char* s = “hello\n”;
myprogram.o
assembler
data
object
file
int p() {
j = write(1, s, 6);
return(j);
}
data
data
data
…..
p:
compiler
store this
store that
push
jsr _write
ret
etc.
myprogram.s
header files
libraries
and other
object
files or
archives
linker
data
program
myprogram
(executable file)
What’s in an Object File or Executable?
Header “magic number”
indicates type of file/image.
Section table an array
of (offset, len, startVA)
sections
Used by linker; may be
removed after final link
step and strip. Also
includes info for debugger.
header
text
program instructions
p
data
idata
immutable data (constants)
“hello\n”
wdata
writable global/static data
j, s
symbol
table
j, s ,p,sbuf
relocation
records
int j = 327;
char* s = “hello\n”;
char sbuf[512];
int p() {
int k = 0;
j = write(1, s, 6);
return(j);
}
But Java programs are interpreted
They run on an “abstract machine” (e.g., JVM)
implemented in software.
”bytecode”
http://www.media-art-online.org/java/help/how-it-works.html
http://forensics.spreitzenbarth.de/2012/08/27/co
mparison-of-dalvik-and-java-bytecode/
What’s the point?
“Program” is an abstraction
• There are many different representations of
programs, even of executable programs.
• Executable programs are compiled and
packaged to run on an abstract machine.
• Details of the program depend on the platform:
the machine and system software.
• Abstraction(s) is/are crucial in computer systems
because they help accommodate rapid change.
Running a program
sections
code (“text”)
constants
initialized data
Process
segments
data
Thread
Program
virtual
memory
When a program launches, the OS creates an
execution context (process) to run it, with a thread
to run the program, and a virtual memory to store
the running program’s code and data.
VAS example (32-bit)
• The program uses virtual memory through
its process’ Virtual Address Space:
0x7fffffff
Reserved
Stack
• An addressable array of bytes…
• Containing every instruction the process
thread can execute…
• And every piece of data those instructions
can read/write…
– i.e., read/write == load/store on memory
• Partitioned into logical segments with
distinct purpose and use.
• Every memory reference is interpreted in
the context of theVAS.
– Resolves to a location in machine memory
Dynamic data
(heap/BSS)
Static data
Text
(code)
0x0
“Classic Linux Address Space”
N
http://duartes.org/gustavo/blog/category/linux
int P(int a){…}
void C(int x){
int y=P(x);
}
How do C and P share information?
Via a shared, in-memory stack
int P(int a){…}
void C(int x){
int y=P(x);
}
What info is stored on the stack?
C’s registers, call arguments, RA,
P's local vars
Review of the stack
• Each stack frame contains a function’s
•
•
•
•
Local variables
Parameters
Return address
Saved values of calling function’s registers
• The stack enables recursion
Code
0x8048347
void C () {
A (0);
}
0x8048354
void B () {
C ();
}
0x8048361
void A (int tmp){
if (tmp) B ();
}
0x804838c
Memory
Stack
0xfffffff
…
int main () {
A (1);
return 0;
}
A
tmp=0
RA=0x8048347
C
const=0
RA=0x8048354
B
RA=0x8048361
A
tmp=1
RA=0x804838c
main
0x0
const1=1
const2=0
Code
Memory
Stack
0xfffffff
0x8048361
0x804838c
void A (int bnd){
if (bnd)
A (bnd-1);
}
int main () {
A (3);
return 0;
}
How can recursion go wrong?
Can overflow the stack …
Keep adding frame after frame
…
A
bnd=0
RA=0x8048361
A
bnd=1
RA=0x8048361
A
bnd=2
RA=0x8048361
A
bnd=3
RA=0x804838c
main
0x0
const1=3
const2=0
Code
void cap (char* b){
for (int i=0;
b[i]!=‘\0’;
i++)
0x8048361 } b[i]+=32;
int main(char*arg) {
char wrd[4];
strcpy(arg, wrd);
cap (wrd);
return 0;
0x804838c }
What can go wrong?
Can overflow wrd variable …
Overwrite cap’s RA
Memory
Stack
0xfffffff
…
0x0
cap
b= 0x00234
RA=0x804838c
wrd[3]
wrd[2]
wrd[1]
main
wrd[0]
0x00234
const2=0
Assembler directives: quick
peek
From x86 Assembly Language Reference Manual
The .align directive causes the next data generated to be aligned modulo
integer bytes.
The .ascii directive places the characters in string into the object module at the
current location but does not terminate the string with a null byte (\0).
The .comm directive allocates storage in the data section. The storage is
referenced by the identifier name. Size is measured in bytes and must be a
positive integer.
The .globl directive declares each symbol in the list to be global. Each symbol
is either defined externally or defined in the input file and accessible in other
files.
The .long directive generates a long integer (32-bit, two's complement value)
for each expression into the current section. Each expression must be a 32–bit
value and must evaluate to an integer value.
Basic hints on using Unix
•
Find a properly installed Unix system: linux.cs.duke.edu, or MacOS with
Xcode and its command line tools will do nicely.
•
Learn a little about the Unix shell command language: e.g., look ahead to
the shell lab, Lab #2. On MacOS open the standard Terminal utility.
•
Learn some basic commands: cd, ls, cat, grep, more/less, pwd, rm, cp,
mv, diff, and an editor of some kind (vi, emacs, …). Spend one hour.
•
Learn basics of make. Look at the makefile. Run “make –i” to get it to tell
you what it is doing. Understand what it is doing.
•
Wikipedia is a good source for basics. Use the man command to learn
about commands (1), syscalls (2), or C libraries (3). E.g.: type “man man”.
•
Know how to run your programs under a debugger: gdb. If it crashes you
can find out where. It’s easy to set breakpoints, print variables, etc.
•
If your program doesn’t compile, deal with errors from the top down. Try
“make >out 2>out”. It puts all output in the file “out” to examine at leisure.
•
Put source in a revision system like git or svn, but Do. Not. Share. It.
Running a program
Can a program launch multiple
running instances on the same
platform?
Program
Running a program
Can a program launch multiple
running instances on the same
platform?
Program
It depends.
On some platforms (e.g., Android) an
app is either active or it is not.
Abstraction
• Separate:
– Interface from internals
– Specification from implementation
• Abstraction is a double-edged sword.
– “Don’t hide power.”
• More than an interface…
This course is (partly) about the use of
abstraction(s) in complex software systems.
We want abstractions that are simple, rich,
efficient to implement, and long-lasting.
Interface and abstraction
Abstraction(s)
• A means to organize knowledge
– Capture what is common and essential
– Generalize and abstract away the details
– Specialize as needed
– Concept hierarchy
• A design pattern or element
– Templates for building blocks
– Instantiate as needed
• E.g.: class, subclass, and instance
Standards, wrappers, adapters
“Plug-ins”
“Plug-compatible”
Another layer of software can overcome superficial or
syntactic differences if the fundamental are right.
Virtualization?
Download