Linux TCP/IP Stack - ODU Computer Science

advertisement
Linux TCP/IP Stack
TCP / IP
vs.
OSI model
Process
7: Application
6: Presentation
5: Session
Socket layer
4: Transport
3: Network
Protocol Layer (TCP / IP)
Interface Layer (Ethernet, etc.)
2: Data
Link
1: Physical
Layer
TCP/IP Stack Overview
Process
1: sosend (……………... )
5: recvfrom(……….)
Socket Layer
2: tcp_output ( ……. )
4: tcp_input ( ……... )
Protocol Layer (TCP Layer)
3: ip_output ( ……. )
3: ip_input ( ……... )
Protocol Layer (IP Layer)
4: ethernet_output ( ……. )
2: ethernet_input ( …….. )
Interface Layer (Ethernet Device Driver)
Physical Media
Output Queue
Input Queue
Process Layer
to
TCP Layer
send (int socket, const char *buf, int length, int flags)
Process
Kernel
sendto (int socket, const char *data_buffer, int length, int flags, struct sockaddr *destination, int destination _length)
sendit (struct proc *p, int socket, struct msghdr *mp, int flags, int *return_size)
sosend (struct socket *s, struct mbuf *addr, struct uio *uio, struct mbuf *top, struct mbuf *control, int flags )
tcp_userreq (struct socket *s, int request, struct mbuf *m, struct mbuf * nam, struct mbuf * control )
TCP Layer
tcp_output (struct tcpcb *tp)
uipc_syscalls.c
uipc_socket.c
tcp_userreq.c
tcp_output.c
Socket Layer
sendto (int socket, const char *data_buffer, int length, int flags, struct sockaddr *destination, int destination _length)
MBUF Chain
m_next = NULL
m_next
28 Bytes
m_nextpkt = NULL
m_nextpkt = NULL
m_len = 100
m_len = 50
m_data
20 Bytes
m_type = MT_DATA
m_type = MT_DATA
data_buffer
m_flags = M_PKTHDR
m_flags = 0
m_pkthdr.len = 150
128 Bytes
mBuf
150 Bytes
Data
m_data
m_pkthdr.recvif =NULL
100 Bytes
50 Bytes
Data
58 Bytes
Unused Space
Data
Socket Layer -sosend passes data and control information to the protocol layer
sosend(struct socket *s, struct mbuf *addr, struct uio *uio, struct mbuf *data_buffer, struct mbuf *control, int flags )
Initialize a new memory buffer and
variables to hold flags
no
Is there enough space
in the buffer
sbspace(s->sb_snd)
yes
Copy data_buffer
mbuf
int error = tcp_usrreq(s, flags, mbuf, addr, control)
yes
0
More buffers
to send?
1
error
no
Free the memory buffers
received
Return value of error
to sendto ( )
TCP Layer - tcp_usrreq(struct socket *s, int request, struct mbuf *data_buffer, mbuf *nam, mbuf * control)
Initialize internet protocol control block inp and
TCP control block tp
to store information useful for TCP
Convert Socket to
Internet Protocol Control Block
inp = sotoinpcb(so)
Convert the internet protocol control block
to a tcp control block
tp = intopcb(inp)
request
PRU_SEND
int error = tcp_output(tp)
return error
to tcp_userreq( )
TCP Layer (tcp_output.c) - tcp_output(struct tcpcb *tp)
Called by tcp_usrreq for one of the following reasons:
To send the initial SYN
To send a finished_sending message
To send data
To send a window update after data has been received.
tcp_ouput ( ) functionality:
1. determines whether TCP can send a segment or not depending on:
flags in the data sent by the socket layer to send an ACK, etc.
Size of window advertised by the receiver’s end.
Amount of data ready to send
whether unacknowledged data already exists for the connection
2. Calculate the amount of data to be sent depending on:
size of receiver’s window
number of bytes in the send buffer
3. Check for window shrink
4. Send a segment
Allocate a buffer for the TCP and IP header from the header template
Copy the TCP and IP header template into the the buffer to be sent.
Fill the fields in the TCP header.
Decrement the number of buffers to tbe sent, so that the end can be checked.
Set sequencenumber and acknowledgement field.
Set three fields in the IP header - IP length, TTL and Tos.
Pass the datagram to IP
TCP Layer (tcp_output.c) - tcp_output(struct tcpcb *tp)
struct socket *so = tp -> t_inpcb -> inp_socket
Initialize a tcp header tcp_header
Idle is true if the max sequence number
equals the oldest unacknowledged sequence number,
if an ACK is not expected from the other end.
int idle = (tp -> snd_max == tp -> snd_una)
false
idle
true
Check ACK Flag
Acknowledgement is
not expected, set the
congestion window to
one segment
tp -> snd_cwnd =
tp -> t_maxseg;
TCP Layer - tcp_output(struct tcpcb *tp)
Acknowledgement is
not expected, set the
congestion window to
one segment
tp -> snd_cwnd =
tp -> t_maxseg;
off is the offset in bytes from the beginning of
the send buffer of the first data byte to send.
off bytes have already been sent and
acknowledgement
on those is awaited.
int off = tp -> snd_nxt - tp -> snd_una
Determine length of data that should
be transmitted and the flags to be used.
len is the minimum number of bytes in the
send buffer,
win (the minimum of the receiver’s window)
and the congestion window.
len = min(so -> so_snd.sb_cc, win) - off
Determine the flags like TH_ACK, TH_FIN,
TH_RST, TH_SYN
flags = tcp _outflags [ tp -> t_state ]
TCP Layer - tcp_output(struct tcpcb *tp)
Determine the flags like TH_ACK, TH_FIN,
TH_RST, TH_SYN
flags = tcp _outflags [ tp -> t_state ]
true
tp -> t_flags &
TF_ACKNOW
Send acknowledgement
false
true
tp -> t_flags &
TF_SYN || TH_RST
Send sequence number
or reset
false
true
tp -> t_flags &
TH_FIN
false
Finished sending
Ckeck flags to determine the type of message:
window probe
retransmission
normal data transmission
Allocate an mbuf for the TCP & IP header and data if possible.
MGETHDR ( m, M_DONTWAIT, MT_HEADR)
M_DONTWAIT indicates that if memory is not available for
mbuf then come out of the routine and return an error state.
Length of data < 44 Bytes
100 - 40 - 16
no
Create a new mbuf chain,
copy the surplus data and
point it to the first mbuf chain.
yes
Copy the data from the socket send buffer into the
new packet header mbuf
ip_output(m, tp->t_inpcb -> inp_options, &tp -> t_inpcb -> inp_route,
so -> so_options & SO_DONOTROUTE, 0)
ip_output.c
ip_output(struct mbuf *m, struct mbuf *opt, struct route *ro, int flags, struct ip_moptions *imo)
1. Header initialization
2. Route Selection
3. Source address selection and Fragmentation
1. Header initialization
Packets
damaged?
ERROR
yes
Check if there were any errors while adding headers in higher
layers. Most of the fields of the IP header are pre defined by
higher layer protocols.
no
if ((flags == IP_FORWARDING ) ||
(flags == IP_RAWOUTPUT ))
yes
no
Save header length in hlen
for fragmentation algorithm
Construct and initialize IP header
set ip_v = 4, clear ip_off
assign unique identifier to ip_id
length, offset, TTL, protocol, TOS etc
are set by higher layers.
The value of “flags” decides what’s to be done with the data
• IP_FORWARDING : Forward packet
• IP_ROUTETOIF : Route directly to Interface
• IP_ALLOWBROADCAST : Allow broadcasting of packet
• IP_RAWOUTPUT : Packet contains pre-constructed header
If the packet has to be forwarded to another host, i.e if the
machine is acting as a router, then the IP header for forwarded
packets should not be modified by ip_output.
If the packet is not being forwarded and has to be sent to
another host then initialize the IP header.
2. Route Selection
A cached route may be provided to ip_output as an
argument. UDP and TCP maintain a route cache
associated with each socket.
Verify Cached Route for
destination address
If (cached_route == destination)
no
Locate route : Call rtalloc(dst_ip) to
locate a route to the destination. Find
the interface on which the packet has
to be placed. Ifp points to the
interface’s ifnet structure. If
rtalloc(dst_ip) fails to find a route,
return host unreachable error.
yes
Find the interface on which the
packet has to be placed. Ifp points to
the interface’s ifnet structure.
Check if the cached route is the correct destination. If a
route has not been provided, ip_output sets a temporary
route structure called iproute.
If the cached route is provided, find the interface on
which the frame has to be sent.
If the packet is being routed, rtalloc locates a route to
the address specified by dst. If rtalloc fails, an
EHOSTUNREACH error is generated. If ip_forward called
ip_output the error is converted to an ICMP error.
If the address is found then ifp is made to point to thr
ifnet structure for the interface. If the next hop is not the
packets final destination, then dst is changed to point to
the next hop router.
3. Source address selection and Fragmentation
Check if valid source
address is specified.
no
Select the IP address of the outgoing
interface as the source address.
yes
Fragment the packet if it’s size is
greater than the MTU.
The final section of the ip_output ensures that the
IP header has a valid source IP address. This
couldn’t have been done earlier because the route
hadn’t been selected yet. If there is no source IP
then the IP address of the outgoing interface is used
as the source IP.
yes
Does the packet have
to be fragmented ?
Larger packets (packets that exceed the MTU) must
be fragmented before they can be sent.
no
If there are no check_sum errors, send
the data to if_output function of the
selected interface.
In either case (fragmented or not) the checksum is
computed (in_cksum). If no errors are found, the
data is sent to if_output function of the output
interface.
Interface Layer (if_ethersubr.c)
ether_output(struct ifnet *ifp, struct mbuf *mbuf, struct sockaddr *destination, struct rtentry *routing_entry)
1. Verification
2. Protocol-Specific Processing
3. Frame Construction
4. Interface Queuing.
1. Verification
no
Ethernet port
up and running ?
ifp -> if_flags &
(IF_UP | IF_RUNNING )
yes
senderr (ENETDOWN)
Interface Layer(if_ethersubr.c) - ether_output(struct ifnet *ifp, struct mbuf *mbuf,
struct sockaddr *destination, struct rtentry *rt_entry)
Function: Takes the data portion of an Ethernet frame ans encapsulates it with a 14-byte header and places it on the interface send_queue.
Phases: Verification, Protocol-Specific Processing, Frame Construction, Interface Queuing.
Arguments ifp points to outgoing interface’s ifnet structure
mbuf is the data to be sent
destination is the destination address
rt_entry points o the routing entry
InitializeEthernet header - struct eth_header *eh
Verification
no
Ethernet port
up and running ?
ifp -> if_flags &
(IF_UP | IF_RUNNING )
yes
senderr (ENETDOWN)
0
Route valid ?
rt_entry = rtalloc1 (destination, 1)
senderr (EHOSTUNREACH)
1
Next hop a gateway ?
rt = rt -> rt_gwroute
0
1
Destination responding
to ARP requests?
If not then do not send more
packets to avoid flooding.
rt -> rt_flags &
RTF_REJECT
no
Verification
Protocol Specific Processing
Functionality: Finds Ethernet address corresponding to the IP address of the destination.
Protocol Specific Processing
destination -> sa_family
AF_INET
Send ARP broadcast to find the
ethernet address corresponding to the
destination IP address
Use m_copy( ) to keep the packet till
an ack. Is recvd.
Frame Preparartion
Protocol Specific Processing
Frame Preparartion
Make sure there is room for the 14 byte
ethernet header
M_PREPEND ( m, sizeof(ethernet_header),
M_DONOTWAIT)
Form the Ethernet header from
ethernet frame type,
ethernet MAC address,
unicast ethernet address associated
with the output interface.
e.g. the default gateway for a host
Frame Preparartion
Interface Queuing
yes
Is the output queue full
Discard the frame
Free the memory buff
senderr ( ENOBUFS )
no
Place the frame on the
interface’s send queue
lestart ( ifp )
if_snd
lestart ( ifp )
Interface Layer(if_le.c) - lestart(struct ifnet *ifp)
Function: Dequeues frames from the interface output queue and arranges for them to be transmitted by the Ethernet Card.
struct le_softc *le = & le_softcl [ ifp -> if_unit ]
0
le -> sc_if.if_flags &
IFF_RUNNING
1
Copy the the frame in mbuf to the
hardware buffer
Set the IFF_OACTIVE on to indicate that the
device is busy transmitting.
return error
Download