Jim Williams HONP-112 Week 7 Part 1: Networking overview Part 2: Data transfer methods Part 3: Communication Channels A Network consists of at least two computers, and other peripherals (like a printer) that are connected to each other. It also contains the intermediate equipment that makes these connections possible. Computer networks allow resources to be shared, and allows data to be transferred from one location to another. Efficiency: For example, you can have one inventory database that is shared by many different store branches. Economy/Ecology: For example, instead of buying and maintaining a printer for every computer in a classroom, you can just have one shared printer on the network. Network Interface Card (NIC) or the equivalent circuitry must be in each device that is connected to the network. Communications channel(s): the appropriate physical wiring. Separate machines that regulate how data is communicated along the network (some examples: hub, switch, router, gateway, bridge – some functionality may overlap). We will not discuss these details in this class. A Local Area Network (LAN) covers a small geographical area (a house, office building, college campus, etc.) A Wide Area Network (WAN) covers a large geographical area (a college with multiple campuses, a business with many store locations, etc.) Frequently, WANs consist of multiple LANs that are connected to each other over long distances. A Protocol means a “set of rules” that the computers on a network must understand in order to communicate with each other. Different networks can use different protocols. Some you may have heard of include: ◦ Ethernet (common for LANs and WANs) ◦ TCP/IP – Transfer Control Protocol / Internet Protocol ◦ HTTP – Hypertext Transfer Protocol (i.e. the WWW) There are many others. Just know that a computer that understands one protocol cannot communicate with another computer that understands a different protocol. However, in practice, MOST computers and other digital devices are configured to “understand” TCP/IP. Part of the network interface circuitry in a machine includes a unique address called a Machine Access Control (MAC) address. These addresses are binary patterns which are 48 bits long. How many bytes long are MAC addresses How many possible MAC addresses are there? Do you think it is possible to run out of MAC addresses? What then? Besides MAC addresses, another way of identifying a computer is the IP Address. This is only applicable of course devices that understand the TCP / IP protocol, and wish to operate over the Internet. These addresses are binary patterns which are (at this time) 32 bits long. How many bytes long are IP addresses How many possible IP addresses are there? As Internet use has increased, there is a danger of running out of available IP addresses. Think how you would solve the issue, then research how the issue is currently being handled. Peer-to-peer networks consist of numerous computers connected to each other, but there is no “main” computer in the system. ◦ Example: file-sharing networks like bit torrent, limewire, etc. Client-Server networks consist of a main computer called the “server,” and the multiple computers attached to it are called the “clients. ◦ Example: the BlackBoard system at this college. You have to log into the BlackBoard server from your home computers (which function as the clients). Peer-to-peer networks consist of numerous computers connected to each other, but there is no “main” computer in the system. ◦ Example: file-sharing networks like bit torrent, limewire, etc. Client-Server networks consist of a main computer called the “server,” and the multiple computers attached to it are called the “clients. ◦ Example: the BlackBoard system at this college. You have to log into the BlackBoard server from your home computers (which function as the clients). Topologies describe the ways that machines are connected to each other on a network. We will briefly discuss some – try to think of the advantages and disadvantages of each as we do. Ring : Each machine on the network is attached to only and only two others. ◦ Less expensive, but less resistant to failure Mesh: Each machine is attached to every other machine ◦ Very resistant to failure, but expensive to maintain. Star : Each machine has a single connection to a central machine. ◦ Very common for client-server networks. Network will fail if central machine goes down. Bus: Each machine is connected to a single high-capacity data line called a “backbone” ◦ Example: large telecommunications networks. Tree : This is actually several star networks attached to a “backbone” that essentially connects networks to other networks. ◦ Very common for wide–area networks of large businesses, universities, etc. that may have multiple physical locations, each with its own network. Networks would be useless unless we were able to actually send data from one machine to another. There are many ways to do this, but we will focus on one very common method that is shared by many types of network protocols. Data communication in most cases happens asynchronously. This means that a single logical “piece” of data sent over the network (for instance a file, an e-mail message, etc.) is not transmitted “in order.” Rather it can be “broken up” into smaller pieces, and later “reassembled” once all the pieces arrive at the destination. It does not matter what order the pieces arrive in! Imagine you had written a book, and you were MAILING (using the Post Office) a copy to your editor. But for some strange reason your editor only allows you (and all the other authors he works with) to mail one page at a time, in separate envelopes! There is of course no guarantee that all of the envelopes will arrive at the publisher in the correct order, etc. Is this possible? Yes. But think for a minute what would be needed for this scheme to work successfully. You (the sender) would need a way to communicate directly with the editor (the receiver). The sender needs to tell the receiver to expect a new book. The sender needs to know where to mail the pages to. The receiver needs to know where the pages are being mailed from. The receiver also needs to know what book the page is from (the same author may be mailing several books). The receiver also needs to know how many pages each book should have. The receiver needs a way to notify the sender if any pages are missing. The sender can the re-send. The receiver needs a away to confirm with the sender that the entire book was received successfully. If we are mailing a letter, an envelope can contain the sender and receiver address. If we wish, we can also write the title of the book, and something like “page number 7 of 153” on the envelope as well. To keep things safe we can also write this information on each page that we put into the envelope (in case the envelope gets lost...). The receiver will of course first divide the many envelopes he receives into separate stacks for each sender/book he receives books from. Then he opens each envelope and assembles the book pages into the correct order as they come in. If after a certain amount of time (days in this case??) there are still missing pages, based on what the total number of pages is, he will call the sender so the page can be re-sent. Notice the various types of communication that need to happen: ◦ Author tells editor to expect a book shipment (good thing to check – what if editor is on vacation/unavailable)? ◦ Editor needs to ask author for any missing pages after an agreed-upon time. ◦ Author needs to inform editor that the book has been received in its entirety, and author needs to confirm/agree. This marks the end of that transaction between the two parties. When a file, or other single “piece” of data is send over a network, almost the same thing happens as in our hypothetical author/editor example. The data is broken into smaller chunks called “packets” which are sent out to their destination on the network. All the packets may not arrive at the same time, or even by the same means, but hopefully they all eventually get to their destination. A data “packet” consists of two main parts. There is a “header” which describes what the packet contains, where it is coming from, where it is going to, etc. This is simliar to the “envelope” in our example. There is also the data itself the packet contains. This is similar to the “page” in our example. Different protocols have different formats for packets. We will not be concerned with the technical details. Just know the principles. The sending machine must establish communication with the receiving machine. The receiving machine must of course respond in kind. In technical terms this is called a “handshake.” When packets are received they must of course be reassembled. If any expected packets are missing, the receiving machine sends a special signal to the sending machine to re-send the packet. When the file is successfully reassembled the receiving machine sends an “Acknowledgement” signal to the sending machine that the file was received. The sending machine can then end the “handshake” and the transaction is complete. Given: Our imaginary packet format is as follows: Header: ◦ ◦ ◦ ◦ ◦ File ID Number: From: To: Total packets for this file: Packet Number for this packet: Data ◦ 16 Byte “chunks” File: a text file that says “I love learning about data communication.” ◦ Assume we are using ASCII (1 byte per character) as per our ASCII chart to store this data digitally. So the file is 41 bytes long. We said that it needs to be broken up into 16-byte “chunks”. This means we need to break it up into three packets before we send it over the network. It is easiest at this point to express the file in hexadecimal bytes as per our ASCII chart.. Then we can easily break up into our three packets. “I love learning about data communication.” Expressed in hexadecimal characters: 49 20 6C 6F 76 65 20 6C 65 61 72 6E 69 6E 67 20 61 62 6F 75 74 20 64 61 74 61 20 63 6F 6D 6D 75 6E 69 63 61 74 69 6F 6E 2E Here is our hexadecimal representation of the file. Remember that a single ASCII byte is represented by two hex digits. So: Packet 1 data: 49 20 6C 6F 76 65 20 6C 65 61 72 6E 69 6E 67 20 Packet 2 data: 61 62 6F 75 74 20 64 61 74 61 20 63 6F 6D 6D 75 Packet 3 data: 6E 69 63 61 74 69 6F 6E 2E Each packet must also contain a header. We will be completely hypothetical and say that we can represent this however we wish. Let’s just keep our header in “plain English” for this exercise. Let’s use this header: File ID: 435-2011-10-13-08:09:76.341AM From: 435 To: 3998 Total number of packets: 3 Packet number: (# of packet) Here is what packet 2 for this example may look like: File ID: 435-2011-10-13-08:09:76.341AM From: 435 To: 3998 Total number of packets: 3 Packet Number: 2 Packet Data: 61 62 6F 75 74 20 64 61 74 61 20 63 6F 6D 6D 75 Understand the basics of how asynchronous transfer works. I may give you packets and ask you to reassemble the text file (and also possibly identify any errors with the packets if that is the case!). I may also give you a text file and tell you to break it into packets, using some clearlydefined hypothetical scheme. The next slides will discuss data communication channels and their units of performance measurement. We will also work out some hypothetical problems to determine how much time certain data transfers may take. Most communication channels (wires/lines) are suited for analog electrical signals (due to physical properites that are beyond this class) So, before data is sent through a channel, it has to be modulated into an analog form (without getting too technical: 0s and 1s are translated into different “frequencies”). When it is received by the other machine, it must be de-modulated back to digital form. The device that does this is called a modulator/de-modulator, or “Modem” for short. This term is used to describe the amount of data that can pass through a point in the network during some specified length of time. The time element is very most important to keep in mind. Because, in this context, the bandwidth is what determines the speed in which the data can travel from one point to another. Imagine I wanted to pump 100 gallons of water from a lake into a storage tank. I have two hoses I can choose from (a narrow garden hose, and a wide firefighter’s hose). Of course, 100 gallons of water can pass through either hose with no problems. But, through which hose will it take less TIME for the 100 gallons to pass through? Simple physics tells us that it will take longer for the same volume of liquid to move from point A to point B when confined to a smaller space, versus a larger space (because less water can reside in a smaller space - at a point in time than in a larger space) So, the same volume of water will take a longer time to move through the narrow hose, as opposed to the wider hose. Looked at another way, we can pick a point on the hose and measure the volume of water which passes that point in a given time (like one second). Bandwidth is a figure that describes how many bits can be communicated along the data communications channel in a given interval of time. Different communications channels have different bandwidths (for various physical and technical reasons). In most cases bandwidth is measured relative to Bits (as opposed to bytes). So you will frequently see figures relative to kilobits (Kb), Megabits (Mb), etc. Notice the lower case “b” – which means a single bit. Ethernet (LAN): 100 Mbps T1 backbone (WAN): 1.5 Mbps T3 backbone (WAN): 44 Mbps Telephone Modem: 56 Kbps ADSL Modem: Cable Modem: Fiber Optic Internet ◦ 2 Mbps (download) / 512 Kbps (upload) ◦ 5 Mbps (download) / 384Kbps (upload) ◦ 15Mbps (download) / 5Mbps (upload) ** IMPORTANT – These are just some samples. There are many variations – too numerous to mention. If you are curious about the bandwidth of your own personal network service (like Cable/DSL, etc.) contact your vendor. Bandwidth figures do not always give an accurate representation of real-world performance. Many issues, including the current amount of other network activity, can cause performance to degrade and the data communication “speeds” to be much slower that the stated figures. While “bandwidth” describes what a network is capable of, the term “throughput” is sometimes used to describe actual performance (of course this changes as the network conditions change). I want to download a 6.5 MB file from the Internet. I am using an ASDL service capable of 2Mbps download bandwidth. Assuming that I am getting maximum throughput, how long, in seconds, will it take to down load the file? You can do this two ways: express the file in terms of bits, or express the bandwidth in terms of bytes. For this example we will use the first method. Express a 6.5 MB file size in terms of bits. There are 8 bits for every byte, so we multiple 6.5 by 8 to get the size in bits. 6.5 MB * 8 = 52 Mb (notice the small “b”). From now on consider the file to be 52 Mb. Now, we know the file size is 52 Mb. We also know the bandwidth is 2 Mbps. So divide the file size by the bandwidth. The Mb will “cancel out” and we will be left with a measurement in seconds. X = 52 Mb / (2 Mb / sec) = 26 seconds When we discussed the file wise, we did NOT account for any breaking it up into packets, additional space for packet headers, etc. We also did not adjust for any other issues that may impact the network performance. This is because I am only trying to teach the basic principles of how we can calculate the network performance, given a file size and a bandwidth. In real life there are many other factors that come into play, beyond the scope of this class. But hopefully these simplified problems will help you understand the concept. Let’s try another probem, stated in simpler terms: Given: File size = 45 MB, bandwidth = 384 Kbps (cable upload speed) How many minutes will it take to upload the file? File Size: 45MB * 8 = 360 Mb ◦ ** file size now expressed as megabits Bandwidth: 384 Kbps X = 360 Mb / (384 Kb / sec) X = 360 Mb / (.384 Mb / sec) ◦ ** both figures expressed as megabits. (Remember K means “times a thousand” and M means “times a million”). X = 937.5 sec X = 937.5 sec / (60 sec / min) = 15.625 min ◦ **answer expressed in minutes For solving performance problems: Know the time increments the answer is supposed to be in Take care to do proper conversions to make sure both the file size and bandwidth are relative to the same units of measure (bits vs bytes). Along with this, also make sure you handle the prefixes like Kilo- and Mega- correctly. We did all these things in our second example problem. Know how to solve other problems of similar complexity.