Web Architecture - Week 4 Technical Lecture

TCP/IP, UDP, Ports, LANs, WANs, and other curiosities

TCP and UDP

In this week's technical discussion, we discuss the concept of ports and how they work with the IP addresses you studied last week. If you have not read last week's assignment on IP addresses and need a brush up, you can revisit the read here. The video is very important this week to tie together everything from level five down to level one. We will linger on levels six and seven the last two weeks since I can tell those are the levels most of you will acquire your creativity from.

The devices and comptuers connected to the Internet use popular protocols called TCP/IP and UDP to communicate with each other. When a computer in New York wants to send a piece of data to a computer in England, it must know the destination IP address that it woud like to send the information to. That information is sent most often via two methods, UDP and TCP (when using TCP, IP first generates a fixed connection between two devices and then TCP uses that path to communicate).

TCP stands for Transmission Control Protocol. Using this method, the computer sending the data connects directly to the computer it is sending the data it to, and stay connected for the duration of the transfer. The transmission control features enable two computers to guarantee to each other that the data has arrived safely and correctly, and then they disconnect the connection. This method of transferring data tends to be quicker and more reliable, but puts a higher load on the computer as it has to monitor the connection and the data going across it. A real life comparison to this method would be to pick up the phone and call a friend. You have a conversation and when it is over, you both hang up, releasing the connection.

UDP stands for User Datagram Protocol. Using this method, the computer sending the data breaks the information into a nice little packets of information and releases each packet into the network with the hopes that routers will be smart enough to get each to the right place. Yes, this means that UDP does not connect directly to the receiving computer like TCP does, but rather sends the data out and relies on the devices in between the sending computer and the receiving computer to get the data where it is supposed to go properly. This method of transmission does not provide any guarantee that the data you send will ever reach its destination. On the other hand, this method of transmission has a very low overhead on the sending computer and is therefore very popular to use for services that are not that important to work on the first try (or where a packet here and there need not get to the destination to make sense out of the message being sent - think about static in a radio transmission whereby part of the signal is not received properly - you still understand the message because your brain ignores the static and makes sense of the transmission without that information (human brains are really great at that).

A good analogy for UCP is plain old US Postal Service. You place your mail in a mailbox and hope the Postal Service will get it to the proper location. Most of the time they do, but sometimes it gets lost along the way. You don't mind taking the risk because the historical level of service has been acceptable.

Now that you understand what TCP and UDP are as messaging delivery strategies, we can start discussing TCP and UDP ports in detail.

TCP and UDP Ports

As you know every computer or device on the Internet must have a unique number assigned to it called the IP address (or else be managed by a smart device that has an IP address and can manage multiple devices as a proxy for those other devices. This IP address is used to recognize your particular computer out of the millions of other computers connected to the Internet. When information is sent over the Internet to your computer how does your computer accept that information? It accepts that information by using TCP or UDP ports.

An easy way to understand ports is to imagine your IP address is a cable box and the ports are the different channels on that cable box. The cable company knows how to send cable to your cable box based upon a unique serial number associated with that box (like an IP Address), and then you receive the individual shows on different channels (which would be like ports in our explanation).

Ports work the same way. You have an IP address, and then many ports on that IP address - You can have a total of 65,535 TCP Ports and another 65,535 UDP ports. When a program on your computer (or other device) sends or receives data over the Internet it sends that data to an IP address and a specific port on the remote computer, and receives the data on a usually random port on its own computer. If it uses the TCP protocol to send and receive the data then it will connect and bind itself to a TCP port. If it uses the UDP protocol to send and receive data, it will use a UDP port. Figure 1, below, is a represenation of an IP address split into its many TCP and UDP ports. Note that once an application binds itself to a particular port, that port can not be used by any other application. It becomes blocked on a first come, first served basis.

<-------------------- 192.168.1.10 -------------------->

65531

65532

65533

65534

65535

Figure 1. IP address with Ports

Let's use Web servers in an example as you all know that a Web server is a computer running an application that allows other computers to connect to it and retrieve the Web pages stored there. Remind yourself of how the hierarchical domain names and URL paths uniquely address information on a Web server.

In order for a Web server to accept connections from remote computers, such as your computer that runs a Web browser and asks for a page, it must bind the Web server application to a local port. The computer hosting Web services will then use this port to listen for and accept connections from remote computers. Web servers typically bind to the TCP port 80, which is what the HTTP (HyperText Transport Protocol) uses by default, and then will wait and listen for connections from remote devices on that port. Once a device is connected, it will send the requested Web pages to that remote requesting device, and when done disconnect the connection.

On the other hand, when you consider the remote device connecting to a Web server it works in reverse. A Web browser on that device picks a random TCP port from a certain range of port numbers, and attempt to connect to port 80 on the IP address of the Web server. When the connection is established, the Web browser will send the request for a particular Web page and receive it from the Web server. Then both computers will disconnect the connection to finish up the TCP session.

If you wanted to run an FTP (File Transfer Protocol) server, which is a server that allows you to transfer and receive files from remote computers, on the same Web server. FTP servers use TCP ports 20 and 21 to send and receive information, so you won't have any conflicts with the Web server running on TCP port 80. Therefore, the FTP server application when it starts will bind itself to TCP ports 20 and 21, and wait for connections in order to send and receive data.

Each application that wants to provide unique Web services can run on a different port. Most major applications have a specific port that they listen on and they register this port information with an organization called the Internet Assigned Numbers Authority (IANA). You can see a list of applications and the ports they use at the IANA Registry. With developers registering the ports their applications use with IANA, the chances of two programs attempting to use the same port, and therefore causing a conflict, becomes diminished.

LAN and WAN Subnetworks Under IP

Remember that TCP can only perform its work if it has an Internet Protocol service available with which to rely on (and therefore software to do TCP is always bundled with IP when distributed for use). The Internet Protocol communications process itself is comprised of four layers (remember that each layer in the OSI can be implemented by sublayers within that layer): Subnetwork, IP, Transport, and Application. Each layer provides service(s) to the layer above it, and depends on the service(s) offered by the layer below it. Let's focus on the subnetwork layer of the IP model. Both Local Area Network (LAN) and Wide Area Network (WAN) subnetwork technologies are similar, but different enough to merit different marketing strategies and technical specialists.

In fact, the acronyms LAN and WAN are based on general conventions and marketing - more than any one difference in the specific technologies. The following table is a good guide to the differences. Get the basic overview here, read about the subnetworking layer after this table, and then read more about WAN technologies. Almost everything I have presented in class to date has been presented with the LAN in mind (since I have my own LAN in my house and Providence does not have a public WAN yet like some communities (Philadelphia had to fight the big telephone companies in court to provide a free public WAN to city residents). When I lived in Christchurch, New Zealand for six months (avoiding the northern winter of course), there was one family that lived up on the high bluff that provided a WAN for 30,000 residents - out of the goodness of their heart! Perhaps your town will provide a free public WAN someday, but be careful what you ask for...

LAN WAN Hide All Show All

Definition: LAN (Local Area Network) is a computer network covering a small geographic area, like a home, office, or group of buildings WAN (Wide Area Network) is a computer network that covers a broad area (i.e., any network whose communications links cross metropolitan, regional, or national boundaries hide

Example: Network in an organisation can be a LAN The Internet as a whole is the ultimate example of a WAN hide

Ownership: Typically owned, controlled, and managed by a single person or organization WANs (like the Internet) are not owned by any one organization but rather exist under collective or distributed ownership and management hide

Technology: Tend to use certain connectivity technologies, primarily Ethernet and Token Ring WANs tend to use technology like ATM, Frame Relay and X.25 for connectivity over the longer distances hide

Data transfer rates: LANs have a high data transfer rate WANs have a lower data transfer rate as compared to LANs hide

Geographical spread: Have a small geographical range and do not need any leased telecommunication lines Have a large geographical range generally spreading across boundaries and need leased telecommunication lines hide

Connection: one LAN can be connected to other LANs over any distance via telephone lines and radio waves Computers connected to a wide-area network are often connected through public networks, such as the telephone system. They can also be connected through leased lines or satellites hide

Set-up costs: If there is a need to set-up a couple of extra devices on the network, it is not very expensive to do that In this case since networks in remote areas have to be connected hence the set-up costs are higher

What services does IP need from the LAN or WAN subnetwork layer? IP is an unreliable datagram service (meaning you can't rely that each datagram packet will get information to its destination reliably without the help of other services - like, for example, TCP), so to use IP, the link layer need not offer reliable delivery or any performance guarantees. What IP needs is simply a service that will transport its information packets from one IP-speaking device to the next. However, near the end of the 1990s, more and more new subnetwork technologies were developed, such as ATM LAN Emulation and 10/100/1000 Mbps Ethernet. With these new services, an IP "Type of Service" byte (just some additional 0s and 1s added within the frame of a packet) was used to implement so-called Differentiated Services which may allow different classes of service within an IP-based network, other than the generic default which is best effort (the one most of us get from our cable or phone service). At the same time, newer applications of IP were being developed to take advantage of minimum quality of service capabilities - quite often to provide enhanced "multimedia" services as multimedia became more mature on the Web. Many created Quality of Service (QoS) plans in order for subscribers to pay different prices based on the level of service they wanted as guaranteed.

Traditional IP-enabled applications are for "data" transfers, in varying amounts, with generally few time-related performance constraints, except for interactive applications such as remote computer terminals. In the Internet of the late 20th century, data-oriented applications were clearly dominant, as they had been since the Internet's inception. Rather than leave the impression that data protocols have no timing requirements, note that remote machine access and other protocols, with whose applications users interact directly, are delay-sensitive for that reason alone. If these protocols perform sluggishly, the users will be unproductive, and perhaps unhappy, too.

Emerging applications continue to create requirements for IP-based networks to support enhanced classes of service. A small list of such emerging IP-based applications include: voice-over IP (VoIP), video-streaming and audio-streaming, and video-conferencing and audio-conferencing. Web enhancements required to support these new applications often boil down to some statistical assurance that most packets will arrive within a certain maximum amount of time (i.e., packets will have bounded delay). Time-sensitive applications such as interactive voice may also require that a certain bandwidth be reserved, or that the variation of packet delay (also known as jitter) be kept within certain levels.

The topic of multimedia networking has already filled many books, and for now it will suffice to observe that such applications appear to be heavily used on the Internet for the duration. Certainly, VoIP is real enough that Internet Telephony Service Providers (ITSPs) are sprouting up, such as Vontage, Qwest, and others. Even established telephony providers such as Bell Atlantic and AT&T make significant investments in VoIP technology. These companies are willing to make billion-dollar-scale investments in the technology. If the Internet can continue to successfully enhance voice quality, which is one of the most demanding of all "multimedia" applications, the floodgates are open on a whole new class of both data and non-data applications that can leverage these new capabilities.

At a minimum, WAN and LAN link layer protocols provide for the encapsulation and transmission of higher-layer protocol packets, including IP packets. The link-layer encapsulation enables the higher-layer protocol's packet to travel through the subnetwork medium and be distinguished from other packets, which may be IP packets or packets of some other protocol stack (the term used to refer to a hierarchical interaction of protocols in software or hardware). Framing, or the process of prepending a data-link header (and, optionally, appending a trailer) to the higher-layer protocol packet, provides for the synchronous transmission of large sequences of data. Each frame starts with a pattern that allows for the destination station to synchronize its clock with the transmitter. Clock transmission is important in UDP transmission because packets can arrive out of order and need to be re-ordered based on the clock timestamp within each packet.

With UDP, the packet's frame synchronization pattern, which is transmitted immediately prior to the actual start of the frame, allows the receiver's hardware to synchronize itself to the transmitter's exact frequency. Once the receiver has been synchronized to the transmitter's frequency, it can then maintain synchronization throughout the duration of the frame. Synchronous subnetwork protocols, such as those that are managed on a LAN or WAN, require that timing be maintained over an entire frame (up to many thousands of bits). Clock recovery begins at the framing sequence and then the bits are transmitted such that the bit stream is self-clocking, once synchronization has been established.

Why is synchronization important? If a LAN is supposed to run at 10 Mbps, there is no guarantee that every device on the network will have precisely identical clock frequencies. This is due to manufacturing differences, power supply voltage differences, temperature variations, and other environmental variables such as the temperature, the age of the clock chip, and possibly even the ambient humidity of the air in different areas of a building. Considering these real-world factors, it would be a bad idea for each device to use its own local clock to receive data bits from another sender, since the transmitted bit timing would almost certainly not match up with the receiving station's bit frequency, ensuring data corruption.

I write all this to emphasize the two important points to you regarding our transport architecture decisions:

First, TCP builds coherency on top of a network connection by adding different packet transmissions outside of the main information being sent. These packets go back and forth between sender and recipient to acknowledge the receipt, compare its fidelity to the sent message, and request retransmissions when something is missed or garbled at the destination. This TCP works remarkably well, but it is a control freak in that it slows down the transmission to keep things coherent - even telling the sender to stop transmitting packets for a while while it catches up at the recipient's end. These extra packets back and forth can get quite high - and at the worst time! More correcting packets need to be sent when the Internet is sluggish which makes it more sluggish. It has a magnifying feedback effect which can be deadly for technologies. Most routers have logic to drop a connection when the sluggishness reaches a certain threshhold. The idea being it is better to have a minimum level of service for less connections than an unacceptable level of service for more connections.

Second, The UDP scenario is quite different above, right? No additional packets no matter what the situation - the recipient is stuck with doing the best it can with what makes it and in what order it arrives. The routers can be made more intelligent to give priority to some packets over others (think about the governing effect of that ability), but once a packet passes through each router, it is never heard from again (unless some circular loop gets formed by a bad routing table in one router - but in that case the feedback mechanism is overwhelming and the router shuts down in very short order). There are no fixed paths between source and destination which means the routers are very flexible and one can opt out of participating once it gets overloaded. Very nice indeed! You might not get the sense that the Web has been moving heavily towards UDP away from TCP/IP. And yet, HTTP, which you use to get your Web pages from Web servers around the world, continues to be built upon TCP/IP - it is not going away any day soon!

WANs

So, for those of you who are interested in WANs since they seem to be coming about in many municipalities, you can spend some time reading about them here. But, know that I am not really focusing on the WAN in the rest of the course because we have enough to learn about with the LAN and the LAN is something we might set up or buy for our homes as more devices become Web-enabled (remember that refrigerator I was talking about).

What technology must we look at when using a WAN? How do WAN 's work?

Well, a WAN does not use Ethernet (way too expensive to connect every device in a huge area with copper-based cables and insulation) - a WAN is something slightly different which uses other technologies by convention.

The first option is to use analog lines, and in this scenario, we usually have an analog modem, pretty much like a modem that you may have dialed up to your Internet Service Provider (ISP) with years ago. This was a very common way of connecting to the Internet and in this mechanism we have a PC connected to a modem, which can dial-up from time to time make a connection to a modem at the ISP which is in turn connected to a LAN. By dialing up, we are extending the LAN as a WAN.

The difference between an analog modem and a dial-up modem is that an analog modem doesn't dial. On the other side we would have an analog modem as well, so we have a local client and a remote client and between the two, we have a telephone company supplied piece of copper cabling. How the internal service of this supplied copper cabling works is again out of the scope of this course but really what this means is that we can now connect a local office to a remote office.

There are disadvantages to analog modems:

1. Analog WAN 's in general are slow - much slower than broadband cable modems (typically you can only get up to 4 Kbps across an analog connection which can be magnified to 56 Kbps with tricky compression and decompression processes in software that manipulates the data before transmitting it and after receiving it).

2. The other disadvantages are of analog lines are that this piece of copper is not guaranteed. What that means, is that every time there is rain or static or exceptionally dry conditions, there might be problems on this piece of copper line.

3. Telephone companies usually don't guarantee any degree of service across an analog line.

Some advantages could be that they are cheap - they're much cheaper than any other communication mechanisms with the exception of possibly using wireless, so they still are in fairly high demand in some parts of the world (for example, in South Africa there are still quite a number of installations of analogue circuits).

The second option is to use digital lines (T1, E1, and ADSL, etc.) as the means of connecting a remote and a local server together by a digital wire, and again this would usually be supplied by your local telephone company. A digital wire can run much faster because it 's a digital signal that 's being transmitted which means there is no conversion between an analog signal and a digital one. Think of a modem - when you dial up to the Internet you hear the buzzing, crackling and wheezing of the modem while it 's converting your digital bits coming out of the PC into analog sound and sending them across a piece of wire - those are analog frequencies which vary across a spectrum. In digital mode, with a digital line, there is no conversion happening, which means it 's much faster.

With digital transmision, on both the local and remote side there is a Network Terminating Unit, what they call an NTU. An NTU is equivalent to a modem. An NTU 's job is to provide an interface that we can connect our devices. In this scenario, we are transmitting digital data down this line rather than analog data. The disadvantage with digital is that it's expensive in most places in the world. Telephone companies like to define their digital lines as T1 (and a bundle of T1's as a T3) and E1 lines, where T1 is 1.5Mbps and E1 is 3.4Mbps.

The latest technology is ADSL, which is Asynchronous Digital Scriber Line this is a digital line, so we get the digital connection between the two but the Asynchronous Transfer means that the download speed can happen anywhere between 8 and 15Mbps. With ADSL, the upload speed is restricted to between approx 256k and 2Mbps (this will depend on your Telecom provider) but it is Asynchronous Transfer, which means it doesn't send/receive these things at the same speed. ADSL is only now being rolled out in some places in the world.

The third WAN technology is another form of dial-up line bundles - digital Integrated Services Digital Network (ISDN)

ISDN offers a dial-up digital line instead of a dial-up analogue line. It uses a technology where it offers three lines at the same time:

1.a B channel
2.another B channel
3.a D channel.

The D channel is the data channel - it 's the channel used to communicate between the ISDN equipment and it 's not available for us to communicate on but runs at 16Kbps. Each B channel can run at 64 Kbps. So in fact, with ISDN we've got a maximum of 128Kbps of bandwidth when we use both B channels. The advantage of ISDN for example is that it can either use both B channels and get 128Kbps or we can use a single B channel (64Kbps) reserving the remaining B channel for telephone or fax communication, while simultaneously being attached to the network. The two B channels and a D channel offer us more flexibility and the dial-up is a digital rather than analog. The advantage of ISDN apart from the fact that you've got higher speed is also the connection time. The time to connect with an ISDN service is often less than 4 seconds. In other words, from the time that you dial to your ISP, until the time that you are actually connected and can start surfing the Web is less than 4 seconds.

Others: Wi-Fi and ATM

Wi-Fi is technology for connecting clients remotely and is the fastest growing technology offered by all the major players in this market. Wi-Fi or 802.11g is wireless connectivity offering to connect between 11 and 56Mbps and even higher. The advantage of wireless technology is it 's lack of the need of physical wire/copper or Fiber to connect to the client. In the past we've had a modem in some form, connected by a physical piece of wire to another modem, the wire is now gone and we will have a dish or an antenna talking to another antenna.

Another means of connecting is Asynchronous Transfer Mode (ATM) and this certainly offers the fastest Wide Area Connection available today. Speeds start at 155Mbps and running to approx 622Mbps, although with recent technology, we can expect speeds to be significantly higher. If you take that and you compare that to our LAN running at 1000Mbps, 622Mbps is only running 40% slower than what our Ethernet is running. So clearly this is where WAN 's are moving. Higher bandwidth is demanded and this can only be delivered by these types of technology at speeds high enough to satisfy the need for bandwidth. In South Africa the Telecoms company uses a combination of microwave and ATM technology to deliver service between Johannesburg, Durban and Cape Town, the three main centers. This technology can carry voice, video and data at great enough speeds to ensure some quality of service. South Africa is representative of where the second world is headed with WANs - and the second and third world need WANs to get more people connected to the world so that they don't fall further and further behind on important health, climate, financial, and you name it global issues.

Now, a few more thoughts to get us back to the LAN (which you now know needs to be connected to the WAN unless it is just an intranet that is not going to be accessible to the outside world - but usually intranets are made by software and not hardware decisions because the people on the intranet want to get to the outside world - in our lab in Seattle, we ran both as separate distinct networks but kept a splitter in each wall to combine the two for the device user - very fancy and nice for keeping private stuff private, but very expensive and (so I think) wasteful for just a technology research organization)!

The bottom line is that no matter how you slice it, all LAN and WAN types need to be interconnected - it is not practical for one LAN to span the whole world, or even an entire company (unless it is very small). LAN interconnection devices typically operate at either OSI Layer 1 (the Physical layer), Layer 2 (the Data Link layer), or Layer 3 (the Network layer) - you know the devices I refer to from week 2.

Frames (the message plus the special header and trailer that identifies processing needs) that arrive on a LAN interconnection device's interface have three processing choices: 1) simply flow straight through the device as a single stream of packets, 2) be received by the device, which verifies the layer-2 frame check sequence to ensure error-free reception then forwards the packet based on its layer-2 destination address, or 3) be received by the device, pass layer-2 input error checks, and then be forwarded based on the layer-3 destination address, after having proper outgoing layer-2 header and trailer attached. Each layer depends on the one immediately below it.

We start with abstract ideas that we give names to, but each abstract idea needs to be implemented using the OSI layers all the way down to the physical layer (which we spend very little time on and yet is very interesting because it is based on the natural physics of our natural world).