Web Architecture

Week 3 Technical Information

IP Addresses, Domain Name System, Name Servers, Virtual Addresses, and Ping as a useful learning tool

This week we are focusing on how a network is formed among devices on the Internet that want to communicate as a Web of devices using the Web architecture available. The concepts you are reading about in this document are the backbone of what makes the traditional Web so powerful - connecting any two entities in the world via a common communication platform.

IP Addresses

An IP address (Internet Protocol Address) is a unique identifying number given to every single computer on the Internet. Like a car license plate, an IP address is a special serial number used for identification by routers and other devices that want to contact a specific virtual location on the Web. IPv4 is the current standard of IP addressing scheme used in the United States. This scheme gives each registered entity requiring an address a number which is made somewhat more human-readible through breaking it into four sub-numbers with periods in between each sub-number. For example, our www.hitl.washington.edu Web server is registered as an accessible entity via its IP Address which is:

128.208.63.17

That number is unique in the world to the Web server I have made available to this class. The server sits in a copier room in Seattle, Washington that happens to have very a fast (or a high-bandwidth) communications service for all devices in the room. The devices in that room communicate with each other via a router that keeps each address associated with a specific cable that is attached to each port of the router. Our Web server is just one of the devices plugged into that router. The highest numbered IP address via the IPv4 standard is:

255.255.255.255

and, not surprisingly, the lowest numbered IP address available via the IPv4 standard is:

0.0.0.0

So, you have probably guessed that means each sub-number can range from 0 to 255 - you are correct. That gives us 4,294,967,296 (256x256x256x256) possible numbers to give out as unique IP addresses. That is less than one number for every human being alive on the planet today. We don't have enough numbers for everyone to connect to the Web via a cell phone at the same time and be able to be contacted directly via a IP address. Ranges of IP addresses were distributed by country when they were first made available.

China and Korea were not given enough based on their plans for using IP addresses. As a result, there is an IPv6 standard that can be used to extend the groups of sub-numbers in an IP address from four to six. Any organization can broadcast their devices as IPv6 devices, but receiving devices might not be able to understand the extra two sub-groups. You should read up on the IPv6 standard if you want to stay ahead of the curve as to where we might be headed with IP addresses (I mean, if we want every refrigerator and automobile to have an IP address, we need lots of them, right?). It is a thoughtful standard with some very nice features that will make the Web better overall. But, it costs more to adopt the IPv6 standard than to refrain from doing so. IPv4 looks like it will be with us for a long time. We'll only use IPv4 in this class for simplification. It is representative of the architecture decisions that need to be made and you can think creatively about everything you might want to think about just by considering IPv4.

Organizations use Domain Controller devices to deal with the explosion of devices connected to the Web. With a domain controller, you can request a single external IP address for many devices and program the domain controller to keep track of your organizational devices while representing them as a single IP address to the external world. Every external request that is made to your organization comes into the domain controller (think of it as a basic router-server mix of services) as it is the only externally addressed device available via the IPv4 addressing scheme - the domain controller passes on all communication messages based on an internal addressing scheme that only the domain controller needs to understand. Many organizations like this because it means there is an obvious place in which to place security measures for the whole organization. The domain controller can delete spam, check for faulty connections to devices, keep a record of all transactions on mission critical applications, etc. The domain controller is often the device that manages the Domain Name Service for the organization. What is a Domain Name Service you ask? Well, it is a service that implements use of the Domain Name System...

Domain Name System

You will see I assigned an academic paper this week for you to read. The paper is highly regarded and is representative of how Web architects communicate with each other when publishing studies and new ideas relative to Web-enabling concepts. I spent years wading through such papers before I became efficient at reading and processing them in my mind. You have to start somewhere and so I offer you a very insightful paper as your perhaps first academic paper on a Web architecture issue. Take your time and learn what you can from the paper. What follows here is a very simple discussion of the DNS concept mentioned so often in the paper.

Internet architects made many decisions (as long ago as 1965) upon which Web architecture has been built without change. Although the devices of the Web speak IPv4 addresses fluently and reliably without confusion, the numbers weren't considered an optimal communication approach to sharing device addresses among human beings. The Domain Name System protocol was developed and defined in the early 1980s and published by the Internet Engineering Task Force. These architects agreed upon the domain name as an equivalent naming scheme for addresses on the Internet - we use the domain name concept still today, often adding a 'www.' to the front of an exisiting domain name to suggest new features bundled upon new Web capabilities. Newer organizations don't need to use the 'www.' prefix to their domain names. Previously, www.oworld.org was the domain name for this class' server that Web routers referred to by 128.208.70.92. That is correct. OWorld is a newer organization that grew up with the Web. There was no reason for me to give it a 'www.' prefix since oworld.org domain name did not already exist. I was just too enamored with the 'www.' craze to do my due dilligence into best naming methods for domain names. There were ramifications as I typed www thousands of times when I really did not need to (yes, I eventually did set up an alias to the domain name so that you don't have to type the www to access the OWorld domain) had I set it up right from the start.

The official line is that "The Domain Name System (DNS) is a hierarchical naming system for computers, services, or any resource participating in the Internet". Nicely said. Let's continue to read an official sounding explanation for a while...

The DNS associates various information identifiers with domain names assigned to such participants. Most importantly, it translates domain names meaningful to humans into the numerical (binary) identifiers associated with networking equipment for the purpose of locating and addressing these devices world-wide. An often used analogy to explain the Domain Name System is that it serves as the "phone book" for the Internet by translating human-friendly computer hostnames into IP addresses. For example, www.example.com translates to 208.77.188.166.

The Domain Name System makes it possible to assign domain names to groups of Internet users in a meaningful way, independent of each user's physical location. Because of this, World-Wide Web (WWW) hyperlinks and Internet contact information can remain consistent and constant even if the current Internet routing arrangements change or the participant uses a mobile device. Internet domain names are easier to remember than IP addresses such as 208.77.188.166 (IPv4) or 2001:db8:1f70::999:de8:7648:6e8 (IPv6). People take advantage of this when they recite meaningful URLs and e-mail addresses without having to know how the machine will actually locate them.

The Domain Name System distributes the responsibility of assigning domain names and mapping those names to IP addresses by designating authoritative name servers for each domain. Authoritative name servers are assigned to be responsible for their particular domains, and in turn can assign other authoritative name servers for their sub-domains. This mechanism has made the DNS distributed, fault tolerant, and helped avoid the need for a single central register to be continually consulted and updated. In general, the Domain Name System also stores other types of information, such as the list of mail servers that accept email for a given Internet domain. By providing a world-wide, distributed keyword-based redirection service, the Domain Name System is an essential component of the functionality of the Internet.

The addressing process has been so useful that other identifiers such as RFID tags, UPC codes, International characters in e-mail addresses and host names, and a variety of other identifiers could all eventually utilize DNS. If you think like me, even the phone system could converge on an IPv6 addressing scheme to make phone systems around the world more consistent for the global village.

The Domain Name System also defines the technical underpinnings of the functionality of this database service. For this purpose it defines the DNS protocol, a detailed specification of the data structures and communication exchanges used in DNS, as part of the Internet Protocol Suite (TCP/IP) that we will be looking at next week. The database architecture that led to a widely-accepted specification is critical for routers to be able to communicate with each other and connect two (or more) devices anywhere in the world into one communication session. These databases need to work so quickly and handle addresses from all around the world instantaneously. As a result, the database specification is often etched into the hardware of the communications hardware responsible for reliable address processing.

Name Servers

It is critical that a domain name be translated to the correct IP address. When you send out a private e-mail message, you rely on the fact it will only reach a person in the organization you are sending it to (if I send a message to my friend billybob@wisc.edu, I only want the University of Wisconsin-Madison e-mail server to get access to its contents). Well, we have name servers that make sure that happens all around the world. A name server (also called 'nameserver') consists of a program or computer server that implements a name-service protocol. It will normally map (i.e. connect) a human-recognisable identifier of a host (for example, the domain name 'en.wikipedia.org') to its computer-recognisable identifier (such as the Internet Protocol (IP) address 208.80.154.225), and vice versa.

A very important type of name server is an international authoritative name server. These name servers maintain a list of all properly registered and maintained domain names that have been authorized to use an IP address. You can't write new authoritative records into the databases on these devices (or if you can, you are a world-class hacker). Only certain organizations that can register domain names have the right to send new records to the administrative groups that manage authoritative name servers (companies like Go Daddy and Register.com are popular domain registration organizations). The requests for domain name addresses gets reviewed and approved regularly and the authoritative name servers are updated. Once the name servers are updated, they propogate their entries to all other name servers in the world. Each sub-name server (like perhaps a domain controller at one organization) can decide which domain names and addresses to keep alive in their local domain name databases. If a request is made for an address that is not in the local database, the device has to make a query to the Web. These queries are fascinating to get to understand but beyond the scope of this course. The paper I assigned you to read should give you a sense of it all in a meaningful way - it is the best thoughtful paper I have encountered on the subject in years (still very relevant since its first writing in 2002).

Let's look at a typical hierachical domain naming process for an organization. You can access a list of technical people in the Computer Science Department at the University of Washington by typing http://www.cs.washington.edu/people/staff/#technical (which is called a Uniform Resource Locator, or URL, by the way) in a Web browser. How does the Web find that resource for you in a way that you know you can rely on its contents being authentic? First, note how quickly the page appeared in your Web browser and realize it was limited by the speed of light connection between where you sit and Seattle, Washington (twice the distance since it was a round-trip request). Then realize the following had to take place before the Web was willing to send you that page:

Your Web browser first looked at the request to see if you already had visited that page recently. If you had, the page could already have been on your computer and the browser could retrieve it again without having to go out on the Web again to retrieve it. That service is called the Web browser cache - it keeps recent documents so that the Web isn't burdened with having to retrieve them again (yes, mainly for the Web's benefit than your own, but it can work out nicely for you as well). A potential downside to this caching mechanism is that the page may have changed since you downloaded it last and your browser gives you the old one without checking to see if a new one is out there (that would defeat the purpose of not burdening the Web). You can always request a refresh or reload of a page to force the Web to go get it fresh for you again. Then again, if you are doing official research, you might need to keep the old page for your references. In that case, the cache is doing what you need it to do on your behalf.

You probably had never been to this page before on your computer, right? So, the next thing your browser does is to see if you had been to any of the hierarchical addresses inherent in the domain name. If you had opened any resource at washington.edu, your browser knows how to talk with the domain controller there by IP address without bothering the Web to find out the address for you. Same goes for the cs.washington.edu which is even more specific to your needs. If you had been to cs.washington.edu anywhere in a URL request in the past, your browser would offer that more specific address. Otherwise, the washington.edu name server would provide the address of the cs.washington.edu for you. Now, the cs.washington.edu domain name pre-dates the Web! So, there is another level of naming that must be addressed for your request. The cs.washington.edu domain knows where the www.cs.washington.edu address is should you need to find it (your browser would already know it if you had requested any resource on that domain via a URL in the past). Now that you have found the address for the www.cs.washington.edu domain, the rest of your request (people/staff/#technical) can be processed by that domain controller. At this point, the addressing is mostly virtual outside the requirements of the Web architecture specification.

We need to look at virtual addresses to think about how our list of technical staff document would be delivered...

Virtual Addresses

A virtual address is a string of text that can be written for communications purposes but used by a device to go to that address by any means it wishes to use. We don't know if information about people is managed from a separate device from buildings at the University of Washington. And further down the hierarchy, we don't know if staff information is managed on a separate device from faculty or students. We do know that the technical staff directory is maintained on the same Web page as the teaching staff directory. We only know this by the fact the HTML standard reserves the cross-hatch or pound symbol (#) to refer to bookmarks on the same Web resource.

The best part of virtual addresses is that we don't have to care about how information is stored in order for us to access it! The UW computing staff can put all the information on the same Web server one day and then change it to separate servers the next (and back again the following day even). The key is that the domain controller associated with the official IP address knows how to find the information no matter how it is stored (the UW computing staff just needs to update the domain controller with the changes for it to keep on working fine for us).

Do you use a network at work or school whereby different shared resources are available through different network 'drives'? Perhaps you have an H: drive you can access from anywhere in an organization and still get to your documents? Well, those are great examples of virtual addreses. You type in a drive name or number and you get information off it no matter how it is stored or arranged.

Virtual addresses work all the way down to your core computer components. Computer memory works with its contents with virutal addresses. Each hard drive you use is managed via virtual addresses for each storable unit on the drive. The programs that manage virtual addresses are legendary and do wonderful things for us humans. And, yet, we barely have to worry about them or give them their due. Well, Web architecture makes virtual addressing pretty clean to understand versus other virtual addressing schemes. So, it is a great place for you to get your first exposure to the concept.

Ping

Ping is a great simple Internet tool (software application) you can use to connect to any device on the Internet that has a IP address and/or associated domain name. Ping sends what is called a ping message to that device. Devices respond in standard ways to ping messages. Here is an example of a ping session between my laptop in my kitchen and my old lab's server in Seattle:

bruce-campbells-computer$ ping www.hitl.washington.edu
PING www.hitl.washington.edu (128.208.63.17): 56 data bytes
64 bytes from 128.208.63.13: icmp_seq=0 ttl=51 time=105.247 ms
64 bytes from 128.208.63.13: icmp_seq=1 ttl=51 time=103.505 ms
64 bytes from 128.208.63.13: icmp_seq=2 ttl=51 time=111.709 ms
64 bytes from 128.208.63.13: icmp_seq=3 ttl=51 time=99.601 ms
64 bytes from 128.208.63.13: icmp_seq=4 ttl=51 time=99.466 ms
64 bytes from 128.208.63.13: icmp_seq=6 ttl=51 time=101.547 ms
64 bytes from 128.208.63.13: icmp_seq=7 ttl=51 time=108.899 ms
64 bytes from 128.208.63.13: icmp_seq=8 ttl=51 time=98.678 ms

I open up my ping software (very easy to do on all Windows, Mac, and Linux systems - Google the Web to see how to do it on your system if you don't know how to open a Command Prompt or Terminal Window from which you can just type ping). Then I type:

ping www.hitl.washington.edu

to request a ping from the www.hitl.washington.edu device. Ping requests are supposed to respond with the IP address of the machine responding to the ping message. In this case, the server does that as expected: ping www.hitl.washington.edu (128.208.63.17).

The ping message is very short (56 bytes) and gets sent consecutively over and over until I stop the pinging or a limit is set in the software. Each time, the device I have pinged responds with 64 bytes that I can interpret to see how fast the device is responding given the distance between us (if I know it). In this example above, each ping returns in about 100 milliseconds (that is 100ths of a second). That tells me I can go back and forth between my computer and the remote computer ten times a second (more or less). For sophisticated applications (I like to make those and am getting better at it), it tells me a lot about how my networking services will work (video game programmers look at this data very closely when putting game servers up on the Web). Ping will also tell you when a ping message does not make it to the requested address. In this way, you can tell if a Web resource is alive from where you sit. I use ping to tell if a Web browser I am using has crashed unexpectedly or whether it just can't reach its requested destination for communications.

I built an application for the government of Taiwan and was requested to show it off to senior government officials as a demonstration. You better believe I did a month's worth of pinging between the US and Seattle before agreeing on a time of day for the demonstration. Some times of day were so busy across the Pacific that 60% of my packets did not make it there in time! Other times of day I got great service reliably. I ended up demonstrating in the wee hours of the morning when most Americans slept. And, the demonstration did not fail for any networking reasons! Human language communications between Mandarin Chinese and American English were the cause of that!

Note that I provided you a wonderful Reverse DNS Lookup service to play with in the class syllabus. I am going to let you figure that one out by yourself to prove you have understood much of this document for yourself.

Traceroute

In computing, traceroute is a computer network diagnostic tool for displaying the route (path) and measuring transit delays of packets across an Internet Protocol (IP) network. The history of the route is recorded as the round-trip times of the packets received from each successive host (remote node) in the route (path); the sum of the mean times in each hop indicates the total time spent to establish the connection. Traceroute proceeds unless all (three) sent packets are lost more than twice, then the connection is lost and the route cannot be evaluated. Ping, on the other hand, only computes the final round-trip times from the destination point.

The traceroute command is available on a number of modern operating systems. On Apple Mac OS, it is available by opening 'Network Utilities' then selecting 'Traceroute' tab, as well as by typing the "traceroute" command in the terminal. On other Unix systems, such as FreeBSD or Linux, it is available as a traceroute(8) command in a terminal. On Microsoft Windows, it is named tracert. Windows NT-based operating systems also provide PathPing, with similar functionality. For Internet Protocol Version 6 (IPv6) the tool sometimes has the name traceroute6 or tracert6.

Here's an example of reporting on the path between a client and our RISD CE Link domain (celink.risd.edu). One of the students in class was kind enough to run traceroute from Madrid, Spain and Boston, USA:

    
FROM MADRID (SPAIN)
host-1:~ tibisay$ traceroute celink.risd.edu
 1     1 ms    <1 ms    <1 ms  192.168.1.1
 2    19 ms     8 ms    11 ms  10.203.128.1
 3    13 ms     7 ms    13 ms  10.127.45.165
 4    31 ms    14 ms    13 ms  mad-b2-link.telia.net [80.239.160.81]
 5    67 ms    36 ms    39 ms  prs-bb1-link.telia.net [80.91.254.68]
 6   141 ms   136 ms   136 ms  nyk-bb1-link.telia.net [80.91.253.122]
 7   145 ms   142 ms   141 ms  chi-bb1-link.telia.net [213.155.131.241]
 8   159 ms   161 ms   160 ms  chi-bb1-link.telia.net [62.115.139.186]
 9   162 ms   155 ms   153 ms  kanc-b1-link.telia.net [62.115.139.21]
10   187 ms   176 ms   174 ms  hurricane-ic-152554-kanc-b1.c.telia.net [213.248.73.210]
11   165 ms   161 ms   161 ms  arsalon-technologies.gigabitethernet1-9.core1.mci3.he.net [184.105.250.62]
12   166 ms   161 ms   160 ms  swc-lx-2n1.arsalon.net [204.13.103.5]
13   176 ms   160 ms   160 ms  rl01-gateway.arsalon.net [204.13.97.12]
14   161 ms   163 ms   158 ms  208-74-102-208.arsalon.net [208.74.102.208]

FROM BOSTON
host-1:~ tibisay$ traceroute celink.risd.edu
 1  192.168.0.1 (192.168.0.1)  2.633 ms  0.999 ms  0.864 ms
 2  96.120.64.173 (96.120.64.173)  23.360 ms  17.588 ms  14.890 ms
 3  te-0-1-0-3-sur01.westroxbury.ma.boston.comcast.net (68.87.152.89)  16.179 ms  14.567 ms  13.798 ms
 4  be-20-ar01.needham.ma.boston.comcast.net (68.85.106.21)  15.082 ms  17.847 ms  17.063 ms
 5  he-2-8-0-0-cr01.newyork.ny.ibone.comcast.net (68.86.93.185)  32.690 ms  21.193 ms  23.953 ms
 6  he-0-13-0-1-pe03.111eighthave.ny.ibone.comcast.net (68.86.85.190)  24.529 ms  100.698 ms  22.845 ms
 7  be7922.ccr21.jfk10.atlas.cogentco.com (154.54.13.161)  23.267 ms  26.822 ms  21.380 ms
 8  be2056.ccr21.jfk02.atlas.cogentco.com (154.54.44.217)  24.619 ms
    be2057.ccr22.jfk02.atlas.cogentco.com (154.54.80.177)  23.075 ms
    be2059.mpd22.jfk02.atlas.cogentco.com (154.54.1.221)  23.593 ms
 9  be2117.ccr42.ord01.atlas.cogentco.com (154.54.7.58)  42.703 ms  43.642 ms
    be2116.ccr41.ord01.atlas.cogentco.com (154.54.7.26)  43.651 ms
10  be2156.ccr21.mci01.atlas.cogentco.com (154.54.6.85)  72.091 ms  71.435 ms
    be2157.ccr22.mci01.atlas.cogentco.com (154.54.6.117)  71.653 ms
11  te2-1.mag01.mci01.atlas.cogentco.com (154.54.30.174)  502.761 ms  102.971 ms  210.640 ms
12  38.104.88.70 (38.104.88.70)  59.523 ms  53.808 ms  53.054 ms
13  rtr-lx-1n1.arsalon.net (208.89.116.1)  54.010 ms  63.319 ms  54.513 ms
14  swc-lx-1n1.arsalon.net (204.13.103.4)  55.104 ms  54.357 ms  53.959 ms
15  rl01-gateway.arsalon.net (204.13.97.12) 65.199 ms  53.986 ms  53.307 ms
16  208-74-102-208.arsalon.net (208.74.102.208) 56.053 ms  58.678 ms  52.095 ms

Of course, the traceroutes were performed on different dates, but the times matched up with her ping times from Madrid and other students' ping times from Boston at the same time. You can read more about traceroute on Wikipedia, but hopefully it's rather intuitive given what you explore on ping (it just shows you the intermediate telecommunications hops that are willing to provide time reports).

Happy pinging! Think about what creative Web services you can build by connecting to any device in the world using these concepts. Don't let your head explode, but it is pretty amazing the power each of us wields thanks to others.