How URLs work

See more about:

Where are we?

You learned that a URL is the address of a file. Could be a page, could be a photo, could be something else.

Let’s dig a little deeper. What do servers do when they get a URL request from a browser?

This lesson’s goals

By the end of this lesson, you should know:

  • When a browser asks for the contents of a URL, it usually gets back the contents of a file on a server.
  • How a URL is mapped to a server file.

Suppose that a browser shows this.

Link rendered in a browser

Figure 1. Link rendered in a browser

It comes from this HTML.

<a href="http://doomdogs.com/products/ball.html">Bouncing ball</a>

There are two parts to the link. First, there’s the text shown to the user. That’s “Bouncing ball.” Second, there’s a URL: http://doomdogs.com/products/ball.html. That’s where the browser will go if the link is clicked.

GETting a page

A user clicks the “Bouncing ball” link. What does the browser do?

First, it creates an HTTP GET request. We looked at HTTP earlier, in the lesson on the services layer. The request looks something like this:

GET /products/ball.html HTTP/1.1

The message is sent to a server. Which server? The one given in the URL.

The URL of the desired page is http://doomdogs.com/products/ball.html. The first part – http – is the service-layer communication protocol to use. The second part – doomdogs.com – is the domain name of the server. So that’s where the message is sent: to the server at doomdogs.com.

The rest of the URL (/products/ball.html) tells the server what data to return to the browser. That part of the URL matches a file on the server. A server might have 50,000 files that it can send to a browser. /products/ball.html is the particular file it should send.

URL paths and file paths

Let’s look at this more closely.

As you know, the files on the computer you’re using right now – PC, Mac, whatever – are organized into a directory tree. Windows calls directories “folders,” but they’re the same thing. Webers usually talk about directories rather than folders.

Here’s part of the tree on my Windows hard drive.

Directory tree

Figure 2. Directory tree

The Windows file path to the file renata.jpg is:

C:\dogs\mydogs\renata.jpg

This means:

Start at the root of the C: drive.
Then go down into the dogs directory.
Then go down into the mydogs directory.
There you will find the file renata.jpg.

I can use this path in Windows programs. For instance, I could type the path into the Open File box of a Windows editor:

Windows file path

Figure 3. Windows file path

Other operating systems have different rules for file paths. A path on a Unix machine might look like this:

/dogs/mydogs/renata.jpg

Unix (and Linux and similar things) don’t have drive letters, like C: or D:. And they use a / rather than a \ to separate subdirectory names.

Another difference is that Windows file names are not case sensitive, while Unix file names are case sensitive. So, in Windows:

C:\awah\ajer\kyam.txt

and

C:\awah\ajer\Kyam.txt

refer to the same file. But in Unix:

/awah/ajer/kyam.txt

and

/awah/ajer/Kyam.txt

refer to different files.

Beginners often forget this. Suppose you create a site on a Windows PC. You type the file name DepressedDoom.html in some places, and depresseddoom.html in others. It works fine on your computer, because Windows thinks that DepressedDoom.html and depresseddoom.html are the same file.

You upload the site to a Unix server (most Web servers run Unix of some sort), and the site breaks. Unix thinks that DepressedDoom.html and depresseddoom.html are different files.

Here’s the rule we follow throughout CoreDogs:

Always use lowercase for file names.

Mapping URL paths and file paths

OK. So we have URLs. They’re Web addresses for files. Like:

http://doomdogs.com/products/ball.html

And we have file paths, like:

C:\dogs\mydogs\renata.jpg

They look similar, don’t they? They both have slashes, things that look like directory names, file names, and extensions.

There’s a reason they look similar: they are different versions of the same thing!

A URL is really a file path, with some networkish stuff added.

On simple sites. Some advanced tech is different. Let’s ignore it for now.

How can a URL and a file path both refer to the same thing? By taking a directory on a Web server, and making it the root of a Web site.

An example

Here’s Jake, looking at a Web page on doomdogs.com.

Jake finds a ball

Figure 1 (again). Link rendered in a browser

Why is Jake’s browser showing him a link? Because of this HTML:

<a href="http://doomdogs.com/products/ball.html">Bouncing ball</a>

This link says to show the text “Bouncing ball.” If Jake clicks on it, his Web browser will jump to the page http://doomdogs.com/products/ball.html.

Let’s look at what happens when Jake clicks.

Serving a file

Figure 4. Serving a file

Jake clicks on the link (1 in Figure 4). The browser looks at the code that created the link. The code is:

<a href="http://doomdogs.com/products/ball.html">Bouncing ball</a>

So the browser knows it should show the page http://doomdogs.com/products/ball.html.

The browser creates an HTTP message to doomdogs.com: GET /products/ball.html (2). The Internet sends the message to the Web server.

The server at doomdogs.com is a computer, with a hard disk spinning away. It runs the Unix operating system.

If you looked on the server’s disk drive, you would see a file with the path /sites/dd/products/ball.html.

Server files

Figure 5. Server files

This is the file with information on the bouncing ball.

Somehow, we want the Web server to use the URL the browser sent, to fetch the right file from its disk drive. It needs to convert the URL into a file path, then use the file path to read the file.

Translate URL into file path

Figure 6. Translate URL into file path

Suppose the server is running the Apache Web server software (it’s the most popular in the world). Apache has a configuration file, created by the DoomDog’s Web master. The file tells Apache how to run.

One of the settings in the file is DocumentRoot. The setting tells Apache where on the computer’s disk drive the files for the Web site are stored.

The Web master put all the data for the doomdogs.com Web site in the directory /sites/dd. Then s/he set DocumentRoot to /sites/dd, so that Apache would know where to get the files.

The server takes its DocumentRoot setting (/sites/dd) and appends the URL path (/products/ball.html), giving /sites/dd/products/ball.html.

Computing the file path

Figure 7. Computing the file path

Now Apache knows where to get the file the browser asked for.

Here’s the process again.

Serving a file

Figure 4 (again). Serving a file

We’re at step 3. Apache has translated the URL to a file path on the server computer’s disk drive.

Apache reads the file (4), and sends its contents back to the browser (5). The browser renders the content for Jake (6). Recall that rendering is the process of making a display from code in a file.

Hooray! Jake is happy, and runs around and barks.

Renata
Renata

So for a URL like http://thing.com/what.html, thing.com is the server name, and /what.html is a file path?

Kieran
Kieran

Yes, that’s a good way to think about it.

Renata
Renata

But when I want to Google something, I don’t type http://www.google.com/search.html. I type http://www.google.com. There’s no file path in that URL. What does the Google server do?

Kieran
Kieran

Good question!

The server needs to get a file name from the path. If there isn’t a file name there, it adds one. That is, it uses a default file name.

Every server could use a different default, but in practice most use index.html. So, the URL http://doomdogs.com/ maps to http://doomdogs.com/index.html.

Renata
Renata

So when I make a home page, I should call the file index.html? So if someone just types the domain name, they’ll get the home page?

Kieran
Kieran

Yes! That’s it! You are one smart puppy.

Servers actually have a list of default names, like index.html, index.php, default.htm, and so on. The server will grab the first file in the list that it finds.

For now, let’s assume it’s always index.html.

Summary

In this lesson, you learned:

  • That many Web page requests are actually requests for files.
  • How a URL is mapped to a server file.

What now?

Let’s put some of this knowledge into action. Time to buy a Web hosting account, and get real!


How to...

Lessons

User login

Log in problems? Try here


Dogs