The structure of a Web page

See more about:

This lesson’s goals

By the end of this lesson, you should know:

  • The structure of a Web page.
  • What character sets are.
  • Indenting, which makes it easier to change a page.
  • Why it’s important to get the title right.

A template

Web pages are plain text files. Most have more-or-less the same structure. Here it is:

<!DOCTYPE HTML PUBLIC  "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>TITLE</title>
  </head>
  <body>
    BODY
  </body>
</html>

Figure 1. Standard page template

Have a look at this page in your browser.

The page is made up of tags. Tags use < and >. Most tags come in pairs, like <title></title>. The first part – <title> – opens the tag. The second part – </title> – closes the tag. You’ll see this pattern a lot. Open tag, close tag, open tag, close tag.

Exercise: Upload the template

Upload the code in Figure 1 to your server. I’ll lead you through it this time.

We’ll do the following:

  • Make a file on your computer, with the template code from Figure 1 in it.
  • Upload the file to your server.
  • Look at the result in your browser.

Making the file

On your computer, create a new directory for each exercise. For example, within your My documents directory, you might create a directory called coredogs.

Creating a directory

This is where all of your CoreDogs work will go. Within that, make a directory called clientcore (lowercase only!). Your work on ClientCore exercises will go here. Within clientcore, make a directory for each chapter. The directory for this chapter might be called web-page-with-text. Within that, make a directory for this exercise called upload-template.

You’ll end up with:

Directories

Try to keep things organized. It’s a little more effort at the beginning, but it will save you trouble later.

Start an editor, like Notepad++. Download and install Notepad++ if you haven’t already. It’s free.

As I’m writing this, the name of the file you want is npp.5.7.Installer.exe. Download that file, and run it. It will install Notepad++.

Remember, do not use Microsoft Word, or another word processor. Use a plain text editor. Word and other word processors add extra stuff to files, and that will mess up your work.

Run Notepad++, or another editor you choose. Notepad++ will open with an empty file.

Copy the code in Figure 1. There’s an easy way to do this that doesn’t get the line numbers. Rest the mouse anywhere in the figure. A toolbar appears in the upper right corner of the figure:

Figure toolbar

The second button will copy the code in the figure to the clipboard, without any of the line numbers. W00f!

Copying to the clipboard

Paste the code into Notepad++. Save the file into the upload-template directory on your computer. Call the file template.html, or some such. Remember to use only lowercase letters in the file name. This is because your Web server probably runs Unix, which is case-sensitive. It thinks that template.html and Template.html are different files.

Upload your file

Start your FTP program, like WinSCP.

Connect to your Web server. We talked about this process earlier. Here’s a reminder.

When WinSCP starts, it will show you a Login dialog. click New, and enter your connection information from the email you got from your Web hosting company. Like this:

Connecting to your server

You’ll see a split screen, like this:

FTP split screen

If you see just one window, with no split, you probably chose the wrong interface when installing WinSCP. To fix it, click Options | Preferences (that is, click Options on the menu bar, then Preferences – at the bottom of the menu). Choose Interface. Select the Commander interface and click OK. Restart WinSCP for the changes to take effect.

Remember that only files under your Web root will be accessible on the Web. On Hostgator, the Web root is usually called www or public_html. So that’s where your file should go.

Create a new directory on your server (under your Web root) for CoreDogs projects. You might call it coredogs. Within that, create separate directories for each chapter and project. Just as you did on your PC.

To create a directory, navigate to the parent, that is, the directory that will contain your new directory. So if I wanted to create coredogs under www with WinSCP, I’d double-click on www, and create the directory there.

In WinSCP, press F7 to create a directory, or use the button at the bottom of the window:

Create directory button

You’ll see a dialog that lets you type the name of the directory:

Create directory dialog

Click OK, and the new directory will appear. W00f! Double-click on the directory to open it. You can create more directories under that one.

As with the files on your own computer, I recommend creating a separate directory for each book (e.g., clientcore), and within that a directory for each chapter (e.g., web-page-with-text), and within that a directory for each exercise (e.g., upload-template). That would give you a path like /www/coredogs/clientcore/web-page-with-text/basic-template. This seems like a lot of work, but it’s better than accidentally erasing things you need.

Time to upload the file. Find the file you created (maybe you called it template.html) in the left window. Remember that the left window is the file system on your computer.

WinSCP has a drop-down that gives you quick access to your Desktop, documents, drives, and such:

Local drives and such

Use the drop down and the directory tree below it to navigate to where you stored the file you created.

Local directory

Upload a file with good old drag-and-drop, from the left to the right:

FTP split screen

You should see your file on the server. W00f!

CC
CC

Wow, that’s a lot to go through to upload a file.

Kieran
Kieran

True. But after a while it gets to be second nature. You won’t even think about it.

CC
CC

Can’t you just give me a script for everything? Tell me what to click, and then I’ll click it. I don’t need to actually know all this stuff about FTP. I’ll just follow the instructions.

Kieran
Kieran

That would be nice, but it wouldn’t work. Trying to memorize each small step is too hard, and too inflexible.

Instead, remember the larger steps that each small step is a part of. Such as:

  • Large step - Connect to the server
    • Small step - Start WinSCP
    • Small step - Use the connect dialog
      • Tiny step - Type in the server name
      • Tiny step - Type in the user name
      • Tiny step - Type in the password
      • Tiny step - Select FTP
      • Tiny step - Click the Login button
  • Large step - Upload the file
    • Small step - Find the file to upload in the left window
    • ...
CC
CC

But I only do the little steps. You work out the list of steps, and I’ll just follow them. Why is that a problem?

Kieran
Kieran

Because things change.

Say that you switched from Hostgator to another company. Let’s call it Serverdile. They use SCP instead of FTP (FTP is a more secure version of FTP). Someone who just memorized the small steps would be stuck. S/he would get to “Select FTP” step, and would do the wrong thing. It wouldn’t work, and s/he wouldn’t know why.

Someone who knew what s/he was doing would know, “Oh, this affects the ‘Connect to server’ step. I’ll change that one thing.” Everything would be just fine.

Suppose you were hiring someone for a job. Who would you rather have as an employee?

  • Someone who needed a new set of instructions every time something changed.
  • Someone who could adapt to changes him- or herself, because s/he knows why the small steps are done.

The first person would need constant hand-holding. S/he would run back to you every time something changed, saying “These steps don’t work anymore. Please fix them.” Things change a lot in tech land. You would be forever fixing things for this employee.

The second person would change the instructions him- or herself. S/he wouldn’t interrupt you constantly.

Which person would be a better employee? Which one would you hire? Which one would you pay more?

If you want a job where someone tells you exactly what to do all the time, well, you might spend the rest of your life in low paying jobs. Good luck with that.

Look at the file in your browser

So you’ve uploaded the file, from your PC to your server. Other people can now see the file on the Web.

We talked earlier about the relationship between server files and URLs. Remember that for a static site, a URL is a path to a file on a server’s hard disk. That is, a path from your server’s Web root (like /public_html/) to the file. So if the file was at /public_html/mydir/evil.html on the server, its URL would be http://siteofdoom.com/mydir/evil.html.

Open up a browser. Type in the URL of the file you just uploaded. For example, if your site was drewid.com, you might enter http://drewid.com/coredogs/clientcore/web-page-with-text/upload-template/template.html.

You should see something like this in your browser:

Template displayed

W00f! W00f! W00fy-w00f-w00f! With w00f sauce!

Later, if you forget the basics of creating and uploading a page, come back to this exercise.

Enter the URL of your page as your solution to this exercise.

(Log in to enter your solution to this exercise.)

Nesting

So you’ve uploaded the template. Let’s have another look at it.

<!DOCTYPE HTML PUBLIC  "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>TITLE</title>
  </head>
  <body>
    BODY
  </body>
</html>

Figure 1 (again). Standard page template

Tags are nested, that is, tags are inside other tags. It’s important to get the nesting right. Inner tags should be closed before outer tags.

Tag nesting

Figure 2. Tag nesting

What happens if you violate nesting? Well, maybe nothing, maybe something. Different browsers handle invalid HTML differently. Sometimes it will look fine in Internet Explorer, but not in Firefox. Or it might look OK on a Mac, but not a PC. It’s hard to tell without looking at all the combinations.

Webers strive for predictability. They want to create a page once, then have it work on every browser, on every operating system. That isn’t always feasible. But the more closely you follow the rules of HTML, the better off you’ll be.

Indenting

Use consistent indenting to make HTML easier to read, and errors in markup easier to spot. (“Markup” is just another name for text with HTML tags in it.)

Indenting

Figure 3. Indenting

Browsers don’t care about indenting. Both pieces of code in Figure 3 would render identically in a browser. But the first one is easier to follow.

Renata
Renata

If it looks the same to the user, why would it matter? In the end, it’s the user experience that’s key.

CC
CC

Can I answer that one?

Kieran
Kieran

Sure.

CC
CC

It’s change again. The one thing you can count on is change.

A page might be fine today. But tomorrow, a marketing type is going to want to change the text. And next week, it will be different again.

The easier the markup is to read, the easier it will be to change. And the fewer mistakes you’ll make.

Webers think not only about the results (that is, the HTML page), but about the work processes they use to create the results. They try to make the work processes fast and accurate. The indenting in Figure 3 would help.

DOCTYPE

Let’s look at Figure 1 again.

<!DOCTYPE HTML PUBLIC  "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>TITLE</title>
  </head>
  <body>
    BODY
  </body>
</html>

Figure 1 (again). Standard page template

The first line tells the browser what HTML standard we are using. The World Wide Web Consortium (W3C) is the organization that creates HTML standards. The current standard is HTML 4.01, though HTML 5 is waiting in the wings.

Line 1 says we’ll use the 4.01 standard, and we’ll be complying with it strictly. “strict” will give us the most predictable results across browsers.

The <head> section

All of the HTML is between the <html> tags on lines 2 and 10. <html> is a matched pair, as are most tags.

The code inside the <html> tag has two parts: the <head> section (lines 3 – 6), and the <body> section (lines 7 – 9). The <head> section contains metadata, that is, data describing the page.

Character set

Line 4 tells the browser what character set the page will use. A character set is a list of all of the symbols that can be used for a document. For example, the ancient Egyptians wrote hieroglyphs. Wikipedia says this means “tongue:”

Tongue

Figure 4. Tongue

If you listed all of the different hieroglyphs they used, you would have their character set.

At one time, computers could only store upper- and lowercase letters (A-Z and a-z), digits (0-9), and a few symbols (&, @, !, space, etc.). This was called the ASCII character set.

The ISO-8859-1 character set improved on ASCII, with characters like é and ß. Better, but still not great. What about Cyrillic and Chinese characters? Huh? Huh?

Today, character sets like UTF-8 include thousands of symbols.

The two most common character sets on Web pages are ISO-8859-1 and UTF-8. The former works for Western languages, and many Webers use it. UTF-8 is slowly taking over, however. Everything that is in ISO-8859-1 is in UTF-8. We’ll use UTF-8.

The character set definition (line 4 in Figure 1) is typical of the tags in the head section. It tells the browser about the page, but doesn’t tell the browser what to show on the page. Line 4 tells the browser that the page could contain Chinese, Cyrillic, Greek, and other characters. But it doesn’t tell the browser what characters it will show to the user.

The <title>

Here’s Figure 1 again.

<!DOCTYPE HTML PUBLIC  "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>TITLE</title>
  </head>
  <body>
    BODY
  </body>
</html>

Figure 1 (again). Standard page template

Line 5 is the <title> tag. It doesn’t affect the main area of the page itself, but it does show up. But where? In line 5, the text for the title is TITLE. Have a look at the page template in Figure 1 in your browser. Where do you see TITLE?

Find it yet? I’ll wait.

Do do do do-do do do doo, do do do do doo, do-do-do-do-do do do do do-do do do doo, do, do-do do do, do, do, dum dum.

Yes, it’s in the top area of the browser’s window.

Page title

Figure 5. Page title

The title tells the user what the page is about.

The title appears in other places as well. First, if the user bookmarks the page, the title will show up in the bookmark list.

Second, the title will appear in search engine listings for the page. Suppose I want to know more about the musical episode of Buffy the Vampire Slayer, which, as everyone knows, is the best television ever.

I’m a Buffy fandog.

I just searched, and got this:

Buffy page

Figure 6. Buffy search

I went to the page, and saw this in the browser’s title area:

Browser title area

Figure 7. Browser title area

Then I looked at the page’s code. Here’s what it said:

<title>Once More, with Feeling (Buffy the Vampire Slayer) - Wikipedia, the free encyclopedia</title>

Figure 8. <title> tag

So the value in the <title> tag showed up in the browser title area, and the search results.

You may have seen search engines show “Untitled” for a page. This means that a Weber forgot a <title> tag.

Search engines also use the <title> tag to figure out what the page is about. If your title is “Dogs of Doom,” and a Googler searches for “doom dogs,” there’s a good chance Google will show your page. All because of the title.

The <title> tag is probably the most important tag in SEO, or search engine optimization. This is the art of getting your pages to rank high in search engine results.

A broken page

While researching this lesson, I came across some interesting code. I saw the following in Google:

Google listing

Figure 9. Google listing

Strange title for a page! No way to know what the page would be about, just from the title.

I looked at the code of the page, and saw this:

Broken title

Figure 10. Broken title

The page starts off OK, except for the missing DOCTYPE tag. It opens a title tag at (1). But then there’s a DOCTYPE! Huh? A complete page is embedded in the title tag of a page!

The real title is at (2), but the browser can’t pick it up. The code is so messed up that the browser can’t figure it out. And neither could the Google search engine.

This shows you the importance of well-formed markup, that is, markup that follows the rules of HTML. The code in Figure 10 is not well-formed. It confuses browsers. And it won’t show up right in searches, so people are less likely to find the page. I just found it by accident.

The body

Here’s Figure 1 again.

<!DOCTYPE HTML PUBLIC  "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>TITLE</title>
  </head>
  <body>
    BODY
  </body>
</html>

Figure 1 (again). Standard page template

The actual content of the page is in the <body> section, from lines 7 to 9. This is where the real action happens.

There isn’t much there at the moment. Let’s add some stuff in the next lesson.

Renata
Renata

Say I’m looking at a Web page. Can I check out the HTML behind it?

Kieran
Kieran

Yes, you can. In Firefox, press Ctrl-U (Windows) or Cmd-U (Mac). Or in the menu, View | Page source:

View | Page source

Try it. Bring up the template page in your browser. Now look at the page source. Compare it with Figure 1.

Summary

You learned about:

  • The structure of a Web page.
  • What character sets are.
  • Indenting, which makes it easier to change a page.
  • Why it’s important to get the title right.

What now?

Let’s start adding some HTML tags to the body.


How to...

Lessons

User login

Log in problems? Try here


Dogs