html if you must

webweaving.gif (14996 bytes)

This document has been produced by SECC (March 98)

HTML (if you must)

Introduction: HTML is Really Easy

HTML (HyperText Markup Language) is really just ASCII text with extra codes thrown in to better identify the content of the document. These codes (also called "tags" or "elements") are enclosed in "less-than" and "greater-than" (< and >) angle brackets. Some are used in pairs, others are used singularly.

For example, to emphasise text, I would write "<em> text </em>". In this case, I enclosed the text in emphasis tags (em> </em>), which are paired tags. Paired tags, also known as containers, have opening and closing tags, the closing having a "/" right after the "<". The "em" part is known as the name of the tag. Notice that there is no space between the "<" and the name in the start tag, or between the "</" and the name in the closing tag.

To create a paragraph with emphasised text, you would write "<p> Text <em> Emphasised Text </em> More Text </P>." All of the paragraph is enclosed in paragraph tags (<p> </p>), while the emphasised text is enclosed in emphasis tags.

HTML text is enclosed in an ASCII file, usually with an extension of.html or .HTM on file systems that don't allow extensions of three characters or more (like DOS.) Many word processors, such as WordPerfect or Microsoft Word, allow the HTML author to save a file in ASCII format, so you could use them to write HTML files.

It's important to know that HTML is not a WYSIWYG (what-you-see-is-what-you-get) format. Each web browser may display your document slightly differently, and most let the reader customise how documents are displayed.

What Every HTML file Should Contain

At the very least, all HTML documents should have a title and an address of the author. Other codes in this section are also found in good HTML documents.

Title

Every document must have a title. The title should say something about what the document contains or where it's from. For example, if it were a web page this document might be named "Really Quick Guide to Good HTML " so you would write <TITLE> Really Quick Guide to Good HTML </TITLE>. When choosing a title, avoid the obvious; words like "web", "WWW" and "page" are often not necessary.

Titles are not normally viewed as part of the document, headings fill that function. Instead, titles usually show up as part of the program's window title. Often, the level 1 heading and the Title are the same, but they don't have to be.

Also, web browsers often save "hotlists" of documents listed by their title. Without a title, an HTML document is only known by its location.

Standard "Wrapper"

To tell the difference between HTML and other ASCII formats, a web browser looks for an html extension and/or <HTML > at the beginning and </HTML > at the end. (This can vary from browser to browser.)

Good HTML documents also divide the document between a header and a body. The head contains the title of the document and will sometimes contain other information about the document as needed. The body contains the "meat" of the document, and contains everything that the user sees. A document following these standards looks something like this:

<HTML>
<HEAD>
<TITLE>this document</TITLE>
</HEAD>
<BODY>
<Hl>this document</H1>
<P> this is a good HTML document. </P>
</BODY>
</HTML >

In the example above, we are using elements of HTML that we haven't gone over yet, but we will go over these later in the document.

Address: Author at Bottom

Any good document has someone who is willing to admit that he/she wrote it and is responsible for its contents. Usually, this information is put at the bottom of the document body, with the e-mail address of the author. Addresses are shown with the use of the <ADDRES S >< /ADDRES S > code. For example:

<ADDRESS>
Mark [email protected]
</ADDRESS>

If you really want to get fancy, a link to the author's home page could let the reader gain more information about the author or a mailto: link to his/her e-mail address might let the reader send mail directly to the author.

Paragraphs, Headings, and Horizontal Rules

One of the great things about HTML is how it lets you format text to make it more readable by adding just a few short codes. Still, since most web browsers allow resizable windows, text can flow like water to fill the window.

Web browsers ignore what is known as "white space." White space is the extra formatting used to make the source HTML document more readable in ASCII form. Such formatting includes tabs, spaces, and carriage returns.

Paragraphs and Line Breaks

To separate text into paragraphs, you can use the < p> tag at the beginning of the paragraph and an optional < / p> at the end of the paragraph.

In most web browsers, an extra line is put between paragraphs. For example, above this text, there is a paragraph tag, which puts in the extra line. If you want to put in a line break without adding the extra line, use <br>. For another example, the following codes:

<P>
I've never seen a purple cow, <BR>
I hope I never see one,
<P> but I can tell you here and now, <BR>
I'd rather see than be one, </P>

will give the following text:

I've never seen a purple cow,
I hope I never see one,
but I can tell you here and now,
I'd rather see than be one,

Note that you do not need to add the < p> element on the same line as the beginning of the paragraph text.

Headings

Headings give sections of text names and tell the reader what is in the text to follow. In this way, they are much like the headlines of newspapers and magazines.

They are made by typing <Hn> before the text and </Hn> after the text, where n is the number of the type of heading the author wants. H1 headings are the most important, represented using the biggest text, and are often the same as the document's title. H2-H6 are the other headings, each being consecutively less important than the first. Headings are usually displayed in bold and are separated by other text by a blank line. More important (or "higher ranking") headings are usually displayed as being larger.

For example:

<H1>A History of Spam</H1>
<P>some text
<H2>Part One: the Humble Beginnings</H2>
<P>more text
<H3>The Birth of Spam</H3>
<H3>The Heady Teenage Years</H3>
<H2>Part Two: the long climb to the top</H2>

Horizontal Rule

A horizontal rule is a line that spans the width of the page, regardless of what font is used or how the reader's window is sized. <HR> is the code for the horizontal line.

Anchors, Links, and Images

Probably the main reason HTML is so popular is that it provides the ability to link documents and objects. For example, I could link a word to its definition, an image to a description, or the name of a composer to the sound of his work. Although some of those examples are beyond the scope of this document, I will briefly go over the basics.

Anchors

Anchors mark text that is linked to another location or text that can be linked to within a document or location. To mark a piece of text as being able to be linked to, you enclose the text in NAME anchor tags.

For example, to mark the title of this section, you would write
<A Name="Anchors"> Anchors, Links, and Images </A>.
This lets the browser know that piece of text has the name "Anchors".

To mark a piece of text as a link to another object, you enclose it in HREF anchor tags. For example, to link the words " SECC Home Page" to the SECC Home Page, you would write

The www.sln.org.uk/qls/secc part is referred to as a URL (Uniform Resource Locator). URLs tell the browser what to do or where to go when that text is activated. There are several types of URLs, each corresponding to a method of locating a resource.

The example used above is an "http:" type and denotes an HTML document served via the HTTP protocol. Other examples are:

mailto: (for sending SMTP mail),
ftp: (for anonymous FTP addresses), and
news: (for NNTP services.)

To specify a named piece of text within a document, use the url for the document, followed by a hash sign (#), followed by the name of the text within the document. URLs of files at can also be relative. For example, if another file is in the same directory as the document you are viewing, you could write
<A HREF="file.html"> other file in directory </A>.
If it were in a sub-directory named "other", you could use the URL "other/file.html".

Be aware that anchors can be placed within paragraphs or headings, but paragraphs and headings cannot be placed within anchors. Also, anchors must be ended before ending the element in which they reside. So these two lines are okay:

<H1> <A name="top"> first heading </A> </H1>
<P> go to to the <A HREF="#top"> top </A> </P>

But these two are wrong:

<A name="top"> <H1> first heading </H1> </A>
<P> go to the <A HREF="#top"> top </P> </A>

Images

Inline images are easy and can make a document more interesting. Note however, that too many graphics can cause a document to load slowly, so you should definitely avoid too many graphics or graphics that are large (there is a reason why ancient man moved from hieroglyphics to letters). Browsers in use today can display graphics in the "GIF" or "JPEG" format.

The code for graphics is
<IMG SRC="name.gif" ALT="some text">.

In this example, name.gif is the image source: a file, "name.gif", that resided in the same directory as the document.

The "ALT" attribute is optional. It specifies text to be presented in case the reader is using a text-only Web browser or is browsing with "auto-load images" turned off. It is a hallmark of good HTML writing because it makes your pages convey more.

Remember

When you have finished typing your HTML document it is always a good idea to read it into a web browser to check it for errors. Almost all web browsers support reading a local file; this option is usually found under the "file" menu, and can be marked "open local file" or "open file".

If your document is meant for a wide audience, do not make assumptions about how it will be viewed. Your document could be read by readers using different browsers on Macintosh, Unix, M.S. Windows, Amiga, or DEC computers, to name a few. It may not be "read", but heard by a blind person using a speech synthesis system. Or a "robot" (an automated web-travelling program) may read your document and include the headings in its index of web documents.

Remember, HTML is not a WYSIWYG format, and a document may appear differently on different browsers. To ensure your document is making the type of statement you want, you should try to view it on as many browsers as is reasonable.

Advanced Paragraphs and Headings

New with HTML 3 is the horizontal alignment of the paragraph and heading elements. These are enabled with the introduction of the "ALIGN=alignment" attribute, where alignment can be left, center, right, or justify. (For clarification, some word processors call alignment "justification.") For example, to create a level 4 heading that was horizontally centered, the following text could be used:
<H4 ALIGN=center> Spam Vs. Jello </H4>

For a right justified paragraph, the following text could be used:

<P ALIGN=right> It is a little known fact that Spam has its origins in the Second World War. </P>

Also new with Level 3 is the ID attribute, which names a section of text. Like the name attribute of the anchor element, this can specify a target for hypertext links. For example, the following heading:

<H4 align=center ID="spam vs. jello"> Spam Vs. Jello </H4>

in the "spam.html" document, could be referenced by the hypertext link:

<A href="spam.html#spam vs. jello"> take a look at the Spam vs. Jello section</A>

The ID attribute is not yet widely supported, so authors should probably stick with <A name=" "> </A> for now.

Advanced Images

As with headers and paragraphs, HTML 3 improves upon the earlier <IMG> (image) element, adding HEIGHT and WIDTH attributes, and improving upon the ALIGN attribute.

The ALIGN attribute defines how the image will be aligned with surrounding text. From HTML 2, the values TOP, MIDDLE, and BOTTOM designate how the graphic should be aligned with text line in which the element appears. With the TOP value of the image attribute, the image aligns with the tallest item of the text line; with MIDDLE, the center of the image aligns with the baseline of the text line; and with BOTTOM, the bottom of the image is aligned with the baseline of the text. HTML 3 brings in LEFT and RIGHT values for the ALIGN attribute. These cause the image to be displayed on the left or right margins, with text wrapping around the opposite sides.

Also, one can add HEIGHT and WIDTH statements to <IMG> tags. While these don't actually speed up the downloading of the images, they do give the browser the image's display size before it is downloaded. This information allows the browser to format the text of the HTML document (allowing space for the graphics) before loading all of the image file. The benefit is that the user can start reading the text before all images have been displayed.

For example, the following piece of HTML code:
<IMG SRC= "photo.gif" ALT="a photograph" ALIGN=RIGHT HEIGHT="225" WIDTH="256">

	tells the browser:
	that the graphic file "photo.gif' belongs in this document,
	that it has a height of 225 pixels and a width of 256 pixels
	that the image should be aligned along the right margin and text should be wrapping along it's left side.

Special Characters in HTML

Since the characters <, >, and & are used in HTML code, many browsers do not display them (even when they are not part of an HTML element) unless they are encoded as entity references or numeric character codes.

Entity references are made up of an ampersand ("&"), text characters that represent the name of the entity, and a terminating semicolon (";"). The lesser-than symbol, "<" is represented by "<"; the greater-than symbol, ">" is represented by ">"; and the ampersand ("&") is represented by "&". The entity reference for the double-quotation mark (&quote;) is also useful for putting quotation marks in the ALT text of an image element.

Numeric character codes are made up of an ampersand, a number/hash symbol (#), a decimal number that corresponds to the character, and a terminating semicolon. The lesser-than symbol is represented by &#60,. Entity references are preferred over numeric character codes, but not all characters have corresponding entity references.

International text is also possible in HTML. For example, to get "N" one could use Ñ and for n, one could use &ntilde. Notice that capitalization is important in getting the right glyph; Ñ and ñ yield different results. The glyphs follow the ISO (International Organization for Standards) 8859 standard for character sets.

Certain non-text characters are available on most, but not all web browsers. (¶), ~ (®), and O (&copy) are examples. Some characters, such as (™ the trademark sign) haven't been completely standardized yet.

Lists: Ordered, Unordered, and Definition Lists

HTML lists are an easy way of presenting brief bits of information. Three types of lists are ordered lists, unordered lists, and definition lists.

Ordered lists and unordered lists are very similar, both in markup and presentation. In both types, each item is preceded by the Line Item (<LI>) HTML element. The major difference is that Ordered Lists (<OL></OL>) are numbered, whereas unordered (<UL></UL>) lists are bulleted.

The following bit of HTML source:

<OL>
<LI>First Item
<LI>Second Item
<LI>Third Item
</OL>

gives you the following ordered list:

First item
Second item
Third Item

Lists can also be nested, with one list being "inside" the other. For example, this code:

<UL>
<LI> Vegetables
<LI> Fruits
<UL>
<LI> Oranges
<LI>Apples
<LI> Bananas
</UL>
<LI> Nuts
</UL>

gives you the following two unordered lists, the second being nested inside the first:

	Vegetables
	Fruits � Oranges � Apples � bananas
	Nuts

Definition lists (<DL> </DL>) are a bit different from ordered and unordered lists, because they contain two types of items, the term being defined (<DT>) and the definition of the term (<DD>). For example, the following HTML code:

<DL>
<DT> HTTP
<DD> Hypertext Transfer Protocol = An efficient, stateless protocol used on
the World Wide Web.
<DT>HTML
<DD>Hypertext Markup Language = the format in which web pages are
written.
<DT>InfoStructure
<DD>A group of documents linked together on one or more information servers, usually providing information concerning a certain subject or idea.
</DL>

Preformatted Text

Normally, HTML text flows, that is, when you resize a window, the text changes its line breaks so that the left and right margins will fit in the window. Also, multiple spaces are ignored in HTML. Sometimes, however, you do not want this flow, you want to keep the line breaks and spaces as you wrote them. This is where preformatted text comes into play.

All spaces, line breaks, etc, are rendered literally within the <PRE></PRE> tags. This makes it useful for special formatting, or for simple tables for browsers that don't understand the <TABLE> tag. For example the following HTML code:

<PRE>

Went Up Hill? Went down hill safely?

Jack Yes No
Jill Yes No (ran)

</PRE>

gives you the following:

Went Up Hill? Went down hill safely?

Jack Yes No

Jill Yes No (ran)

Block Quotes

The Block Quote element (<BLOCKQUOTE> </BLOCKQUOTE>) is used for designating several lines of quoted material. It is shown by most browsers as indented text. In HTML 3, this element was shortened from <BLOCKQUOTE> to <BQ>, but not all browsers recognise <BQ> at the time of this writing.

The following makes use of the <BLOCKQUOTE> element:

<BLOCKQUOTE>
Periodically, Office of Personnel Management announces supervisory and
management development classes for GS-ll-13 and GS/GM 13-15. Tuition and travel/per diem are funded by the Employee's department. If interested, check Tackboard under Employee Development or call Lit x2950
</BLOCKQUOTE>

this code gives the following:

Periodically, Office of Personnel Management announces supervisory and Management development classes for GS-11-13 and GS/GM 13-15. Tuition and travel/per diem are funded by the Employee's department. If interested, check Tackboard under Employee Development or call Lit x2950

Character highlighting - Emphasized text, Information

Formatting

HTML has many ways of highlighting text, some of which designate what type of information is being represented by text, others are simply presentation tags, which may look pleasant but do not explicitly convey any meaning. Logical emphasis (informational elements) is recommended over typographic emphasis (presentation elements).

It should be stressed that some web browsers conform to only part of the HTML specification, and that some of those elements that are from HTML 3 are not widely supported.

Information Elements

The most basic types of logical emphasis are "strong" (<STRONG> </STRONG>) and "emphasis" (<EM> </EM>.)

These are used for information that is very important or otherwise needs to be stressed. Text in strong tags is of the type that would be spoken loudly. Text in emphasis tags would be spoken slowly. For example, the following HTML code:

<P>Write web pages using <STRONG> standard </STRONG> HTML. <EM> Nonstandard HTML does not show up in all web browsers, so you lose part of your audience by using elements only recognised by certain browsers. </EM> </P>

Write web pages using standard HTML. Nonstandard HTML does not show up in all web browsers, so you lose part of your audience by using elements only recognised by certain browsers.

Other informational elements exist. The cite element (<CITE> </CITE>) is used to mark citations. The KBD (<KBD> </KBD>) element indicates characters that the user is supposed to type from a keyboard. Variables are signified by the VAR (<VAR> <VAR>) element.For example the following HTML code:

According to the <CITE>Spamify User's Manual </CITE> you can spamify a file by typing <KBD> spamify </KBD> <VAR> filename </VAR> at the DOS prompt; where <VAR> filename</VAR> is the name of the file you wish to spamify.

Gives the following:

According to the Spamify, User's Manual, you can spamify a file by typing spamify filename at the DOS prompt; where filename is the name of the file you wish to spamify.

Most of the examples of HTML code in this document were written within <CODE> </CODE> tags, which designate pieces of code.

Typographic Elements

Typographic elements have no defined meaning, but sometimes help to mark a piece of text as being special. They are also called "font style elements" because they correspond to ways text can look, not what the text means.

The most basic typographic elements are boldface (<B> </B>), italics (<I> </I>), teletype (<TT> </TT>), and underline (<U> </U>). Although boldface and italics often look identical to strong and emphasized, the information elements are preferred, because of their more precise meaning. Teletype specifies that the text should be displayed in a fixed-width typewriter-like font. Although the underline element is in the HTML 2.0 specification, it is not supported in some browsers.

New in HTML 3 are strikethrough (<S> </S>), big (<BIG> </BIG>), small (<SMALL> </SMALL>), subscript (<SUB> </SUB>), and superscript (<SUP> </SUP>). Strikethrough specifies that the text will have a horizontal line through its middle.

Big and small make the text larger and smaller than its normal size. Text enclosed in the subscript and superscript elements appear as lower and higher, respectively, than normal text.

Tables

The Table element (<TABLE> </TABLE>) was perhaps one of the most-requested elements of HTML 3.0.

An HTML table contains rows (<TR> </TR>) of data, and an optional caption (<CAPTION> </CAPTION>).

Each Row contains cells, which can be either header cells (<TH> </TH>) or normal data cells (<TD>

</TD>), and can be ID'ed and aligned as are paragraphs. Captions can be ID'ed as well, but their alignment is relative to the table itself. An attribute of the table element is BORDER, which specifies that lines will be used to separate the cells of the table.

Not all browsers can read tables, if you believe that your audience will not be able to see your data, you may wish to use preformatted text instead of or in addition to a table.

Tables however are extremely useful for aligning sections of text. A double column layout can be simulated by using tables without borders.

Revised: 24 August 1998. (e-mail at [email protected])