|
Author:
Have you ever noticed the doctype of an HTML/XHTML document? For that matter do you even use one? What is a doctype for and what does it all mean?
These are questions that I asked myself early on in the game of Web design and I am glad I did, because the answers are important. Take a look at the source code of the Web pages out on the Internet. I would venture to say that "most" of the pages do not include a doctype or character encoding information (you can read more about character encoding on UniNetNews.)
I can almost guarantee you that the reason this information is not being included is because designers don't understand them. Doctype and character encoding declarations are very important topics, so I hope this article can clear up the mystery and prompt you to start including the information in your documents.
Also, read How good will your website look and how well will it function as the Web evolves? which explains how the move towards Web standardization and XML have resulted in new requirements for Web documents.
The DTD has become more important in the move to standards compliance and XML because it states the rules of that markup language. These rules provide details of each element, their order, what attributes they can take and other markup information, such as, if the element is block-level or in-line.
Validation simplifies HTML/XHTML processing because:
Validation will remove some of the work browsers currently need to do to render a page. As HTML moves into the XML environment, validation will become more important because XML usually requires valid documents.
Below are the XHTML 1.0 doctype declarations followed by the XHTML 1.1 and XHTML Basic doctypes. This article will only touch on the differences of these doctypes but will breakdown and explain each component of the declaration.
Here is how you would write the 3 different doctype declarations and a brief description on why you would pick one over the other:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
XHTML Strict - Use this when you want a document that is pure structural mark-up, free of any tags associated with style (see the UniNetNews article Why XML? For the Web Designer for more information on the makeup of a web document - structure, content and style). Use the strict DTD along with the W3C's Cascading Style Sheet language to get the font, color, and alignment effects you need.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
XHTML Transitional - Most people writing web pages for the general public will want to use the Transitional DTD. This DTD works best in older and current browsers that don't fully support Cascading Style Sheets. Transitional documents can use Cascading Style Sheets, but authors can still include style tags within the document for people who are viewing your pages with older browsers that don't understand style sheets. For example, using BODY with bgcolor or setting text and link attributes such as color, etc.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
XHTML Frameset - The frameset DTD is used when your page is designed using "frames". Frames allow you to split the browser window into separate sections and each section is a different page. Each page should validate as either strict or transitional and is brought together with the Frameset document. Frames are considered a deprecated technology.
Use the Frameset DTD only in the frameset document - not the documents called by the frameset.
Called documents are either strict or transitional. Below is an example of the Frameset document that calls the documents used in the frames.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3c.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Frameset Page</title> </head> <frameset rows="130,*"> <frame src="top.htm" name="top" scrolling="no" frameborder="0" /> <frame src="main.htm" name="main" scrolling="yes" frameborder="0" /> </frameset> </html>
So, in the example above, the Frameset document is calling top.htm and main.htm.
Top.htm and main.htm would have a DTD of either Strict or Transitional, NOT Frameset.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
XHTML 1.1 - This document type is essentially a reformulation of XHTML 1.0 Strict using XHTML Modules. While XHTML 1.1 looks very similar to XHTML 1.0 Strict, it is designed to serve as the basis for future extended XHTML Family document types. Its modular design makes it easier to add other XHTML modules as needed and to create and include, in one document, other markup languages.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
XHTML Basic - This document type was developed for browser agents that do not support the full set of XHTML modules, such as mobile phones, PDAs, pagers and settop boxes.
XHTML Basic uses the core modules from XHTML Modularization along with the images, forms, basic tables, and object modules. XHTML Basic can be extended with additional modules from XHTML Modularization or by writing your own modules. XHTML Basic is targeted at wireless, mobile applications.
There is a required order of the XHTML prolog and document type declaration as stated in the XML 1.0 specification.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
You do NOT need to read this section to understand DOCTYPE declarations. It was added for those of you who are interested in reading the XML specification.
prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
Not an easy syntax to read, especially when you are use to reading the DTDs which has grammar that looks enough like the XML BNF grammar to be confusing. BNF grammar is not a DTD grammar, it's on a much lower level (BNF is often used to create a compiler for a given programming language). In DTDs element order is done with commas, but in the EBNF used to describe XML 1.0 itself, item order is just the order presented. If there were ORs (the "|" symbol) between those items, then there is no ordering, as in the production for Misc:
Misc ::= Comment | PI | S
[1] document ::= prolog element Misc*
If you click on the "prolog" link in the actual W3C's XML 1.0 specification that describes a "well-formed XML Documents", you'll be taken to a later section of that document that reads -
[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
. . . which essentially says "The prolog consists of an optional XML declaration, optionally followed by "Misc", optionally followed by a DOCTYPE declaration -- and maybe some more "Misc" (Misc is defined as being a comment, a processing instruction, or just plain whitespace)."
All of this is really just saying, "In an XML document, the XML declaration -- if you use it -- MUST COME FIRST".
Reading either the XML Recommendation or the EBNF production rules in them can be a bit on the ponderous side. Tim Bray's "Annotated XML" at http://www.xml.com/axml/testaxml.htm makes it a (slight) bit easier.
If you're into figuring out the XML specification, a great guy (Jelks) from the Yahoo! group XHTML list wrote "Just the EBNF" - EBNF without the prose. An example from this article, tells you that <empty/> and <empty /> are syntactically equivalent as far as XML is concerned.
To learn how to read EBNF, check out Lars Marius Garhol's "BNF and EBNF: What are they and how do they work?" (you only need to read up to, but not including, the section called "Parsing").
Someone from the XHTML list said "The XML Declaration is not a PI (processing instruction), it just looks like one."
You need to pick what DOCTYPE is appropriate for you. You may even decide that HTML 4.01 is going to work better for you now, then trying to train staff to code XHTML. The important thing is to make sure EVERY page validates! You will be far ahead of the game if you do.