UniNetNews Logo

Let's Unify the Net!

Standards News and Solutions for Web Designers

How to Read an XHTML Doctype Declaration

Author: Jan Hunt

Have you ever noticed the doctype of an HTML/XHTML document? For that matter do you even use one? What is a doctype for and what does it all mean?

These are questions that I asked myself early on in the game of Web design and I am glad I did, because the answers are important. Take a look at the source code of the Web pages out on the Internet. I would venture to say that "most" of the pages do not include a doctype or character encoding information (you can read more about character encoding on UniNetNews.)

I can almost guarantee you that the reason this information is not being included is because designers don't understand them. Doctype and character encoding declarations are very important topics, so I hope this article can clear up the mystery and prompt you to start including the information in your documents.

Also, read How good will your website look and how well will it function as the Web evolves? which explains how the move towards Web standardization and XML have resulted in new requirements for Web documents.

Declare your Doctype and Validate your Documents

The DTD has become more important in the move to standards compliance and XML because it states the rules of that markup language. These rules provide details of each element, their order, what attributes they can take and other markup information, such as, if the element is block-level or in-line.

Validation simplifies HTML/XHTML processing because:

  1. The browser does NOT have to be programmed with an innate understanding of all the tags (as they do now), and
  2. the browser will not perform any error correction (as they do now).

Validation will remove some of the work browsers currently need to do to render a page. As HTML moves into the XML environment, validation will become more important because XML usually requires valid documents.

Doctype Declarations

Below are the XHTML 1.0 doctype declarations followed by the XHTML 1.1 and XHTML Basic doctypes. This article will only touch on the differences of these doctypes but will breakdown and explain each component of the declaration.

XHTML 1.0 Doctype Declarations

Here is how you would write the 3 different doctype declarations and a brief description on why you would pick one over the other:

XHTML 1.0 Strict


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

XHTML Strict - Use this when you want a document that is pure structural mark-up, free of any tags associated with style (see the UniNetNews article Why XML? For the Web Designer for more information on the makeup of a web document - structure, content and style). Use the strict DTD along with the W3C's Cascading Style Sheet language to get the font, color, and alignment effects you need.

XHTML 1.0 Transitional


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html 

	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

XHTML Transitional - Most people writing web pages for the general public will want to use the Transitional DTD. This DTD works best in older and current browsers that don't fully support Cascading Style Sheets. Transitional documents can use Cascading Style Sheets, but authors can still include style tags within the document for people who are viewing your pages with older browsers that don't understand style sheets. For example, using BODY with bgcolor or setting text and link attributes such as color, etc.

XHTML 1.0 Frameset


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"

     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

XHTML Frameset - The frameset DTD is used when your page is designed using "frames". Frames allow you to split the browser window into separate sections and each section is a different page. Each page should validate as either strict or transitional and is brought together with the Frameset document. Frames are considered a deprecated technology.

Frameset Document and the Frameset DTD

Use the Frameset DTD only in the frameset document - not the documents called by the frameset.

Called documents are either strict or transitional. Below is an example of the Frameset document that calls the documents used in the frames.


<!DOCTYPE html  

	PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"

	"http://www.w3c.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>

 <title>Frameset Page</title>

</head>



<frameset rows="130,*">

  <frame src="top.htm" name="top" scrolling="no" frameborder="0" />

  <frame src="main.htm" name="main"  scrolling="yes" frameborder="0" />

</frameset>

</html>

So, in the example above, the Frameset document is calling top.htm and main.htm.

Top.htm and main.htm would have a DTD of either Strict or Transitional, NOT Frameset.

XHTML 1.1 Doctype Declaration

XHTML 1.1


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html 

	PUBLIC "-//W3C//DTD XHTML 1.1//EN"

	"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

XHTML 1.1 - This document type is essentially a reformulation of XHTML 1.0 Strict using XHTML Modules. While XHTML 1.1 looks very similar to XHTML 1.0 Strict, it is designed to serve as the basis for future extended XHTML Family document types. Its modular design makes it easier to add other XHTML modules as needed and to create and include, in one document, other markup languages.

XHTML Basic Doctype Declaration

XHTML Basic


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html

	PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"

	"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

XHTML Basic - This document type was developed for browser agents that do not support the full set of XHTML modules, such as mobile phones, PDAs, pagers and settop boxes.

XHTML Basic uses the core modules from XHTML Modularization along with the images, forms, basic tables, and object modules. XHTML Basic can be extended with additional modules from XHTML Modularization or by writing your own modules. XHTML Basic is targeted at wireless, mobile applications.

Breaking It Down

There is a required order of the XHTML prolog and document type declaration as stated in the XML 1.0 specification.

  1. XHTML documents should begin with an XML declaration which specifies the version of XML being used and the character encoding of the document.
    <?xml version="1.0" encoding="UTF-8"?>

    The XML declaration is optional, but recommended. It is often left off due to problems in some browsers and with PHP or ASP because of the similar angle bracket notation of   <?PHP ?>  and  <% %> . You can get around the PHP/ASP problem by using JavaScript to write in the XML declaration.

    If you do not include the XML declaration then make sure to declare your character encoding in the meta tag as such:
    <meta http-equiv="Content-type" content="text/html; charset=UTF-8" />.
    You are safe to use the XML declaration unless your site still has older browsers accessing it. As browsers become more "XML aware" (and the most current ones are) we will not have this problem.

  2. The document type declaration must appear before the first element in the document followed by the DTD.
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

  3. The first element of an XHTML document is the HTML root with XML namespace.
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

Line-By-Line Description


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

  1. If the document is XML (which XHTML is), you would have the XML version declaration and character encoding information first.

  2. The next line starts with the tag DOCTYPE which is the SGML declaration of the version (remember, SGML is the basis of HTML). And, too make things confusing, even though XHTML requires tags to be in lowercase, DOCTYPE is the exception to the rule. It is always uppercase.

  3. Next is html which says that you can expect the tag "html" as the opening tag of the document,

  4. PUBLIC says that the DTD document in our example is publically available,

  5. Believe it or not, but the dash is really a negative sign (versus a plus sign) which means the organization following it (the W3C) is NOT a registered ISO organization (ISO International Standards Organization).

  6. Following the dash is the type which in our example is a DTD,

  7. then the label which says XHTML 1.0 Strict,

  8. the EN means the language is English (the language of the markup not the content. HTML is always EN).

  9. The next line is the URI or location of the DTD. So, in our example the DTD is on the W3C site. If you type in that address in your browser you will see that it does take you to the DTD.

  10. Next is the opening html tag (which in XHTML is always lowercase). The html tag has an attribute which is called an XML namespace.

  11. XHTML 1.0 will always use this namespace:
    <html xmlns="http://www.w3c.org/1999/xhtml">
    Namespaces are important when developers mix more than one markup language. In XHTML, they will be used to distinguish the XHTML elements and attributes from other markup language elements and attributes with the same name (for example the title element in XHTML from a title element that is meant to be the title of a book). There is more to using namespace declarations than is mentioned in this article.

    So the root element html has an attribute of xmlns which has a quoted value of the URL ("http://www.w3c.org/1999/xhtml").

Reading the XML standards - the Prolog

You do NOT need to read this section to understand DOCTYPE declarations. It was added for those of you who are interested in reading the XML specification.


prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?

Not an easy syntax to read, especially when you are use to reading the DTDs which has grammar that looks enough like the XML BNF grammar to be confusing. BNF grammar is not a DTD grammar, it's on a much lower level (BNF is often used to create a compiler for a given programming language). In DTDs element order is done with commas, but in the EBNF used to describe XML 1.0 itself, item order is just the order presented. If there were ORs (the "|" symbol) between those items, then there is no ordering, as in the production for Misc:


Misc ::= Comment | PI | S


[1] document ::= prolog element Misc*

If you click on the "prolog" link in the actual W3C's XML 1.0 specification that describes a "well-formed XML Documents", you'll be taken to a later section of that document that reads -


[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?

. . . which essentially says "The prolog consists of an optional XML declaration, optionally followed by "Misc", optionally followed by a DOCTYPE declaration -- and maybe some more "Misc" (Misc is defined as being a comment, a processing instruction, or just plain whitespace)."

All of this is really just saying, "In an XML document, the XML declaration -- if you use it -- MUST COME FIRST".

Reading either the XML Recommendation or the EBNF production rules in them can be a bit on the ponderous side. Tim Bray's "Annotated XML" at http://www.xml.com/axml/testaxml.htm makes it a (slight) bit easier.

If you're into figuring out the XML specification, a great guy (Jelks) from the Yahoo! group XHTML list wrote "Just the EBNF" - EBNF without the prose. An example from this article, tells you that <empty/> and <empty /> are syntactically equivalent as far as XML is concerned.

To learn how to read EBNF, check out Lars Marius Garhol's "BNF and EBNF: What are they and how do they work?" (you only need to read up to, but not including, the section called "Parsing").

Someone from the XHTML list said "The XML Declaration is not a PI (processing instruction), it just looks like one."

In Summary:

You need to pick what DOCTYPE is appropriate for you. You may even decide that HTML 4.01 is going to work better for you now, then trying to train staff to code XHTML. The important thing is to make sure EVERY page validates! You will be far ahead of the game if you do.



Valid XHTML 1.0! Valid CSS! Bobby Approved Triple A