UniNetNews Logo

Let's Unify the Net!

Standards News and Solutions for Web Designers

XHTML Versions Demystified

Author: Jan Hunt

To understand how XHTML was developed and evolved through the different versions we need to start with a look at HTML, in particular, HTML 4.01 - the last version of HTML. The versions HTML 4.01 and XHTML 1.0 are further broken down into three versions - Strict, Transitional and Frameset, and each one has its own DTD (Document Type Definition).

Which DTD you choose for your document will have an impact on how easy it is, down the road, to update your pages from either HTML to XHTML or XHTML to a higher version of XHTML like XHTML 1.0, XHTML 2.0 etc.

The evolution of markup also requires Web designers understand that a typical document is made up of three layers:

  1. structure (titles, headings, paragraphs, etc.),
  2. content (the data that is being marked up),
  3. and style (display information, presentation, such as font, centered, etc.)

For a more detailed discussion on the interaction of these layers, read the UniNetNews article Why XML? For the Web Designer.

Other related articles on UniNetNews:
How to Read an XHTML Doctype Declaration and How Good Will Your Website Look and How Well Will It Function as the Web Evolves? which discusses coding to standards, the DTD and validation.

HTML 4.01

HTML 4.01 Strict: The strict DTD does not support deprecated (phased out) elements and attributes. In most cases, these would be style elements and attributes that will be moved into style sheets. Pages that validate to the strict DTD will be the easiest of the three HTML 4.01 versions to move to XHTML.

HTML 4.01 Transitional: The transitional DTD (also called "loose") does support the deprecated (phased out) elements and attributes. This DTD is popular because a designer does not need to know style sheets, resulting in a document that is a mixture of style and structural tags and of course the content that is marked up by these tags. Because XHTML is moving away from style tags to style sheets, pages that validate to this DTD will be more difficult to update to XHTML.

HTML 4.01 Frameset: The frameset DTD is used when your page is designed using "frames". Frames allow you to split the browser window into separate sections and each section is a different page. Each page should validate as either strict or transitional and is brought together with the Frameset document. Frames are considered a deprecated (phased out) technology, although XHTML 1.0 has a frameset DTD.

XHTML 1.0

This markup language is the first step in the move from HTML to XML. It became a W3C recommendation on January 26, 2000. XHTML 1.0 requires a document to be marked up using the tag sets described in the XHTML DTDs. There are other criteria required for a "strictly conforming" XHTML document that Web designers have not had to use before, such as DOCTYPE, root element and XML namespace.

XHTML 1.0 Strict: The XHTML strict DTD is based on the HTML 4.01 strict DTD and so it also does not support deprecated (phased out) elements and attributes. Style elements and attributes that are moved into style sheets. Pages that validate to the strict XHTML DTD will be the easiest of the three XHTML 1.0 versions to move to XHTML 1.1.

XHTML 1.0 Transitional: The XHTML transitional DTD is based on the HTML 4.01 transitional DTD and so it does support the deprecated elements and attributes. This DTD is popular because a designer does not need to know style sheets, resulting in a document that is a mixture of style and structural tags and of course the content that is marked up by these tags. Because XHTML is moving away from style tags to style sheets, pages that validate to this DTD will be more difficult to update to XHTML 1.1. These pages will display most consistently from browser to browser (because style sheet support differs from browser to browser) and in older browsers (which don't support style sheets).

XHTML 1.0 Frameset: The XHTML frameset DTD is based on the HTML 4.01 frameset DTD and so it is used when your page is designed using "frames". Frames allow you to split the browser window into separate sections and each section is a different page. Each page should validate as either strict or transitional and is brought together with the Frameset document. Frames are considered a deprecated technology and everything I have read says they will not be supported in XHTML 1.1 and up (but there is a frames module listed in the abstract modules in the W3C XHMTL Modularization specification, so go figure).

In order for an XHTML document to validate, the inclusion of a doctype, document character encoding and identifying the document language is now required. You might want to read the UniNetNews article How Good Will Your Website Look and How Well Will It Function as the Web Evolves?.

Modularization of XHTML

"XHTML Modularization defines a collection of abstract modules that can be grouped together and used as the basis for future document type definitions."

There are two parts to a module, the abstract definition - a people readable definition and the module implementation - the DTD fragment which describes the elements, attributes and content model.

The modularization of XHTML breaks down "XHTML 1.0" into sets or modules of related elements and attributes. The deprecated or "transitional" elements and attributes of HTML 4 (font, u, s, strike, basefont, center, dir, isindex, menu) have been removed entirely or relegated to the "legacy" module and the presentational elements (b, big, hr, i, small, sub, sup, tt) are separated from structural elements.

With XHTML Modularization it is possible to create markup languages that are purely structural, using stylesheets for document style and presentation. Currently, these languages are most appropriate for use in generic XML applications, at least until browsers catch up with the industry and fully support stylesheets.

In order to ensure that XHTML documents will function the same, regardless of the XHTML DTD used, there are a few core modules that are required in any XHTML document based on modularization. These core modules are the:

  1. structure module,
  2. text module,
  3. hypertext module,
  4. and lists module.

Other available modules are the:

  1. applet module,
  2. presentation module,
  3. edit module,
  4. bi-directional text module,
  5. basic forms module,
  6. forms module,
  7. basic tables module,
  8. tables module,
  9. client-side image map module,
  10. server-side image map module,
  11. object module,
  12. frames module,
  13. target module,
  14. iframe module,
  15. intrinsic events module,
  16. metainformation module,
  17. scripting module,
  18. style sheet module,
  19. style attribute module,
  20. link module,
  21. base module,
  22. name identification module,
  23. and legacy module.

Content Model

The content model is the elements or data allowed within an element, for example the first element in the abstract structure module is the BODY element. The minimal content model is defined as (Heading | Block | List)* which is saying that the BODY element can contain elements from the HEADING content set such as h1 | h2 | h3 | h4 | h5 | h6, any BLOCK LEVEL elements such as address | blockquote | div | p | pre, and list-orientated elements from the LIST module such as DL, DT, DD, OL, UL, LI. And, these elements might be found zero or more times within the BODY element.

<-- Back to the XHTML Modularization discussion

XHTML 1.1

XHTML 1.1 became a W3C recommendation on May 31, 2001 and is based on the XHTML 1.0 strict DTD with just a couple of changes. You can read about the differences between XHTML 1.1 and XHTML 1.0 strict on the W3C website.

The W3C describes XHTML 1.1 as a "forward-looking markup language built using modules defined in XHTML Modularization. The purpose of XHTML 1.1 is to serve as the basis for future extended XHTML 'family' document types, and to provide a consistent, forward-looking document type cleanly separated from the deprecated, legacy functionality of XHTML 1.0 (and from HTML 4). XHTML 1.1 is essentially a reformulation of XHTML 1.0 Strict using XHTML Modules, plus ruby."

This version of XHTML is considered to be "module-based" which really just means that there are numerous modules and each is its own DTD fragment. This way, you can create a document using only the modules (DTD's) you want to use AND you can create your own modules, thus "extending" the language.

Because this document type is a reformulation of XHTML 1.0 Strict using XHTML Modules, there are some elements and attributes available in other XHTML document types that are NOT available in this document type, examples would be frames, style and legacy elements.

  1. structure module,
  2. text module,
  3. hypertext module,
  4. list module,
  5. object module,
  6. presentation module,
  7. edit module,
  8. bidirectional text module,
  9. forms module,
  10. table module,
  11. image module,
  12. client-side image map module,
  13. server-side image map module,
  14. intrinsic events module,
  15. metainformation module,
  16. scripting module,
  17. stylesheet module,
  18. style attribute module,
  19. link module,
  20. base module,
  21. and ruby module.

The W3C has a great page that lists the elements in each module and these elements link to the definitive abstract definitions found in the W3C's "Modularization of XHTML".

XHTML 2.0

There is no public draft yet available for XHTML 2.0 but the W3C calls it "a next generation markup language". The W3C site says "The functionality of XHTML 2.0 is expected to remain similar to (or a superset of) that of XHTML 1.1, however, the markup language may be altered semantically and syntactically to conform to the requirements of related XML standards such as XML Linking and XML Schema."

"The objective of these changes is to ensure that XHTML 2.0 can be readily supported by XML browsers that have no arcane knowledge of XHTML semantics such as linking, image maps, forms, etc. The development of XHTML 2.0 will likely require the development of new XHTML modules or revisions to existing XHTML modules."

XHTML Basic

This document type was developed for browser agents that do not support the full set of XHTML modules, such as mobile phones, PDAs, pagers and settop boxes.

XHTML Basic uses the core modules from XHTML Modularization along with the images, forms, basic tables, and object modules. XHTML Basic can be extended with additional modules from XHTML Modularization or by writing your own modules.

The XHTML Basic specification became a W3C Recommendation on December 19, 2000.

In Summary:

XHTML is the "next generation of HTML" that combines the best of HTML with the best of XML and allows designers to leave behind the constraints of HTML 4 and move into the future of XML.

XHTML 1.0 documents will only validate as XHTML 1.0 by using one of the XHTML 1.0 DTD's. A misconception is that you can extend XHTML 1.0 by writing your own DTD, but if you do it is no longer XHTML 1.0. With XHTML 1.1, you can create your own new elements and attributes and mix them into the master driver DTD along with any of the W3C's XHTML modules that you care to support.

The W3C lists some of the expected benefits of XHTML modularization as: "reduced authoring costs, an improved match to database & workflow applications, a modular solution to the increasingly disparate capabilities of browsers, and the ability to cleanly integrate HTML with other XML applications."

In other words, XHTML Modularization will allow a variety of devices and applications to access the same information and will allow different XML languages to be combined in one document.



Valid XHTML 1.0! Valid CSS! Bobby Approved Triple A