The Joy of Strict XHTML

I’ve recently discovered something else the Mozilla Firefox browser can do that Microsoft’s Internet Explorer can’t: Firefox can accept documents using the “application/xhtml+xml” header.

Who gives a shit? you might be thinking to yourself. Wait, I’ll explain. This might actually change your life someday.

For years, people have been writing web pages using the dated and somewhat arbitrary HTML 4 specification. If you don’t know what HTML looks like, take a look at the source code on any web page (by going to the “View” menu and selecting “Page Source” in Firefox or “View Source” in IE).

The problem is that during the web browser wars of the ’90s, Microsoft and Netscape both decided that they wanted their browsers to be as inclusive as possible. You could be a sloppy or an amateur coder, make all kinds of errors in your HTML, and the browser would silently compensate for you. For instance, the proper way to create a bulleted list is by using this code:

<ul>
<li>apples</li>
<li>oranges</li>
<li>bananas</li>
</ul>

But you could just as easily get away with typing this instead:

<UL>
<Li>apples
<li>oranges<lI>
<li>bananas
</ul></Ul>

Now this sort of expansiveness worked really well when the Web was new and getting people to buy the entire concept was the name of the game. You didn’t need to be a programming geek to get your chicken tortilla soup recipe before the masses; all you needed was a half-hour tutorial in HTML and you were on your way.

But we’ve entered a new phase in the development of the Internet. Web 2.0 has arrived, to use the popular catchphrase. And though you’ll hear a lot about how social networking and sharing apps are what Web 2.0 is all about, the truth is that Web 2.0 is about machines talking to machines. I write this blog entry in WordPress blogging software, which talks to the MySQL database holding all the information; talks to Ping-o-Matic and tells it to alert various search engines; and talks to your feedreading client and tells it that I’ve written a new entry. Ta-da! Machines talking to machines.

So what’s the problem with sloppy HTML? Machines can’t understand it. Which means that Google (to take one example) has to use elaborate parsing algorithms in order to turn your website into something it can understand. And while it’s all very well and good for a mega-corporation like Google to build these fuzzy linguistic interpreters, the next guy who wants to market a simple web service in his basement doesn’t have that kind of luxury.

It also means that different browsers interpret your web pages differently, and therefore display things differently. If you use Internet Explorer and you come across an improperly coded XHTML page, the browser goes into “Quirks mode” (I swear I’m not making that up) and tries to figure out what the hell you’re trying to do. Many web programmers simply code for what looks good in Internet Explorer 6 on Windows — even if it’s “wrong” or “broken” code — and to hell with all the Safari, Firefox, Mozilla, Netscape, Opera, Konqueror, Lynx and Flock users.

Enter XHTML.

XHTML is basically the HTML language, cleaned up. It’s HTML after six weeks of boot camp under a hard-ass drill sergeant. You have strict rules, and those rules must be obeyed. Take the bulleted list code above. In proper XHTML, you cannot capitalize any of the tags. You must close each tag so that for every opening <ul> there is a closing </ul>. You can’t put an <li> tag outside of a <ul> tag floating around on its own.

You can debate the merits of sending unruly teenagers to military school all you want, but for web pages there’s no debate. Strictly followed XHTML makes things easier for the machines that read your code. If we all followed the rules to the letter, Google would have a much easier time categorizing websites and it would save us all a lot of time.

Now here’s the problem: most web browsers process XHTML like normal HTML. They applaud your good manners and give you a gold star for coding correctly, but they’ll still slide right into Quirks mode when you make a mistake.

Until Mozilla Firefox. All you need to do to turn Firefox into an A-1 hard-ass drill sergeant is to (1) assign your web page the XHTML Strict DTD and (2) have your web server send the page to the browser as application/xhtml+xml instead of text/html. (You can look up how to do this elsewhere if you care. It’s basically just two lines of code.) Once you do this, Firefox will stop the display of your website cold if you’ve made any coding errors. Missed a closing <p> tag? Added an extra space? Accidentally capitalized a tag? Tough shit. Your page does not display, and you see a yellow XHTML Parsing Error instead.

(It’s important to understand that Firefox will only do this on a page-by-page basis, when that page and web server tell it to. If you don’t give Firefox these instructions, it will “fail gracefully” and display sloppily coded pages just like any other browser.)

This just might change the world.

Here’s how it might work: (1) Web programmers start migrating towards Strict XHTML. (2) Web services begin to interpret properly coded websites better than sloppily coded sites. (3) Web programmers flock to Strict XHTML in droves so their sites aren’t penalized. (4) The creators of these web services eventually decide to stop processing pages that are incorrectly coded altogether because it’s too much of a hassle. (5) The overhead for creating a useful web service goes down drastically. (6) Useful web services multiply exponentially. (7) You can search Google for “Northern Virginia Mexican restaurants,” and Google will no longer suggest that the “Pamela Andersen Britney Spears Katie Holmes Nude Sex Tits!!!” website might be the one you’re looking for.

And the world will be a better place.