Serving XHTML with the correct mime type using PHP

A lot of websites nowadays make sure that they validate to XHTML 1.0 or 1.1 standards. At least, they think they do. The sad fact is, that unless the correct mime type is being used, no browser will actually process your carefully put together XHTML as XHTML. Instead, it’s treated as tag soup, just as if you’d used any other doc type to identify your page.

Of course, for most websites there’s no real reason to use XHTML in preference to HTML 4.01 - most people (myself included) really just use it to “look cool, and be uber-geeky”. Unless you need the extensible nature of XHTML there’s no real overwhelming reason to use it.

However, if you’re going to say you’re using XHTML, then please make sure you use it properly. This is where most people (and, to be fair, browsers) fall down. Until a couple of months ago, it had never even crossed my mind that I might want to send out a different mime type with my XHTML documents.

The aim of this article is to show you how to use PHP to control the mime type given to your users’ web browsers for your XHTML pages. There are other ways to perform the same task, with httpd.conf or .htaccess files, but this is the way which I control mime types on this website.

What is the correct mime type for XHTML?

According to the W3C’s note on XHTML media types:

  • All XHTML variants should be sent out with the “application/xhtml+xml” mime type.
  • HTML 4 should be sent with the “text/html” mime type and must not be sent with the “application/xhtml+xml” mime type.
  • HTML compatible XHTML 1.0 may be sent with the “text/html” mime type (note that this applies to 1.0 only, not 1.1 or anything which comes after).

HTTP_ACCEPT

So, it would seem that the thing to do would be to simply change the mime type used whenever you write an XHTML page, wouldn’t it? If only life were that simple. Unfortunately, the world’s most dominant browser, Internet Explorer, is incapable of understanding what the “application/xhtml+xml” mime type is talking about, so we’ll need to write a little bit of code to deal with this. Thankfully, there’s a nice easy way to deal with this, since browsers are meant to tell web servers which mime types they know about via the “HTTP_ACCEPT” header which they send with every request. So, if a browser says that it can understand “application/xhtml+xml” as a mime type then we can be sure that it will cope, whereas if it doesn’t say anything about it, then we should make provisions and send as “text/html” instead.

Here’s how we do that bit in PHP:

if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) { }

Q Values

Now, we need to check the “Q values” which the browser says that it likes. “Q values” are a value between 0 and 1 which the browser can give for each mime type which it says it can deal with - “1” the browser really likes, “0” it hates. So, even if the browser says that it can deal with “application/xhtml+xml”, if it says it prefers “text/html” then that is how we should send the document. This is simply done with a couple of regexes.

$mime = "text/html";
if(preg_match("/application\/xhtml\+xml;q=0(\.[1-9]+)/i",
              $_SERVER["HTTP_ACCEPT"], $matches)) {
   $xhtml_q = $matches[1];
   if(preg_match("/text\/html;q=0(\.[1-9]+)/i",
                 $_SERVER["HTTP_ACCEPT"], $matches)) {

      $html_q = $matches[1];
      if($xhtml_q >= $html_q) {
         $mime = "application/xhtml+xml";
      }
   }
} else {
   $mime = "application/xhtml+xml";
}

We first check if there is a Q value for the “application/xhtml+xml” mime type, and if there is, we compare it to the Q value for the “text/html” mime type. If “application/xhtml+xml” has the greater Q value, or the “application/xhtml+xml” mime type has no Q value given, then the mime type to be used will be “application/xhtml+xml”.

A Special Case for the W3C

What we have so far will deal with all “real” browsers. However, there is one final case which we have to be aware of - the W3C web page validation service. Unfortunately, when their HTML validator requests a webpage, it does not say that it can handle “application/xhtml+xml”. This is a bit of a problem, since if you’re going to the trouble of writing XHTML, you’re definitely going to want to be making sure that it validates! The simple special case which we have to write is as follows:

if (stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {
   $mime = "application/xhtml+xml";
}

This ensures that the W3C will always validate the XHTML version of your webpage.

Serving the Headers and writing the Prolog

Now that we know that we’ve chosen a mime type appropriate for the web browser that’s being used, we can serve the headers for the webpage. The headers are what tell the web browser how a file should be interpreted, and can include things such as the mime type, error codes, and server information. We will be changing the “Content-Type” and “Vary” headers. PHP gives us a nice simple method for sending headers - the header function.

header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");

The “Vary” header is used to indicate that the page sent by the webserver will vary, depending on one or more of the request headers originally sent by the web browser. In our case, the page sent back is determined by the contents of the “HTTP_ACCEPT” header.

Once we’ve sent the correct headers, all that remains for us to do with XHTML compatible web browsers is to write the correct doctype and html tags. We decide which we should use with a simple “if” statement.

if($mime == "application/xhtml+xml") {
   $prolog_type = "<?xml version='1.0' encoding='$charset' ?>
      <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.1//EN' 
      'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'>
      <html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en'>";
} else {
   $prolog_type = "<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01//EN' 
      'http://www.w3.org/TR/html4/strict.dtd'>
      <html lang='en'>";
}
print $prolog_type;

Keeping things nice for the HTML Browsers

However, according to the W3C specifications, things like <br /> are not valid HTML. Now, for the majority of real world browsers, this is not an issue, as they will quite happily parse these closed single tags as if they were just normal HTML. However, there are a few implementations (most notably the standard Java routines) which, if you give them closed single tags, will interpret the first half (<br, say) correctly, and then display the /> as text. So, to ensure that everything works as expected for them, we need to make sure that if we send the web page out with the “text/html” mime type, then we must also change closed single tags into normal single tags.

Again, PHP gives us a nice easy function which does what we want - ob_start. We’ll be using ob_start to pipe output through a function which simply changes any instances of “/>” to “>”.

function fix_code($buffer) {
   return (str_replace(" />", ">", $buffer));
}

This function is called by adding the line ob_start("fix_code"); to the prolog creation code for the “text/html” mime type. This technique could easily be expanded to do more complicated things.

Putting it all Together

Putting everything together, we get the following script. Just copy and paste, and save in an accessible place on your webserver:

<?php
$charset = "iso-8859-1";
$mime = "text/html";

function fix_code($buffer) {
   return (str_replace(" />", ">", $buffer));
}

if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) {
   # if there's a Q value for "application/xhtml+xml" then also 
   # retrieve the Q value for "text/html"
   if(preg_match("/application\/xhtml\+xml;q=0(\.[1-9]+)/i",
                 $_SERVER["HTTP_ACCEPT"], $matches)) {
      $xhtml_q = $matches[1];
      if(preg_match("/text\/html;q=0(\.[1-9]+)/i",
                    $_SERVER["HTTP_ACCEPT"], $matches)) {
         $html_q = $matches[1];
         # if the Q value for XHTML is greater than or equal to that 
         # for HTML then use the "application/xhtml+xml" mimetype
         if($xhtml_q >= $html_q) {
            $mime = "application/xhtml+xml";
         }
      }
   # if there was no Q value, then just use the 
   # "application/xhtml+xml" mimetype
   } else {
      $mime = "application/xhtml+xml";
   }
}

# special check for the W3C_Validator
if (stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {
   $mime = "application/xhtml+xml";
}

# set the prolog_type according to the mime type which was determined
if($mime == "application/xhtml+xml") {
   $prolog_type = "<?xml version='1.0' encoding='$charset' ?>
      <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.1//EN' 
      'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'>
      <html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en'>";
} else {
   ob_start("fix_code");
   $prolog_type = "<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01//EN' 
      'http://www.w3.org/TR/html4/strict.dtd'>
      <html lang='en'>";
}

# finally, output the mime type and prolog type
header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");
print $prolog_type;
?>

And there you have it. All you need to do to use the correct mime type in your PHP created XHTML pages is replace everything up and including your <html> tag with a simple PHP include. So, your pages will now look like this:

<?php include "/path/to/mimetype.php" ?>
   <head>
      ...
   </head>
   <body>
      ...
   </body>
</html>

And that’s all there is to it.

References

Keystone Websites: Serving up XHTML with the correct mime type - the original script which inspired this article.

Sending XHTML as text/html Considered Harmful - The best text on the subject. It’s been around since 2002, but it’s still being updated regularly.

The Road to XHTML 2.0: MIME Types - Talks about the issues inherent in serving the correct mime type with your documents.

If you enjoyed reading this and would like other people to read it as well, please add it to del.icio.us, digg or furl.

If you really enjoyed what you just read, why not buy yourself something from Amazon? You get something nice for yourself, and I get a little bit of commission to pay for servers and the like. Everyone's a winner!

comments (23) | write a comment | permalink | View blog reactions

Comments

  1. by Simon Jessey on April 17, 2004 03:18 PM

    Nice reworking of my original article. I like the fact that you have gone into a little more detail about the regular expression than I did. Thank you for crediting me :)

  2. by Neil Crosby [TypeKey Profile Page] on April 17, 2004 04:37 PM

    That’s okay Simon - glad you’re okay with it. :o)

  3. by Jordan Bedwell on August 9, 2004 04:55 AM

    File makes invalid document and makes parsing incorrect please help me fix this!

    I don’t know whats going on but this is the error message i get.

     in the document and it causes xml to trip out widely and it wont parse my document :(

    I’m using Mozilla Firefox

    Server Environment (Dev Server)

    PHP5 Apache 2

  4. by Neil Crosby [TypeKey Profile Page] on August 9, 2004 10:41 AM

    The only thing that I can think of is that you have some white space in your webpage before you call mimetype.php, but that should only result in a “headers have already been sent” error.

    If you could email me the php file which is calling mimetype.php and your copy of mimetype.php, then I’ll have a look and see if I can work out what’s going wrong.

  5. by Jordan Bedwell on August 9, 2004 06:04 PM

    It was fixed :). It was an erorr in the document its self, well more like the encoding.

    For those of you using PHP and UTF formatting beaware. PHP doesnt support including your BOM Signature as far as I know. If you get strange characters before your Doctype Header do the following in DW MX 2K4 to resolve the issue.

    Open your .php .html .phtml .php3 or other php document and do the follwoing.

    1.) Make sure you are using a Unicode Format BOM Signature in your document (UTF - 8 and up)

    2.) Go to Modify > Page Properties > Title/Encoding

    3.) Look for your pages encoding and below that is Include BOM Signature

      • If Checked) Uncheck the document save it and this should resolve the issue in Mozilla and IE
      • If it isnt check) Check it save it, close it, open it, then uncheck it and save it again, this should resolve the issue (this is if you saved the file in notepad or another type of editing program)

    NOTE: It has been a myth to programmers who use Notepad that its all DW’s fault, when infact its not, they have said that DW does not transfer PHP files in the right mode, this is inccorect dw has full support for php 4 and does recodnize it as a datafile that does not need to be transfered via binary. So please before you blaim one of the worlds most popular editors for mac and windows do some research, blaiming is not the way of life, its all about seeing if thats really the problem ;).

    This will also be posted on http://www.jordonbedwell.com in my blog so that you can get the full instructions with pictures.

  6. by Alan on November 11, 2004 04:26 PM

    I’ve now created a variant that sends IE (and other browsers that don’t support application/xhtml+xml) XHTML 1.0 instead.

    It can be downloaded from:

    http://www.college.gameplan.org.uk/wsg/mimetype.txt

  7. by Al on May 12, 2005 12:49 AM

    Hi,

    When I use this script, it works btw, Firefox 1.0.3 reports the document as xhtml+xml but all javascript stops working ??? Any ideas ???

  8. by Neil Crosby [TypeKey Profile Page] on May 12, 2005 01:29 AM

    Without seeing your code, I’d guess that you’re probably using document.write . If you are, then you should be using the DOM Core methods to output your code instead. If not, what exactly isn’t working with your javascript?

  9. by Al on May 13, 2005 08:55 PM

    Doh! The only parts of my Javascript which appear to be broken are the document.write()’s. Forgot they were removed from the XHTML spec. Guess I should have written the code before Midnight ;) Thanks for your help.

  10. by Neil Crosby [TypeKey Profile Page] on May 14, 2005 01:20 AM

    No problem - I should really have mentioned this in the main body of the article. Thanks for bringing it up!

  11. by Anonymous on May 26, 2005 03:19 PM

    I have a similar problem, but it’s with my Flickr badge. You can read the full explanation of the problem on my beta page: http://zepfanman.com/beta

  12. by Anonymous on June 3, 2005 07:25 AM

    If you go to my beta page now, you can see the solution to the Flickr badge problem I was having.

  13. by Anonymous on September 26, 2005 11:49 AM

    I have a problem with displaying an image in a tag like this:

    <img src=”bild.php?id=7” />

    The image is in a database, so I only can show it through a php-script. When I use the mime-type “application/xhtml+xml” the image is not displayed in Firefox. When I use the mime-type “text/html” (but still Doctype XHTML1.1) it is displayed. What is the problem with this, and how can I solve it?

    Thanks in Advance

  14. by Neil Crosby [TypeKey Profile Page] on September 26, 2005 04:29 PM

    Can you email me at neil.crosby at this domain so that we can try and puzzle this out together please?

  15. by Anonymous on September 28, 2005 04:56 PM

    I am sorry. I thank you very much for your suggested help and time. I tried to make a very short example with only the image-tag in it. And what ahould I say: It works… So I first have to figure out where the problem is in my original script. Thank you although. And if I still have problems, I will email you.

  16. by grimboy on December 2, 2005 11:53 PM

    Wow, that’s going to put a massive load on the servers for next to no reason.

  17. by Scott Kimler on February 20, 2006 09:55 PM

    Neil,

    I was using your script, happily, until my host made some PHP configuration changes and then it started tossing out some index errors. I’ve since revisited the issue and can recommend a couple things …

    1. Test to see if the HTTP_ACCEPT is set.

      if((isset($_SERVER["HTTP_ACCEPT"])) && (stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")))

    2. If you’re going to the trouble of serving XHTMLv1.1, then serve XHTMLv1.0(Strict), with a MIME type of text/html, instead of HTML4.01(Strict). It’s perfectly kosher and a better fall-back to XHTMLv1.1 (imo).

    Thanks for the article (I especially like the test for the validator, as it allows me to validate against the correct standard, despite IE’s ignorance to the application/xhtml+xml MIME type) ;-)

    Cheers,

    -stk

  18. by Scott on February 20, 2006 10:02 PM

    PS … I forgot to mention that by sending the XHTMLv1.0(Strict) to IE, instead of HTMLv4.01(Strict), you also obviate the need to replace the self-closing tag (i.e., “/>”)

    This means one doesn’t need the obstart and fixcode() function.

    -stk

  19. by Dan Jacobson on February 24, 2006 11:01 PM

    Such a complex solution. One day something goes wrong and… no pages served at all!

    Compare Google, they don’t even send DOCTYPES.

  20. by Vix on March 12, 2006 10:27 AM

    indeed a good stuff!

  21. by Odgitfa on May 19, 2006 03:32 AM

    So what are the rights on this wonderful piece of code? May I use it in an open source project I’m writing?

  22. by Neil Crosby [TypeKey Profile Page] on May 19, 2006 07:42 AM

    Odgitfa: Use it wherever you like. All I ask is that you say where you got it from wherever you do use it.

  23. by F. Hovey on November 21, 2006 04:16 AM

    Your code will fail to properly handle some browsers, e.g., Opera 9.02. For example, Opera sends:

    “Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, >mage/gif, image/x-xbitmap, /;q=0.1”

    You can see it prefers html over app/xhtml+xml.

    To fix this, you need to change the reg expression in both if statements. The following will work and will catch app/xhtml+xml no matter where it is in the string and also catches cases where the semicolon and q are sometimes separated by a space “; q=0.1”.

      <?php
      $mime = "text/html";
      $charset = "utf-8";
      # special check for the W3C_Validator
      if ((isset($_SERVER["HTTP_USER_AGENT"])) && (stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator"))) {
         $mime = "application/xhtml+xml";
      } elseif ((isset($_SERVER["HTTP_ACCEPT"])) && (stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")))  {
           if(preg_match("/application\/xhtml\+xml[a-z0-9,\s\/\*\-\+]*;[\s]?q=([0-1]{0,1}\.\d{0,4})/i",$_SERVER["HTTP_ACCEPT"],$matches)) {
              $xhtml_q = $matches[1];
              if(preg_match("/text\/html[a-z0-9,\s\/\*\-\+]*;[\s]?q=([0-1]{0,1}\.\d{0,4})/i",$_SERVER["HTTP_ACCEPT"],$matches)) {
                 $html_q = $matches[1];
                 if((float)$xhtml_q >= (float)$html_q)
                    $mime = "application/xhtml+xml";
              }
           }
           else
              $mime = "application/xhtml+xml"; 
      }
      if ($mime=="application/xhtml+xml") header('Content-Type: '.$mime."; charset=$charset");
      ?>
    

other relevant pages

about wwm

workingwith.me.uk is a resource for web developers created by Neil Crosby, a web developer who lives and works in London, England. More about the site.

Neil Crosby now blogs at The Code Train and also runs NeilCrosby.com, The Ten Word Review and Everything is Rubbish.