Arguing Semantics: HTML

Continuing on, sort of, from my previous post about semantics, I have come to a topic I deal with a lot. Semantic HTML. This is not a new idea by any means but it is something that many, many people seem not to care about at all. Even people claiming to care about Search Engine Optimisation (SEO), something to which semantic code makes a rather large improvement, often end up making very non-semantic pages.

The first thing is of course to try and separate content and presentation as much as possible. Things that govern how your page looks, and how it should be presented to various user agents (eg. web browsers) should be done in CSS. Splitting up the presentation from the actual content (your HTML) in this way usually increases the semantic meaning of the content, along with it's logical structure, as that's all that you need to concern yourself about when creating it. Tables should never, ever be used for the layout of content!

Next we have the correct use of HTML tags. I think the post The Definitive Guide to Semantic Web Markup for Blogs gives a great introduction to this concept and shows how it's useful for SEO purposes. Basically, the idea is to use the right tag for what you're trying to convey. Writing the main heading for the page? That's a <h1> tag [level 1 heading]. Creating a list of items in no particular order? <ul> [unordered list]. Oh, those are meant to convey a logical ranking (an order)? <ol> [ordered list]. Want to emphasise some text? <em> (not <i>!)

This last point needs a bit of explanation. First of all, the italic tag shouldn't be used because it only conveys a modification of the font rendering, not any inherent meaning of the text itself. Furthermore, in languages other than English, a different font rendering may be applied to convey the same meaning and we all want to internationalise our HTML right? A similar argument goes for the difference between the <b> (bold) and <strong> tags.

Interestingly, the JavaScript editor I'm using right now in WordPress inserts an <em> tag when I press the "i" button. As odd as that is, I don't mind because that's usually what I want. However, this is not always the case as the two tags are not semantically the same. Sometimes I'd like to use the italic tag to highlight a piece of text in a way that doesn't imply emphasis. The only example I can really think of right now is in bibliographies. Depending on what format you're using, various parts (such as a book's title) will be italicised. This really isn't intended to convey an emphasis on that information but is simply a visual effect. Whilst using a <span> for this and styling it appropriately later with CSS might be more appropriate I think it's too much overhead, thus I just go with a simple <i> tag and be done with it. A better way might be to do something like this:

<span class="author">Smith, John</span>, <span class="title">Amazing Book</span>. <span class="publisher">Smith Inc.</span>.

Etc. Again, I think this is too much overhead but if you were using a content management system or something else to automatically generate the bibliography for you then it might be OK.

One thing that HTML "standardistas" often fall victim to is "div-itis". <div> tags are all the rage right now. Together with <span> tags they are seeing heavy use due to their almost non-existent semantic context (they have no real meaning) and are thus use for hooks for CSS stylings. Unfortunately they do still have some meaning. A <div> implies a block of content. Indeed they are rendered as display: block by default in web browsers. It is usually not correct to wrap "naked text" in them. In the example above I used <span> because that implies a span of text. <div> should be used for structuring a page, whereas <span> should be used for marking up text. Note that in HTML5 it may not be valid to place text directly inside a <div>.

I mentioned not using tables for layout before. Tables are still perfectly valid for situations when you want tabular data. When used correctly I think tables are great but HTML authors should keep in mind that there are a few more tags to go along with tables than just <table>, <tr> (table row) and <td> (table data – a cell). If you are using heading then these should be placed in the following way:

<thead><th>Column 1 Heading</th><th>Column 2 Heading</th></thead>

Note the use of <th> rather than <td>. Following that with a <tbody> tag semantically breaks up the table body from the header. I used this structure for my post CPU Shootout: Intel vs. AMD so you can see what it looked like when styled nicely. For completeness there's also a <tfoot> tag for table footers and you can view the other tags for tables at the W3C's site.

I'm sure there's a wealth of other topics to talk about here but this should be enough to get you started. I myself fall victim to non-semantic markup but I try to learn about how to make my work better whenever I can. this site should be testament to how my knowledge and understanding of HTML continues to evolve!