Sunday, February 26, 2006

Site Design Update

I updated the design of my web site recently. The update was long overdue. In mid-January 2006 I created this blog on Blogger.com, and I chose a Theme they provide named Sand Dollar. I really liked the look of that Theme, and so I decided to use that as the basis for my whole site.

Now that the update is done—for now—I decided some of you might be interested in what I did, and why.

I changed the look of the pages, and I changed the structure. More importantly, though, I separated the presentation details from the content. Many people call that process "separating style and content," and while that sounds fairly accurate to me, some experts disagree. I'll leave that discussion for another day, or maybe never! Anyway, my stylesheet(s) now control almost everything about the way the pages look. The HTML elements in the content describe what the content is, and not how it looks.

This approach, and the explicit structure of the pages, allows me to use styles to better advantage, and the HTML content is simpler than it was before.

Page Layout

I chose a relatively simple page layout because I wanted my pages to work on Internet Explorer 5 and 6. Those browsers have a lot of CSS-related bugs and a complex layout would mean dancing around those bugs one way or another. Many page authors use CSS hacks to get around the IE bugs, but I wanted to avoid that if possible. I don't have the time or the equipment to test my pages on a wide variety of browsers and platforms, and keeping the page layout simple reduces the amount of testing necessary.

All the pages use the same basic HTML shell. The shell includes containers for the page header, the main content, and a sidebar, roughly as shown below:

header
main content
sidebar

Using those containers, and CSS selectors, I can control the style of HTML elements within a particular section. For example, lists in the sidebar are styled differently than lists that appear in the main content.

The site structure, and the associated CSS stylesheet, should make my site more convenient for visitors. For example, if you print a page, the sidebar is omitted, and the main content will expand to fill the printable area of the paper.

Simpler HTML

If you look at the page source, you won't see a lot of HTML clutter, such as <BR>, <FONT> and <SPAN> elements everywhere. (I'm talking about pages over on my site, not here in this blog. The blog HTML is mostly controlled by Blogger.com.) On most pages, the main content has a section heading or two using <Hn> elements, and paragraphs using <P> elements, and that's about it. Other pages use ordered and unordered lists (<OL>, <UL>).

One enabler for writing simpler HTML was my decision to just give up trying to make pages look good in old browsers. In old browsers, like Netscape 4, you'll see the content but it will be as plain as vanilla ice cream. Most new browsers have a "disable styles" command, and you can use it to see what I mean. (In Firefox, you can use the View > Page Style > No Style command to disable styles.) The pages are still legible, but all formatting is omitted. I don't want to prevent users of old browsers from using my site, but I am not going to bother to try and make the site look the same for them as it does for people using a newer browser. I'd rather encourage visitors to upgrade their browsers!

HTML 4.01 Strict

As I updated the pages, I did a fair amount of HTML cleanup. I chose to use the HTML 4.01 Strict doctype, so that means my pages don't use <B> elements, <I> elements, or any other elements or parameters that are valid in HTML 4.01 Transitional but not valid in HTML 4.01 Strict. I chose HTML 4.01 Strict because I think it's better to separate presentation from content. In addition, using the HTML 4.01 Strict doctype helps ensure consistent results across browsers. The pages won't be formatted exactly the same on all browsers or platforms, but they will look very, very similar.

All the pages that use the new style have been validated with the W3C Markup Validation Service. I use the excellent Web Developer extension for Firefox to make that process quick and easy. While viewing a page in Firefox, I can key Ctrl-Shift-A to validate the current page.

I used to think that eliminating the <B> and <I> elements in HTML 4.01 Strict forced an unnecessarily severe separation of presentation and content, but I gradually changed my mind about that. During the past year or so I have been converting or eliminating those elements (and other deprecated elements) in my HTML pages and I discovered that the process encouraged me to make better decisions about what level of emphasis was appropriate for a particular section of text. In some cases, I cleaned up HTML that was just lazy on my part. For example, I had some section headings that were implemented like this:

<P><B>Heading Text Here</B></P>

If the text truly is a section heading, then it should use one of the <Hn> elements:

<H2>Heading Text Here</H2>

Let the Brower Do Its Job

I removed "go to top" links and links that open new pages as I edited the HTML. I think people should use the controls their browsers provide rather than have web pages include links or other widgets that duplicate the browser functions. For example, if you want to go to the top of the page, from anywhere on the page, most browsers have a key combination you can use. In Firefox, you can press the Home key or Ctrl-Home keys on your keyboard. (One moves the focus as it scrolls to the top of the page, and the other doesn't. I forget which because where the focus is doesn't usually matter to me and I don't go to top that often.) Similarly, if you want to open a link in a new tab, or in a new window, you can right-click and select a command to do that from the context menu.

My mouse has a scroll wheel that is also a button, and in Firefox, clicking a link with that middle button opens a link in a new tab. Very convenient.

Anyway, I am now leaving browser functions to the browser.

Controlling Line Widths

The only tricky thing about my new site layout is the width of the main content. If you have a high-resolution monitor and your browser window is maximized, you should see that lines of text in the main content section wrap to a new line even though there is more space available on the righ-hand side of the window. That is deliberate. Long lines of text are hard to read, and so I set a maximum width that keeps the line width reasonable. It's relatively easy to force the text to wrap at a particular width, but it's a little harder to make it wrap at a particular spot or before if the window is smaller than the maximum width.

Actually, it is easy in good browsers, because CSS2 includes min-width and max-width properties to constrain the width of boxes to a range:

min-width: 20em;
max-width: 35em;

In the example above, the widths are specifed in em units. An em is a typographical measure that is defined relative to the current typeface. It is usually equivalent to the width of the letter M in that typeface, and that is the origin of the name. 35em is recommended as the maximum line length by various experts.

Internet Explorer does not support the min-width and max-width properties, and that's where things get difficult. I could have left it at that; site visitors using good browsers would see text wrapped at a reasonable width, and Internet Explorer users would see text that wrapped at the right edge of the window. I chose not to do that because my old site design used nested tables to constrain the width of paragraphs and I didn't want to take a step backwards.

After a lot of searching and some experimentation, I found a compromise. I implemented a simple version of a technique first described by Svend Tofte in his article max-width in Internet Explorer. Basically, Internet Explorer supports (non-standard) expressions in CSS properties, and it is possible to implement a reasonable substitute for the max-width property like this:

min-width: 20em;
max-width: 35em;
/* IE5/6 version of max-width */
width:expression(document.body.clientWidth>700?"500px":"auto");

The expression tests the width of the body element. If the width of the body element is greater than 700, the width is constrained to "500px" (500 pixels). If the width is less than or equal to 700, the width is set to "auto", which means that text will wrap at the right edge of the window. It's not exactly what I want, because I'd prefer to wrap the text based on the size of the current font, not on the size of the window, but it's the best I could do in IE.

Sven described a more elaborate version of the technique that converts pixels to EMs but I decided to use a simpler version.

Compliant browsers ignore the width property because it is not valid CSS syntax, and so they use the min-width and max-width properties to determine the width.

Global White Space Reset

I used one other technique which is worth mentioning. It's called Global White Space Reset and it was originally described by Andrew Krespanis. Essentially, a simple CSS rule is used to set the margin and padding properties to 0 for all elements:

* {
  margin: 0;
  padding: 0;
}

The rule above cancels padding and margins for all HTML elements, which means that you subsequently have to define margin and padding properties for every HTML element you use that needs non-zero values. While that sounds like a lot of work, it's actually easier than accepting the default values that vary by browser because writing CSS that selectively resets margins and padding requires knowing how every browser, and every browser version, on every platform, sets margins and padding values. If that sounds impossible, that's only because it is. The technique has some pitfalls, so read Andrew's description for more details and some examples before you use it on your site.

Content Management

I should also mention that I build my home page, and the sub-sites for Second Site and TMG Utility, using a homegrown content management system that mixes content files with page templates to produce finished pages. If I change a page template, or one or more of the content files, I can regenerate the site very quickly. I have about 400 pages on my site, and it would be very painful to change the site design without a content management tool. If you have a small site, you probably don't need one. Some people use commercial content management tools, or HTML editors that have some content management features. I haven't found one I like. They all have some fatal flaw, like bad HTML output, or quirky WYSIWYG editors, or worse, and so I use my own program.

Over and Out

I hope you found this article useful. If you are considering an update of your site, some of what I did will apply to your project. In particular, separating presentation details from content, keeping the HTML simple, and making better use of styles are goals every web publisher should consider during a site overhaul.

Saturday, February 18, 2006

IE7 Wants a PC All It's Own

A user reported that a page produced by Second Site didn't look right in a beta version of IE7. Given that the page looked right in Firefox and in IE6, and given that IE7 is supposed to be more compliant with standards, one would think the page would look right in IE7.

The first thing to mention is that IE7 is still in beta, and so an IE7 bug may be responsible for the problem. On the other hand, pages produced by Second Site use pretty vanilla HTML and CSS, and it seems likely that an IE7 bug that affected Second Site would affect a lot of sites and there would be a general hue and cry about it. There isn't, so that means Second Site might be doing something wrong.

The second thing to mention is that some changes in IE7 are going to break existing sites that used work-arounds to make IE6 render pages properly. There's lots of discussion of this on the web, including an article CSS Hacks and IE7 on Position Is Everything, a very useful site that discusses CSS bugs in modern browsers and other CSS-related topics.

It turns out that Second Site avoids using those CSS tricks and so the problem isn't one of those.

At this point, I don't know what the problem is, and that brings us to the reason for this blog entry. I am going to have to install a version of IE7 and see the problem firsthand to solve it.That should be easy, but it isn't...

Microsoft says that IE7 can not coexist on the same PC with IE6.

Here's my reaction to that bit of news.

  1. It wasn't a surprise. The same was true of previous versions of IE. It's been a PITA for a while.
  2. I need a working copy of IE6 on the PC I use to develop Second Site. Whenever I make any change to Second Site, I test the change using Firefox 1.5 and IE6. Those two browsers are used by the lion's share of the browsing audience and I need both of them on my main development PC.
  3. I don't want to risk the stability of any other PCs I use for other tasks by installing IE7.
  4. My development PC currently has the following non-Microsoft browsers installed and operating properly:
    • Firefox 1.5.0.1
    • Firefox 1.0.7
    • Mozilla 1.7.7
    • Opera 8.5
    • Opera 7.54
    • Netscape 8.1
    • Netscape 7.2
    • Netscape 7.1
    • Netscape 6.2
    • Netscape Communicator 4.76

    Why is it that all those other browsers can peacefully coexist, but IE6 and IE7 won't? It's a rhetorical question. I don't really want to know the answer; I want Microsoft to fix the problem.

I can't find the blog link now, but some developers have argued that other developers ought to stop whining about this problem and buy a PC to use for IE7 testing. I don't know about you, but I don't have money burning a hole in my pocket that I am just dying to spend for a PC dedicated to IE7, and I don't have an extra PC lying around. When Bill Gates sends me a check for $1000, that's when I'll consider buying a PC for IE7 testing.

Meanwhile, I found some unofficial instructions for installing IE7 without uninstalling IE6. If I install IE7 at all, I'll probably do it that way.

Sorry about this rant going on so long, but there's one last thing. The web is a democracy, and you get a vote. In fact, you get several. You vote by the sites you visit, the ads you click, and the tools you use. If you are using Internet Explorer, you are essentially voting for Microsoft to keep operating the way they have been for a few years now. That's a bad use of your voting power. I urge you to download and use Firefox. Vote for the challenger, so that the incumbent will know you demand a better browser.

Friday, February 03, 2006

XML

I mentioned XML in a recent message sent to the Second Site News mailing list, and that generated some questions from subscribers, including "What is it?" and "Why would I use it?" It is beyond the scope of a blog entry to explain XML in great detail, but I can provide you with a brief description and some reasons why XML might become more important to you in the future.

XML stands for "eXtensible Markup Language". XML is a text processing standard, a set of rules and other criteria that define how applications store and process data stored in text files. XML is very useful for exchanging data between applications because the standard is fairly simple, and very precise, and that reduces the cost of implementing the importing and exporting components of a program.

XML is eXtensible because users can define the data elements they need for their application. Data can be exchanged between applications when multiple developers agree on a set of data elements.

Here's an example of some text data, and the same data in XML. I've chosen some data elements that represent the typical fields in a standard U.S. address.

Text XML
John Doe, 15 Main Street, Anytown, MA 99999
<name>John Doe</name>
<address>
    <street-address>15 Main Street<street-address>
    <city>Anytown</city>
    <state>MA</state>
    <zipcode>99999</zipcode>
</address>

If you know HTML, you may notice that HTML looks similar to XML. In fact, there is a version of HTML that is XML. It's called XHTML. HTML and XML are similar because they are both based on an older text standard.

There are many uses for text documents, and XML is now the format of choice for many of those uses. For example, many applications write user preferences to XML files.

XML is a popular standard, and so there are many tools available for developers and for end-users. One tool is XSLT, a tool for converting XML documents to some other format. Second Site uses XSLT to convert TMG data to web pages. Second Site converts the TMG data to XML, and then passes the XML to an XSLT stylesheet, which converts the data to HTML. By leveraging XML and XSLT, I saved time and effort implementing Second Site, and the users benefit, too. The transformation facility in Second Site is more powerful and flexible than it would have been if I had to develop it all myself, and users can learn about XML and XSLT on the web and in books. Anything they learn about XML or XSLT from customizing Second Site will transfer to similar projects with other tools that use XML and XSLT.

You don't have to know anything about XML in order to benefit from it. Software developers understand the value of using XML to exchange data, among other things, and so they are building XML capabilities into their applications. As time goes on, you'll see more and more applications using XML.

XML is not ideal for all text applications. One downside to XML is that it is verbose. There are 43 characters in the text-only version of the address data above. There are 147 in the XML version if you don't count the indentation and line breaks. (They aren't necessary in XML; I added them to make the structure clearer.) The overhead varies dramatically based on how much structure the data requires.

There has been a lot of talk in genealogical circles about XML replacing GEDCOM. As of February, 2005, there is little or no support for XML in any of the widely-used applications. That's unfortunate, as GEDCOM needs to be replaced.

I am adding an XML Export feature to TMG Utility so that users can export a subset of their TMG project data to XML. I am including some stylesheets so that data can be converted to web pages. The XSLT Stylesheets I provide will also serve as examples for people who want to write their own... more on that later.

In the initial release, the XML Export facility will support these data items:

  • DNA Log
  • Exhibit Log
  • Master Source List

If you have ever wanted to create a custom report for one of the data items above, but you were unable to get the results you want using one of TMG's built-in reports, then you might want to consider using TMG Utility's XML Export feature. XSLT Stylesheets are not for the technically naive, but they are easier than writing a database application to produce a report. If you can't write an XSLT Stylesheet to produce the report you want, maybe a cousin, friend, or power-user can.