Setup

Inside the MSI file format, again.

Far and away the most popular blog entry I have written to date is the entry where I discussed some of the inner details of the MSI file format. I find this slightly amusing since there is so little actionable information in that blog entry. I consider the blog entries about Windows Installer Components and the Component Rules far more useful to people working with setup.

In any case, that blog entry about peeking under the hood of the MSI file also got the attention of a couple of the Windows Installer developers. The first developer was very concerned about me discussing the undocumented file format. His primary concern was that more information on the file format would encourage people to try to access MSI files in strange and unusual ways that would in turn create application compatibility issues for the Windows Installer team. You would not believe the kind of hacks that exist in the Windows Installer right now that try to work around bad application installations. For a small taste check out Raymond Chen’s blog where he discusses a few of interesting application compatibility issues (not related to the Windows Installer, but should give you an idea). Ultimately, if you are working on something that depends on knowledge about undocumented features or file formats to succeed (and no, my blog does not count as documentation) then please stop right now.

The other Windows Installer developer, Carolyn, reminded me of a couple details that I had forgotten since leaving the team. I thought that errata would make for a decent follow up to what turned out to be an overly popular blog entry. Besides, just like last time, my girlfriend is once again working on her homework so I thought I’d try to be productive too.

[Note: If you haven’t read my previous blog entry you should do that now.]

First, my comment about the way structured storage files became the base format for MSI files was admittedly a bit flippant. As Carolyn reminded me, the Windows Installer was originally going to ship (in 1998 or so) on three different architectures: Intel x86, Alpha axp64, and Macintosh PowerPC. Since structured storage files were a native part of the Windows 95 and NT operating systems and the Mac Office team had already ported over support for them (to interoperate with the Microsoft Office file formats) it made a lot of sense to start with structured storage files. At least somebody had already done the cross-platform testing.

Okay, so that explains the why structured storage files were chosen for the base file format, but why use a relational database format in the first place? On this point, my memory was better. Relational databases were just the “in” thing at the time. Picking a relational database file format in the mid-1990s would be kinda’ like picking XML as your file format today. I have to wonder if, in five year’s time, anybody will be questioning why the heck so many developers picked a verbose, text based file format for so many of their applications.

Two more things of note, then I’m going to bed. I noted that the stream names in structured storage files can only be about 63 characters long. This is technically incorrect. As per the spec, structured storage file stream and sub-storage names can only be 31 characters long (32, actually, but the last must be the null terminator so I don’t count that one). The Windows Installer actually compresses the stream names to double the space available to 62 or 63 characters (plus one extra for the null terminator, but again, who’s counting?). That compression is why if you open a MSI file with dfview.exe you’ll see a bunch of gobbledygook names. That compression algorithm is not documented so I can’t comment on it. Note, even if I could comment on the algorithm, the Windows Installer team is free to change the way they compress those names at any time since nothing about the details of the MSI file format is documented.

Finally, Carolyn pointed out that while the original version of msival.exe has been deprecated and pulled from the MSI SDK string pool validation lives on. While I hope you never need it, you can verify the string pool in your MSI file by finding msiinfo.exe and passing the “-b” and “-d” options. For those of you building MSI files, the “-d” option is particularly useful when trying to validate that you finally got the codepage correct when building your MSI file.

So there we go. A bit more information and a whole lot of validation for my previous blog entry about the inside of an MSI file. I’m sure that Jenny has given up studying and gone to sleep by now so it is time for me to follow.

Until next time.