Tuesday, June 27, 2006

Goodbye WinFS, We Hardly Knew Ye

One of the uber-cool features of Vista was supposed to be WinFS, the filesystem that was actually a database. It's not a new concept - Oracle wanted to create their own OS based on the concept (but since stuck with Linux crawling with Oracle contributions), and IBM has been doing that with the AS/400 since horses were brought to the Americas. Lotus Notes keeps most of it's file info in database tables, formatted in largely unconstrained recordsets. But those implementations were more server-centric, used to accelerate huge tables of data. WinFS was supposed to bring order to the average consumer - and finally catalog the bazillions of MP3 files we all have.

It's a good idea when you think about it... and something that Google Desktop is already capitalizing on. Now that hard drives have reached astronomical sizes it's getting harder and harder to find that Christmas card list you built two years ago. Google Desktop extracts a basic summary out of each file - the meta-data out of the data - and keeps that in a big honkin' index. That way when you search for a file, it can do an extremely fast lookup in the index instead of sequentially looking at each individual file on the drive.

WinFS takes it one step further... all "files" exist purely as records in a database schema. This can be pretty liberating when you stop to think about it - no longer do you care about what directory a file is in. You don't even need directories anymore. You can instead organize files and information any way you want as "views," or collections of records.

For example, say you have a lecture your grandmother gave on neutrino collisions, saved as an MP3. In the old filesystem model you might have a folder for MP3's, a folder for family collections and a folder for general relativity. Which one do you put it in? With WinFS you don't care - you store the document once, then you can have different views called "MP3's" that show you all your MP3 files, another view called "Family Collections" that have family members in the ID3 genre, and another view called "General Relativity" that sorts off the file's topic. The file is in one physical place on disk, but is listed under three separate views.

It's easy to see how this can be utilized with multimedia files and office documents... your music can have a ton of different, virtual "folders" by album, title, artist and genre, but the file is still only in one place. Using the old file system model, you'd have to copy the file over and over again.

There's one problem tho: this method requires a lot of computing beef. You're basically running a massive, 750 GB SQL Server 2003 database on a consumer OS. Not only does this eat memory and CPU cycles like candy, but you also come up with a whole mess of file integrity issues. BeOS tried to do this before, but there's a slew of problems running a filesystem-as-a-database-in-userspace. First off... if you lose power in the middle of a write, how do you recover? Was the data lost in the filesystem, in the database, or when being committed to disk? What about deadlocks? If you have 5000 I/O requests hitting the filesystem at one time, to you lock the database and handle each request sequentially? Do you do some sort of record-level locking? How to you stop deadlock states from a huge queue of I/O, especially with writing out to temporary files or performing huge non-sequential reads?

As far as gaming goes, WinFS would likely be a nightmare. Not only would your resources become sparse, but doing large sequential reads or small file lookups would suddenly become a drain. No longer can you just jump to an .ogg and start streaming it... you have to basically change an open file request to execute an SQL statement which gets the file as a record and then returns the file handle. Your filesystem with quick search time has just become an unnecessary bottleneck.

In the end, you have to start building a journalled filesystem on top of your journalled file system. Maybe create a separate partition that's database-centric, another partition that's node (or allocation table)-centric. Then Occam's razor comes by and stabs you repeatedly, harshing you for building some weird golem of code.

You know what ultimately can get you 90% of the way with 10% of the complexity? Symbolic links, a good journalling file system (like EXT3 or ReiserFS) and a good desktop search tool. Which is where we're at today.

It seems as if Microsoft has finally come to the realization that a filesystem based on SQL Server was just unwieldly. They first receded WinFS' scope by just bolting it on to NTFS, tantamount to the weird file system amalgam. Now it seems they've dropped WinFS altogether, perhaps realizing that the BeOS developers had a point after all.

We should probably all be happy about that. It's taken Linux devs log enough to reverse-engineer NTFS, after all.

2 comments:

  1. RE: "We should probably all be happy about that. It's taken Linux devs log enough to reverse-engineer NTFS, after all."

    Actually, NTFS didn't take long at all to reverse engineer. The problem with NTFS under Linux is the SAME PROBLEM under NT itself! I've been a core NT 3.1 sysadmin since BEFORE it's original release -- and Linux as well since the same time (1993).

    You can NEVER SAFELY WRITE to a NTFS filesystem that was NOT created by the same NT installation (let alone Linux). This is because entries in the NTFS filesystem are tied to the Security ID (SID) values in the Systems Account Manager (SAM). The SAM is in the registry for a system.

    For domains, the first "domain controller" (DC, be it old NT 3.1 PDC/BDC through new ADS 2003 peer-DC model) is the computer that has its SAM taken out of its registry and made "network-wide," while it's registry SAM is _never_ used. This is so NTFS filesystems on servers in the domain will have ALL their SIDs synchronized.

    Unfortunately, that still doesn't fix the issue, because non-DCs stall have SID information from their local registry SAM they could write to the disk.

    Because there is NOTHING WORSE than for a system to modify a NTFS filesystem without being able to read ALL the SAM information. That typically results in CORRUPTION! Especially pre-NT5.0 (pre-2000). Windows 2000 at least offers the LDM (Logical Disk Manager disk label aka partition table aka "Dynamic Disc") which stores extra SID info in special areas. But it still does NOT solve the problem, because you can STILL SCREW UP your Access Control Entries (ACEs) on the NTFS filesystem -- but at least you won't have filesystem-wide corruption.

    But it wasn't until NT5.1 (XP/2003) that Microsoft now PREVENTS different NT installs from reading each other's NTFS filesystems UNLESS you are using Dynamic Discs. If you install two different Windows XP systems on two different discs using legacy BIOS/DOS disk labels ("Basic Disc"), you will NOT be able to read each others NTFS filesystems. Why? Because XP is PREVENTING you from accidentally writing to it and screwing up the other NTFS filesystem created by the other XP install.

    And because NT doesn't offer a "safe, read-only" mount like Linux, that's why it just PREVENTS you from even seeing it (you can only see the volume "exists" in Computer/Disk Manager).

    BACK TO ORIGINAL POINT:

    CarioFS which was designed for NT 4.0 was SUPPOSED to "fix" this SAM-SID issue with NTFS. It became vaporware.

    WinFS which was supposed to go into NT 6.0 "Longhorn" was SUPPOSED to "fix" this SAM-SID issue with NTFS. It is now vaporware.

    Until Microsoft fixes this CORE DESIGN FLAW with NTFS, people will keep trashing NTFS filesystems -- or in the case of "Dynamic Discs," trashing meta-data (although the NTFS filesystem itself won't be destroyed). It seems clear that Microsoft considers the problem "mitigated" with LDM ("Dynamic Disc") and will probably make it the required default with NT 6.0 Vista.

    NOTE: "Resizing" and "Moving" NTFS partitions does NOT modify NTFS filesystem meta-data (and crucial SID info) so it can be done SAFELY with any OS. It's only when you modify NTFS filesystem meta data (like rewriting a file's attributes when writing any file) that you can have the ENTIRE NTFS filesystem corrupted -- not just the file or directory.

    ReplyDelete
  2. Very interesting about the SAM-SID issue... it explains (posthumously for some of my partitions, unfortunately) a lot of things I've seen with NTFS installs.

    The Captive and Paragon NTFS implementations seem to utilize WinXP's kernel libraries (i.e. read from XP install dll's) to do file creation and deletion - is MS kernel code currently the only entity that understands the SID magic in fs entries?

    ReplyDelete