Thursday, December 22, 2005

Implementation Changes

I've hit some snags during implementation. I've made some design changes to accommodate them. I'm going to document them as I go to keep Ken informed, and to have a place to go back to in case I ever wonder "WTF did I do that?"

Removed Owner, Group and Permissions
I removed file ownership and access control for two reasons. They would either require some ThreadContext object to track the actor, or adding a "actor" parameter to every method call to check permissions. This would pollute the interface right now. It may be added later.
Java has a nice ACL library that we may want to use instead. My mental stack is too full to figure it out right now.
Address is a first class citizen
I promoted the "Address" abstraction to be a first class citizen. It's used by Contact, Meeting, and Photo, so I think it should be 1st class. I am making it immutable for now.
Created our own MimeType
I was planning to use the activation framework's MimeType object, but it isn't immutable. I don't want the mime type that is describing a file to be changeable. It doesn't make sense for an instance of Music to have a Document mime type. I wrote a quick little immutable mime type.

Tuesday, December 20, 2005

Persisted Object Hierarchy #2

Dynamic AssociationStatic AssociationAssociationDocumentMeetingContactPhotoImageMusicAudioItem

Replaces Persisted Object Hierarchy.

Meta-Data

Dynamic Association Meta-Data

Dynamic Association

extends Association extends Item.

query [a persisted query]

Static Association Meta-Data

Static Association

extends Association extends Item.

Read/Write
relation Collection <> (promoted to r/w)
representative Item

Association Meta-Data

Association

extends Item.

Read Only
relation Collection <>

Document Meta-Data

Document

extends Item.

Defined by Data
words int
sentences int
paragraphs int
pages int ???
Read/Write
status { WORK_IN_PROGRESS, DRAFT, FINAL, ABANDONED }
contributors Collection <>
keywords Collection <>
abstract String

Monday, December 19, 2005

Meeting Meta-Data

Meeting

extends Item.

Meeting is based on the vEvent object in the iCalendar format as described in RFC 2445 (HTML).

Reference implementation: iCal4j.

summary String
description String
agenda Document (added)
minutes Document (added)
material Collection <> (added)
comments Collection <>
start Date
end Date
location Address
status { TENTATIVE, CONFIRMED, CANCELLED }
organizer Contact
attendees Collection <>
lastMeeting Meeting (added)
nextMeeting Meeting (added)
Attendee
attendee Contact
role { CHAIR, REQ-PARTICIPANT, OPT-PARTICIPANT, NON-PARTICIPANT } ROLE
status { NEEDS-ACTION, ACCEPTED, DECLINED, TENTATIVE, DELEGATED } PARTSTAT
delagatedTo Attendee delegateTo
delagatedFrom Attendee delegateFrom

Sunday, December 11, 2005

Contact Meta-Data

Contact

extends Item

Contact is based on the vCard specification. Vcard is formalized in RFC 2425 (HTML) and RFC 2426 (HTML). See Vcard as RDF/XML, hCard for more info.

Reference implementation: vCard4j.

Read/Write
name Name N
nicknames List <> NICKNAME
photo Photo PHOTO
birthday Date BDAY
addresses Collection <> ADR
telephone Collection <> TEL
email Collection <> (TODO: should there be TYPEs for this?) EMAIL
url URL URL
title String TITLE
role String ROLE
agent Contact AGENT
organization String ORG
unit List <> ORG
Name
family String
given String
additional List <>
prefix List <>
suffix List <>
Address
type[] { dom, intl, postal, parcel, home, work, pref }
poBox String
extended String
street String
city (locality) String
state (region) String
postalCode String
country ISOCode
Telephone
type[] { home, work, msg, voice, fax, cell, video, pager, bbs, modem, car, isdn, pcs, pref }
number String (vcard recomends canonical)

Friday, December 9, 2005

Photo Meta-Data

Photo's meta-data is based on the EXIF specification and the IPTC spec.

EXIF

See the EXIF v2.2 spec and sample data.

IPTC

IPTC IIM v4.1 spec.

Refernece implementations

Photo

extends Image extends Item

Read Only (exif)
Camera Camera
Exposure Time float
F Number float
Original Date
Digitized Date
Shutter Speed float
Aperture float
Exposure Bias float
Max Apature float
Subject Distance float
Metering Mode { unknown, Average, CenterWeightedAverage, Spot, MultiSpot, Pattern, Partial, other }
Flash Flash
FocalLength float
Read/Write (iptc)
Object Name String
Keywords Collection <>
By Line Collection <>
Country (primary location) String TODO: use ISO countries?
Province (state) String
City String
Headline String
Caption String
Read/Write (custom)
Subjects Collection <>
Camera
Make String
Model String
Flash
Fired { Fired, NotFired }
Returned { NoStrobeReturnDetectionFunction, StrobeReturnLightNotDetected, StrobeReturnLightDetected }
Mode { unknown, CompulsoryFiring, CompulsorySupression, Auto }
Function { Present, NoFlash }
RedEye { RedEyeReduction, NoRedEyeReduction }

TODO: should these "float" values be "rationals"?

Sunday, November 27, 2005

Image Meta-Data

TODO: Image is currently geared towards raster images, not vector images.

Image

extends Item

Read Only
width int
height int
xResolution float
yResolution float
resolutionUnit { inches, centimeters }

Music Meta-Data

The meta-data for the Music class is a subset of the ID3v2.3.0 specification. Only the most significant fileds (the ones displayed by Winamp) were kept as a proof of concept.

We are using javamusictag to handle these.

Music

extends Audio extends Item

Read/Write
Track int NULLABLE TRCK
Total Track int NULLABLE TRCK
Title String NULLABLE TIT2
Album String NULLABLE TALB
Artist String NULLABLE TPE1
Year int NULLABLE TYER
Genre String NULLABLE TCON
Comments String NULLABLE COMM
Composer String NULLABLE TCOM
Original Artist String NULLABLE TOPE
Copyright String NULLABLE TCOP
URL URL NULLABLE WXXX
Encoded by String NULLABLE TENC
Overridden
Length int TLEN

Audio Meta-Data

Audio

extends Item

Read Only
Length int Length of audio in mills

Saturday, November 19, 2005

Item Meta-Data

hese are the schemetized attributes which we feel that belong to every Item.

Item

Read Only
UID long
fileSize long
created Date
modified Date
accessed Date
Read/Write
owner User
group Group
perms Permissions
type MimeType javax.activation.MimeType
name String NULLABLE
stream File NULLABLE
related Collection
User Definable
attributes Map

Persisted Meta-Data

There are four types of meta-data per Item in SpoonFS.

System Data

Read-only data which is maintained by the file system:
  • Date
    • created
    • modified
    • accessed
  • File size
  • Owner
  • Group
  • Permissions
    • For owner
    • For group
    • For all
  • UID

Persisted Object Schema's Data

Predefined data corresponding to a known type in the file store.

User Attributes

Free-formed user-defined key-value pairs.

Associations

Dynamic
A saved query.
Static
A predefined disjunction of items.

Persisted Object Hierarchy

The decision was made to persist know file types rather than having ad-hoc meta-data schemas. We decided that it was better to have a group of known schemas that users would know what meta-data was available for those types. This helps prevent users from creating version hell with schema types for these well-know objects.

Replaced by Persisted Object Hierarchy #2.

Wednesday, November 16, 2005

Representative Elements

It should be possible to have a concept of representative elements in a static Association (i.e., an Association without a dynamic predicate). This would allow for a bunch of photos selected of a bike tour to have just one photo be the representative element for the association. One way to implement this would be to have each Photo have a Representative field, which points to the actual representative photo. Of course, then photos can be part of only one representable set. Perhaps the representative should be stored with the Association. So you ask an Item for an Association that it's part of, and then ask the Association for the representative element. If there is no representative element, just show all elements.

Again, representative elements should only be valid for static associations. If it were a dynamic Association and an element E were picked, if E's state changed so that it was no longer part of the Association, what would that mean? We would have an Association whose representative element was not part of the Association. Maybe this should be allowed, but it seems kind of dumb (e.g., why have a 10 day old e-mail be representative of the Association "All e-mails less than 7 days old"?)

As an alternative, perhaps we should meet in the middle and only allow static members of Associations to be representative. That is, when the person wants to make the member of an Association the representative, its inclusion in the Association had to be satisfied by a static part of the predicate (e.g., a particular UID) and not a dynamic part (e.g., age <>

Anonymous Associations

Since Associations are Items and the Name field is optional for Items, it is possible to have Anonymous Associations. Such a creature is useful if, for example, you just want to say that a group of things are related. E.g., I just select a bunch of photos from my bike tour and say that they are related. Sure, I *could* say that they have the name "bike tour", but why? I should just be able to associate them and not have to worry about naming immediately, especially since I can get an Item's Associations from the Item itself.

Attributes and Tags

Services like Flickr offer Tags. Attributes are more powerful than Tags. Attributes are key-value pairs. If you want tags, you can make an attribute without a value. Also, Attributes are type-safe. If you specify a Date, you get a date object. If you specify the type as Contact, you get a Contact.

Association Implementation

It is possible to implement Associations with Attributes. But don't do this.

Typesafe Attributes nd Relationships

Since attributes are typesafe, full relationships can be made. For example, it is possible to have Person Items listed as attributes of a Photo Item (e.g., if the person is in the photo). This allows for very powerful queries to be developed (e.g., show me all photos of Candice taken last year).

Tuesday, November 15, 2005

Associations

Definition

As discussed before, a traditional filesystem has Files and Directories whereas SpoonFS will have Items and Relationships. This post describes Associations and their usage in the filesystem. Formally, an Association is just a unordered set of Items. In this respect, an Association is similar to a Directory in a traditional file system.

Simple Example

However, the power of Associations extends much further than Directory capabilities. First of all, it is possible for Items (for now, just think of Items and Files as being the same) to be in more than one Association. For example, imagine that Alice has recently taken several photos from her vacation in France. One thing that she could do is create a new Association named Vacation and put all of the recent photos in this Association. Furthermore, assume that Alice visited France because of a fascination with the Eiffel Tower. Alice has several photos from other people of the Eiffel Tower and these are all under an "Eiffel Tower" Association. Now that Alice has taken her own photos, she can also put these under the "Eiffel Tower" Association. Note that this would have been very difficult to do in a hierarchical file system; a picture of Alice in front of the Eiffel tower would need to be in both an "Eiffel Tower" directory and a "Vacation" directory. A hacky solution is available on platforms supporting hard links, but many mainstream filesystems (NTFS and FAT32) do not support this feature.

Associations are Items too

As a parenthetical comment, it was mentioned that Items should be thought of as just being Files. We now extend the concept of a SpoonFS Item to include Associations as well. That is, an Item is either a File, a Spoon, or an Association. Since Associations are sets of Items, it is possible to have Associations contain other Associations. Such a use for sub-Associations could be useful in organizing system files. For example, consider a "School" Association consisting of all of the work done for school (e.g., essays, assignments, OS projects, etc.). Now consider that there is an Association named "CS508" which corresponds to any files having to do with CS508 (e.g., PDF documents corresponding to the papers read). "CS508" could be placed as an Item in the "School" Association (along with other Associations pertaining to courses). Using Associations in this way is mirroring the functionality of a hierarchical file system. In fact, with this example, we wanted exactly a subset type of relationship, which is what hierarchical file systems provide. This example is useful to demonstrate that Associations are more powerful than just hierarchies: they can implement hierarchies (i.e., sub-Associations), but can also do much more (e.g., look at the previous example involving vacation photos).

Dynamic Associations

As defined, Associations are sets of Items. The examples provided so far have offered sets that have a static predicate function. That is, the Association is told that it contains "these files", where "these files" are some set selected by the user. However, the predicate function for an Association can also be dynamic, allowing Associations to grow dynamically as new files are added. As a requirement, we allow Associations to query over the Attributes of an Item. For example, consider an Association entitled "Favourite Songs" in which the predicate function is "All MP3s whose "rating" attribute is 5 or more". An example from business could be "All Contacts that I have sent an E-mail to in the past five days". In this respect, Associations act somewhat like stored queries or views in a database. There is already a similar concept built into modern e-mail clients called virtual folders (such clients include Novell Evolution, Mozilla Thunderbird and Microsoft Outlook). Of course, virtual folders are only defined for mail messages in these applications. The great thing about refactoring this type of powerful behaviour into the filesystem is that all SpoonFS aware applications can benefit from it.

Tuesday, October 25, 2005

System Calls for Attributes

The title is a link to Dominic Giampaolo's book Practical File System Design with the Be File System. The attribute related part is in Chapter 5. From the book:

BFS stores the list of attributes associated with a file in an attribute directory (the attributes field of the bfs.inode structure). The directory is not part of the normal directory hierarchy but rather "hangs" on the side of the file. The named entries of the attribute directory point to the corresponding attribute value.
So each file has a hidden attributes directory and the attributes themselves are files in that directory. The file name is the attribute name and the file contents are the attribute's value. This allows the inode data structure to be reused. As an optimization, unused portions of inodes are used for small attributes (to limit the disk head from having to open other directories to get attribute information).

A program can perform the following system calls on attributes:

  • Open attribute directory
  • Read attribute directory
  • Rewind attribute directory
  • Close attribute directory
  • Stat attribute
  • Remove attribute
  • Read attribute
  • Write attribute
An example system call looks like:
ssize_t fs_read_attr(int fd, const char *attribute, uint32 type, off_t pos, void *buf, size_t count);
The file descriptor indicates which file to operate on, the attribute name indicates which attribute to do the I/O to, the type indicates the type of data being written (integer, double, string, etc.), and the position specifies the offset into the attribute to do the I/O at.

The exact system calls for the above are

  • DIR *fs_open_attr_dir(char *path);
  • struct dirent *fs_read_attr_dir(DIR *dirp);
  • int fs_rewind_attr_dir(DIR *dirp);
  • int fs_close_attr_dir(DIR *dirp);
  • int fs_stat_attr(int fd, char *name, struct attr_info *info);
  • int fs_remove_attr(int fd, char *name);
  • ssize_t fs_read_attr(int fd, char *name, uint32 type, off_t pos, void *buffer, size_t count);
  • ssize_t fs_write_attr(int fd, char *name, uint32 type, off_t pos, void *buffer, size_t count);
Note the API style for the last four calls: both the file descriptor of the file that the attribute is associated with and the name of the attribute are required. Making attributes into full-fledged file descriptors would have made removing files considerably more complex, so attributes are not treated as file descriptors in their own right.

Attributes

There was once a great operating system called BeOS which sported an excellent filesystem called BFS (the Be File System). In addition to being efficient, journalled, large (64-bit) and multithreaded, the most compelling feature of the operating system (i.e., the feature not found in most conventional operating systems) was attributes. Attributes are metadata associated with files on a BFS volume.

For example, MP3 files contain ID3 data identifying fields such as Artist and Title. JPEG files can contain IPTC data containing information such as the date of a photograph, the photographer, the shutter speed, the aperture, etc. Using attributes, these fields would be refactored into the filesystem, not stored in the file itself. That is, an MP3 file would not have an ID3 tag, but would instead be attributed with Artist and Title attributes.

Another example of attribute usage is storing the preferred application to view a file. For example, in Windows, I may designate that JPEG files should open in the Gimp. However, when I download my vacation photos (which are JPEGs) from my camera, I'd like the default double click action to be to open them in the Picture Viewer (so I can get an easy slide show). I do not recall if Windows currently has the ability to selectively open some files of a type in one viewer and other files of the same type in a different editor. If it does indeed have this feature, that information would have to be stored in the registry (which brings a lot of woes including loss of information on OS reinstall). Using attributes, each file could have a PreferredApp attribute that specifies which application to use. This attribute could be an OS specified attribute (so that "power users" do not accidently delete this attribute).

SpoonFS will support arbitrary attributes on files. There will be a concept of system attributes (e.g., Date, Permission) and user attributes (Width, Height for JPEGs, Author for MP3 files). As well, attributes will be type safe. For example, a Date attribute will expect a date to be entered according to the format provided by the OS locale.

Attributes present a very powerful paradigm; they are a first step for turning a filesystem into a database. Attributes are indexed and queryable, so it is possible to quickly find files. As well, attributes allow the creation of metadata-only files. For example, consider the concept of a Contact in an e-mail program or IM application. In a traditional filesystem, a Contact may be implemented as a small text file that has fields listed one per line (e.g., Name, E-mail address, Birthday, etc.). In an attribute-based filesystem, each field could be stored as an attribute on a zero byte file. In actuality, no inode needs to be allocated for the file itself; only metadata is stored. The name SpoonFS comes from this observation on the lack of existance of an data actual file. Quoting from the Matrix:

There is no spoon

Similarly, there is no file (only metadata).

Saturday, October 22, 2005

SpoonFS Motivation

This is the first post to get some ideas regarding SpoonFS down. Traditionally, filesystems have been based on the notion of a hierarchy. There are two primitives in these hierarchical systems:
  • files (which are an abstraction for a sequence of bytes)
  • directories (which contain a collection of files)
Directories are called 'folders' in some simpler OSes. Files are named objects and the names are unique (i.e., you cannot have two files with the same name). Over time, various applications have required the ability to build up databases of files that are queriable. E-mail clients, photo software, word processors, all of these clients end up building their own databases so that users may query their data orthogonally. However, this repetition by all applications points out a deficiency in the design of current file systems. Application developers are building databases of files on top of hierarchical file systems; this common functionality should be refactored into the filesystem yielding a filesystem that is itself a database of files. The hierarchy notion will be disbanded. The SpoonFS project will provide an implementation of this database filesystem. Features will include typesafe attributes, anonymous and named relations of files and fast query time.