[tahoe-dev] how to access files: FUSE, CLI, Dropbox-like-hack, etc.

Greg Troxel gdt at ir.bbn.com
Tue May 22 23:00:37 UTC 2012


"Zooko Wilcox-O'Hearn" <zooko at zooko.com> writes:

>> But I think the reason I find it odd is that tahoe-lafs *is* a
>> filesystem.  Now, if you mean: "it's a filesystem, but think about
>> accessing a filesystem on another machine over scp; the current
>> software interface that people use feels more like that than mounting
>> a disk onto /mnt" then I see what you mean (and indeed it's true).
>
> You're right. I don't *really* mean "Tahoe the Least-Authority File
> System is not a File System". What I really mean is "Tahoe-LAFS is not
> best used through the traditional POSIX semantics, and therefore it is
> not best used through your operating system's Virtual File System
> layer".

The examples you give (equivalent to scp) are not really that different
From open/read-all/close and open-for-write/write-all/close.  For a lot
of usage, especially backup-type, I don't think there is actually all
that much mismatch.

For more general use, there may be more mismatch, and I guess that
raises the issue of whether immutable files are what people really
need.  It should be relatively easy (for someone to do in their Copious
Spare Time, as my professors liked to say) to instrument an OS kernel
and figure out how often files are modified without being mostly
rewritten.  I suspect that outside of database files, and log files for
append, it isn't super common.

There are two issues I see between posix semantics and tahoe:

  something tahoe backup does about writing a file and then linking it in
  that inverts the usual order.  (I sort of understand this, but not
  well enough.)

  writing immutable files doesn't really make sense.  But, it makes
  sense to talk about a posix-style interface where if you don't do
  certain things it's still efficient.  Or where opening an immutable
  file for write is an error, and you have to write a new file and
  rename, which is kind of what you should be doing anyway.

Then, there's the missing feature to choose immutable vs mutable.  I see
that not as posix being fundamentally broken, but as a clue that the
posix interface needs to be made richer in a way which will accomodate a
new class of filesystems.

>> tahoe certainly has slightly unusual semantics compared to POSIX
>> (necessary because of how it works; that's not meant to be a
>> complaint), but in many ways it's not so far off.  On top of that
>> there is a culture of access via a command-line program or web
>> gateway rather than OS filesystem integration (as is normal for
>> pretty much every other filesystem), but I think that's both a
>> current cultural artifact and a reflection that the fuse support
>> isn't complete/etc.
>
> Hm, well I think the FUSE support is already nearly as good as it can
> be, and that's not very good.

There are two classes of FUSE issues.  One is simply usability - the
lack of 'tahoe mount tahoe:foo/bar ~/tahoe' with no key setup, and
that's mostly what I'm referring to.  The other class is the immutable
and ordering issues, and I agree that's harder.

> I think the semantic mismatches -- Tahoe-LAFS immutables vs. POSIX
> everything-is-mutable for starters -- mean that the FUSE layer, no
> matter how well-engineered and complete, can't provide the full
> functionality and efficiency that the Tahoe-LAFS layer provides.

That is assuming that the posix interface remains fixed.  I think we
should consider it as something that could accomodate a well-thought-out
extension for filesystems like tahoe (not specifically tahoe, because
that's not likely to get it right).

> Of course, you can always paper over any limitations of functionality
> by adding caching! But that necessarily adds latency and, worst of
> all, interesting new failure modes. I've always resisted the
> suggestion to add caching into Tahoe-LAFS itself (#316) because I
> don't want to expose users (and Tahoe-LAFS developers) to those added
> failure modes and because I think caching (and prefetching) would be
> done better done by a separate layer.

I basically agree.  It's not just caching/prefetching, but write-behind
caching and disconnected operation (e.g. coda).

> For example, you could imagine a separate box on your network -- a
> Network Attached Storage device or "NAS" -- which serves files to you
> over tried and true protocols like, uh, SMB, NFS, Gluster, or whatever
> the kids are using nowadays, and which also runs a Tahoe-LAFS client
> to backup or sync those files with the remote grid. The NAS can be
> seen from one perspective as nothing but a huge, very smart cache for
> Tahoe-LAFS.

It's a cache for tahoe if there are other places where the files
appear.  If backups are done to tahoe only, then that's not so different
From running incremental dumps to s3.

> For another example, Dropbox can be seen as a great hack to use your
> local disk and your operating system's builtin filesystem as a huge,
> very smart cache for the Dropbox remote sync protocol.
>
> So to reiterate:
>
> 1. I think the FUSE layer that we already have is pretty good, for a
> FUSE layer. It is well-engineered and reliable, and doesn't do
> anything a lot less efficiently than it could.

I was referring to the how-to-start-it glue, which I think is a big
usability issue (I know, ENOPATCH).

> 2. I think any possible FUSE layer would introduce inefficiencies and
> hide functionalities (like immutable files, capability access control,
> easy link-based sharing, ...).

Yes, this is where the extensions are needed.  But the other side of the
coin is that VFS access enables lots of programs that otherwise need to
be taught about tahoe specifically - the cross product of programs and
filesystems doesn't scale.  Which has basically led me to non-use so
far, mostly due to not enough time.

> 3. I think a good trend is to stop trying to stretch the POSIX
> semantics across the wide-area network, but rather to use POSIX
> semantics from the app to a local filesystem which basically acts as a
> highly intelligent cache, and then use newer and more
> Internet-friendly semantics from there across the wide-area network.
> I'll bet there is a lot of value to be created by following Dropbox's
> lead and adding more of that kind of functionality to Tahoe-LAFS.

Yes, that does sound interesting.

But, with the cache/semantics-adaptation layer, a key question is
whether the user's files as laid out in the vfs layer/cache match the
files as stored in tahoe.  In other words, is the cache an optimization
for what's in tahoe, or is tahoe a backend for a new distributed
filesystem?

I find bup very interesting because it has a write-new-files-only model,
at least for large files.  It doesn't address encryption or redundant
storage, just bookkeeping of source bits and (sub-file-level)
deduplication.  So bup+tahoe seems like a good answer, and someday I'll
get around to trying it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20120522/e0d688ff/attachment.pgp>


More information about the tahoe-dev mailing list