[tahoe-dev] added Medium-Sized Distributed Mutable Files to GSoCIdeas

Zooko O'Whielacronx zookog at gmail.com
Sun Mar 28 19:36:13 PDT 2010


Thanks to Kevan Carstensen's help we added Medium-Sized Distributed
Mutable Files to the GSoCIdeas list:

http://tahoe-lafs.org/trac/tahoe-lafs/wiki/GSoCIdeas

I'm very interested in MDMF nowadays because my current employer,
http://simplegeo.com, uses the Cassandra distributed key-value store
[1]. I am paying attention to how we use Cassandra and thinking to
myself "What would it take for Tahoe-LAFS to support this sort of use
case?". I think MDMF is a step on the road to that.

Appended below is Kevan's write-up of the MDMF GSoC Idea.

Regards,

Zooko

[1] http://cassandra.apache.org

Medium-Sized Distributed Mutable Files (MDMF) ¶

Mutable files in Tahoe-LAFS have some significant limitations and
performance issues, as discussed in docs/performance.txt. Users who
aren't aware of these limitations are surprised when they find out
that mutable files can't scale to large sizes without using
unacceptable levels of memory, and that reading one byte of the file
costs as much as reading the entire file.

A fix for this issue would essentially be fixing #393. That is,

    * Developing mutable files that are segmented on upload, as with
immutable files. Part of this would involve making sure that the way
we currently ensure the integrity of the parts of mutable files stored
on servers is adequate for your new design, and altering it if it
isn't.
    * Implementing efficient reading and writing of arbitrary spans of
those mutable files.

This would make Tahoe-LAFS less surprising to users, and allow mutable
files to be used in more ways than they currently are. If successful
enough, this might allow Tahoe-LAFS to support range queries or "graph
database"-style access, in the style of the "NoSQL" projects.

To learn more about this issue, you should first read
docs/performance.txt, so you're familiar with the performance problems
with mutable files as currently implemented. You should also look at
the file encoding specification, to understand how immutable files are
segmented (since you'll be doing something similar with this project).
The mutable file specification may be informative as well. The mutable
file upload and download code is in mutable, and, for comparison, the
immutable file upload and download code is in immutable.


More information about the tahoe-dev mailing list