#345 closed defect (fixed)

document write coordination

Reported by: zooko Owned by: warner
Priority: major Milestone: 1.1.0
Component: unknown Version: 0.8.0
Keywords: Cc: booker
Launchpad Bug:

Description

We've been trying to figure out how to deal with this issue on the tahoe-dev list. Most recently, we thought that uncoordinated writes would happen infrequently, and that even when it arose permanent data loss would happen very rarely, so it would be okay to rely on this probability of safety for now, rather than alternatives that would negate a planned allmydata.com product feature (shared writeable folder), or require more development work (lock server).

However, just now Brian and I realized that our current plan for "update a directory" would have a 50% chance of one of the two colliding updates silently disappearing, where a collision happens any time two directory-update processes (which can be between about 4 and 40 seconds) overlap.

This is therefore quite a likely occurrence, and requires something to be done.

My meta-engineering belief at this point is:

The fact that we didn't realize this until just now shows that we don't really understand how Tahoe behaves under uncoordinated writes, and so for the imminent Tahoe (LAUGFS) v0.9.0 release we should continue to require The Prime Directive of Uncoordinated Writes: "Don't Do That", and for the Allmydata.com 3.0 release we should implement some method or other of Not Doing That. (Such as a simple lock server.)

Here's some recent history:

We think there is a problem:

http://allmydata.org/pipermail/tahoe-dev/2008-March/000433.html http://allmydata.org/pipermail/tahoe-dev/2008-March/000436.html

We think we have a sufficiently safe solution:

http://allmydata.org/pipermail/tahoe-dev/2008-March/000438.html

and just now, on the phone, Brian and I agreed that the solution that we thought we had is not sufficiently safe.

Change History (12)

comment:1 Changed at 2008-03-12T22:17:42Z by zooko

I've heard rumor that Peter doesn't want to try to fix this issue for Allmydata.com 3.0. But for concreteness, here is an extension to the wapi to give a local process which is using the wapi (such as the SMB layer) access to a central lock server:

POST /acquire_lock?lockserver=$LOCK_SERVER_FURL&lockname=$LOCK_NAME

This blocks until it can acquire the named lock from the given lock server. This will take no more than 200 seconds, because if the lock is not released by its current owner within 200 seconds then it will be broken -- taken away from them and given to you. So typically this call doesn't, but of course you should still check the response code -- if the lock server that you specified is unreachable, for instance, or if there is some other problem, then this call can fail.

Once this call returns successfully, then you have the lock, but be warned that if you don't release it within 200 seconds, someone else may break it and start writing to the resource without telling you.

In order to avoid race conditions (where you think that 200 seconds aren't quite up yet so you can keep using the resource, but someone else thinks that your 200 seconds just finished and they can start using the resource), you should release and re-acquire the lock every 100 seconds. Of course you have to stop writing to the shared resource before you release the lock, and don't resume writing to the shared resource until after you have reacquired the lock.

(Updating a Tahoe mutable file or directory by downloading it, changing it, and re-uploading it typically takes between 4 and 40 seconds total, so this should not be an onerous requirement.)

Suggestion: if you want to get a lock

POST /release_lock?lockserver=$LOCK_SERVER_FURL&lockname=$LOCK_NAME

Call this when you are finished writing to the shared resource.

comment:2 Changed at 2008-03-12T22:19:36Z by zooko

There was an editing mistake in comment:1. Here it is again:

POST /acquire_lock?lockserver=$LOCK_SERVER_FURL&lockname=$LOCK_NAME

This blocks until it can acquire the named lock from the given lock server. This will take no more than 200 seconds, because if the lock is not released by its current owner within 200 seconds then it will be broken -- taken away from them and given to you. So typically this call doesn't, but of course you should still check the response code -- if the lock server that you specified is unreachable, for instance, or if there is some other problem, then this call can fail.

Once this call returns successfully, then you have the lock, but be warned that if you don't release it within 200 seconds, someone else may break it and start writing to the resource without telling you.

In order to avoid race conditions (where you think that 200 seconds aren't quite up yet so you can keep using the resource, but someone else thinks that your 200 seconds just finished and they can start using the resource), you should release and re-acquire the lock every 100 seconds. Of course you have to stop writing to the shared resource before you release the lock, and don't resume writing to the shared resource until after you have reacquired the lock.

(Updating a Tahoe mutable file or directory by downloading it, changing it, and re-uploading it typically takes between 4 and 40 seconds total, so this should not be an onerous requirement.)

Suggestion: if you want to get a lock on a Tahoe mutable file or directory, then take the hash of the capability and use the resultings string as the lock name.

POST /release_lock?lockserver=$LOCK_SERVER_FURL&lockname=$LOCK_NAME

Call this when you are finished writing to the shared resource.

comment:3 Changed at 2008-03-12T23:31:20Z by zooko

  • Milestone changed from 0.9.0 (Allmydata 3.0 final) to 0.10.0

Okay, we're pushing this ticket out of v0.9.0. If you are a user of allmydata.org " Tahoe" (LAUGFS) v0.9.0, and you write to a mutable file or directory at the same time as another process does, then one or the other of your versions will be overwritten. In addition, there is a tiny chance, if you have many uncoordinated writing processes, and/or few servers or unreliable servers, and your clients all crash or get disconnected at the wrong moment, then all versions will be lost.

comment:4 Changed at 2008-03-24T00:48:36Z by zooko

  • Milestone changed from 1.1.0 to 1.0.0

I think we should update the documentation to explain how uncoordinated writes are considered sort of okay with certain caveats in Tahoe 1.0.

comment:5 Changed at 2008-03-24T22:44:43Z by zooko

  • Owner changed from nobody to zooko
  • Status changed from new to assigned

"Don't Do That Very Much!"

comment:6 Changed at 2008-03-25T18:58:17Z by zooko

  • Owner changed from zooko to warner
  • Status changed from assigned to new
  • Summary changed from write coordination to document write coordination

I'm changing this ticket to be for the task of updating the documentation so that programmers of Tahoe know what are the issues with regard to write coordination.

comment:7 Changed at 2008-03-25T19:26:17Z by zooko

  • Milestone changed from 1.0.0 to 1.0.1
  • Owner changed from warner to robk

comment:8 Changed at 2008-03-25T19:26:32Z by zooko

  • Cc booker added

comment:9 Changed at 2008-05-05T21:08:36Z by zooko

  • Milestone changed from 1.0.1 to 1.1.0

Milestone 1.0.1 deleted

comment:10 Changed at 2008-05-29T22:34:31Z by zooko

  • Owner changed from robk to someboy

comment:11 Changed at 2008-05-29T22:35:11Z by warner

  • Owner changed from someboy to warner
  • Status changed from new to assigned

comment:12 Changed at 2008-06-03T06:05:30Z by warner

  • Resolution set to fixed
  • Status changed from assigned to closed

Done, in 01469433ef2732df. The last section of source:docs/webapi.txt now explains the issue and the programmer's obligations (which aren't nearly as bad, now that we have internal serialization and the weakref Node cache). It also points to the larger explanation in mutable.txt .

Note: See TracTickets for help on using tickets.