Ticket #393: 393status40.dpatch

File 393status40.dpatch, 626.6 KB (added by kevan, at 2011-03-07T08:43:10Z)

correct poor choice of internal offset representation, add tests, clarify some calculations

Line 
1Mon Aug  9 16:32:44 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
2  * interfaces.py: Add #993 interfaces
3
4Mon Aug  9 16:35:35 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
5  * frontends/sftpd.py: Modify the sftp frontend to work with the MDMF changes
6
7Mon Aug  9 17:06:19 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
8  * immutable/filenode.py: Make the immutable file node implement the same interfaces as the mutable one
9
10Mon Aug  9 17:06:33 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
11  * immutable/literal.py: implement the same interfaces as other filenodes
12
13Fri Aug 13 16:49:57 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
14  * scripts: tell 'tahoe put' about MDMF
15
16Sat Aug 14 01:10:12 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
17  * web: Alter the webapi to get along with and take advantage of the MDMF changes
18 
19  The main benefit that the webapi gets from MDMF, at least initially, is
20  the ability to do a streaming download of an MDMF mutable file. It also
21  exposes a way (through the PUT verb) to append to or otherwise modify
22  (in-place) an MDMF mutable file.
23
24Sat Aug 14 15:57:11 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
25  * client.py: learn how to create different kinds of mutable files
26
27Wed Aug 18 17:32:16 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
28  * mutable/checker.py and mutable/repair.py: Modify checker and repairer to work with MDMF
29 
30  The checker and repairer required minimal changes to work with the MDMF
31  modifications made elsewhere. The checker duplicated a lot of the code
32  that was already in the downloader, so I modified the downloader
33  slightly to expose this functionality to the checker and removed the
34  duplicated code. The repairer only required a minor change to deal with
35  data representation.
36
37Wed Aug 18 17:32:31 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
38  * mutable/filenode.py: add versions and partial-file updates to the mutable file node
39 
40  One of the goals of MDMF as a GSoC project is to lay the groundwork for
41  LDMF, a format that will allow Tahoe-LAFS to deal with and encourage
42  multiple versions of a single cap on the grid. In line with this, there
43  is a now a distinction between an overriding mutable file (which can be
44  thought to correspond to the cap/unique identifier for that mutable
45  file) and versions of the mutable file (which we can download, update,
46  and so on). All download, upload, and modification operations end up
47  happening on a particular version of a mutable file, but there are
48  shortcut methods on the object representing the overriding mutable file
49  that perform these operations on the best version of the mutable file
50  (which is what code should be doing until we have LDMF and better
51  support for other paradigms).
52 
53  Another goal of MDMF was to take advantage of segmentation to give
54  callers more efficient partial file updates or appends. This patch
55  implements methods that do that, too.
56 
57
58Wed Aug 18 17:33:42 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
59  * mutable/publish.py: Modify the publish process to support MDMF
60 
61  The inner workings of the publishing process needed to be reworked to a
62  large extend to cope with segmented mutable files, and to cope with
63  partial-file updates of mutable files. This patch does that. It also
64  introduces wrappers for uploadable data, allowing the use of
65  filehandle-like objects as data sources, in addition to strings. This
66  reduces memory inefficiency when dealing with large files through the
67  webapi, and clarifies update code there.
68
69Wed Aug 18 17:35:09 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
70  * nodemaker.py: Make nodemaker expose a way to create MDMF files
71
72Sat Aug 14 15:56:44 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
73  * docs: update docs to mention MDMF
74
75Wed Aug 18 17:33:04 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
76  * mutable/layout.py and interfaces.py: add MDMF writer and reader
77 
78  The MDMF writer is responsible for keeping state as plaintext is
79  gradually processed into share data by the upload process. When the
80  upload finishes, it will write all of its share data to a remote server,
81  reporting its status back to the publisher.
82 
83  The MDMF reader is responsible for abstracting an MDMF file as it sits
84  on the grid from the downloader; specifically, by receiving and
85  responding to requests for arbitrary data within the MDMF file.
86 
87  The interfaces.py file has also been modified to contain an interface
88  for the writer.
89
90Wed Aug 18 17:34:09 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
91  * mutable/retrieve.py: Modify the retrieval process to support MDMF
92 
93  The logic behind a mutable file download had to be adapted to work with
94  segmented mutable files; this patch performs those adaptations. It also
95  exposes some decoding and decrypting functionality to make partial-file
96  updates a little easier, and supports efficient random-access downloads
97  of parts of an MDMF file.
98
99Wed Aug 18 17:34:39 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
100  * mutable/servermap.py: Alter the servermap updater to work with MDMF files
101 
102  These modifications were basically all to the end of having the
103  servermap updater use the unified MDMF + SDMF read interface whenever
104  possible -- this reduces the complexity of the code, making it easier to
105  read and maintain. To do this, I needed to modify the process of
106  updating the servermap a little bit.
107 
108  To support partial-file updates, I also modified the servermap updater
109  to fetch the block hash trees and certain segments of files while it
110  performed a servermap update (this can be done without adding any new
111  roundtrips because of batch-read functionality that the read proxy has).
112 
113
114Wed Aug 18 17:35:31 PDT 2010  Kevan Carstensen <kevan@isnotajoke.com>
115  * tests:
116 
117      - A lot of existing tests relied on aspects of the mutable file
118        implementation that were changed. This patch updates those tests
119        to work with the changes.
120      - This patch also adds tests for new features.
121
122Sun Feb 20 15:02:01 PST 2011  "Brian Warner <warner@lothar.com>"
123  * resolve conflicts between 393-MDMF patches and trunk as of 1.8.2
124
125Sun Feb 20 17:46:59 PST 2011  "Brian Warner <warner@lothar.com>"
126  * mutable/filenode.py: fix create_mutable_file('string')
127
128Sun Feb 20 21:56:00 PST 2011  "Brian Warner <warner@lothar.com>"
129  * resolve more conflicts with current trunk
130
131Sun Feb 20 22:10:04 PST 2011  "Brian Warner <warner@lothar.com>"
132  * update MDMF code with StorageFarmBroker changes
133
134Fri Feb 25 17:04:33 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
135  * mutable/filenode: Clean up servermap handling in MutableFileVersion
136 
137  We want to update the servermap before attempting to modify a file,
138  which we now do. This introduced code duplication, which was addressed
139  by refactoring the servermap update into its own method, and then
140  eliminating duplicate servermap updates throughout the
141  MutableFileVersion.
142
143Sun Feb 27 15:16:43 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
144  * web: Use the string "replace" to trigger whole-file replacement when processing an offset parameter.
145
146Sun Feb 27 16:34:26 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
147  * docs/configuration.rst: fix more conflicts between #393 and trunk
148
149Sun Feb 27 17:06:37 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
150  * mutable/layout: remove references to the salt hash tree.
151
152Sun Feb 27 18:10:56 PST 2011  warner@lothar.com
153  * test_mutable.py: add test to exercise fencepost bug
154
155Mon Feb 28 00:33:27 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
156  * mutable/publish: account for offsets on segment boundaries.
157
158Mon Feb 28 19:08:07 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
159  * tahoe-put: raise UsageError when given a nonsensical mutable type, move option validation code to the option parser.
160
161Fri Mar  4 17:08:58 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
162  * web: use None instead of False in the case of no offset, use object identity comparison to check whether or not an offset was specified.
163
164Mon Mar  7 00:17:13 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
165  * mutable/filenode: remove incorrect comments about segment boundaries
166
167Mon Mar  7 00:22:29 PST 2011  Kevan Carstensen <kevan@isnotajoke.com>
168  * mutable: use integer division where appropriate
169
170New patches:
171
172[interfaces.py: Add #993 interfaces
173Kevan Carstensen <kevan@isnotajoke.com>**20100809233244
174 Ignore-this: b58621ac5cc86f1b4b4149f9e6c6a1ce
175] {
176hunk ./src/allmydata/interfaces.py 499
177 class MustNotBeUnknownRWError(CapConstraintError):
178     """Cannot add an unknown child cap specified in a rw_uri field."""
179 
180+
181+class IReadable(Interface):
182+    """I represent a readable object -- either an immutable file, or a
183+    specific version of a mutable file.
184+    """
185+
186+    def is_readonly():
187+        """Return True if this reference provides mutable access to the given
188+        file or directory (i.e. if you can modify it), or False if not. Note
189+        that even if this reference is read-only, someone else may hold a
190+        read-write reference to it.
191+
192+        For an IReadable returned by get_best_readable_version(), this will
193+        always return True, but for instances of subinterfaces such as
194+        IMutableFileVersion, it may return False."""
195+
196+    def is_mutable():
197+        """Return True if this file or directory is mutable (by *somebody*,
198+        not necessarily you), False if it is is immutable. Note that a file
199+        might be mutable overall, but your reference to it might be
200+        read-only. On the other hand, all references to an immutable file
201+        will be read-only; there are no read-write references to an immutable
202+        file."""
203+
204+    def get_storage_index():
205+        """Return the storage index of the file."""
206+
207+    def get_size():
208+        """Return the length (in bytes) of this readable object."""
209+
210+    def download_to_data():
211+        """Download all of the file contents. I return a Deferred that fires
212+        with the contents as a byte string."""
213+
214+    def read(consumer, offset=0, size=None):
215+        """Download a portion (possibly all) of the file's contents, making
216+        them available to the given IConsumer. Return a Deferred that fires
217+        (with the consumer) when the consumer is unregistered (either because
218+        the last byte has been given to it, or because the consumer threw an
219+        exception during write(), possibly because it no longer wants to
220+        receive data). The portion downloaded will start at 'offset' and
221+        contain 'size' bytes (or the remainder of the file if size==None).
222+
223+        The consumer will be used in non-streaming mode: an IPullProducer
224+        will be attached to it.
225+
226+        The consumer will not receive data right away: several network trips
227+        must occur first. The order of events will be::
228+
229+         consumer.registerProducer(p, streaming)
230+          (if streaming == False)::
231+           consumer does p.resumeProducing()
232+            consumer.write(data)
233+           consumer does p.resumeProducing()
234+            consumer.write(data).. (repeat until all data is written)
235+         consumer.unregisterProducer()
236+         deferred.callback(consumer)
237+
238+        If a download error occurs, or an exception is raised by
239+        consumer.registerProducer() or consumer.write(), I will call
240+        consumer.unregisterProducer() and then deliver the exception via
241+        deferred.errback(). To cancel the download, the consumer should call
242+        p.stopProducing(), which will result in an exception being delivered
243+        via deferred.errback().
244+
245+        See src/allmydata/util/consumer.py for an example of a simple
246+        download-to-memory consumer.
247+        """
248+
249+
250+class IWritable(Interface):
251+    """
252+    I define methods that callers can use to update SDMF and MDMF
253+    mutable files on a Tahoe-LAFS grid.
254+    """
255+    # XXX: For the moment, we have only this. It is possible that we
256+    #      want to move overwrite() and modify() in here too.
257+    def update(data, offset):
258+        """
259+        I write the data from my data argument to the MDMF file,
260+        starting at offset. I continue writing data until my data
261+        argument is exhausted, appending data to the file as necessary.
262+        """
263+        # assert IMutableUploadable.providedBy(data)
264+        # to append data: offset=node.get_size_of_best_version()
265+        # do we want to support compacting MDMF?
266+        # for an MDMF file, this can be done with O(data.get_size())
267+        # memory. For an SDMF file, any modification takes
268+        # O(node.get_size_of_best_version()).
269+
270+
271+class IMutableFileVersion(IReadable):
272+    """I provide access to a particular version of a mutable file. The
273+    access is read/write if I was obtained from a filenode derived from
274+    a write cap, or read-only if the filenode was derived from a read cap.
275+    """
276+
277+    def get_sequence_number():
278+        """Return the sequence number of this version."""
279+
280+    def get_servermap():
281+        """Return the IMutableFileServerMap instance that was used to create
282+        this object.
283+        """
284+
285+    def get_writekey():
286+        """Return this filenode's writekey, or None if the node does not have
287+        write-capability. This may be used to assist with data structures
288+        that need to make certain data available only to writers, such as the
289+        read-write child caps in dirnodes. The recommended process is to have
290+        reader-visible data be submitted to the filenode in the clear (where
291+        it will be encrypted by the filenode using the readkey), but encrypt
292+        writer-visible data using this writekey.
293+        """
294+
295+    # TODO: Can this be overwrite instead of replace?
296+    def replace(new_contents):
297+        """Replace the contents of the mutable file, provided that no other
298+        node has published (or is attempting to publish, concurrently) a
299+        newer version of the file than this one.
300+
301+        I will avoid modifying any share that is different than the version
302+        given by get_sequence_number(). However, if another node is writing
303+        to the file at the same time as me, I may manage to update some shares
304+        while they update others. If I see any evidence of this, I will signal
305+        UncoordinatedWriteError, and the file will be left in an inconsistent
306+        state (possibly the version you provided, possibly the old version,
307+        possibly somebody else's version, and possibly a mix of shares from
308+        all of these).
309+
310+        The recommended response to UncoordinatedWriteError is to either
311+        return it to the caller (since they failed to coordinate their
312+        writes), or to attempt some sort of recovery. It may be sufficient to
313+        wait a random interval (with exponential backoff) and repeat your
314+        operation. If I do not signal UncoordinatedWriteError, then I was
315+        able to write the new version without incident.
316+
317+        I return a Deferred that fires (with a PublishStatus object) when the
318+        update has completed.
319+        """
320+
321+    def modify(modifier_cb):
322+        """Modify the contents of the file, by downloading this version,
323+        applying the modifier function (or bound method), then uploading
324+        the new version. This will succeed as long as no other node
325+        publishes a version between the download and the upload.
326+        I return a Deferred that fires (with a PublishStatus object) when
327+        the update is complete.
328+
329+        The modifier callable will be given three arguments: a string (with
330+        the old contents), a 'first_time' boolean, and a servermap. As with
331+        download_to_data(), the old contents will be from this version,
332+        but the modifier can use the servermap to make other decisions
333+        (such as refusing to apply the delta if there are multiple parallel
334+        versions, or if there is evidence of a newer unrecoverable version).
335+        'first_time' will be True the first time the modifier is called,
336+        and False on any subsequent calls.
337+
338+        The callable should return a string with the new contents. The
339+        callable must be prepared to be called multiple times, and must
340+        examine the input string to see if the change that it wants to make
341+        is already present in the old version. If it does not need to make
342+        any changes, it can either return None, or return its input string.
343+
344+        If the modifier raises an exception, it will be returned in the
345+        errback.
346+        """
347+
348+
349 # The hierarchy looks like this:
350 #  IFilesystemNode
351 #   IFileNode
352hunk ./src/allmydata/interfaces.py 758
353     def raise_error():
354         """Raise any error associated with this node."""
355 
356+    # XXX: These may not be appropriate outside the context of an IReadable.
357     def get_size():
358         """Return the length (in bytes) of the data this node represents. For
359         directory nodes, I return the size of the backing store. I return
360hunk ./src/allmydata/interfaces.py 775
361 class IFileNode(IFilesystemNode):
362     """I am a node which represents a file: a sequence of bytes. I am not a
363     container, like IDirectoryNode."""
364+    def get_best_readable_version():
365+        """Return a Deferred that fires with an IReadable for the 'best'
366+        available version of the file. The IReadable provides only read
367+        access, even if this filenode was derived from a write cap.
368 
369hunk ./src/allmydata/interfaces.py 780
370-class IImmutableFileNode(IFileNode):
371-    def read(consumer, offset=0, size=None):
372-        """Download a portion (possibly all) of the file's contents, making
373-        them available to the given IConsumer. Return a Deferred that fires
374-        (with the consumer) when the consumer is unregistered (either because
375-        the last byte has been given to it, or because the consumer threw an
376-        exception during write(), possibly because it no longer wants to
377-        receive data). The portion downloaded will start at 'offset' and
378-        contain 'size' bytes (or the remainder of the file if size==None).
379-
380-        The consumer will be used in non-streaming mode: an IPullProducer
381-        will be attached to it.
382+        For an immutable file, there is only one version. For a mutable
383+        file, the 'best' version is the recoverable version with the
384+        highest sequence number. If no uncoordinated writes have occurred,
385+        and if enough shares are available, then this will be the most
386+        recent version that has been uploaded. If no version is recoverable,
387+        the Deferred will errback with an UnrecoverableFileError.
388+        """
389 
390hunk ./src/allmydata/interfaces.py 788
391-        The consumer will not receive data right away: several network trips
392-        must occur first. The order of events will be::
393+    def download_best_version():
394+        """Download the contents of the version that would be returned
395+        by get_best_readable_version(). This is equivalent to calling
396+        download_to_data() on the IReadable given by that method.
397 
398hunk ./src/allmydata/interfaces.py 793
399-         consumer.registerProducer(p, streaming)
400-          (if streaming == False)::
401-           consumer does p.resumeProducing()
402-            consumer.write(data)
403-           consumer does p.resumeProducing()
404-            consumer.write(data).. (repeat until all data is written)
405-         consumer.unregisterProducer()
406-         deferred.callback(consumer)
407+        I return a Deferred that fires with a byte string when the file
408+        has been fully downloaded. To support streaming download, use
409+        the 'read' method of IReadable. If no version is recoverable,
410+        the Deferred will errback with an UnrecoverableFileError.
411+        """
412 
413hunk ./src/allmydata/interfaces.py 799
414-        If a download error occurs, or an exception is raised by
415-        consumer.registerProducer() or consumer.write(), I will call
416-        consumer.unregisterProducer() and then deliver the exception via
417-        deferred.errback(). To cancel the download, the consumer should call
418-        p.stopProducing(), which will result in an exception being delivered
419-        via deferred.errback().
420+    def get_size_of_best_version():
421+        """Find the size of the version that would be returned by
422+        get_best_readable_version().
423 
424hunk ./src/allmydata/interfaces.py 803
425-        See src/allmydata/util/consumer.py for an example of a simple
426-        download-to-memory consumer.
427+        I return a Deferred that fires with an integer. If no version
428+        is recoverable, the Deferred will errback with an
429+        UnrecoverableFileError.
430         """
431 
432hunk ./src/allmydata/interfaces.py 808
433+
434+class IImmutableFileNode(IFileNode, IReadable):
435+    """I am a node representing an immutable file. Immutable files have
436+    only one version"""
437+
438+
439 class IMutableFileNode(IFileNode):
440     """I provide access to a 'mutable file', which retains its identity
441     regardless of what contents are put in it.
442hunk ./src/allmydata/interfaces.py 873
443     only be retrieved and updated all-at-once, as a single big string. Future
444     versions of our mutable files will remove this restriction.
445     """
446-
447-    def download_best_version():
448-        """Download the 'best' available version of the file, meaning one of
449-        the recoverable versions with the highest sequence number. If no
450+    def get_best_mutable_version():
451+        """Return a Deferred that fires with an IMutableFileVersion for
452+        the 'best' available version of the file. The best version is
453+        the recoverable version with the highest sequence number. If no
454         uncoordinated writes have occurred, and if enough shares are
455hunk ./src/allmydata/interfaces.py 878
456-        available, then this will be the most recent version that has been
457-        uploaded.
458+        available, then this will be the most recent version that has
459+        been uploaded.
460 
461hunk ./src/allmydata/interfaces.py 881
462-        I update an internal servermap with MODE_READ, determine which
463-        version of the file is indicated by
464-        servermap.best_recoverable_version(), and return a Deferred that
465-        fires with its contents. If no version is recoverable, the Deferred
466-        will errback with UnrecoverableFileError.
467-        """
468-
469-    def get_size_of_best_version():
470-        """Find the size of the version that would be downloaded with
471-        download_best_version(), without actually downloading the whole file.
472-
473-        I return a Deferred that fires with an integer.
474+        If no version is recoverable, the Deferred will errback with an
475+        UnrecoverableFileError.
476         """
477 
478     def overwrite(new_contents):
479hunk ./src/allmydata/interfaces.py 921
480         errback.
481         """
482 
483-
484     def get_servermap(mode):
485         """Return a Deferred that fires with an IMutableFileServerMap
486         instance, updated using the given mode.
487hunk ./src/allmydata/interfaces.py 974
488         writer-visible data using this writekey.
489         """
490 
491+    def set_version(version):
492+        """Tahoe-LAFS supports SDMF and MDMF mutable files. By default,
493+        we upload in SDMF for reasons of compatibility. If you want to
494+        change this, set_version will let you do that.
495+
496+        To say that this file should be uploaded in SDMF, pass in a 0. To
497+        say that the file should be uploaded as MDMF, pass in a 1.
498+        """
499+
500+    def get_version():
501+        """Returns the mutable file protocol version."""
502+
503 class NotEnoughSharesError(Exception):
504     """Download was unable to get enough shares"""
505 
506hunk ./src/allmydata/interfaces.py 1822
507         """The upload is finished, and whatever filehandle was in use may be
508         closed."""
509 
510+
511+class IMutableUploadable(Interface):
512+    """
513+    I represent content that is due to be uploaded to a mutable filecap.
514+    """
515+    # This is somewhat simpler than the IUploadable interface above
516+    # because mutable files do not need to be concerned with possibly
517+    # generating a CHK, nor with per-file keys. It is a subset of the
518+    # methods in IUploadable, though, so we could just as well implement
519+    # the mutable uploadables as IUploadables that don't happen to use
520+    # those methods (with the understanding that the unused methods will
521+    # never be called on such objects)
522+    def get_size():
523+        """
524+        Returns a Deferred that fires with the size of the content held
525+        by the uploadable.
526+        """
527+
528+    def read(length):
529+        """
530+        Returns a list of strings which, when concatenated, are the next
531+        length bytes of the file, or fewer if there are fewer bytes
532+        between the current location and the end of the file.
533+        """
534+
535+    def close():
536+        """
537+        The process that used the Uploadable is finished using it, so
538+        the uploadable may be closed.
539+        """
540+
541 class IUploadResults(Interface):
542     """I am returned by upload() methods. I contain a number of public
543     attributes which can be read to determine the results of the upload. Some
544}
545[frontends/sftpd.py: Modify the sftp frontend to work with the MDMF changes
546Kevan Carstensen <kevan@isnotajoke.com>**20100809233535
547 Ignore-this: 2d25e2cfcd0d7bbcbba660c7e1da12f
548] {
549hunk ./src/allmydata/frontends/sftpd.py 33
550 from allmydata.interfaces import IFileNode, IDirectoryNode, ExistingChildError, \
551      NoSuchChildError, ChildOfWrongTypeError
552 from allmydata.mutable.common import NotWriteableError
553+from allmydata.mutable.publish import MutableFileHandle
554 from allmydata.immutable.upload import FileHandle
555 from allmydata.dirnode import update_metadata
556 from allmydata.util.fileutil import EncryptedTemporaryFile
557hunk ./src/allmydata/frontends/sftpd.py 667
558         else:
559             assert IFileNode.providedBy(filenode), filenode
560 
561-            if filenode.is_mutable():
562-                self.async.addCallback(lambda ign: filenode.download_best_version())
563-                def _downloaded(data):
564-                    self.consumer = OverwriteableFileConsumer(len(data), tempfile_maker)
565-                    self.consumer.write(data)
566-                    self.consumer.finish()
567-                    return None
568-                self.async.addCallback(_downloaded)
569-            else:
570-                download_size = filenode.get_size()
571-                assert download_size is not None, "download_size is None"
572+            self.async.addCallback(lambda ignored: filenode.get_best_readable_version())
573+
574+            def _read(version):
575+                if noisy: self.log("_read", level=NOISY)
576+                download_size = version.get_size()
577+                assert download_size is not None
578+
579                 self.consumer = OverwriteableFileConsumer(download_size, tempfile_maker)
580hunk ./src/allmydata/frontends/sftpd.py 675
581-                def _read(ign):
582-                    if noisy: self.log("_read immutable", level=NOISY)
583-                    filenode.read(self.consumer, 0, None)
584-                self.async.addCallback(_read)
585+
586+                version.read(self.consumer, 0, None)
587+            self.async.addCallback(_read)
588 
589         eventually(self.async.callback, None)
590 
591hunk ./src/allmydata/frontends/sftpd.py 821
592                     assert parent and childname, (parent, childname, self.metadata)
593                     d2.addCallback(lambda ign: parent.set_metadata_for(childname, self.metadata))
594 
595-                d2.addCallback(lambda ign: self.consumer.get_current_size())
596-                d2.addCallback(lambda size: self.consumer.read(0, size))
597-                d2.addCallback(lambda new_contents: self.filenode.overwrite(new_contents))
598+                d2.addCallback(lambda ign: self.filenode.overwrite(MutableFileHandle(self.consumer.get_file())))
599             else:
600                 def _add_file(ign):
601                     self.log("_add_file childname=%r" % (childname,), level=OPERATIONAL)
602}
603[immutable/filenode.py: Make the immutable file node implement the same interfaces as the mutable one
604Kevan Carstensen <kevan@isnotajoke.com>**20100810000619
605 Ignore-this: 93e536c0f8efb705310f13ff64621527
606] {
607hunk ./src/allmydata/immutable/filenode.py 8
608 now = time.time
609 from zope.interface import implements, Interface
610 from twisted.internet import defer
611-from twisted.internet.interfaces import IConsumer
612 
613hunk ./src/allmydata/immutable/filenode.py 9
614-from allmydata.interfaces import IImmutableFileNode, IUploadResults
615 from allmydata import uri
616hunk ./src/allmydata/immutable/filenode.py 10
617+from twisted.internet.interfaces import IConsumer
618+from twisted.protocols import basic
619+from foolscap.api import eventually
620+from allmydata.interfaces import IImmutableFileNode, ICheckable, \
621+     IDownloadTarget, IUploadResults
622+from allmydata.util import dictutil, log, base32, consumer
623+from allmydata.immutable.checker import Checker
624 from allmydata.check_results import CheckResults, CheckAndRepairResults
625 from allmydata.util.dictutil import DictOfSets
626 from pycryptopp.cipher.aes import AES
627hunk ./src/allmydata/immutable/filenode.py 296
628         return self._cnode.check_and_repair(monitor, verify, add_lease)
629     def check(self, monitor, verify=False, add_lease=False):
630         return self._cnode.check(monitor, verify, add_lease)
631+
632+    def get_best_readable_version(self):
633+        """
634+        Return an IReadable of the best version of this file. Since
635+        immutable files can have only one version, we just return the
636+        current filenode.
637+        """
638+        return defer.succeed(self)
639+
640+
641+    def download_best_version(self):
642+        """
643+        Download the best version of this file, returning its contents
644+        as a bytestring. Since there is only one version of an immutable
645+        file, we download and return the contents of this file.
646+        """
647+        d = consumer.download_to_data(self)
648+        return d
649+
650+    # for an immutable file, download_to_data (specified in IReadable)
651+    # is the same as download_best_version (specified in IFileNode). For
652+    # mutable files, the difference is more meaningful, since they can
653+    # have multiple versions.
654+    download_to_data = download_best_version
655+
656+
657+    # get_size() (IReadable), get_current_size() (IFilesystemNode), and
658+    # get_size_of_best_version(IFileNode) are all the same for immutable
659+    # files.
660+    get_size_of_best_version = get_current_size
661}
662[immutable/literal.py: implement the same interfaces as other filenodes
663Kevan Carstensen <kevan@isnotajoke.com>**20100810000633
664 Ignore-this: b50dd5df2d34ecd6477b8499a27aef13
665] hunk ./src/allmydata/immutable/literal.py 106
666         d.addCallback(lambda lastSent: consumer)
667         return d
668 
669+    # IReadable, IFileNode, IFilesystemNode
670+    def get_best_readable_version(self):
671+        return defer.succeed(self)
672+
673+
674+    def download_best_version(self):
675+        return defer.succeed(self.u.data)
676+
677+
678+    download_to_data = download_best_version
679+    get_size_of_best_version = get_current_size
680+
681[scripts: tell 'tahoe put' about MDMF
682Kevan Carstensen <kevan@isnotajoke.com>**20100813234957
683 Ignore-this: c106b3384fc676bd3c0fb466d2a52b1b
684] {
685hunk ./src/allmydata/scripts/cli.py 160
686     optFlags = [
687         ("mutable", "m", "Create a mutable file instead of an immutable one."),
688         ]
689+    optParameters = [
690+        ("mutable-type", None, False, "Create a mutable file in the given format. Valid formats are 'sdmf' for SDMF and 'mdmf' for MDMF"),
691+        ]
692 
693     def parseArgs(self, arg1=None, arg2=None):
694         # see Examples below
695hunk ./src/allmydata/scripts/tahoe_put.py 21
696     from_file = options.from_file
697     to_file = options.to_file
698     mutable = options['mutable']
699+    mutable_type = False
700+
701+    if mutable:
702+        mutable_type = options['mutable-type']
703     if options['quiet']:
704         verbosity = 0
705     else:
706hunk ./src/allmydata/scripts/tahoe_put.py 33
707     stdout = options.stdout
708     stderr = options.stderr
709 
710+    if mutable_type and mutable_type not in ('sdmf', 'mdmf'):
711+        # Don't try to pass unsupported types to the webapi
712+        print >>stderr, "error: %s is an invalid format" % mutable_type
713+        return 1
714+
715     if nodeurl[-1] != "/":
716         nodeurl += "/"
717     if to_file:
718hunk ./src/allmydata/scripts/tahoe_put.py 76
719         url = nodeurl + "uri"
720     if mutable:
721         url += "?mutable=true"
722+    if mutable_type:
723+        assert mutable
724+        url += "&mutable-type=%s" % mutable_type
725+
726     if from_file:
727         infileobj = open(os.path.expanduser(from_file), "rb")
728     else:
729}
730[web: Alter the webapi to get along with and take advantage of the MDMF changes
731Kevan Carstensen <kevan@isnotajoke.com>**20100814081012
732 Ignore-this: 96c2ed4e4a9f450fb84db5d711d10bd6
733 
734 The main benefit that the webapi gets from MDMF, at least initially, is
735 the ability to do a streaming download of an MDMF mutable file. It also
736 exposes a way (through the PUT verb) to append to or otherwise modify
737 (in-place) an MDMF mutable file.
738] {
739hunk ./src/allmydata/web/common.py 12
740 from allmydata.interfaces import ExistingChildError, NoSuchChildError, \
741      FileTooLargeError, NotEnoughSharesError, NoSharesError, \
742      EmptyPathnameComponentError, MustBeDeepImmutableError, \
743-     MustBeReadonlyError, MustNotBeUnknownRWError
744+     MustBeReadonlyError, MustNotBeUnknownRWError, SDMF_VERSION, MDMF_VERSION
745 from allmydata.mutable.common import UnrecoverableFileError
746 from allmydata.util import abbreviate
747 from allmydata.util.encodingutil import to_str, quote_output
748hunk ./src/allmydata/web/common.py 35
749     else:
750         return boolean_of_arg(replace)
751 
752+
753+def parse_mutable_type_arg(arg):
754+    if not arg:
755+        return None # interpreted by the caller as "let the nodemaker decide"
756+
757+    arg = arg.lower()
758+    assert arg in ("mdmf", "sdmf")
759+
760+    if arg == "mdmf":
761+        return MDMF_VERSION
762+
763+    return SDMF_VERSION
764+
765+
766+def parse_offset_arg(offset):
767+    # XXX: This will raise a ValueError when invoked on something that
768+    # is not an integer. Is that okay? Or do we want a better error
769+    # message? Since this call is going to be used by programmers and
770+    # their tools rather than users (through the wui), it is not
771+    # inconsistent to return that, I guess.
772+    offset = int(offset)
773+    return offset
774+
775+
776 def get_root(ctx_or_req):
777     req = IRequest(ctx_or_req)
778     # the addSlash=True gives us one extra (empty) segment
779hunk ./src/allmydata/web/directory.py 19
780 from allmydata.uri import from_string_dirnode
781 from allmydata.interfaces import IDirectoryNode, IFileNode, IFilesystemNode, \
782      IImmutableFileNode, IMutableFileNode, ExistingChildError, \
783-     NoSuchChildError, EmptyPathnameComponentError
784+     NoSuchChildError, EmptyPathnameComponentError, SDMF_VERSION, MDMF_VERSION
785 from allmydata.monitor import Monitor, OperationCancelledError
786 from allmydata import dirnode
787 from allmydata.web.common import text_plain, WebError, \
788hunk ./src/allmydata/web/directory.py 153
789         if not t:
790             # render the directory as HTML, using the docFactory and Nevow's
791             # whole templating thing.
792-            return DirectoryAsHTML(self.node)
793+            return DirectoryAsHTML(self.node,
794+                                   self.client.mutable_file_default)
795 
796         if t == "json":
797             return DirectoryJSONMetadata(ctx, self.node)
798hunk ./src/allmydata/web/directory.py 556
799     docFactory = getxmlfile("directory.xhtml")
800     addSlash = True
801 
802-    def __init__(self, node):
803+    def __init__(self, node, default_mutable_format):
804         rend.Page.__init__(self)
805         self.node = node
806 
807hunk ./src/allmydata/web/directory.py 560
808+        assert default_mutable_format in (MDMF_VERSION, SDMF_VERSION)
809+        self.default_mutable_format = default_mutable_format
810+
811     def beforeRender(self, ctx):
812         # attempt to get the dirnode's children, stashing them (or the
813         # failure that results) for later use
814hunk ./src/allmydata/web/directory.py 780
815             ]]
816         forms.append(T.div(class_="freeform-form")[mkdir])
817 
818+        # Build input elements for mutable file type. We do this outside
819+        # of the list so we can check the appropriate format, based on
820+        # the default configured in the client (which reflects the
821+        # default configured in tahoe.cfg)
822+        if self.default_mutable_format == MDMF_VERSION:
823+            mdmf_input = T.input(type='radio', name='mutable-type',
824+                                 id='mutable-type-mdmf', value='mdmf',
825+                                 checked='checked')
826+        else:
827+            mdmf_input = T.input(type='radio', name='mutable-type',
828+                                 id='mutable-type-mdmf', value='mdmf')
829+
830+        if self.default_mutable_format == SDMF_VERSION:
831+            sdmf_input = T.input(type='radio', name='mutable-type',
832+                                 id='mutable-type-sdmf', value='sdmf',
833+                                 checked="checked")
834+        else:
835+            sdmf_input = T.input(type='radio', name='mutable-type',
836+                                 id='mutable-type-sdmf', value='sdmf')
837+
838         upload = T.form(action=".", method="post",
839                         enctype="multipart/form-data")[
840             T.fieldset[
841hunk ./src/allmydata/web/directory.py 812
842             T.input(type="submit", value="Upload"),
843             " Mutable?:",
844             T.input(type="checkbox", name="mutable"),
845+            sdmf_input, T.label(for_="mutable-type-sdmf")["SDMF"],
846+            mdmf_input,
847+            T.label(for_="mutable-type-mdmf")["MDMF (experimental)"],
848             ]]
849         forms.append(T.div(class_="freeform-form")[upload])
850 
851hunk ./src/allmydata/web/directory.py 850
852                 kiddata = ("filenode", {'size': childnode.get_size(),
853                                         'mutable': childnode.is_mutable(),
854                                         })
855+                if childnode.is_mutable() and \
856+                    childnode.get_version() is not None:
857+                    mutable_type = childnode.get_version()
858+                    assert mutable_type in (SDMF_VERSION, MDMF_VERSION)
859+
860+                    if mutable_type == MDMF_VERSION:
861+                        mutable_type = "mdmf"
862+                    else:
863+                        mutable_type = "sdmf"
864+                    kiddata[1]['mutable-type'] = mutable_type
865+
866             elif IDirectoryNode.providedBy(childnode):
867                 kiddata = ("dirnode", {'mutable': childnode.is_mutable()})
868             else:
869hunk ./src/allmydata/web/filenode.py 9
870 from nevow import url, rend
871 from nevow.inevow import IRequest
872 
873-from allmydata.interfaces import ExistingChildError
874+from allmydata.interfaces import ExistingChildError, SDMF_VERSION, MDMF_VERSION
875 from allmydata.monitor import Monitor
876 from allmydata.immutable.upload import FileHandle
877hunk ./src/allmydata/web/filenode.py 12
878+from allmydata.mutable.publish import MutableFileHandle
879+from allmydata.mutable.common import MODE_READ
880 from allmydata.util import log, base32
881 
882 from allmydata.web.common import text_plain, WebError, RenderMixin, \
883hunk ./src/allmydata/web/filenode.py 18
884      boolean_of_arg, get_arg, should_create_intermediate_directories, \
885-     MyExceptionHandler, parse_replace_arg
886+     MyExceptionHandler, parse_replace_arg, parse_offset_arg, \
887+     parse_mutable_type_arg
888 from allmydata.web.check_results import CheckResults, \
889      CheckAndRepairResults, LiteralCheckResults
890 from allmydata.web.info import MoreInfo
891hunk ./src/allmydata/web/filenode.py 29
892         # a new file is being uploaded in our place.
893         mutable = boolean_of_arg(get_arg(req, "mutable", "false"))
894         if mutable:
895-            req.content.seek(0)
896-            data = req.content.read()
897-            d = client.create_mutable_file(data)
898+            mutable_type = parse_mutable_type_arg(get_arg(req,
899+                                                          "mutable-type",
900+                                                          None))
901+            data = MutableFileHandle(req.content)
902+            d = client.create_mutable_file(data, version=mutable_type)
903             def _uploaded(newnode):
904                 d2 = self.parentnode.set_node(self.name, newnode,
905                                               overwrite=replace)
906hunk ./src/allmydata/web/filenode.py 66
907         d.addCallback(lambda res: childnode.get_uri())
908         return d
909 
910-    def _read_data_from_formpost(self, req):
911-        # SDMF: files are small, and we can only upload data, so we read
912-        # the whole file into memory before uploading.
913-        contents = req.fields["file"]
914-        contents.file.seek(0)
915-        data = contents.file.read()
916-        return data
917 
918     def replace_me_with_a_formpost(self, req, client, replace):
919         # create a new file, maybe mutable, maybe immutable
920hunk ./src/allmydata/web/filenode.py 71
921         mutable = boolean_of_arg(get_arg(req, "mutable", "false"))
922 
923+        # create an immutable file
924+        contents = req.fields["file"]
925         if mutable:
926hunk ./src/allmydata/web/filenode.py 74
927-            data = self._read_data_from_formpost(req)
928-            d = client.create_mutable_file(data)
929+            mutable_type = parse_mutable_type_arg(get_arg(req, "mutable-type",
930+                                                          None))
931+            uploadable = MutableFileHandle(contents.file)
932+            d = client.create_mutable_file(uploadable, version=mutable_type)
933             def _uploaded(newnode):
934                 d2 = self.parentnode.set_node(self.name, newnode,
935                                               overwrite=replace)
936hunk ./src/allmydata/web/filenode.py 85
937                 return d2
938             d.addCallback(_uploaded)
939             return d
940-        # create an immutable file
941-        contents = req.fields["file"]
942+
943         uploadable = FileHandle(contents.file, convergence=client.convergence)
944         d = self.parentnode.add_file(self.name, uploadable, overwrite=replace)
945         d.addCallback(lambda newnode: newnode.get_uri())
946hunk ./src/allmydata/web/filenode.py 91
947         return d
948 
949+
950 class PlaceHolderNodeHandler(RenderMixin, rend.Page, ReplaceMeMixin):
951     def __init__(self, client, parentnode, name):
952         rend.Page.__init__(self)
953hunk ./src/allmydata/web/filenode.py 174
954             # properly. So we assume that at least the browser will agree
955             # with itself, and echo back the same bytes that we were given.
956             filename = get_arg(req, "filename", self.name) or "unknown"
957-            if self.node.is_mutable():
958-                # some day: d = self.node.get_best_version()
959-                d = makeMutableDownloadable(self.node)
960-            else:
961-                d = defer.succeed(self.node)
962+            d = self.node.get_best_readable_version()
963             d.addCallback(lambda dn: FileDownloader(dn, filename))
964             return d
965         if t == "json":
966hunk ./src/allmydata/web/filenode.py 178
967-            if self.parentnode and self.name:
968-                d = self.parentnode.get_metadata_for(self.name)
969+            # We do this to make sure that fields like size and
970+            # mutable-type (which depend on the file on the grid and not
971+            # just on the cap) are filled in. The latter gets used in
972+            # tests, in particular.
973+            #
974+            # TODO: Make it so that the servermap knows how to update in
975+            # a mode specifically designed to fill in these fields, and
976+            # then update it in that mode.
977+            if self.node.is_mutable():
978+                d = self.node.get_servermap(MODE_READ)
979             else:
980                 d = defer.succeed(None)
981hunk ./src/allmydata/web/filenode.py 190
982+            if self.parentnode and self.name:
983+                d.addCallback(lambda ignored:
984+                    self.parentnode.get_metadata_for(self.name))
985+            else:
986+                d.addCallback(lambda ignored: None)
987             d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
988             return d
989         if t == "info":
990hunk ./src/allmydata/web/filenode.py 211
991         if t:
992             raise WebError("GET file: bad t=%s" % t)
993         filename = get_arg(req, "filename", self.name) or "unknown"
994-        if self.node.is_mutable():
995-            # some day: d = self.node.get_best_version()
996-            d = makeMutableDownloadable(self.node)
997-        else:
998-            d = defer.succeed(self.node)
999+        d = self.node.get_best_readable_version()
1000         d.addCallback(lambda dn: FileDownloader(dn, filename))
1001         return d
1002 
1003hunk ./src/allmydata/web/filenode.py 219
1004         req = IRequest(ctx)
1005         t = get_arg(req, "t", "").strip()
1006         replace = parse_replace_arg(get_arg(req, "replace", "true"))
1007+        offset = parse_offset_arg(get_arg(req, "offset", -1))
1008 
1009         if not t:
1010hunk ./src/allmydata/web/filenode.py 222
1011-            if self.node.is_mutable():
1012+            if self.node.is_mutable() and offset >= 0:
1013+                return self.update_my_contents(req, offset)
1014+
1015+            elif self.node.is_mutable():
1016                 return self.replace_my_contents(req)
1017             if not replace:
1018                 # this is the early trap: if someone else modifies the
1019hunk ./src/allmydata/web/filenode.py 232
1020                 # directory while we're uploading, the add_file(overwrite=)
1021                 # call in replace_me_with_a_child will do the late trap.
1022                 raise ExistingChildError()
1023+            if offset >= 0:
1024+                raise WebError("PUT to a file: append operation invoked "
1025+                               "on an immutable cap")
1026+
1027+
1028             assert self.parentnode and self.name
1029             return self.replace_me_with_a_child(req, self.client, replace)
1030         if t == "uri":
1031hunk ./src/allmydata/web/filenode.py 299
1032 
1033     def replace_my_contents(self, req):
1034         req.content.seek(0)
1035-        new_contents = req.content.read()
1036+        new_contents = MutableFileHandle(req.content)
1037         d = self.node.overwrite(new_contents)
1038         d.addCallback(lambda res: self.node.get_uri())
1039         return d
1040hunk ./src/allmydata/web/filenode.py 304
1041 
1042+
1043+    def update_my_contents(self, req, offset):
1044+        req.content.seek(0)
1045+        added_contents = MutableFileHandle(req.content)
1046+
1047+        d = self.node.get_best_mutable_version()
1048+        d.addCallback(lambda mv:
1049+            mv.update(added_contents, offset))
1050+        d.addCallback(lambda ignored:
1051+            self.node.get_uri())
1052+        return d
1053+
1054+
1055     def replace_my_contents_with_a_formpost(self, req):
1056         # we have a mutable file. Get the data from the formpost, and replace
1057         # the mutable file's contents with it.
1058hunk ./src/allmydata/web/filenode.py 320
1059-        new_contents = self._read_data_from_formpost(req)
1060+        new_contents = req.fields['file']
1061+        new_contents = MutableFileHandle(new_contents.file)
1062+
1063         d = self.node.overwrite(new_contents)
1064         d.addCallback(lambda res: self.node.get_uri())
1065         return d
1066hunk ./src/allmydata/web/filenode.py 327
1067 
1068-class MutableDownloadable:
1069-    #implements(IDownloadable)
1070-    def __init__(self, size, node):
1071-        self.size = size
1072-        self.node = node
1073-    def get_size(self):
1074-        return self.size
1075-    def is_mutable(self):
1076-        return True
1077-    def read(self, consumer, offset=0, size=None):
1078-        d = self.node.download_best_version()
1079-        d.addCallback(self._got_data, consumer, offset, size)
1080-        return d
1081-    def _got_data(self, contents, consumer, offset, size):
1082-        start = offset
1083-        if size is not None:
1084-            end = offset+size
1085-        else:
1086-            end = self.size
1087-        # SDMF: we can write the whole file in one big chunk
1088-        consumer.write(contents[start:end])
1089-        return consumer
1090-
1091-def makeMutableDownloadable(n):
1092-    d = defer.maybeDeferred(n.get_size_of_best_version)
1093-    d.addCallback(MutableDownloadable, n)
1094-    return d
1095 
1096 class FileDownloader(rend.Page):
1097     # since we override the rendering process (to let the tahoe Downloader
1098hunk ./src/allmydata/web/filenode.py 509
1099     data[1]['mutable'] = filenode.is_mutable()
1100     if edge_metadata is not None:
1101         data[1]['metadata'] = edge_metadata
1102+
1103+    if filenode.is_mutable() and filenode.get_version() is not None:
1104+        mutable_type = filenode.get_version()
1105+        assert mutable_type in (MDMF_VERSION, SDMF_VERSION)
1106+        if mutable_type == MDMF_VERSION:
1107+            mutable_type = "mdmf"
1108+        else:
1109+            mutable_type = "sdmf"
1110+        data[1]['mutable-type'] = mutable_type
1111+
1112     return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
1113 
1114 def FileURI(ctx, filenode):
1115hunk ./src/allmydata/web/root.py 15
1116 from allmydata import get_package_versions_string
1117 from allmydata import provisioning
1118 from allmydata.util import idlib, log
1119-from allmydata.interfaces import IFileNode
1120+from allmydata.interfaces import IFileNode, MDMF_VERSION, SDMF_VERSION
1121 from allmydata.web import filenode, directory, unlinked, status, operations
1122 from allmydata.web import reliability, storage
1123 from allmydata.web.common import abbreviate_size, getxmlfile, WebError, \
1124hunk ./src/allmydata/web/root.py 19
1125-     get_arg, RenderMixin, boolean_of_arg
1126+     get_arg, RenderMixin, boolean_of_arg, parse_mutable_type_arg
1127 
1128 
1129 class URIHandler(RenderMixin, rend.Page):
1130hunk ./src/allmydata/web/root.py 50
1131         if t == "":
1132             mutable = boolean_of_arg(get_arg(req, "mutable", "false").strip())
1133             if mutable:
1134-                return unlinked.PUTUnlinkedSSK(req, self.client)
1135+                version = parse_mutable_type_arg(get_arg(req, "mutable-type",
1136+                                                 None))
1137+                return unlinked.PUTUnlinkedSSK(req, self.client, version)
1138             else:
1139                 return unlinked.PUTUnlinkedCHK(req, self.client)
1140         if t == "mkdir":
1141hunk ./src/allmydata/web/root.py 70
1142         if t in ("", "upload"):
1143             mutable = bool(get_arg(req, "mutable", "").strip())
1144             if mutable:
1145-                return unlinked.POSTUnlinkedSSK(req, self.client)
1146+                version = parse_mutable_type_arg(get_arg(req, "mutable-type",
1147+                                                         None))
1148+                return unlinked.POSTUnlinkedSSK(req, self.client, version)
1149             else:
1150                 return unlinked.POSTUnlinkedCHK(req, self.client)
1151         if t == "mkdir":
1152hunk ./src/allmydata/web/root.py 324
1153 
1154     def render_upload_form(self, ctx, data):
1155         # this is a form where users can upload unlinked files
1156+        #
1157+        # for mutable files, users can choose the format by selecting
1158+        # MDMF or SDMF from a radio button. They can also configure a
1159+        # default format in tahoe.cfg, which they rightly expect us to
1160+        # obey. we convey to them that we are obeying their choice by
1161+        # ensuring that the one that they've chosen is selected in the
1162+        # interface.
1163+        if self.client.mutable_file_default == MDMF_VERSION:
1164+            mdmf_input = T.input(type='radio', name='mutable-type',
1165+                                 value='mdmf', id='mutable-type-mdmf',
1166+                                 checked='checked')
1167+        else:
1168+            mdmf_input = T.input(type='radio', name='mutable-type',
1169+                                 value='mdmf', id='mutable-type-mdmf')
1170+
1171+        if self.client.mutable_file_default == SDMF_VERSION:
1172+            sdmf_input = T.input(type='radio', name='mutable-type',
1173+                                 value='sdmf', id='mutable-type-sdmf',
1174+                                 checked='checked')
1175+        else:
1176+            sdmf_input = T.input(type='radio', name='mutable-type',
1177+                                 value='sdmf', id='mutable-type-sdmf')
1178+
1179+
1180         form = T.form(action="uri", method="post",
1181                       enctype="multipart/form-data")[
1182             T.fieldset[
1183hunk ./src/allmydata/web/root.py 356
1184                   T.input(type="file", name="file", class_="freeform-input-file")],
1185             T.input(type="hidden", name="t", value="upload"),
1186             T.div[T.input(type="checkbox", name="mutable"), T.label(for_="mutable")["Create mutable file"],
1187+                  sdmf_input, T.label(for_="mutable-type-sdmf")["SDMF"],
1188+                  mdmf_input,
1189+                  T.label(for_='mutable-type-mdmf')['MDMF (experimental)'],
1190                   " ", T.input(type="submit", value="Upload!")],
1191             ]]
1192         return T.div[form]
1193hunk ./src/allmydata/web/unlinked.py 7
1194 from twisted.internet import defer
1195 from nevow import rend, url, tags as T
1196 from allmydata.immutable.upload import FileHandle
1197+from allmydata.mutable.publish import MutableFileHandle
1198 from allmydata.web.common import getxmlfile, get_arg, boolean_of_arg, \
1199      convert_children_json, WebError
1200 from allmydata.web import status
1201hunk ./src/allmydata/web/unlinked.py 20
1202     # that fires with the URI of the new file
1203     return d
1204 
1205-def PUTUnlinkedSSK(req, client):
1206+def PUTUnlinkedSSK(req, client, version):
1207     # SDMF: files are small, and we can only upload data
1208     req.content.seek(0)
1209hunk ./src/allmydata/web/unlinked.py 23
1210-    data = req.content.read()
1211-    d = client.create_mutable_file(data)
1212+    data = MutableFileHandle(req.content)
1213+    d = client.create_mutable_file(data, version=version)
1214     d.addCallback(lambda n: n.get_uri())
1215     return d
1216 
1217hunk ./src/allmydata/web/unlinked.py 83
1218                       ["/uri/" + res.uri])
1219         return d
1220 
1221-def POSTUnlinkedSSK(req, client):
1222+def POSTUnlinkedSSK(req, client, version):
1223     # "POST /uri", to create an unlinked file.
1224     # SDMF: files are small, and we can only upload data
1225hunk ./src/allmydata/web/unlinked.py 86
1226-    contents = req.fields["file"]
1227-    contents.file.seek(0)
1228-    data = contents.file.read()
1229-    d = client.create_mutable_file(data)
1230+    contents = req.fields["file"].file
1231+    data = MutableFileHandle(contents)
1232+    d = client.create_mutable_file(data, version=version)
1233     d.addCallback(lambda n: n.get_uri())
1234     return d
1235 
1236}
1237[client.py: learn how to create different kinds of mutable files
1238Kevan Carstensen <kevan@isnotajoke.com>**20100814225711
1239 Ignore-this: 61ff665bc050cba5f58bf2ed779d692b
1240] {
1241hunk ./src/allmydata/client.py 25
1242 from allmydata.util.time_format import parse_duration, parse_date
1243 from allmydata.stats import StatsProvider
1244 from allmydata.history import History
1245-from allmydata.interfaces import IStatsProducer, RIStubClient
1246+from allmydata.interfaces import IStatsProducer, RIStubClient, \
1247+                                 SDMF_VERSION, MDMF_VERSION
1248 from allmydata.nodemaker import NodeMaker
1249 
1250 
1251hunk ./src/allmydata/client.py 357
1252                                    self.terminator,
1253                                    self.get_encoding_parameters(),
1254                                    self._key_generator)
1255+        default = self.get_config("client", "mutable.format", default="sdmf")
1256+        if default == "mdmf":
1257+            self.mutable_file_default = MDMF_VERSION
1258+        else:
1259+            self.mutable_file_default = SDMF_VERSION
1260 
1261     def get_history(self):
1262         return self.history
1263hunk ./src/allmydata/client.py 500
1264     def create_immutable_dirnode(self, children, convergence=None):
1265         return self.nodemaker.create_immutable_directory(children, convergence)
1266 
1267-    def create_mutable_file(self, contents=None, keysize=None):
1268-        return self.nodemaker.create_mutable_file(contents, keysize)
1269+    def create_mutable_file(self, contents=None, keysize=None, version=None):
1270+        if not version:
1271+            version = self.mutable_file_default
1272+        return self.nodemaker.create_mutable_file(contents, keysize,
1273+                                                  version=version)
1274 
1275     def upload(self, uploadable):
1276         uploader = self.getServiceNamed("uploader")
1277}
1278[mutable/checker.py and mutable/repair.py: Modify checker and repairer to work with MDMF
1279Kevan Carstensen <kevan@isnotajoke.com>**20100819003216
1280 Ignore-this: d3bd3260742be8964877f0a53543b01b
1281 
1282 The checker and repairer required minimal changes to work with the MDMF
1283 modifications made elsewhere. The checker duplicated a lot of the code
1284 that was already in the downloader, so I modified the downloader
1285 slightly to expose this functionality to the checker and removed the
1286 duplicated code. The repairer only required a minor change to deal with
1287 data representation.
1288] {
1289hunk ./src/allmydata/mutable/checker.py 2
1290 
1291-from twisted.internet import defer
1292-from twisted.python import failure
1293-from allmydata import hashtree
1294 from allmydata.uri import from_string
1295hunk ./src/allmydata/mutable/checker.py 3
1296-from allmydata.util import hashutil, base32, idlib, log
1297+from allmydata.util import base32, idlib, log
1298 from allmydata.check_results import CheckAndRepairResults, CheckResults
1299 
1300 from allmydata.mutable.common import MODE_CHECK, CorruptShareError
1301hunk ./src/allmydata/mutable/checker.py 8
1302 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
1303-from allmydata.mutable.layout import unpack_share, SIGNED_PREFIX_LENGTH
1304+from allmydata.mutable.retrieve import Retrieve # for verifying
1305 
1306 class MutableChecker:
1307 
1308hunk ./src/allmydata/mutable/checker.py 25
1309 
1310     def check(self, verify=False, add_lease=False):
1311         servermap = ServerMap()
1312+        # Updating the servermap in MODE_CHECK will stand a good chance
1313+        # of finding all of the shares, and getting a good idea of
1314+        # recoverability, etc, without verifying.
1315         u = ServermapUpdater(self._node, self._storage_broker, self._monitor,
1316                              servermap, MODE_CHECK, add_lease=add_lease)
1317         if self._history:
1318hunk ./src/allmydata/mutable/checker.py 51
1319         if num_recoverable:
1320             self.best_version = servermap.best_recoverable_version()
1321 
1322+        # The file is unhealthy and needs to be repaired if:
1323+        # - There are unrecoverable versions.
1324         if servermap.unrecoverable_versions():
1325             self.need_repair = True
1326hunk ./src/allmydata/mutable/checker.py 55
1327+        # - There isn't a recoverable version.
1328         if num_recoverable != 1:
1329             self.need_repair = True
1330hunk ./src/allmydata/mutable/checker.py 58
1331+        # - The best recoverable version is missing some shares.
1332         if self.best_version:
1333             available_shares = servermap.shares_available()
1334             (num_distinct_shares, k, N) = available_shares[self.best_version]
1335hunk ./src/allmydata/mutable/checker.py 69
1336 
1337     def _verify_all_shares(self, servermap):
1338         # read every byte of each share
1339+        #
1340+        # This logic is going to be very nearly the same as the
1341+        # downloader. I bet we could pass the downloader a flag that
1342+        # makes it do this, and piggyback onto that instead of
1343+        # duplicating a bunch of code.
1344+        #
1345+        # Like:
1346+        #  r = Retrieve(blah, blah, blah, verify=True)
1347+        #  d = r.download()
1348+        #  (wait, wait, wait, d.callback)
1349+        # 
1350+        #  Then, when it has finished, we can check the servermap (which
1351+        #  we provided to Retrieve) to figure out which shares are bad,
1352+        #  since the Retrieve process will have updated the servermap as
1353+        #  it went along.
1354+        #
1355+        #  By passing the verify=True flag to the constructor, we are
1356+        #  telling the downloader a few things.
1357+        #
1358+        #  1. It needs to download all N shares, not just K shares.
1359+        #  2. It doesn't need to decrypt or decode the shares, only
1360+        #     verify them.
1361         if not self.best_version:
1362             return
1363hunk ./src/allmydata/mutable/checker.py 93
1364-        versionmap = servermap.make_versionmap()
1365-        shares = versionmap[self.best_version]
1366-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
1367-         offsets_tuple) = self.best_version
1368-        offsets = dict(offsets_tuple)
1369-        readv = [ (0, offsets["EOF"]) ]
1370-        dl = []
1371-        for (shnum, peerid, timestamp) in shares:
1372-            ss = servermap.connections[peerid]
1373-            d = self._do_read(ss, peerid, self._storage_index, [shnum], readv)
1374-            d.addCallback(self._got_answer, peerid, servermap)
1375-            dl.append(d)
1376-        return defer.DeferredList(dl, fireOnOneErrback=True, consumeErrors=True)
1377 
1378hunk ./src/allmydata/mutable/checker.py 94
1379-    def _do_read(self, ss, peerid, storage_index, shnums, readv):
1380-        # isolate the callRemote to a separate method, so tests can subclass
1381-        # Publish and override it
1382-        d = ss.callRemote("slot_readv", storage_index, shnums, readv)
1383+        r = Retrieve(self._node, servermap, self.best_version, verify=True)
1384+        d = r.download()
1385+        d.addCallback(self._process_bad_shares)
1386         return d
1387 
1388hunk ./src/allmydata/mutable/checker.py 99
1389-    def _got_answer(self, datavs, peerid, servermap):
1390-        for shnum,datav in datavs.items():
1391-            data = datav[0]
1392-            try:
1393-                self._got_results_one_share(shnum, peerid, data)
1394-            except CorruptShareError:
1395-                f = failure.Failure()
1396-                self.need_repair = True
1397-                self.bad_shares.append( (peerid, shnum, f) )
1398-                prefix = data[:SIGNED_PREFIX_LENGTH]
1399-                servermap.mark_bad_share(peerid, shnum, prefix)
1400-                ss = servermap.connections[peerid]
1401-                self.notify_server_corruption(ss, shnum, str(f.value))
1402-
1403-    def check_prefix(self, peerid, shnum, data):
1404-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
1405-         offsets_tuple) = self.best_version
1406-        got_prefix = data[:SIGNED_PREFIX_LENGTH]
1407-        if got_prefix != prefix:
1408-            raise CorruptShareError(peerid, shnum,
1409-                                    "prefix mismatch: share changed while we were reading it")
1410-
1411-    def _got_results_one_share(self, shnum, peerid, data):
1412-        self.check_prefix(peerid, shnum, data)
1413-
1414-        # the [seqnum:signature] pieces are validated by _compare_prefix,
1415-        # which checks their signature against the pubkey known to be
1416-        # associated with this file.
1417 
1418hunk ./src/allmydata/mutable/checker.py 100
1419-        (seqnum, root_hash, IV, k, N, segsize, datalen, pubkey, signature,
1420-         share_hash_chain, block_hash_tree, share_data,
1421-         enc_privkey) = unpack_share(data)
1422-
1423-        # validate [share_hash_chain,block_hash_tree,share_data]
1424-
1425-        leaves = [hashutil.block_hash(share_data)]
1426-        t = hashtree.HashTree(leaves)
1427-        if list(t) != block_hash_tree:
1428-            raise CorruptShareError(peerid, shnum, "block hash tree failure")
1429-        share_hash_leaf = t[0]
1430-        t2 = hashtree.IncompleteHashTree(N)
1431-        # root_hash was checked by the signature
1432-        t2.set_hashes({0: root_hash})
1433-        try:
1434-            t2.set_hashes(hashes=share_hash_chain,
1435-                          leaves={shnum: share_hash_leaf})
1436-        except (hashtree.BadHashError, hashtree.NotEnoughHashesError,
1437-                IndexError), e:
1438-            msg = "corrupt hashes: %s" % (e,)
1439-            raise CorruptShareError(peerid, shnum, msg)
1440-
1441-        # validate enc_privkey: only possible if we have a write-cap
1442-        if not self._node.is_readonly():
1443-            alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
1444-            alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
1445-            if alleged_writekey != self._node.get_writekey():
1446-                raise CorruptShareError(peerid, shnum, "invalid privkey")
1447+    def _process_bad_shares(self, bad_shares):
1448+        if bad_shares:
1449+            self.need_repair = True
1450+        self.bad_shares = bad_shares
1451 
1452hunk ./src/allmydata/mutable/checker.py 105
1453-    def notify_server_corruption(self, ss, shnum, reason):
1454-        ss.callRemoteOnly("advise_corrupt_share",
1455-                          "mutable", self._storage_index, shnum, reason)
1456 
1457     def _count_shares(self, smap, version):
1458         available_shares = smap.shares_available()
1459hunk ./src/allmydata/mutable/repairer.py 5
1460 from zope.interface import implements
1461 from twisted.internet import defer
1462 from allmydata.interfaces import IRepairResults, ICheckResults
1463+from allmydata.mutable.publish import MutableData
1464 
1465 class RepairResults:
1466     implements(IRepairResults)
1467hunk ./src/allmydata/mutable/repairer.py 108
1468             raise RepairRequiresWritecapError("Sorry, repair currently requires a writecap, to set the write-enabler properly.")
1469 
1470         d = self.node.download_version(smap, best_version, fetch_privkey=True)
1471+        d.addCallback(lambda data:
1472+            MutableData(data))
1473         d.addCallback(self.node.upload, smap)
1474         d.addCallback(self.get_results, smap)
1475         return d
1476}
1477[mutable/filenode.py: add versions and partial-file updates to the mutable file node
1478Kevan Carstensen <kevan@isnotajoke.com>**20100819003231
1479 Ignore-this: b7b5434201fdb9b48f902d7ab25ef45c
1480 
1481 One of the goals of MDMF as a GSoC project is to lay the groundwork for
1482 LDMF, a format that will allow Tahoe-LAFS to deal with and encourage
1483 multiple versions of a single cap on the grid. In line with this, there
1484 is a now a distinction between an overriding mutable file (which can be
1485 thought to correspond to the cap/unique identifier for that mutable
1486 file) and versions of the mutable file (which we can download, update,
1487 and so on). All download, upload, and modification operations end up
1488 happening on a particular version of a mutable file, but there are
1489 shortcut methods on the object representing the overriding mutable file
1490 that perform these operations on the best version of the mutable file
1491 (which is what code should be doing until we have LDMF and better
1492 support for other paradigms).
1493 
1494 Another goal of MDMF was to take advantage of segmentation to give
1495 callers more efficient partial file updates or appends. This patch
1496 implements methods that do that, too.
1497 
1498] {
1499hunk ./src/allmydata/mutable/filenode.py 7
1500 from zope.interface import implements
1501 from twisted.internet import defer, reactor
1502 from foolscap.api import eventually
1503-from allmydata.interfaces import IMutableFileNode, \
1504-     ICheckable, ICheckResults, NotEnoughSharesError
1505-from allmydata.util import hashutil, log
1506+from allmydata.interfaces import IMutableFileNode, ICheckable, ICheckResults, \
1507+     NotEnoughSharesError, MDMF_VERSION, SDMF_VERSION, IMutableUploadable, \
1508+     IMutableFileVersion, IWritable
1509+from allmydata.util import hashutil, log, consumer, deferredutil, mathutil
1510 from allmydata.util.assertutil import precondition
1511 from allmydata.uri import WriteableSSKFileURI, ReadonlySSKFileURI
1512 from allmydata.monitor import Monitor
1513hunk ./src/allmydata/mutable/filenode.py 16
1514 from pycryptopp.cipher.aes import AES
1515 
1516-from allmydata.mutable.publish import Publish
1517+from allmydata.mutable.publish import Publish, MutableData,\
1518+                                      DEFAULT_MAX_SEGMENT_SIZE, \
1519+                                      TransformingUploadable
1520 from allmydata.mutable.common import MODE_READ, MODE_WRITE, UnrecoverableFileError, \
1521      ResponseCache, UncoordinatedWriteError
1522 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
1523hunk ./src/allmydata/mutable/filenode.py 70
1524         self._sharemap = {} # known shares, shnum-to-[nodeids]
1525         self._cache = ResponseCache()
1526         self._most_recent_size = None
1527+        # filled in after __init__ if we're being created for the first time;
1528+        # filled in by the servermap updater before publishing, otherwise.
1529+        # set to this default value in case neither of those things happen,
1530+        # or in case the servermap can't find any shares to tell us what
1531+        # to publish as.
1532+        # TODO: Set this back to None, and find out why the tests fail
1533+        #       with it set to None.
1534+        self._protocol_version = None
1535 
1536         # all users of this MutableFileNode go through the serializer. This
1537         # takes advantage of the fact that Deferreds discard the callbacks
1538hunk ./src/allmydata/mutable/filenode.py 134
1539         return self._upload(initial_contents, None)
1540 
1541     def _get_initial_contents(self, contents):
1542-        if isinstance(contents, str):
1543-            return contents
1544         if contents is None:
1545hunk ./src/allmydata/mutable/filenode.py 135
1546-            return ""
1547+            return MutableData("")
1548+
1549+        if IMutableUploadable.providedBy(contents):
1550+            return contents
1551+
1552         assert callable(contents), "%s should be callable, not %s" % \
1553                (contents, type(contents))
1554         return contents(self)
1555hunk ./src/allmydata/mutable/filenode.py 209
1556 
1557     def get_size(self):
1558         return self._most_recent_size
1559+
1560     def get_current_size(self):
1561         d = self.get_size_of_best_version()
1562         d.addCallback(self._stash_size)
1563hunk ./src/allmydata/mutable/filenode.py 214
1564         return d
1565+
1566     def _stash_size(self, size):
1567         self._most_recent_size = size
1568         return size
1569hunk ./src/allmydata/mutable/filenode.py 273
1570             return cmp(self.__class__, them.__class__)
1571         return cmp(self._uri, them._uri)
1572 
1573-    def _do_serialized(self, cb, *args, **kwargs):
1574-        # note: to avoid deadlock, this callable is *not* allowed to invoke
1575-        # other serialized methods within this (or any other)
1576-        # MutableFileNode. The callable should be a bound method of this same
1577-        # MFN instance.
1578-        d = defer.Deferred()
1579-        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
1580-        # we need to put off d.callback until this Deferred is finished being
1581-        # processed. Otherwise the caller's subsequent activities (like,
1582-        # doing other things with this node) can cause reentrancy problems in
1583-        # the Deferred code itself
1584-        self._serializer.addBoth(lambda res: eventually(d.callback, res))
1585-        # add a log.err just in case something really weird happens, because
1586-        # self._serializer stays around forever, therefore we won't see the
1587-        # usual Unhandled Error in Deferred that would give us a hint.
1588-        self._serializer.addErrback(log.err)
1589-        return d
1590 
1591     #################################
1592     # ICheckable
1593hunk ./src/allmydata/mutable/filenode.py 298
1594 
1595 
1596     #################################
1597-    # IMutableFileNode
1598+    # IFileNode
1599+
1600+    def get_best_readable_version(self):
1601+        """
1602+        I return a Deferred that fires with a MutableFileVersion
1603+        representing the best readable version of the file that I
1604+        represent
1605+        """
1606+        return self.get_readable_version()
1607+
1608+
1609+    def get_readable_version(self, servermap=None, version=None):
1610+        """
1611+        I return a Deferred that fires with an MutableFileVersion for my
1612+        version argument, if there is a recoverable file of that version
1613+        on the grid. If there is no recoverable version, I fire with an
1614+        UnrecoverableFileError.
1615+
1616+        If a servermap is provided, I look in there for the requested
1617+        version. If no servermap is provided, I create and update a new
1618+        one.
1619+
1620+        If no version is provided, then I return a MutableFileVersion
1621+        representing the best recoverable version of the file.
1622+        """
1623+        d = self._get_version_from_servermap(MODE_READ, servermap, version)
1624+        def _build_version((servermap, their_version)):
1625+            assert their_version in servermap.recoverable_versions()
1626+            assert their_version in servermap.make_versionmap()
1627+
1628+            mfv = MutableFileVersion(self,
1629+                                     servermap,
1630+                                     their_version,
1631+                                     self._storage_index,
1632+                                     self._storage_broker,
1633+                                     self._readkey,
1634+                                     history=self._history)
1635+            assert mfv.is_readonly()
1636+            # our caller can use this to download the contents of the
1637+            # mutable file.
1638+            return mfv
1639+        return d.addCallback(_build_version)
1640+
1641+
1642+    def _get_version_from_servermap(self,
1643+                                    mode,
1644+                                    servermap=None,
1645+                                    version=None):
1646+        """
1647+        I return a Deferred that fires with (servermap, version).
1648+
1649+        This function performs validation and a servermap update. If it
1650+        returns (servermap, version), the caller can assume that:
1651+            - servermap was last updated in mode.
1652+            - version is recoverable, and corresponds to the servermap.
1653+
1654+        If version and servermap are provided to me, I will validate
1655+        that version exists in the servermap, and that the servermap was
1656+        updated correctly.
1657+
1658+        If version is not provided, but servermap is, I will validate
1659+        the servermap and return the best recoverable version that I can
1660+        find in the servermap.
1661+
1662+        If the version is provided but the servermap isn't, I will
1663+        obtain a servermap that has been updated in the correct mode and
1664+        validate that version is found and recoverable.
1665+
1666+        If neither servermap nor version are provided, I will obtain a
1667+        servermap updated in the correct mode, and return the best
1668+        recoverable version that I can find in there.
1669+        """
1670+        # XXX: wording ^^^^
1671+        if servermap and servermap.last_update_mode == mode:
1672+            d = defer.succeed(servermap)
1673+        else:
1674+            d = self._get_servermap(mode)
1675+
1676+        def _get_version(servermap, v):
1677+            if v and v not in servermap.recoverable_versions():
1678+                v = None
1679+            elif not v:
1680+                v = servermap.best_recoverable_version()
1681+            if not v:
1682+                raise UnrecoverableFileError("no recoverable versions")
1683+
1684+            return (servermap, v)
1685+        return d.addCallback(_get_version, version)
1686+
1687 
1688     def download_best_version(self):
1689hunk ./src/allmydata/mutable/filenode.py 389
1690+        """
1691+        I return a Deferred that fires with the contents of the best
1692+        version of this mutable file.
1693+        """
1694         return self._do_serialized(self._download_best_version)
1695hunk ./src/allmydata/mutable/filenode.py 394
1696+
1697+
1698     def _download_best_version(self):
1699hunk ./src/allmydata/mutable/filenode.py 397
1700-        servermap = ServerMap()
1701-        d = self._try_once_to_download_best_version(servermap, MODE_READ)
1702-        def _maybe_retry(f):
1703-            f.trap(NotEnoughSharesError)
1704-            # the download is worth retrying once. Make sure to use the
1705-            # old servermap, since it is what remembers the bad shares,
1706-            # but use MODE_WRITE to make it look for even more shares.
1707-            # TODO: consider allowing this to retry multiple times.. this
1708-            # approach will let us tolerate about 8 bad shares, I think.
1709-            return self._try_once_to_download_best_version(servermap,
1710-                                                           MODE_WRITE)
1711+        """
1712+        I am the serialized sibling of download_best_version.
1713+        """
1714+        d = self.get_best_readable_version()
1715+        d.addCallback(self._record_size)
1716+        d.addCallback(lambda version: version.download_to_data())
1717+
1718+        # It is possible that the download will fail because there
1719+        # aren't enough shares to be had. If so, we will try again after
1720+        # updating the servermap in MODE_WRITE, which may find more
1721+        # shares than updating in MODE_READ, as we just did. We can do
1722+        # this by getting the best mutable version and downloading from
1723+        # that -- the best mutable version will be a MutableFileVersion
1724+        # with a servermap that was last updated in MODE_WRITE, as we
1725+        # want. If this fails, then we give up.
1726+        def _maybe_retry(failure):
1727+            failure.trap(NotEnoughSharesError)
1728+
1729+            d = self.get_best_mutable_version()
1730+            d.addCallback(self._record_size)
1731+            d.addCallback(lambda version: version.download_to_data())
1732+            return d
1733+
1734         d.addErrback(_maybe_retry)
1735         return d
1736hunk ./src/allmydata/mutable/filenode.py 422
1737-    def _try_once_to_download_best_version(self, servermap, mode):
1738-        d = self._update_servermap(servermap, mode)
1739-        d.addCallback(self._once_updated_download_best_version, servermap)
1740-        return d
1741-    def _once_updated_download_best_version(self, ignored, servermap):
1742-        goal = servermap.best_recoverable_version()
1743-        if not goal:
1744-            raise UnrecoverableFileError("no recoverable versions")
1745-        return self._try_once_to_download_version(servermap, goal)
1746+
1747+
1748+    def _record_size(self, mfv):
1749+        """
1750+        I record the size of a mutable file version.
1751+        """
1752+        self._most_recent_size = mfv.get_size()
1753+        return mfv
1754+
1755 
1756     def get_size_of_best_version(self):
1757hunk ./src/allmydata/mutable/filenode.py 433
1758-        d = self.get_servermap(MODE_READ)
1759-        def _got_servermap(smap):
1760-            ver = smap.best_recoverable_version()
1761-            if not ver:
1762-                raise UnrecoverableFileError("no recoverable version")
1763-            return smap.size_of_version(ver)
1764-        d.addCallback(_got_servermap)
1765-        return d
1766+        """
1767+        I return the size of the best version of this mutable file.
1768 
1769hunk ./src/allmydata/mutable/filenode.py 436
1770+        This is equivalent to calling get_size() on the result of
1771+        get_best_readable_version().
1772+        """
1773+        d = self.get_best_readable_version()
1774+        return d.addCallback(lambda mfv: mfv.get_size())
1775+
1776+
1777+    #################################
1778+    # IMutableFileNode
1779+
1780+    def get_best_mutable_version(self, servermap=None):
1781+        """
1782+        I return a Deferred that fires with a MutableFileVersion
1783+        representing the best readable version of the file that I
1784+        represent. I am like get_best_readable_version, except that I
1785+        will try to make a writable version if I can.
1786+        """
1787+        return self.get_mutable_version(servermap=servermap)
1788+
1789+
1790+    def get_mutable_version(self, servermap=None, version=None):
1791+        """
1792+        I return a version of this mutable file. I return a Deferred
1793+        that fires with a MutableFileVersion
1794+
1795+        If version is provided, the Deferred will fire with a
1796+        MutableFileVersion initailized with that version. Otherwise, it
1797+        will fire with the best version that I can recover.
1798+
1799+        If servermap is provided, I will use that to find versions
1800+        instead of performing my own servermap update.
1801+        """
1802+        if self.is_readonly():
1803+            return self.get_readable_version(servermap=servermap,
1804+                                             version=version)
1805+
1806+        # get_mutable_version => write intent, so we require that the
1807+        # servermap is updated in MODE_WRITE
1808+        d = self._get_version_from_servermap(MODE_WRITE, servermap, version)
1809+        def _build_version((servermap, smap_version)):
1810+            # these should have been set by the servermap update.
1811+            assert self._secret_holder
1812+            assert self._writekey
1813+
1814+            mfv = MutableFileVersion(self,
1815+                                     servermap,
1816+                                     smap_version,
1817+                                     self._storage_index,
1818+                                     self._storage_broker,
1819+                                     self._readkey,
1820+                                     self._writekey,
1821+                                     self._secret_holder,
1822+                                     history=self._history)
1823+            assert not mfv.is_readonly()
1824+            return mfv
1825+
1826+        return d.addCallback(_build_version)
1827+
1828+
1829+    # XXX: I'm uncomfortable with the difference between upload and
1830+    #      overwrite, which, FWICT, is basically that you don't have to
1831+    #      do a servermap update before you overwrite. We split them up
1832+    #      that way anyway, so I guess there's no real difficulty in
1833+    #      offering both ways to callers, but it also makes the
1834+    #      public-facing API cluttery, and makes it hard to discern the
1835+    #      right way of doing things.
1836+
1837+    # In general, we leave it to callers to ensure that they aren't
1838+    # going to cause UncoordinatedWriteErrors when working with
1839+    # MutableFileVersions. We know that the next three operations
1840+    # (upload, overwrite, and modify) will all operate on the same
1841+    # version, so we say that only one of them can be going on at once,
1842+    # and serialize them to ensure that that actually happens, since as
1843+    # the caller in this situation it is our job to do that.
1844     def overwrite(self, new_contents):
1845hunk ./src/allmydata/mutable/filenode.py 511
1846+        """
1847+        I overwrite the contents of the best recoverable version of this
1848+        mutable file with new_contents. This is equivalent to calling
1849+        overwrite on the result of get_best_mutable_version with
1850+        new_contents as an argument. I return a Deferred that eventually
1851+        fires with the results of my replacement process.
1852+        """
1853         return self._do_serialized(self._overwrite, new_contents)
1854hunk ./src/allmydata/mutable/filenode.py 519
1855+
1856+
1857     def _overwrite(self, new_contents):
1858hunk ./src/allmydata/mutable/filenode.py 522
1859+        """
1860+        I am the serialized sibling of overwrite.
1861+        """
1862+        d = self.get_best_mutable_version()
1863+        d.addCallback(lambda mfv: mfv.overwrite(new_contents))
1864+        d.addCallback(self._did_upload, new_contents.get_size())
1865+        return d
1866+
1867+
1868+
1869+    def upload(self, new_contents, servermap):
1870+        """
1871+        I overwrite the contents of the best recoverable version of this
1872+        mutable file with new_contents, using servermap instead of
1873+        creating/updating our own servermap. I return a Deferred that
1874+        fires with the results of my upload.
1875+        """
1876+        return self._do_serialized(self._upload, new_contents, servermap)
1877+
1878+
1879+    def modify(self, modifier, backoffer=None):
1880+        """
1881+        I modify the contents of the best recoverable version of this
1882+        mutable file with the modifier. This is equivalent to calling
1883+        modify on the result of get_best_mutable_version. I return a
1884+        Deferred that eventually fires with an UploadResults instance
1885+        describing this process.
1886+        """
1887+        return self._do_serialized(self._modify, modifier, backoffer)
1888+
1889+
1890+    def _modify(self, modifier, backoffer):
1891+        """
1892+        I am the serialized sibling of modify.
1893+        """
1894+        d = self.get_best_mutable_version()
1895+        d.addCallback(lambda mfv: mfv.modify(modifier, backoffer))
1896+        return d
1897+
1898+
1899+    def download_version(self, servermap, version, fetch_privkey=False):
1900+        """
1901+        Download the specified version of this mutable file. I return a
1902+        Deferred that fires with the contents of the specified version
1903+        as a bytestring, or errbacks if the file is not recoverable.
1904+        """
1905+        d = self.get_readable_version(servermap, version)
1906+        return d.addCallback(lambda mfv: mfv.download_to_data(fetch_privkey))
1907+
1908+
1909+    def get_servermap(self, mode):
1910+        """
1911+        I return a servermap that has been updated in mode.
1912+
1913+        mode should be one of MODE_READ, MODE_WRITE, MODE_CHECK or
1914+        MODE_ANYTHING. See servermap.py for more on what these mean.
1915+        """
1916+        return self._do_serialized(self._get_servermap, mode)
1917+
1918+
1919+    def _get_servermap(self, mode):
1920+        """
1921+        I am a serialized twin to get_servermap.
1922+        """
1923         servermap = ServerMap()
1924hunk ./src/allmydata/mutable/filenode.py 587
1925-        d = self._update_servermap(servermap, mode=MODE_WRITE)
1926-        d.addCallback(lambda ignored: self._upload(new_contents, servermap))
1927+        d = self._update_servermap(servermap, mode)
1928+        # The servermap will tell us about the most recent size of the
1929+        # file, so we may as well set that so that callers might get
1930+        # more data about us.
1931+        if not self._most_recent_size:
1932+            d.addCallback(self._get_size_from_servermap)
1933+        return d
1934+
1935+
1936+    def _get_size_from_servermap(self, servermap):
1937+        """
1938+        I extract the size of the best version of this file and record
1939+        it in self._most_recent_size. I return the servermap that I was
1940+        given.
1941+        """
1942+        if servermap.recoverable_versions():
1943+            v = servermap.best_recoverable_version()
1944+            size = v[4] # verinfo[4] == size
1945+            self._most_recent_size = size
1946+        return servermap
1947+
1948+
1949+    def _update_servermap(self, servermap, mode):
1950+        u = ServermapUpdater(self, self._storage_broker, Monitor(), servermap,
1951+                             mode)
1952+        if self._history:
1953+            self._history.notify_mapupdate(u.get_status())
1954+        return u.update()
1955+
1956+
1957+    def set_version(self, version):
1958+        # I can be set in two ways:
1959+        #  1. When the node is created.
1960+        #  2. (for an existing share) when the Servermap is updated
1961+        #     before I am read.
1962+        assert version in (MDMF_VERSION, SDMF_VERSION)
1963+        self._protocol_version = version
1964+
1965+
1966+    def get_version(self):
1967+        return self._protocol_version
1968+
1969+
1970+    def _do_serialized(self, cb, *args, **kwargs):
1971+        # note: to avoid deadlock, this callable is *not* allowed to invoke
1972+        # other serialized methods within this (or any other)
1973+        # MutableFileNode. The callable should be a bound method of this same
1974+        # MFN instance.
1975+        d = defer.Deferred()
1976+        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
1977+        # we need to put off d.callback until this Deferred is finished being
1978+        # processed. Otherwise the caller's subsequent activities (like,
1979+        # doing other things with this node) can cause reentrancy problems in
1980+        # the Deferred code itself
1981+        self._serializer.addBoth(lambda res: eventually(d.callback, res))
1982+        # add a log.err just in case something really weird happens, because
1983+        # self._serializer stays around forever, therefore we won't see the
1984+        # usual Unhandled Error in Deferred that would give us a hint.
1985+        self._serializer.addErrback(log.err)
1986         return d
1987 
1988 
1989hunk ./src/allmydata/mutable/filenode.py 649
1990+    def _upload(self, new_contents, servermap):
1991+        """
1992+        A MutableFileNode still has to have some way of getting
1993+        published initially, which is what I am here for. After that,
1994+        all publishing, updating, modifying and so on happens through
1995+        MutableFileVersions.
1996+        """
1997+        assert self._pubkey, "update_servermap must be called before publish"
1998+
1999+        p = Publish(self, self._storage_broker, servermap)
2000+        if self._history:
2001+            self._history.notify_publish(p.get_status(),
2002+                                         new_contents.get_size())
2003+        d = p.publish(new_contents)
2004+        d.addCallback(self._did_upload, new_contents.get_size())
2005+        return d
2006+
2007+
2008+    def _did_upload(self, res, size):
2009+        self._most_recent_size = size
2010+        return res
2011+
2012+
2013+class MutableFileVersion:
2014+    """
2015+    I represent a specific version (most likely the best version) of a
2016+    mutable file.
2017+
2018+    Since I implement IReadable, instances which hold a
2019+    reference to an instance of me are guaranteed the ability (absent
2020+    connection difficulties or unrecoverable versions) to read the file
2021+    that I represent. Depending on whether I was initialized with a
2022+    write capability or not, I may also provide callers the ability to
2023+    overwrite or modify the contents of the mutable file that I
2024+    reference.
2025+    """
2026+    implements(IMutableFileVersion, IWritable)
2027+
2028+    def __init__(self,
2029+                 node,
2030+                 servermap,
2031+                 version,
2032+                 storage_index,
2033+                 storage_broker,
2034+                 readcap,
2035+                 writekey=None,
2036+                 write_secrets=None,
2037+                 history=None):
2038+
2039+        self._node = node
2040+        self._servermap = servermap
2041+        self._version = version
2042+        self._storage_index = storage_index
2043+        self._write_secrets = write_secrets
2044+        self._history = history
2045+        self._storage_broker = storage_broker
2046+
2047+        #assert isinstance(readcap, IURI)
2048+        self._readcap = readcap
2049+
2050+        self._writekey = writekey
2051+        self._serializer = defer.succeed(None)
2052+
2053+
2054+    def get_sequence_number(self):
2055+        """
2056+        Get the sequence number of the mutable version that I represent.
2057+        """
2058+        return self._version[0] # verinfo[0] == the sequence number
2059+
2060+
2061+    # TODO: Terminology?
2062+    def get_writekey(self):
2063+        """
2064+        I return a writekey or None if I don't have a writekey.
2065+        """
2066+        return self._writekey
2067+
2068+
2069+    def overwrite(self, new_contents):
2070+        """
2071+        I overwrite the contents of this mutable file version with the
2072+        data in new_contents.
2073+        """
2074+        assert not self.is_readonly()
2075+
2076+        return self._do_serialized(self._overwrite, new_contents)
2077+
2078+
2079+    def _overwrite(self, new_contents):
2080+        assert IMutableUploadable.providedBy(new_contents)
2081+        assert self._servermap.last_update_mode == MODE_WRITE
2082+
2083+        return self._upload(new_contents)
2084+
2085+
2086     def modify(self, modifier, backoffer=None):
2087         """I use a modifier callback to apply a change to the mutable file.
2088         I implement the following pseudocode::
2089hunk ./src/allmydata/mutable/filenode.py 785
2090         backoffer should not invoke any methods on this MutableFileNode
2091         instance, and it needs to be highly conscious of deadlock issues.
2092         """
2093+        assert not self.is_readonly()
2094+
2095         return self._do_serialized(self._modify, modifier, backoffer)
2096hunk ./src/allmydata/mutable/filenode.py 788
2097+
2098+
2099     def _modify(self, modifier, backoffer):
2100hunk ./src/allmydata/mutable/filenode.py 791
2101-        servermap = ServerMap()
2102         if backoffer is None:
2103             backoffer = BackoffAgent().delay
2104hunk ./src/allmydata/mutable/filenode.py 793
2105-        return self._modify_and_retry(servermap, modifier, backoffer, True)
2106-    def _modify_and_retry(self, servermap, modifier, backoffer, first_time):
2107-        d = self._modify_once(servermap, modifier, first_time)
2108+        return self._modify_and_retry(modifier, backoffer, True)
2109+
2110+
2111+    def _modify_and_retry(self, modifier, backoffer, first_time):
2112+        """
2113+        I try to apply modifier to the contents of this version of the
2114+        mutable file. If I succeed, I return an UploadResults instance
2115+        describing my success. If I fail, I try again after waiting for
2116+        a little bit.
2117+        """
2118+        log.msg("doing modify")
2119+        d = self._modify_once(modifier, first_time)
2120         def _retry(f):
2121             f.trap(UncoordinatedWriteError)
2122             d2 = defer.maybeDeferred(backoffer, self, f)
2123hunk ./src/allmydata/mutable/filenode.py 809
2124             d2.addCallback(lambda ignored:
2125-                           self._modify_and_retry(servermap, modifier,
2126+                           self._modify_and_retry(modifier,
2127                                                   backoffer, False))
2128             return d2
2129         d.addErrback(_retry)
2130hunk ./src/allmydata/mutable/filenode.py 814
2131         return d
2132-    def _modify_once(self, servermap, modifier, first_time):
2133-        d = self._update_servermap(servermap, MODE_WRITE)
2134-        d.addCallback(self._once_updated_download_best_version, servermap)
2135+
2136+
2137+    def _modify_once(self, modifier, first_time):
2138+        """
2139+        I attempt to apply a modifier to the contents of the mutable
2140+        file.
2141+        """
2142+        # XXX: This is wrong -- we could get more servers if we updated
2143+        # in MODE_ANYTHING and possibly MODE_CHECK. Probably we want to
2144+        # assert that the last update wasn't MODE_READ
2145+        assert self._servermap.last_update_mode == MODE_WRITE
2146+
2147+        # download_to_data is serialized, so we have to call this to
2148+        # avoid deadlock.
2149+        d = self._try_to_download_data()
2150         def _apply(old_contents):
2151hunk ./src/allmydata/mutable/filenode.py 830
2152-            new_contents = modifier(old_contents, servermap, first_time)
2153+            new_contents = modifier(old_contents, self._servermap, first_time)
2154+            precondition((isinstance(new_contents, str) or
2155+                          new_contents is None),
2156+                         "Modifier function must return a string "
2157+                         "or None")
2158+
2159             if new_contents is None or new_contents == old_contents:
2160hunk ./src/allmydata/mutable/filenode.py 837
2161+                log.msg("no changes")
2162                 # no changes need to be made
2163                 if first_time:
2164                     return
2165hunk ./src/allmydata/mutable/filenode.py 845
2166                 # recovery when it observes UCWE, we need to do a second
2167                 # publish. See #551 for details. We'll basically loop until
2168                 # we managed an uncontested publish.
2169-                new_contents = old_contents
2170-            precondition(isinstance(new_contents, str),
2171-                         "Modifier function must return a string or None")
2172-            return self._upload(new_contents, servermap)
2173+                old_uploadable = MutableData(old_contents)
2174+                new_contents = old_uploadable
2175+            else:
2176+                new_contents = MutableData(new_contents)
2177+
2178+            return self._upload(new_contents)
2179         d.addCallback(_apply)
2180         return d
2181 
2182hunk ./src/allmydata/mutable/filenode.py 854
2183-    def get_servermap(self, mode):
2184-        return self._do_serialized(self._get_servermap, mode)
2185-    def _get_servermap(self, mode):
2186-        servermap = ServerMap()
2187-        return self._update_servermap(servermap, mode)
2188-    def _update_servermap(self, servermap, mode):
2189-        u = ServermapUpdater(self, self._storage_broker, Monitor(), servermap,
2190-                             mode)
2191-        if self._history:
2192-            self._history.notify_mapupdate(u.get_status())
2193-        return u.update()
2194 
2195hunk ./src/allmydata/mutable/filenode.py 855
2196-    def download_version(self, servermap, version, fetch_privkey=False):
2197-        return self._do_serialized(self._try_once_to_download_version,
2198-                                   servermap, version, fetch_privkey)
2199-    def _try_once_to_download_version(self, servermap, version,
2200-                                      fetch_privkey=False):
2201-        r = Retrieve(self, servermap, version, fetch_privkey)
2202+    def is_readonly(self):
2203+        """
2204+        I return True if this MutableFileVersion provides no write
2205+        access to the file that it encapsulates, and False if it
2206+        provides the ability to modify the file.
2207+        """
2208+        return self._writekey is None
2209+
2210+
2211+    def is_mutable(self):
2212+        """
2213+        I return True, since mutable files are always mutable by
2214+        somebody.
2215+        """
2216+        return True
2217+
2218+
2219+    def get_storage_index(self):
2220+        """
2221+        I return the storage index of the reference that I encapsulate.
2222+        """
2223+        return self._storage_index
2224+
2225+
2226+    def get_size(self):
2227+        """
2228+        I return the length, in bytes, of this readable object.
2229+        """
2230+        return self._servermap.size_of_version(self._version)
2231+
2232+
2233+    def download_to_data(self, fetch_privkey=False):
2234+        """
2235+        I return a Deferred that fires with the contents of this
2236+        readable object as a byte string.
2237+
2238+        """
2239+        c = consumer.MemoryConsumer()
2240+        d = self.read(c, fetch_privkey=fetch_privkey)
2241+        d.addCallback(lambda mc: "".join(mc.chunks))
2242+        return d
2243+
2244+
2245+    def _try_to_download_data(self):
2246+        """
2247+        I am an unserialized cousin of download_to_data; I am called
2248+        from the children of modify() to download the data associated
2249+        with this mutable version.
2250+        """
2251+        c = consumer.MemoryConsumer()
2252+        # modify will almost certainly write, so we need the privkey.
2253+        d = self._read(c, fetch_privkey=True)
2254+        d.addCallback(lambda mc: "".join(mc.chunks))
2255+        return d
2256+
2257+
2258+    def read(self, consumer, offset=0, size=None, fetch_privkey=False):
2259+        """
2260+        I read a portion (possibly all) of the mutable file that I
2261+        reference into consumer.
2262+        """
2263+        return self._do_serialized(self._read, consumer, offset, size,
2264+                                   fetch_privkey)
2265+
2266+
2267+    def _read(self, consumer, offset=0, size=None, fetch_privkey=False):
2268+        """
2269+        I am the serialized companion of read.
2270+        """
2271+        r = Retrieve(self._node, self._servermap, self._version, fetch_privkey)
2272         if self._history:
2273             self._history.notify_retrieve(r.get_status())
2274hunk ./src/allmydata/mutable/filenode.py 927
2275-        d = r.download()
2276-        d.addCallback(self._downloaded_version)
2277+        d = r.download(consumer, offset, size)
2278         return d
2279hunk ./src/allmydata/mutable/filenode.py 929
2280-    def _downloaded_version(self, data):
2281-        self._most_recent_size = len(data)
2282-        return data
2283 
2284hunk ./src/allmydata/mutable/filenode.py 930
2285-    def upload(self, new_contents, servermap):
2286-        return self._do_serialized(self._upload, new_contents, servermap)
2287-    def _upload(self, new_contents, servermap):
2288-        assert self._pubkey, "update_servermap must be called before publish"
2289-        p = Publish(self, self._storage_broker, servermap)
2290+
2291+    def _do_serialized(self, cb, *args, **kwargs):
2292+        # note: to avoid deadlock, this callable is *not* allowed to invoke
2293+        # other serialized methods within this (or any other)
2294+        # MutableFileNode. The callable should be a bound method of this same
2295+        # MFN instance.
2296+        d = defer.Deferred()
2297+        self._serializer.addCallback(lambda ignore: cb(*args, **kwargs))
2298+        # we need to put off d.callback until this Deferred is finished being
2299+        # processed. Otherwise the caller's subsequent activities (like,
2300+        # doing other things with this node) can cause reentrancy problems in
2301+        # the Deferred code itself
2302+        self._serializer.addBoth(lambda res: eventually(d.callback, res))
2303+        # add a log.err just in case something really weird happens, because
2304+        # self._serializer stays around forever, therefore we won't see the
2305+        # usual Unhandled Error in Deferred that would give us a hint.
2306+        self._serializer.addErrback(log.err)
2307+        return d
2308+
2309+
2310+    def _upload(self, new_contents):
2311+        #assert self._pubkey, "update_servermap must be called before publish"
2312+        p = Publish(self._node, self._storage_broker, self._servermap)
2313         if self._history:
2314hunk ./src/allmydata/mutable/filenode.py 954
2315-            self._history.notify_publish(p.get_status(), len(new_contents))
2316+            self._history.notify_publish(p.get_status(),
2317+                                         new_contents.get_size())
2318         d = p.publish(new_contents)
2319hunk ./src/allmydata/mutable/filenode.py 957
2320-        d.addCallback(self._did_upload, len(new_contents))
2321+        d.addCallback(self._did_upload, new_contents.get_size())
2322         return d
2323hunk ./src/allmydata/mutable/filenode.py 959
2324+
2325+
2326     def _did_upload(self, res, size):
2327         self._most_recent_size = size
2328         return res
2329hunk ./src/allmydata/mutable/filenode.py 964
2330+
2331+    def update(self, data, offset):
2332+        """
2333+        Do an update of this mutable file version by inserting data at
2334+        offset within the file. If offset is the EOF, this is an append
2335+        operation. I return a Deferred that fires with the results of
2336+        the update operation when it has completed.
2337+
2338+        In cases where update does not append any data, or where it does
2339+        not append so many blocks that the block count crosses a
2340+        power-of-two boundary, this operation will use roughly
2341+        O(data.get_size()) memory/bandwidth/CPU to perform the update.
2342+        Otherwise, it must download, re-encode, and upload the entire
2343+        file again, which will use O(filesize) resources.
2344+        """
2345+        return self._do_serialized(self._update, data, offset)
2346+
2347+
2348+    def _update(self, data, offset):
2349+        """
2350+        I update the mutable file version represented by this particular
2351+        IMutableVersion by inserting the data in data at the offset
2352+        offset. I return a Deferred that fires when this has been
2353+        completed.
2354+        """
2355+        # We have two cases here:
2356+        # 1. The new data will add few enough segments so that it does
2357+        #    not cross into the next power-of-two boundary.
2358+        # 2. It doesn't.
2359+        #
2360+        # In the former case, we can modify the file in place. In the
2361+        # latter case, we need to re-encode the file.
2362+        new_size = data.get_size() + offset
2363+        old_size = self.get_size()
2364+        segment_size = self._version[3]
2365+        num_old_segments = mathutil.div_ceil(old_size,
2366+                                             segment_size)
2367+        num_new_segments = mathutil.div_ceil(new_size,
2368+                                             segment_size)
2369+        log.msg("got %d old segments, %d new segments" % \
2370+                        (num_old_segments, num_new_segments))
2371+
2372+        # We also do a whole file re-encode if the file is an SDMF file.
2373+        if self._version[2]: # version[2] == SDMF salt, which MDMF lacks
2374+            log.msg("doing re-encode instead of in-place update")
2375+            return self._do_modify_update(data, offset)
2376+
2377+        log.msg("updating in place")
2378+        d = self._do_update_update(data, offset)
2379+        d.addCallback(self._decode_and_decrypt_segments, data, offset)
2380+        d.addCallback(self._build_uploadable_and_finish, data, offset)
2381+        return d
2382+
2383+
2384+    def _do_modify_update(self, data, offset):
2385+        """
2386+        I perform a file update by modifying the contents of the file
2387+        after downloading it, then reuploading it. I am less efficient
2388+        than _do_update_update, but am necessary for certain updates.
2389+        """
2390+        def m(old, servermap, first_time):
2391+            start = offset
2392+            rest = offset + data.get_size()
2393+            new = old[:start]
2394+            new += "".join(data.read(data.get_size()))
2395+            new += old[rest:]
2396+            return new
2397+        return self._modify(m, None)
2398+
2399+
2400+    def _do_update_update(self, data, offset):
2401+        """
2402+        I start the Servermap update that gets us the data we need to
2403+        continue the update process. I return a Deferred that fires when
2404+        the servermap update is done.
2405+        """
2406+        assert IMutableUploadable.providedBy(data)
2407+        assert self.is_mutable()
2408+        # offset == self.get_size() is valid and means that we are
2409+        # appending data to the file.
2410+        assert offset <= self.get_size()
2411+
2412+        # We'll need the segment that the data starts in, regardless of
2413+        # what we'll do later.
2414+        start_segment = mathutil.div_ceil(offset, DEFAULT_MAX_SEGMENT_SIZE)
2415+        start_segment -= 1
2416+
2417+        # We only need the end segment if the data we append does not go
2418+        # beyond the current end-of-file.
2419+        end_segment = start_segment
2420+        if offset + data.get_size() < self.get_size():
2421+            end_data = offset + data.get_size()
2422+            end_segment = mathutil.div_ceil(end_data, DEFAULT_MAX_SEGMENT_SIZE)
2423+            end_segment -= 1
2424+        self._start_segment = start_segment
2425+        self._end_segment = end_segment
2426+
2427+        # Now ask for the servermap to be updated in MODE_WRITE with
2428+        # this update range.
2429+        u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
2430+                             self._servermap,
2431+                             mode=MODE_WRITE,
2432+                             update_range=(start_segment, end_segment))
2433+        return u.update()
2434+
2435+
2436+    def _decode_and_decrypt_segments(self, ignored, data, offset):
2437+        """
2438+        After the servermap update, I take the encrypted and encoded
2439+        data that the servermap fetched while doing its update and
2440+        transform it into decoded-and-decrypted plaintext that can be
2441+        used by the new uploadable. I return a Deferred that fires with
2442+        the segments.
2443+        """
2444+        r = Retrieve(self._node, self._servermap, self._version)
2445+        # decode: takes in our blocks and salts from the servermap,
2446+        # returns a Deferred that fires with the corresponding plaintext
2447+        # segments. Does not download -- simply takes advantage of
2448+        # existing infrastructure within the Retrieve class to avoid
2449+        # duplicating code.
2450+        sm = self._servermap
2451+        # XXX: If the methods in the servermap don't work as
2452+        # abstractions, you should rewrite them instead of going around
2453+        # them.
2454+        update_data = sm.update_data
2455+        start_segments = {} # shnum -> start segment
2456+        end_segments = {} # shnum -> end segment
2457+        blockhashes = {} # shnum -> blockhash tree
2458+        for (shnum, data) in update_data.iteritems():
2459+            data = [d[1] for d in data if d[0] == self._version]
2460+
2461+            # Every data entry in our list should now be share shnum for
2462+            # a particular version of the mutable file, so all of the
2463+            # entries should be identical.
2464+            datum = data[0]
2465+            assert filter(lambda x: x != datum, data) == []
2466+
2467+            blockhashes[shnum] = datum[0]
2468+            start_segments[shnum] = datum[1]
2469+            end_segments[shnum] = datum[2]
2470+
2471+        d1 = r.decode(start_segments, self._start_segment)
2472+        d2 = r.decode(end_segments, self._end_segment)
2473+        d3 = defer.succeed(blockhashes)
2474+        return deferredutil.gatherResults([d1, d2, d3])
2475+
2476+
2477+    def _build_uploadable_and_finish(self, segments_and_bht, data, offset):
2478+        """
2479+        After the process has the plaintext segments, I build the
2480+        TransformingUploadable that the publisher will eventually
2481+        re-upload to the grid. I then invoke the publisher with that
2482+        uploadable, and return a Deferred when the publish operation has
2483+        completed without issue.
2484+        """
2485+        u = TransformingUploadable(data, offset,
2486+                                   self._version[3],
2487+                                   segments_and_bht[0],
2488+                                   segments_and_bht[1])
2489+        p = Publish(self._node, self._storage_broker, self._servermap)
2490+        return p.update(u, offset, segments_and_bht[2], self._version)
2491}
2492[mutable/publish.py: Modify the publish process to support MDMF
2493Kevan Carstensen <kevan@isnotajoke.com>**20100819003342
2494 Ignore-this: 2bb379974927e2e20cff75bae8302d1d
2495 
2496 The inner workings of the publishing process needed to be reworked to a
2497 large extend to cope with segmented mutable files, and to cope with
2498 partial-file updates of mutable files. This patch does that. It also
2499 introduces wrappers for uploadable data, allowing the use of
2500 filehandle-like objects as data sources, in addition to strings. This
2501 reduces memory inefficiency when dealing with large files through the
2502 webapi, and clarifies update code there.
2503] {
2504hunk ./src/allmydata/mutable/publish.py 3
2505 
2506 
2507-import os, struct, time
2508+import os, time
2509+from StringIO import StringIO
2510 from itertools import count
2511 from zope.interface import implements
2512 from twisted.internet import defer
2513hunk ./src/allmydata/mutable/publish.py 9
2514 from twisted.python import failure
2515-from allmydata.interfaces import IPublishStatus
2516+from allmydata.interfaces import IPublishStatus, SDMF_VERSION, MDMF_VERSION, \
2517+                                 IMutableUploadable
2518 from allmydata.util import base32, hashutil, mathutil, idlib, log
2519 from allmydata.util.dictutil import DictOfSets
2520 from allmydata import hashtree, codec
2521hunk ./src/allmydata/mutable/publish.py 21
2522 from allmydata.mutable.common import MODE_WRITE, MODE_CHECK, \
2523      UncoordinatedWriteError, NotEnoughServersError
2524 from allmydata.mutable.servermap import ServerMap
2525-from allmydata.mutable.layout import pack_prefix, pack_share, unpack_header, pack_checkstring, \
2526-     unpack_checkstring, SIGNED_PREFIX
2527+from allmydata.mutable.layout import unpack_checkstring, MDMFSlotWriteProxy, \
2528+                                     SDMFSlotWriteProxy
2529+
2530+KiB = 1024
2531+DEFAULT_MAX_SEGMENT_SIZE = 128 * KiB
2532+PUSHING_BLOCKS_STATE = 0
2533+PUSHING_EVERYTHING_ELSE_STATE = 1
2534+DONE_STATE = 2
2535 
2536 class PublishStatus:
2537     implements(IPublishStatus)
2538hunk ./src/allmydata/mutable/publish.py 118
2539         self._status.set_helper(False)
2540         self._status.set_progress(0.0)
2541         self._status.set_active(True)
2542+        self._version = self._node.get_version()
2543+        assert self._version in (SDMF_VERSION, MDMF_VERSION)
2544+
2545 
2546     def get_status(self):
2547         return self._status
2548hunk ./src/allmydata/mutable/publish.py 132
2549             kwargs["facility"] = "tahoe.mutable.publish"
2550         return log.msg(*args, **kwargs)
2551 
2552+
2553+    def update(self, data, offset, blockhashes, version):
2554+        """
2555+        I replace the contents of this file with the contents of data,
2556+        starting at offset. I return a Deferred that fires with None
2557+        when the replacement has been completed, or with an error if
2558+        something went wrong during the process.
2559+
2560+        Note that this process will not upload new shares. If the file
2561+        being updated is in need of repair, callers will have to repair
2562+        it on their own.
2563+        """
2564+        # How this works:
2565+        # 1: Make peer assignments. We'll assign each share that we know
2566+        # about on the grid to that peer that currently holds that
2567+        # share, and will not place any new shares.
2568+        # 2: Setup encoding parameters. Most of these will stay the same
2569+        # -- datalength will change, as will some of the offsets.
2570+        # 3. Upload the new segments.
2571+        # 4. Be done.
2572+        assert IMutableUploadable.providedBy(data)
2573+
2574+        self.data = data
2575+
2576+        # XXX: Use the MutableFileVersion instead.
2577+        self.datalength = self._node.get_size()
2578+        if data.get_size() > self.datalength:
2579+            self.datalength = data.get_size()
2580+
2581+        self.log("starting update")
2582+        self.log("adding new data of length %d at offset %d" % \
2583+                    (data.get_size(), offset))
2584+        self.log("new data length is %d" % self.datalength)
2585+        self._status.set_size(self.datalength)
2586+        self._status.set_status("Started")
2587+        self._started = time.time()
2588+
2589+        self.done_deferred = defer.Deferred()
2590+
2591+        self._writekey = self._node.get_writekey()
2592+        assert self._writekey, "need write capability to publish"
2593+
2594+        # first, which servers will we publish to? We require that the
2595+        # servermap was updated in MODE_WRITE, so we can depend upon the
2596+        # peerlist computed by that process instead of computing our own.
2597+        assert self._servermap
2598+        assert self._servermap.last_update_mode in (MODE_WRITE, MODE_CHECK)
2599+        # we will push a version that is one larger than anything present
2600+        # in the grid, according to the servermap.
2601+        self._new_seqnum = self._servermap.highest_seqnum() + 1
2602+        self._status.set_servermap(self._servermap)
2603+
2604+        self.log(format="new seqnum will be %(seqnum)d",
2605+                 seqnum=self._new_seqnum, level=log.NOISY)
2606+
2607+        # We're updating an existing file, so all of the following
2608+        # should be available.
2609+        self.readkey = self._node.get_readkey()
2610+        self.required_shares = self._node.get_required_shares()
2611+        assert self.required_shares is not None
2612+        self.total_shares = self._node.get_total_shares()
2613+        assert self.total_shares is not None
2614+        self._status.set_encoding(self.required_shares, self.total_shares)
2615+
2616+        self._pubkey = self._node.get_pubkey()
2617+        assert self._pubkey
2618+        self._privkey = self._node.get_privkey()
2619+        assert self._privkey
2620+        self._encprivkey = self._node.get_encprivkey()
2621+
2622+        sb = self._storage_broker
2623+        full_peerlist = sb.get_servers_for_index(self._storage_index)
2624+        self.full_peerlist = full_peerlist # for use later, immutable
2625+        self.bad_peers = set() # peerids who have errbacked/refused requests
2626+
2627+        # This will set self.segment_size, self.num_segments, and
2628+        # self.fec. TODO: Does it know how to do the offset? Probably
2629+        # not. So do that part next.
2630+        self.setup_encoding_parameters(offset=offset)
2631+
2632+        # if we experience any surprises (writes which were rejected because
2633+        # our test vector did not match, or shares which we didn't expect to
2634+        # see), we set this flag and report an UncoordinatedWriteError at the
2635+        # end of the publish process.
2636+        self.surprised = False
2637+
2638+        # we keep track of three tables. The first is our goal: which share
2639+        # we want to see on which servers. This is initially populated by the
2640+        # existing servermap.
2641+        self.goal = set() # pairs of (peerid, shnum) tuples
2642+
2643+        # the second table is our list of outstanding queries: those which
2644+        # are in flight and may or may not be delivered, accepted, or
2645+        # acknowledged. Items are added to this table when the request is
2646+        # sent, and removed when the response returns (or errbacks).
2647+        self.outstanding = set() # (peerid, shnum) tuples
2648+
2649+        # the third is a table of successes: share which have actually been
2650+        # placed. These are populated when responses come back with success.
2651+        # When self.placed == self.goal, we're done.
2652+        self.placed = set() # (peerid, shnum) tuples
2653+
2654+        # we also keep a mapping from peerid to RemoteReference. Each time we
2655+        # pull a connection out of the full peerlist, we add it to this for
2656+        # use later.
2657+        self.connections = {}
2658+
2659+        self.bad_share_checkstrings = {}
2660+
2661+        # This is set at the last step of the publishing process.
2662+        self.versioninfo = ""
2663+
2664+        # we use the servermap to populate the initial goal: this way we will
2665+        # try to update each existing share in place. Since we're
2666+        # updating, we ignore damaged and missing shares -- callers must
2667+        # do a repair to repair and recreate these.
2668+        for (peerid, shnum) in self._servermap.servermap:
2669+            self.goal.add( (peerid, shnum) )
2670+            self.connections[peerid] = self._servermap.connections[peerid]
2671+        self.writers = {}
2672+
2673+        # SDMF files are updated differently.
2674+        self._version = MDMF_VERSION
2675+        writer_class = MDMFSlotWriteProxy
2676+
2677+        # For each (peerid, shnum) in self.goal, we make a
2678+        # write proxy for that peer. We'll use this to write
2679+        # shares to the peer.
2680+        for key in self.goal:
2681+            peerid, shnum = key
2682+            write_enabler = self._node.get_write_enabler(peerid)
2683+            renew_secret = self._node.get_renewal_secret(peerid)
2684+            cancel_secret = self._node.get_cancel_secret(peerid)
2685+            secrets = (write_enabler, renew_secret, cancel_secret)
2686+
2687+            self.writers[shnum] =  writer_class(shnum,
2688+                                                self.connections[peerid],
2689+                                                self._storage_index,
2690+                                                secrets,
2691+                                                self._new_seqnum,
2692+                                                self.required_shares,
2693+                                                self.total_shares,
2694+                                                self.segment_size,
2695+                                                self.datalength)
2696+            self.writers[shnum].peerid = peerid
2697+            assert (peerid, shnum) in self._servermap.servermap
2698+            old_versionid, old_timestamp = self._servermap.servermap[key]
2699+            (old_seqnum, old_root_hash, old_salt, old_segsize,
2700+             old_datalength, old_k, old_N, old_prefix,
2701+             old_offsets_tuple) = old_versionid
2702+            self.writers[shnum].set_checkstring(old_seqnum,
2703+                                                old_root_hash,
2704+                                                old_salt)
2705+
2706+        # Our remote shares will not have a complete checkstring until
2707+        # after we are done writing share data and have started to write
2708+        # blocks. In the meantime, we need to know what to look for when
2709+        # writing, so that we can detect UncoordinatedWriteErrors.
2710+        self._checkstring = self.writers.values()[0].get_checkstring()
2711+
2712+        # Now, we start pushing shares.
2713+        self._status.timings["setup"] = time.time() - self._started
2714+        # First, we encrypt, encode, and publish the shares that we need
2715+        # to encrypt, encode, and publish.
2716+
2717+        # Our update process fetched these for us. We need to update
2718+        # them in place as publishing happens.
2719+        self.blockhashes = {} # (shnum, [blochashes])
2720+        for (i, bht) in blockhashes.iteritems():
2721+            # We need to extract the leaves from our old hash tree.
2722+            old_segcount = mathutil.div_ceil(version[4],
2723+                                             version[3])
2724+            h = hashtree.IncompleteHashTree(old_segcount)
2725+            bht = dict(enumerate(bht))
2726+            h.set_hashes(bht)
2727+            leaves = h[h.get_leaf_index(0):]
2728+            for j in xrange(self.num_segments - len(leaves)):
2729+                leaves.append(None)
2730+
2731+            assert len(leaves) >= self.num_segments
2732+            self.blockhashes[i] = leaves
2733+            # This list will now be the leaves that were set during the
2734+            # initial upload + enough empty hashes to make it a
2735+            # power-of-two. If we exceed a power of two boundary, we
2736+            # should be encoding the file over again, and should not be
2737+            # here. So, we have
2738+            #assert len(self.blockhashes[i]) == \
2739+            #    hashtree.roundup_pow2(self.num_segments), \
2740+            #        len(self.blockhashes[i])
2741+            # XXX: Except this doesn't work. Figure out why.
2742+
2743+        # These are filled in later, after we've modified the block hash
2744+        # tree suitably.
2745+        self.sharehash_leaves = None # eventually [sharehashes]
2746+        self.sharehashes = {} # shnum -> [sharehash leaves necessary to
2747+                              # validate the share]
2748+
2749+        self.log("Starting push")
2750+
2751+        self._state = PUSHING_BLOCKS_STATE
2752+        self._push()
2753+
2754+        return self.done_deferred
2755+
2756+
2757     def publish(self, newdata):
2758         """Publish the filenode's current contents.  Returns a Deferred that
2759         fires (with None) when the publish has done as much work as it's ever
2760hunk ./src/allmydata/mutable/publish.py 344
2761         simultaneous write.
2762         """
2763 
2764-        # 1: generate shares (SDMF: files are small, so we can do it in RAM)
2765-        # 2: perform peer selection, get candidate servers
2766-        #  2a: send queries to n+epsilon servers, to determine current shares
2767-        #  2b: based upon responses, create target map
2768-        # 3: send slot_testv_and_readv_and_writev messages
2769-        # 4: as responses return, update share-dispatch table
2770-        # 4a: may need to run recovery algorithm
2771-        # 5: when enough responses are back, we're done
2772+        # 0. Setup encoding parameters, encoder, and other such things.
2773+        # 1. Encrypt, encode, and publish segments.
2774+        assert IMutableUploadable.providedBy(newdata)
2775 
2776hunk ./src/allmydata/mutable/publish.py 348
2777-        self.log("starting publish, datalen is %s" % len(newdata))
2778-        self._status.set_size(len(newdata))
2779+        self.data = newdata
2780+        self.datalength = newdata.get_size()
2781+        #if self.datalength >= DEFAULT_MAX_SEGMENT_SIZE:
2782+        #    self._version = MDMF_VERSION
2783+        #else:
2784+        #    self._version = SDMF_VERSION
2785+
2786+        self.log("starting publish, datalen is %s" % self.datalength)
2787+        self._status.set_size(self.datalength)
2788         self._status.set_status("Started")
2789         self._started = time.time()
2790 
2791hunk ./src/allmydata/mutable/publish.py 405
2792         self.full_peerlist = full_peerlist # for use later, immutable
2793         self.bad_peers = set() # peerids who have errbacked/refused requests
2794 
2795-        self.newdata = newdata
2796-        self.salt = os.urandom(16)
2797-
2798+        # This will set self.segment_size, self.num_segments, and
2799+        # self.fec.
2800         self.setup_encoding_parameters()
2801 
2802         # if we experience any surprises (writes which were rejected because
2803hunk ./src/allmydata/mutable/publish.py 415
2804         # end of the publish process.
2805         self.surprised = False
2806 
2807-        # as a failsafe, refuse to iterate through self.loop more than a
2808-        # thousand times.
2809-        self.looplimit = 1000
2810-
2811         # we keep track of three tables. The first is our goal: which share
2812         # we want to see on which servers. This is initially populated by the
2813         # existing servermap.
2814hunk ./src/allmydata/mutable/publish.py 438
2815 
2816         self.bad_share_checkstrings = {}
2817 
2818+        # This is set at the last step of the publishing process.
2819+        self.versioninfo = ""
2820+
2821         # we use the servermap to populate the initial goal: this way we will
2822         # try to update each existing share in place.
2823         for (peerid, shnum) in self._servermap.servermap:
2824hunk ./src/allmydata/mutable/publish.py 454
2825             self.bad_share_checkstrings[key] = old_checkstring
2826             self.connections[peerid] = self._servermap.connections[peerid]
2827 
2828-        # create the shares. We'll discard these as they are delivered. SDMF:
2829-        # we're allowed to hold everything in memory.
2830+        # TODO: Make this part do peer selection.
2831+        self.update_goal()
2832+        self.writers = {}
2833+        if self._version == MDMF_VERSION:
2834+            writer_class = MDMFSlotWriteProxy
2835+        else:
2836+            writer_class = SDMFSlotWriteProxy
2837 
2838hunk ./src/allmydata/mutable/publish.py 462
2839+        # For each (peerid, shnum) in self.goal, we make a
2840+        # write proxy for that peer. We'll use this to write
2841+        # shares to the peer.
2842+        for key in self.goal:
2843+            peerid, shnum = key
2844+            write_enabler = self._node.get_write_enabler(peerid)
2845+            renew_secret = self._node.get_renewal_secret(peerid)
2846+            cancel_secret = self._node.get_cancel_secret(peerid)
2847+            secrets = (write_enabler, renew_secret, cancel_secret)
2848+
2849+            self.writers[shnum] =  writer_class(shnum,
2850+                                                self.connections[peerid],
2851+                                                self._storage_index,
2852+                                                secrets,
2853+                                                self._new_seqnum,
2854+                                                self.required_shares,
2855+                                                self.total_shares,
2856+                                                self.segment_size,
2857+                                                self.datalength)
2858+            self.writers[shnum].peerid = peerid
2859+            if (peerid, shnum) in self._servermap.servermap:
2860+                old_versionid, old_timestamp = self._servermap.servermap[key]
2861+                (old_seqnum, old_root_hash, old_salt, old_segsize,
2862+                 old_datalength, old_k, old_N, old_prefix,
2863+                 old_offsets_tuple) = old_versionid
2864+                self.writers[shnum].set_checkstring(old_seqnum,
2865+                                                    old_root_hash,
2866+                                                    old_salt)
2867+            elif (peerid, shnum) in self.bad_share_checkstrings:
2868+                old_checkstring = self.bad_share_checkstrings[(peerid, shnum)]
2869+                self.writers[shnum].set_checkstring(old_checkstring)
2870+
2871+        # Our remote shares will not have a complete checkstring until
2872+        # after we are done writing share data and have started to write
2873+        # blocks. In the meantime, we need to know what to look for when
2874+        # writing, so that we can detect UncoordinatedWriteErrors.
2875+        self._checkstring = self.writers.values()[0].get_checkstring()
2876+
2877+        # Now, we start pushing shares.
2878         self._status.timings["setup"] = time.time() - self._started
2879hunk ./src/allmydata/mutable/publish.py 502
2880-        d = self._encrypt_and_encode()
2881-        d.addCallback(self._generate_shares)
2882-        def _start_pushing(res):
2883-            self._started_pushing = time.time()
2884-            return res
2885-        d.addCallback(_start_pushing)
2886-        d.addCallback(self.loop) # trigger delivery
2887-        d.addErrback(self._fatal_error)
2888+        # First, we encrypt, encode, and publish the shares that we need
2889+        # to encrypt, encode, and publish.
2890+
2891+        # This will eventually hold the block hash chain for each share
2892+        # that we publish. We define it this way so that empty publishes
2893+        # will still have something to write to the remote slot.
2894+        self.blockhashes = dict([(i, []) for i in xrange(self.total_shares)])
2895+        for i in xrange(self.total_shares):
2896+            blocks = self.blockhashes[i]
2897+            for j in xrange(self.num_segments):
2898+                blocks.append(None)
2899+        self.sharehash_leaves = None # eventually [sharehashes]
2900+        self.sharehashes = {} # shnum -> [sharehash leaves necessary to
2901+                              # validate the share]
2902+
2903+        self.log("Starting push")
2904+
2905+        self._state = PUSHING_BLOCKS_STATE
2906+        self._push()
2907 
2908         return self.done_deferred
2909 
2910hunk ./src/allmydata/mutable/publish.py 524
2911-    def setup_encoding_parameters(self):
2912-        segment_size = len(self.newdata)
2913+
2914+    def _update_status(self):
2915+        self._status.set_status("Sending Shares: %d placed out of %d, "
2916+                                "%d messages outstanding" %
2917+                                (len(self.placed),
2918+                                 len(self.goal),
2919+                                 len(self.outstanding)))
2920+        self._status.set_progress(1.0 * len(self.placed) / len(self.goal))
2921+
2922+
2923+    def setup_encoding_parameters(self, offset=0):
2924+        if self._version == MDMF_VERSION:
2925+            segment_size = DEFAULT_MAX_SEGMENT_SIZE # 128 KiB by default
2926+        else:
2927+            segment_size = self.datalength # SDMF is only one segment
2928         # this must be a multiple of self.required_shares
2929         segment_size = mathutil.next_multiple(segment_size,
2930                                               self.required_shares)
2931hunk ./src/allmydata/mutable/publish.py 543
2932         self.segment_size = segment_size
2933+
2934+        # Calculate the starting segment for the upload.
2935         if segment_size:
2936hunk ./src/allmydata/mutable/publish.py 546
2937-            self.num_segments = mathutil.div_ceil(len(self.newdata),
2938+            self.num_segments = mathutil.div_ceil(self.datalength,
2939                                                   segment_size)
2940hunk ./src/allmydata/mutable/publish.py 548
2941+            self.starting_segment = mathutil.div_ceil(offset,
2942+                                                      segment_size)
2943+            self.starting_segment -= 1
2944+            if offset == 0:
2945+                self.starting_segment = 0
2946+
2947         else:
2948             self.num_segments = 0
2949hunk ./src/allmydata/mutable/publish.py 556
2950-        assert self.num_segments in [0, 1,] # SDMF restrictions
2951+            self.starting_segment = 0
2952+
2953+
2954+        self.log("building encoding parameters for file")
2955+        self.log("got segsize %d" % self.segment_size)
2956+        self.log("got %d segments" % self.num_segments)
2957+
2958+        if self._version == SDMF_VERSION:
2959+            assert self.num_segments in (0, 1) # SDMF
2960+        # calculate the tail segment size.
2961+
2962+        if segment_size and self.datalength:
2963+            self.tail_segment_size = self.datalength % segment_size
2964+            self.log("got tail segment size %d" % self.tail_segment_size)
2965+        else:
2966+            self.tail_segment_size = 0
2967+
2968+        if self.tail_segment_size == 0 and segment_size:
2969+            # The tail segment is the same size as the other segments.
2970+            self.tail_segment_size = segment_size
2971+
2972+        # Make FEC encoders
2973+        fec = codec.CRSEncoder()
2974+        fec.set_params(self.segment_size,
2975+                       self.required_shares, self.total_shares)
2976+        self.piece_size = fec.get_block_size()
2977+        self.fec = fec
2978+
2979+        if self.tail_segment_size == self.segment_size:
2980+            self.tail_fec = self.fec
2981+        else:
2982+            tail_fec = codec.CRSEncoder()
2983+            tail_fec.set_params(self.tail_segment_size,
2984+                                self.required_shares,
2985+                                self.total_shares)
2986+            self.tail_fec = tail_fec
2987+
2988+        self._current_segment = self.starting_segment
2989+        self.end_segment = self.num_segments - 1
2990+        # Now figure out where the last segment should be.
2991+        if self.data.get_size() != self.datalength:
2992+            end = self.data.get_size()
2993+            self.end_segment = mathutil.div_ceil(end,
2994+                                                 segment_size)
2995+            self.end_segment -= 1
2996+        self.log("got start segment %d" % self.starting_segment)
2997+        self.log("got end segment %d" % self.end_segment)
2998+
2999+
3000+    def _push(self, ignored=None):
3001+        """
3002+        I manage state transitions. In particular, I see that we still
3003+        have a good enough number of writers to complete the upload
3004+        successfully.
3005+        """
3006+        # Can we still successfully publish this file?
3007+        # TODO: Keep track of outstanding queries before aborting the
3008+        #       process.
3009+        if len(self.writers) <= self.required_shares or self.surprised:
3010+            return self._failure()
3011+
3012+        # Figure out what we need to do next. Each of these needs to
3013+        # return a deferred so that we don't block execution when this
3014+        # is first called in the upload method.
3015+        if self._state == PUSHING_BLOCKS_STATE:
3016+            return self.push_segment(self._current_segment)
3017+
3018+        elif self._state == PUSHING_EVERYTHING_ELSE_STATE:
3019+            return self.push_everything_else()
3020+
3021+        # If we make it to this point, we were successful in placing the
3022+        # file.
3023+        return self._done(None)
3024+
3025+
3026+    def push_segment(self, segnum):
3027+        if self.num_segments == 0 and self._version == SDMF_VERSION:
3028+            self._add_dummy_salts()
3029 
3030hunk ./src/allmydata/mutable/publish.py 635
3031-    def _fatal_error(self, f):
3032-        self.log("error during loop", failure=f, level=log.UNUSUAL)
3033-        self._done(f)
3034+        if segnum > self.end_segment:
3035+            # We don't have any more segments to push.
3036+            self._state = PUSHING_EVERYTHING_ELSE_STATE
3037+            return self._push()
3038+
3039+        d = self._encode_segment(segnum)
3040+        d.addCallback(self._push_segment, segnum)
3041+        def _increment_segnum(ign):
3042+            self._current_segment += 1
3043+        # XXX: I don't think we need to do addBoth here -- any errBacks
3044+        # should be handled within push_segment.
3045+        d.addBoth(_increment_segnum)
3046+        d.addBoth(self._turn_barrier)
3047+        d.addBoth(self._push)
3048+
3049+
3050+    def _turn_barrier(self, result):
3051+        """
3052+        I help the publish process avoid the recursion limit issues
3053+        described in #237.
3054+        """
3055+        return fireEventually(result)
3056+
3057+
3058+    def _add_dummy_salts(self):
3059+        """
3060+        SDMF files need a salt even if they're empty, or the signature
3061+        won't make sense. This method adds a dummy salt to each of our
3062+        SDMF writers so that they can write the signature later.
3063+        """
3064+        salt = os.urandom(16)
3065+        assert self._version == SDMF_VERSION
3066+
3067+        for writer in self.writers.itervalues():
3068+            writer.put_salt(salt)
3069+
3070+
3071+    def _encode_segment(self, segnum):
3072+        """
3073+        I encrypt and encode the segment segnum.
3074+        """
3075+        started = time.time()
3076+
3077+        if segnum + 1 == self.num_segments:
3078+            segsize = self.tail_segment_size
3079+        else:
3080+            segsize = self.segment_size
3081+
3082+
3083+        self.log("Pushing segment %d of %d" % (segnum + 1, self.num_segments))
3084+        data = self.data.read(segsize)
3085+        # XXX: This is dumb. Why return a list?
3086+        data = "".join(data)
3087+
3088+        assert len(data) == segsize, len(data)
3089+
3090+        salt = os.urandom(16)
3091+
3092+        key = hashutil.ssk_readkey_data_hash(salt, self.readkey)
3093+        self._status.set_status("Encrypting")
3094+        enc = AES(key)
3095+        crypttext = enc.process(data)
3096+        assert len(crypttext) == len(data)
3097+
3098+        now = time.time()
3099+        self._status.timings["encrypt"] = now - started
3100+        started = now
3101+
3102+        # now apply FEC
3103+        if segnum + 1 == self.num_segments:
3104+            fec = self.tail_fec
3105+        else:
3106+            fec = self.fec
3107+
3108+        self._status.set_status("Encoding")
3109+        crypttext_pieces = [None] * self.required_shares
3110+        piece_size = fec.get_block_size()
3111+        for i in range(len(crypttext_pieces)):
3112+            offset = i * piece_size
3113+            piece = crypttext[offset:offset+piece_size]
3114+            piece = piece + "\x00"*(piece_size - len(piece)) # padding
3115+            crypttext_pieces[i] = piece
3116+            assert len(piece) == piece_size
3117+        d = fec.encode(crypttext_pieces)
3118+        def _done_encoding(res):
3119+            elapsed = time.time() - started
3120+            self._status.timings["encode"] = elapsed
3121+            return (res, salt)
3122+        d.addCallback(_done_encoding)
3123+        return d
3124+
3125+
3126+    def _push_segment(self, encoded_and_salt, segnum):
3127+        """
3128+        I push (data, salt) as segment number segnum.
3129+        """
3130+        results, salt = encoded_and_salt
3131+        shares, shareids = results
3132+        self._status.set_status("Pushing segment")
3133+        for i in xrange(len(shares)):
3134+            sharedata = shares[i]
3135+            shareid = shareids[i]
3136+            if self._version == MDMF_VERSION:
3137+                hashed = salt + sharedata
3138+            else:
3139+                hashed = sharedata
3140+            block_hash = hashutil.block_hash(hashed)
3141+            self.blockhashes[shareid][segnum] = block_hash
3142+            # find the writer for this share
3143+            writer = self.writers[shareid]
3144+            writer.put_block(sharedata, segnum, salt)
3145+
3146+
3147+    def push_everything_else(self):
3148+        """
3149+        I put everything else associated with a share.
3150+        """
3151+        self._pack_started = time.time()
3152+        self.push_encprivkey()
3153+        self.push_blockhashes()
3154+        self.push_sharehashes()
3155+        self.push_toplevel_hashes_and_signature()
3156+        d = self.finish_publishing()
3157+        def _change_state(ignored):
3158+            self._state = DONE_STATE
3159+        d.addCallback(_change_state)
3160+        d.addCallback(self._push)
3161+        return d
3162+
3163+
3164+    def push_encprivkey(self):
3165+        encprivkey = self._encprivkey
3166+        self._status.set_status("Pushing encrypted private key")
3167+        for writer in self.writers.itervalues():
3168+            writer.put_encprivkey(encprivkey)
3169+
3170+
3171+    def push_blockhashes(self):
3172+        self.sharehash_leaves = [None] * len(self.blockhashes)
3173+        self._status.set_status("Building and pushing block hash tree")
3174+        for shnum, blockhashes in self.blockhashes.iteritems():
3175+            t = hashtree.HashTree(blockhashes)
3176+            self.blockhashes[shnum] = list(t)
3177+            # set the leaf for future use.
3178+            self.sharehash_leaves[shnum] = t[0]
3179+
3180+            writer = self.writers[shnum]
3181+            writer.put_blockhashes(self.blockhashes[shnum])
3182+
3183+
3184+    def push_sharehashes(self):
3185+        self._status.set_status("Building and pushing share hash chain")
3186+        share_hash_tree = hashtree.HashTree(self.sharehash_leaves)
3187+        for shnum in xrange(len(self.sharehash_leaves)):
3188+            needed_indices = share_hash_tree.needed_hashes(shnum)
3189+            self.sharehashes[shnum] = dict( [ (i, share_hash_tree[i])
3190+                                             for i in needed_indices] )
3191+            writer = self.writers[shnum]
3192+            writer.put_sharehashes(self.sharehashes[shnum])
3193+        self.root_hash = share_hash_tree[0]
3194+
3195+
3196+    def push_toplevel_hashes_and_signature(self):
3197+        # We need to to three things here:
3198+        #   - Push the root hash and salt hash
3199+        #   - Get the checkstring of the resulting layout; sign that.
3200+        #   - Push the signature
3201+        self._status.set_status("Pushing root hashes and signature")
3202+        for shnum in xrange(self.total_shares):
3203+            writer = self.writers[shnum]
3204+            writer.put_root_hash(self.root_hash)
3205+        self._update_checkstring()
3206+        self._make_and_place_signature()
3207+
3208+
3209+    def _update_checkstring(self):
3210+        """
3211+        After putting the root hash, MDMF files will have the
3212+        checkstring written to the storage server. This means that we
3213+        can update our copy of the checkstring so we can detect
3214+        uncoordinated writes. SDMF files will have the same checkstring,
3215+        so we need not do anything.
3216+        """
3217+        self._checkstring = self.writers.values()[0].get_checkstring()
3218+
3219+
3220+    def _make_and_place_signature(self):
3221+        """
3222+        I create and place the signature.
3223+        """
3224+        started = time.time()
3225+        self._status.set_status("Signing prefix")
3226+        signable = self.writers[0].get_signable()
3227+        self.signature = self._privkey.sign(signable)
3228+
3229+        for (shnum, writer) in self.writers.iteritems():
3230+            writer.put_signature(self.signature)
3231+        self._status.timings['sign'] = time.time() - started
3232+
3233+
3234+    def finish_publishing(self):
3235+        # We're almost done -- we just need to put the verification key
3236+        # and the offsets
3237+        started = time.time()
3238+        self._status.set_status("Pushing shares")
3239+        self._started_pushing = started
3240+        ds = []
3241+        verification_key = self._pubkey.serialize()
3242+
3243+
3244+        # TODO: Bad, since we remove from this same dict. We need to
3245+        # make a copy, or just use a non-iterated value.
3246+        for (shnum, writer) in self.writers.iteritems():
3247+            writer.put_verification_key(verification_key)
3248+            d = writer.finish_publishing()
3249+            # Add the (peerid, shnum) tuple to our list of outstanding
3250+            # queries. This gets used by _loop if some of our queries
3251+            # fail to place shares.
3252+            self.outstanding.add((writer.peerid, writer.shnum))
3253+            d.addCallback(self._got_write_answer, writer, started)
3254+            d.addErrback(self._connection_problem, writer)
3255+            ds.append(d)
3256+        self._record_verinfo()
3257+        self._status.timings['pack'] = time.time() - started
3258+        return defer.DeferredList(ds)
3259+
3260+
3261+    def _record_verinfo(self):
3262+        self.versioninfo = self.writers.values()[0].get_verinfo()
3263+
3264+
3265+    def _connection_problem(self, f, writer):
3266+        """
3267+        We ran into a connection problem while working with writer, and
3268+        need to deal with that.
3269+        """
3270+        self.log("found problem: %s" % str(f))
3271+        self._last_failure = f
3272+        del(self.writers[writer.shnum])
3273 
3274hunk ./src/allmydata/mutable/publish.py 875
3275-    def _update_status(self):
3276-        self._status.set_status("Sending Shares: %d placed out of %d, "
3277-                                "%d messages outstanding" %
3278-                                (len(self.placed),
3279-                                 len(self.goal),
3280-                                 len(self.outstanding)))
3281-        self._status.set_progress(1.0 * len(self.placed) / len(self.goal))
3282 
3283hunk ./src/allmydata/mutable/publish.py 876
3284-    def loop(self, ignored=None):
3285-        self.log("entering loop", level=log.NOISY)
3286-        if not self._running:
3287-            return
3288-
3289-        self.looplimit -= 1
3290-        if self.looplimit <= 0:
3291-            raise LoopLimitExceededError("loop limit exceeded")
3292-
3293-        if self.surprised:
3294-            # don't send out any new shares, just wait for the outstanding
3295-            # ones to be retired.
3296-            self.log("currently surprised, so don't send any new shares",
3297-                     level=log.NOISY)
3298-        else:
3299-            self.update_goal()
3300-            # how far are we from our goal?
3301-            needed = self.goal - self.placed - self.outstanding
3302-            self._update_status()
3303-
3304-            if needed:
3305-                # we need to send out new shares
3306-                self.log(format="need to send %(needed)d new shares",
3307-                         needed=len(needed), level=log.NOISY)
3308-                self._send_shares(needed)
3309-                return
3310-
3311-        if self.outstanding:
3312-            # queries are still pending, keep waiting
3313-            self.log(format="%(outstanding)d queries still outstanding",
3314-                     outstanding=len(self.outstanding),
3315-                     level=log.NOISY)
3316-            return
3317-
3318-        # no queries outstanding, no placements needed: we're done
3319-        self.log("no queries outstanding, no placements needed: done",
3320-                 level=log.OPERATIONAL)
3321-        now = time.time()
3322-        elapsed = now - self._started_pushing
3323-        self._status.timings["push"] = elapsed
3324-        return self._done(None)
3325-
3326     def log_goal(self, goal, message=""):
3327         logmsg = [message]
3328         for (shnum, peerid) in sorted([(s,p) for (p,s) in goal]):
3329hunk ./src/allmydata/mutable/publish.py 957
3330             self.log_goal(self.goal, "after update: ")
3331 
3332 
3333+    def _got_write_answer(self, answer, writer, started):
3334+        if not answer:
3335+            # SDMF writers only pretend to write when readers set their
3336+            # blocks, salts, and so on -- they actually just write once,
3337+            # at the end of the upload process. In fake writes, they
3338+            # return defer.succeed(None). If we see that, we shouldn't
3339+            # bother checking it.
3340+            return
3341 
3342hunk ./src/allmydata/mutable/publish.py 966
3343-    def _encrypt_and_encode(self):
3344-        # this returns a Deferred that fires with a list of (sharedata,
3345-        # sharenum) tuples. TODO: cache the ciphertext, only produce the
3346-        # shares that we care about.
3347-        self.log("_encrypt_and_encode")
3348-
3349-        self._status.set_status("Encrypting")
3350-        started = time.time()
3351-
3352-        key = hashutil.ssk_readkey_data_hash(self.salt, self.readkey)
3353-        enc = AES(key)
3354-        crypttext = enc.process(self.newdata)
3355-        assert len(crypttext) == len(self.newdata)
3356+        peerid = writer.peerid
3357+        lp = self.log("_got_write_answer from %s, share %d" %
3358+                      (idlib.shortnodeid_b2a(peerid), writer.shnum))
3359 
3360         now = time.time()
3361hunk ./src/allmydata/mutable/publish.py 971
3362-        self._status.timings["encrypt"] = now - started
3363-        started = now
3364-
3365-        # now apply FEC
3366-
3367-        self._status.set_status("Encoding")
3368-        fec = codec.CRSEncoder()
3369-        fec.set_params(self.segment_size,
3370-                       self.required_shares, self.total_shares)
3371-        piece_size = fec.get_block_size()
3372-        crypttext_pieces = [None] * self.required_shares
3373-        for i in range(len(crypttext_pieces)):
3374-            offset = i * piece_size
3375-            piece = crypttext[offset:offset+piece_size]
3376-            piece = piece + "\x00"*(piece_size - len(piece)) # padding
3377-            crypttext_pieces[i] = piece
3378-            assert len(piece) == piece_size
3379-
3380-        d = fec.encode(crypttext_pieces)
3381-        def _done_encoding(res):
3382-            elapsed = time.time() - started
3383-            self._status.timings["encode"] = elapsed
3384-            return res
3385-        d.addCallback(_done_encoding)
3386-        return d
3387-
3388-    def _generate_shares(self, shares_and_shareids):
3389-        # this sets self.shares and self.root_hash
3390-        self.log("_generate_shares")
3391-        self._status.set_status("Generating Shares")
3392-        started = time.time()
3393-
3394-        # we should know these by now
3395-        privkey = self._privkey
3396-        encprivkey = self._encprivkey
3397-        pubkey = self._pubkey
3398-
3399-        (shares, share_ids) = shares_and_shareids
3400-
3401-        assert len(shares) == len(share_ids)
3402-        assert len(shares) == self.total_shares
3403-        all_shares = {}
3404-        block_hash_trees = {}
3405-        share_hash_leaves = [None] * len(shares)
3406-        for i in range(len(shares)):
3407-            share_data = shares[i]
3408-            shnum = share_ids[i]
3409-            all_shares[shnum] = share_data
3410-
3411-            # build the block hash tree. SDMF has only one leaf.
3412-            leaves = [hashutil.block_hash(share_data)]
3413-            t = hashtree.HashTree(leaves)
3414-            block_hash_trees[shnum] = list(t)
3415-            share_hash_leaves[shnum] = t[0]
3416-        for leaf in share_hash_leaves:
3417-            assert leaf is not None
3418-        share_hash_tree = hashtree.HashTree(share_hash_leaves)
3419-        share_hash_chain = {}
3420-        for shnum in range(self.total_shares):
3421-            needed_hashes = share_hash_tree.needed_hashes(shnum)
3422-            share_hash_chain[shnum] = dict( [ (i, share_hash_tree[i])
3423-                                              for i in needed_hashes ] )
3424-        root_hash = share_hash_tree[0]
3425-        assert len(root_hash) == 32
3426-        self.log("my new root_hash is %s" % base32.b2a(root_hash))
3427-        self._new_version_info = (self._new_seqnum, root_hash, self.salt)
3428-
3429-        prefix = pack_prefix(self._new_seqnum, root_hash, self.salt,
3430-                             self.required_shares, self.total_shares,
3431-                             self.segment_size, len(self.newdata))
3432-
3433-        # now pack the beginning of the share. All shares are the same up
3434-        # to the signature, then they have divergent share hash chains,
3435-        # then completely different block hash trees + salt + share data,
3436-        # then they all share the same encprivkey at the end. The sizes
3437-        # of everything are the same for all shares.
3438-
3439-        sign_started = time.time()
3440-        signature = privkey.sign(prefix)
3441-        self._status.timings["sign"] = time.time() - sign_started
3442-
3443-        verification_key = pubkey.serialize()
3444-
3445-        final_shares = {}
3446-        for shnum in range(self.total_shares):
3447-            final_share = pack_share(prefix,
3448-                                     verification_key,
3449-                                     signature,
3450-                                     share_hash_chain[shnum],
3451-                                     block_hash_trees[shnum],
3452-                                     all_shares[shnum],
3453-                                     encprivkey)
3454-            final_shares[shnum] = final_share
3455-        elapsed = time.time() - started
3456-        self._status.timings["pack"] = elapsed
3457-        self.shares = final_shares
3458-        self.root_hash = root_hash
3459-
3460-        # we also need to build up the version identifier for what we're
3461-        # pushing. Extract the offsets from one of our shares.
3462-        assert final_shares
3463-        offsets = unpack_header(final_shares.values()[0])[-1]
3464-        offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
3465-        verinfo = (self._new_seqnum, root_hash, self.salt,
3466-                   self.segment_size, len(self.newdata),
3467-                   self.required_shares, self.total_shares,
3468-                   prefix, offsets_tuple)
3469-        self.versioninfo = verinfo
3470-
3471-
3472-
3473-    def _send_shares(self, needed):
3474-        self.log("_send_shares")
3475-
3476-        # we're finally ready to send out our shares. If we encounter any
3477-        # surprises here, it's because somebody else is writing at the same
3478-        # time. (Note: in the future, when we remove the _query_peers() step
3479-        # and instead speculate about [or remember] which shares are where,
3480-        # surprises here are *not* indications of UncoordinatedWriteError,
3481-        # and we'll need to respond to them more gracefully.)
3482-
3483-        # needed is a set of (peerid, shnum) tuples. The first thing we do is
3484-        # organize it by peerid.
3485-
3486-        peermap = DictOfSets()
3487-        for (peerid, shnum) in needed:
3488-            peermap.add(peerid, shnum)
3489-
3490-        # the next thing is to build up a bunch of test vectors. The
3491-        # semantics of Publish are that we perform the operation if the world
3492-        # hasn't changed since the ServerMap was constructed (more or less).
3493-        # For every share we're trying to place, we create a test vector that
3494-        # tests to see if the server*share still corresponds to the
3495-        # map.
3496-
3497-        all_tw_vectors = {} # maps peerid to tw_vectors
3498-        sm = self._servermap.servermap
3499-
3500-        for key in needed:
3501-            (peerid, shnum) = key
3502-
3503-            if key in sm:
3504-                # an old version of that share already exists on the
3505-                # server, according to our servermap. We will create a
3506-                # request that attempts to replace it.
3507-                old_versionid, old_timestamp = sm[key]
3508-                (old_seqnum, old_root_hash, old_salt, old_segsize,
3509-                 old_datalength, old_k, old_N, old_prefix,
3510-                 old_offsets_tuple) = old_versionid
3511-                old_checkstring = pack_checkstring(old_seqnum,
3512-                                                   old_root_hash,
3513-                                                   old_salt)
3514-                testv = (0, len(old_checkstring), "eq", old_checkstring)
3515-
3516-            elif key in self.bad_share_checkstrings:
3517-                old_checkstring = self.bad_share_checkstrings[key]
3518-                testv = (0, len(old_checkstring), "eq", old_checkstring)
3519-
3520-            else:
3521-                # add a testv that requires the share not exist
3522-
3523-                # Unfortunately, foolscap-0.2.5 has a bug in the way inbound
3524-                # constraints are handled. If the same object is referenced
3525-                # multiple times inside the arguments, foolscap emits a
3526-                # 'reference' token instead of a distinct copy of the
3527-                # argument. The bug is that these 'reference' tokens are not
3528-                # accepted by the inbound constraint code. To work around
3529-                # this, we need to prevent python from interning the
3530-                # (constant) tuple, by creating a new copy of this vector
3531-                # each time.
3532-
3533-                # This bug is fixed in foolscap-0.2.6, and even though this
3534-                # version of Tahoe requires foolscap-0.3.1 or newer, we are
3535-                # supposed to be able to interoperate with older versions of
3536-                # Tahoe which are allowed to use older versions of foolscap,
3537-                # including foolscap-0.2.5 . In addition, I've seen other
3538-                # foolscap problems triggered by 'reference' tokens (see #541
3539-                # for details). So we must keep this workaround in place.
3540-
3541-                #testv = (0, 1, 'eq', "")
3542-                testv = tuple([0, 1, 'eq', ""])
3543-
3544-            testvs = [testv]
3545-            # the write vector is simply the share
3546-            writev = [(0, self.shares[shnum])]
3547-
3548-            if peerid not in all_tw_vectors:
3549-                all_tw_vectors[peerid] = {}
3550-                # maps shnum to (testvs, writevs, new_length)
3551-            assert shnum not in all_tw_vectors[peerid]
3552-
3553-            all_tw_vectors[peerid][shnum] = (testvs, writev, None)
3554-
3555-        # we read the checkstring back from each share, however we only use
3556-        # it to detect whether there was a new share that we didn't know
3557-        # about. The success or failure of the write will tell us whether
3558-        # there was a collision or not. If there is a collision, the first
3559-        # thing we'll do is update the servermap, which will find out what
3560-        # happened. We could conceivably reduce a roundtrip by using the
3561-        # readv checkstring to populate the servermap, but really we'd have
3562-        # to read enough data to validate the signatures too, so it wouldn't
3563-        # be an overall win.
3564-        read_vector = [(0, struct.calcsize(SIGNED_PREFIX))]
3565-
3566-        # ok, send the messages!
3567-        self.log("sending %d shares" % len(all_tw_vectors), level=log.NOISY)
3568-        started = time.time()
3569-        for (peerid, tw_vectors) in all_tw_vectors.items():
3570-
3571-            write_enabler = self._node.get_write_enabler(peerid)
3572-            renew_secret = self._node.get_renewal_secret(peerid)
3573-            cancel_secret = self._node.get_cancel_secret(peerid)
3574-            secrets = (write_enabler, renew_secret, cancel_secret)
3575-            shnums = tw_vectors.keys()
3576-
3577-            for shnum in shnums:
3578-                self.outstanding.add( (peerid, shnum) )
3579+        elapsed = now - started
3580 
3581hunk ./src/allmydata/mutable/publish.py 973
3582-            d = self._do_testreadwrite(peerid, secrets,
3583-                                       tw_vectors, read_vector)
3584-            d.addCallbacks(self._got_write_answer, self._got_write_error,
3585-                           callbackArgs=(peerid, shnums, started),
3586-                           errbackArgs=(peerid, shnums, started))
3587-            # tolerate immediate errback, like with DeadReferenceError
3588-            d.addBoth(fireEventually)
3589-            d.addCallback(self.loop)
3590-            d.addErrback(self._fatal_error)
3591+        self._status.add_per_server_time(peerid, elapsed)
3592 
3593hunk ./src/allmydata/mutable/publish.py 975
3594-        self._update_status()
3595-        self.log("%d shares sent" % len(all_tw_vectors), level=log.NOISY)
3596+        wrote, read_data = answer
3597 
3598hunk ./src/allmydata/mutable/publish.py 977
3599-    def _do_testreadwrite(self, peerid, secrets,
3600-                          tw_vectors, read_vector):
3601-        storage_index = self._storage_index
3602-        ss = self.connections[peerid]
3603+        surprise_shares = set(read_data.keys()) - set([writer.shnum])
3604 
3605hunk ./src/allmydata/mutable/publish.py 979
3606-        #print "SS[%s] is %s" % (idlib.shortnodeid_b2a(peerid), ss), ss.tracker.interfaceName
3607-        d = ss.callRemote("slot_testv_and_readv_and_writev",
3608-                          storage_index,
3609-                          secrets,
3610-                          tw_vectors,
3611-                          read_vector)
3612-        return d
3613+        # We need to remove from surprise_shares any shares that we are
3614+        # knowingly also writing to that peer from other writers.
3615 
3616hunk ./src/allmydata/mutable/publish.py 982
3617-    def _got_write_answer(self, answer, peerid, shnums, started):
3618-        lp = self.log("_got_write_answer from %s" %
3619-                      idlib.shortnodeid_b2a(peerid))
3620-        for shnum in shnums:
3621-            self.outstanding.discard( (peerid, shnum) )
3622+        # TODO: Precompute this.
3623+        known_shnums = [x.shnum for x in self.writers.values()
3624+                        if x.peerid == peerid]
3625+        surprise_shares -= set(known_shnums)
3626+        self.log("found the following surprise shares: %s" %
3627+                 str(surprise_shares))
3628 
3629hunk ./src/allmydata/mutable/publish.py 989
3630-        now = time.time()
3631-        elapsed = now - started
3632-        self._status.add_per_server_time(peerid, elapsed)
3633-
3634-        wrote, read_data = answer
3635-
3636-        surprise_shares = set(read_data.keys()) - set(shnums)
3637+        # Now surprise shares contains all of the shares that we did not
3638+        # expect to be there.
3639 
3640         surprised = False
3641         for shnum in surprise_shares:
3642hunk ./src/allmydata/mutable/publish.py 996
3643             # read_data is a dict mapping shnum to checkstring (SIGNED_PREFIX)
3644             checkstring = read_data[shnum][0]
3645-            their_version_info = unpack_checkstring(checkstring)
3646-            if their_version_info == self._new_version_info:
3647+            # What we want to do here is to see if their (seqnum,
3648+            # roothash, salt) is the same as our (seqnum, roothash,
3649+            # salt), or the equivalent for MDMF. The best way to do this
3650+            # is to store a packed representation of our checkstring
3651+            # somewhere, then not bother unpacking the other
3652+            # checkstring.
3653+            if checkstring == self._checkstring:
3654                 # they have the right share, somehow
3655 
3656                 if (peerid,shnum) in self.goal:
3657hunk ./src/allmydata/mutable/publish.py 1081
3658             self.log("our testv failed, so the write did not happen",
3659                      parent=lp, level=log.WEIRD, umid="8sc26g")
3660             self.surprised = True
3661-            self.bad_peers.add(peerid) # don't ask them again
3662+            self.bad_peers.add(writer) # don't ask them again
3663             # use the checkstring to add information to the log message
3664             for (shnum,readv) in read_data.items():
3665                 checkstring = readv[0]
3666hunk ./src/allmydata/mutable/publish.py 1103
3667                 # if expected_version==None, then we didn't expect to see a
3668                 # share on that peer, and the 'surprise_shares' clause above
3669                 # will have logged it.
3670-            # self.loop() will take care of finding new homes
3671             return
3672 
3673hunk ./src/allmydata/mutable/publish.py 1105
3674-        for shnum in shnums:
3675-            self.placed.add( (peerid, shnum) )
3676-            # and update the servermap
3677-            self._servermap.add_new_share(peerid, shnum,
3678+        # and update the servermap
3679+        # self.versioninfo is set during the last phase of publishing.
3680+        # If we get there, we know that responses correspond to placed
3681+        # shares, and can safely execute these statements.
3682+        if self.versioninfo:
3683+            self.log("wrote successfully: adding new share to servermap")
3684+            self._servermap.add_new_share(peerid, writer.shnum,
3685                                           self.versioninfo, started)
3686hunk ./src/allmydata/mutable/publish.py 1113
3687-
3688-        # self.loop() will take care of checking to see if we're done
3689+            self.placed.add( (peerid, writer.shnum) )
3690+        self._update_status()
3691+        # the next method in the deferred chain will check to see if
3692+        # we're done and successful.
3693         return
3694 
3695hunk ./src/allmydata/mutable/publish.py 1119
3696-    def _got_write_error(self, f, peerid, shnums, started):
3697-        for shnum in shnums:
3698-            self.outstanding.discard( (peerid, shnum) )
3699-        self.bad_peers.add(peerid)
3700-        if self._first_write_error is None:
3701-            self._first_write_error = f
3702-        self.log(format="error while writing shares %(shnums)s to peerid %(peerid)s",
3703-                 shnums=list(shnums), peerid=idlib.shortnodeid_b2a(peerid),
3704-                 failure=f,
3705-                 level=log.UNUSUAL)
3706-        # self.loop() will take care of checking to see if we're done
3707-        return
3708-
3709 
3710     def _done(self, res):
3711         if not self._running:
3712hunk ./src/allmydata/mutable/publish.py 1126
3713         self._running = False
3714         now = time.time()
3715         self._status.timings["total"] = now - self._started
3716+
3717+        elapsed = now - self._started_pushing
3718+        self._status.timings['push'] = elapsed
3719+
3720         self._status.set_active(False)
3721hunk ./src/allmydata/mutable/publish.py 1131
3722-        if isinstance(res, failure.Failure):
3723-            self.log("Publish done, with failure", failure=res,
3724-                     level=log.WEIRD, umid="nRsR9Q")
3725-            self._status.set_status("Failed")
3726-        elif self.surprised:
3727-            self.log("Publish done, UncoordinatedWriteError", level=log.UNUSUAL)
3728-            self._status.set_status("UncoordinatedWriteError")
3729-            # deliver a failure
3730-            res = failure.Failure(UncoordinatedWriteError())
3731-            # TODO: recovery
3732-        else:
3733-            self.log("Publish done, success")
3734-            self._status.set_status("Finished")
3735-            self._status.set_progress(1.0)
3736+        self.log("Publish done, success")
3737+        self._status.set_status("Finished")
3738+        self._status.set_progress(1.0)
3739         eventually(self.done_deferred.callback, res)
3740 
3741hunk ./src/allmydata/mutable/publish.py 1136
3742+    def _failure(self):
3743+
3744+        if not self.surprised:
3745+            # We ran out of servers
3746+            self.log("Publish ran out of good servers, "
3747+                     "last failure was: %s" % str(self._last_failure))
3748+            e = NotEnoughServersError("Ran out of non-bad servers, "
3749+                                      "last failure was %s" %
3750+                                      str(self._last_failure))
3751+        else:
3752+            # We ran into shares that we didn't recognize, which means
3753+            # that we need to return an UncoordinatedWriteError.
3754+            self.log("Publish failed with UncoordinatedWriteError")
3755+            e = UncoordinatedWriteError()
3756+        f = failure.Failure(e)
3757+        eventually(self.done_deferred.callback, f)
3758+
3759+
3760+class MutableFileHandle:
3761+    """
3762+    I am a mutable uploadable built around a filehandle-like object,
3763+    usually either a StringIO instance or a handle to an actual file.
3764+    """
3765+    implements(IMutableUploadable)
3766+
3767+    def __init__(self, filehandle):
3768+        # The filehandle is defined as a generally file-like object that
3769+        # has these two methods. We don't care beyond that.
3770+        assert hasattr(filehandle, "read")
3771+        assert hasattr(filehandle, "close")
3772+
3773+        self._filehandle = filehandle
3774+        # We must start reading at the beginning of the file, or we risk
3775+        # encountering errors when the data read does not match the size
3776+        # reported to the uploader.
3777+        self._filehandle.seek(0)
3778+
3779+        # We have not yet read anything, so our position is 0.
3780+        self._marker = 0
3781+
3782+
3783+    def get_size(self):
3784+        """
3785+        I return the amount of data in my filehandle.
3786+        """
3787+        if not hasattr(self, "_size"):
3788+            old_position = self._filehandle.tell()
3789+            # Seek to the end of the file by seeking 0 bytes from the
3790+            # file's end
3791+            self._filehandle.seek(0, 2) # 2 == os.SEEK_END in 2.5+
3792+            self._size = self._filehandle.tell()
3793+            # Restore the previous position, in case this was called
3794+            # after a read.
3795+            self._filehandle.seek(old_position)
3796+            assert self._filehandle.tell() == old_position
3797+
3798+        assert hasattr(self, "_size")
3799+        return self._size
3800+
3801+
3802+    def pos(self):
3803+        """
3804+        I return the position of my read marker -- i.e., how much data I
3805+        have already read and returned to callers.
3806+        """
3807+        return self._marker
3808+
3809+
3810+    def read(self, length):
3811+        """
3812+        I return some data (up to length bytes) from my filehandle.
3813+
3814+        In most cases, I return length bytes, but sometimes I won't --
3815+        for example, if I am asked to read beyond the end of a file, or
3816+        an error occurs.
3817+        """
3818+        results = self._filehandle.read(length)
3819+        self._marker += len(results)
3820+        return [results]
3821+
3822+
3823+    def close(self):
3824+        """
3825+        I close the underlying filehandle. Any further operations on the
3826+        filehandle fail at this point.
3827+        """
3828+        self._filehandle.close()
3829+
3830+
3831+class MutableData(MutableFileHandle):
3832+    """
3833+    I am a mutable uploadable built around a string, which I then cast
3834+    into a StringIO and treat as a filehandle.
3835+    """
3836+
3837+    def __init__(self, s):
3838+        # Take a string and return a file-like uploadable.
3839+        assert isinstance(s, str)
3840+
3841+        MutableFileHandle.__init__(self, StringIO(s))
3842+
3843+
3844+class TransformingUploadable:
3845+    """
3846+    I am an IMutableUploadable that wraps another IMutableUploadable,
3847+    and some segments that are already on the grid. When I am called to
3848+    read, I handle merging of boundary segments.
3849+    """
3850+    implements(IMutableUploadable)
3851+
3852+
3853+    def __init__(self, data, offset, segment_size, start, end):
3854+        assert IMutableUploadable.providedBy(data)
3855+
3856+        self._newdata = data
3857+        self._offset = offset
3858+        self._segment_size = segment_size
3859+        self._start = start
3860+        self._end = end
3861+
3862+        self._read_marker = 0
3863+
3864+        self._first_segment_offset = offset % segment_size
3865+
3866+        num = self.log("TransformingUploadable: starting", parent=None)
3867+        self._log_number = num
3868+        self.log("got fso: %d" % self._first_segment_offset)
3869+        self.log("got offset: %d" % self._offset)
3870+
3871+
3872+    def log(self, *args, **kwargs):
3873+        if 'parent' not in kwargs:
3874+            kwargs['parent'] = self._log_number
3875+        if "facility" not in kwargs:
3876+            kwargs["facility"] = "tahoe.mutable.transforminguploadable"
3877+        return log.msg(*args, **kwargs)
3878+
3879+
3880+    def get_size(self):
3881+        return self._offset + self._newdata.get_size()
3882+
3883+
3884+    def read(self, length):
3885+        # We can get data from 3 sources here.
3886+        #   1. The first of the segments provided to us.
3887+        #   2. The data that we're replacing things with.
3888+        #   3. The last of the segments provided to us.
3889+
3890+        # are we in state 0?
3891+        self.log("reading %d bytes" % length)
3892+
3893+        old_start_data = ""
3894+        old_data_length = self._first_segment_offset - self._read_marker
3895+        if old_data_length > 0:
3896+            if old_data_length > length:
3897+                old_data_length = length
3898+            self.log("returning %d bytes of old start data" % old_data_length)
3899+
3900+            old_data_end = old_data_length + self._read_marker
3901+            old_start_data = self._start[self._read_marker:old_data_end]
3902+            length -= old_data_length
3903+        else:
3904+            # otherwise calculations later get screwed up.
3905+            old_data_length = 0
3906+
3907+        # Is there enough new data to satisfy this read? If not, we need
3908+        # to pad the end of the data with data from our last segment.
3909+        old_end_length = length - \
3910+            (self._newdata.get_size() - self._newdata.pos())
3911+        old_end_data = ""
3912+        if old_end_length > 0:
3913+            self.log("reading %d bytes of old end data" % old_end_length)
3914+
3915+            # TODO: We're not explicitly checking for tail segment size
3916+            # here. Is that a problem?
3917+            old_data_offset = (length - old_end_length + \
3918+                               old_data_length) % self._segment_size
3919+            self.log("reading at offset %d" % old_data_offset)
3920+            old_end = old_data_offset + old_end_length
3921+            old_end_data = self._end[old_data_offset:old_end]
3922+            length -= old_end_length
3923+            assert length == self._newdata.get_size() - self._newdata.pos()
3924+
3925+        self.log("reading %d bytes of new data" % length)
3926+        new_data = self._newdata.read(length)
3927+        new_data = "".join(new_data)
3928+
3929+        self._read_marker += len(old_start_data + new_data + old_end_data)
3930+
3931+        return old_start_data + new_data + old_end_data
3932 
3933hunk ./src/allmydata/mutable/publish.py 1327
3934+    def close(self):
3935+        pass
3936}
3937[nodemaker.py: Make nodemaker expose a way to create MDMF files
3938Kevan Carstensen <kevan@isnotajoke.com>**20100819003509
3939 Ignore-this: a6701746d6b992fc07bc0556a2b4a61d
3940] {
3941hunk ./src/allmydata/nodemaker.py 3
3942 import weakref
3943 from zope.interface import implements
3944-from allmydata.interfaces import INodeMaker
3945+from allmydata.util.assertutil import precondition
3946+from allmydata.interfaces import INodeMaker, SDMF_VERSION
3947 from allmydata.immutable.literal import LiteralFileNode
3948 from allmydata.immutable.filenode import ImmutableFileNode, CiphertextFileNode
3949 from allmydata.immutable.upload import Data
3950hunk ./src/allmydata/nodemaker.py 9
3951 from allmydata.mutable.filenode import MutableFileNode
3952+from allmydata.mutable.publish import MutableData
3953 from allmydata.dirnode import DirectoryNode, pack_children
3954 from allmydata.unknown import UnknownNode
3955 from allmydata import uri
3956hunk ./src/allmydata/nodemaker.py 92
3957             return self._create_dirnode(filenode)
3958         return None
3959 
3960-    def create_mutable_file(self, contents=None, keysize=None):
3961+    def create_mutable_file(self, contents=None, keysize=None,
3962+                            version=SDMF_VERSION):
3963         n = MutableFileNode(self.storage_broker, self.secret_holder,
3964                             self.default_encoding_parameters, self.history)
3965hunk ./src/allmydata/nodemaker.py 96
3966+        n.set_version(version)
3967         d = self.key_generator.generate(keysize)
3968         d.addCallback(n.create_with_keys, contents)
3969         d.addCallback(lambda res: n)
3970hunk ./src/allmydata/nodemaker.py 103
3971         return d
3972 
3973     def create_new_mutable_directory(self, initial_children={}):
3974+        # mutable directories will always be SDMF for now, to help
3975+        # compatibility with older clients.
3976+        version = SDMF_VERSION
3977+        # initial_children must have metadata (i.e. {} instead of None)
3978+        for (name, (node, metadata)) in initial_children.iteritems():
3979+            precondition(isinstance(metadata, dict),
3980+                         "create_new_mutable_directory requires metadata to be a dict, not None", metadata)
3981+            node.raise_error()
3982         d = self.create_mutable_file(lambda n:
3983hunk ./src/allmydata/nodemaker.py 112
3984-                                     pack_children(initial_children, n.get_writekey()))
3985+                                     MutableData(pack_children(initial_children,
3986+                                                    n.get_writekey())),
3987+                                     version=version)
3988         d.addCallback(self._create_dirnode)
3989         return d
3990 
3991}
3992[docs: update docs to mention MDMF
3993Kevan Carstensen <kevan@isnotajoke.com>**20100814225644
3994 Ignore-this: 1c3caa3cd44831007dcfbef297814308
3995] {
3996merger 0.0 (
3997hunk ./docs/configuration.rst 324
3998+Frontend Configuration
3999+======================
4000+
4001+The Tahoe client process can run a variety of frontend file-access protocols.
4002+You will use these to create and retrieve files from the virtual filesystem.
4003+Configuration details for each are documented in the following
4004+protocol-specific guides:
4005+
4006+HTTP
4007+
4008+    Tahoe runs a webserver by default on port 3456. This interface provides a
4009+    human-oriented "WUI", with pages to create, modify, and browse
4010+    directories and files, as well as a number of pages to check on the
4011+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
4012+    with a REST-ful HTTP interface that can be used by other programs
4013+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
4014+    details, and the ``web.port`` and ``web.static`` config variables above.
4015+    The `<frontends/download-status.rst>`_ document also describes a few WUI
4016+    status pages.
4017+
4018+CLI
4019+
4020+    The main "bin/tahoe" executable includes subcommands for manipulating the
4021+    filesystem, uploading/downloading files, and creating/running Tahoe
4022+    nodes. See `<frontends/CLI.rst>`_ for details.
4023+
4024+FTP, SFTP
4025+
4026+    Tahoe can also run both FTP and SFTP servers, and map a username/password
4027+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
4028+    for instructions on configuring these services, and the ``[ftpd]`` and
4029+    ``[sftpd]`` sections of ``tahoe.cfg``.
4030+
4031merger 0.0 (
4032replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
4033merger 0.0 (
4034hunk ./docs/configuration.rst 384
4035-shares.needed = (int, optional) aka "k", default 3
4036-shares.total = (int, optional) aka "N", N >= k, default 10
4037-shares.happy = (int, optional) 1 <= happy <= N, default 7
4038-
4039- These three values set the default encoding parameters. Each time a new file
4040- is uploaded, erasure-coding is used to break the ciphertext into separate
4041- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
4042- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
4043- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
4044- Setting k to 1 is equivalent to simple replication (uploading N copies of
4045- the file).
4046-
4047- These values control the tradeoff between storage overhead, performance, and
4048- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
4049- backend storage space (the actual value will be a bit more, because of other
4050- forms of overhead). Up to N-k shares can be lost before the file becomes
4051- unrecoverable, so assuming there are at least N servers, up to N-k servers
4052- can be offline without losing the file. So large N/k ratios are more
4053- reliable, and small N/k ratios use less disk space. Clearly, k must never be
4054- smaller than N.
4055-
4056- Large values of N will slow down upload operations slightly, since more
4057- servers must be involved, and will slightly increase storage overhead due to
4058- the hash trees that are created. Large values of k will cause downloads to
4059- be marginally slower, because more servers must be involved. N cannot be
4060- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
4061- uses.
4062-
4063- shares.happy allows you control over the distribution of your immutable file.
4064- For a successful upload, shares are guaranteed to be initially placed on
4065- at least 'shares.happy' distinct servers, the correct functioning of any
4066- k of which is sufficient to guarantee the availability of the uploaded file.
4067- This value should not be larger than the number of servers on your grid.
4068-
4069- A value of shares.happy <= k is allowed, but does not provide any redundancy
4070- if some servers fail or lose shares.
4071-
4072- (Mutable files use a different share placement algorithm that does not
4073-  consider this parameter.)
4074-
4075-
4076-== Storage Server Configuration ==
4077-
4078-[storage]
4079-enabled = (boolean, optional)
4080-
4081- If this is True, the node will run a storage server, offering space to other
4082- clients. If it is False, the node will not run a storage server, meaning
4083- that no shares will be stored on this node. Use False this for clients who
4084- do not wish to provide storage service. The default value is True.
4085-
4086-readonly = (boolean, optional)
4087-
4088- If True, the node will run a storage server but will not accept any shares,
4089- making it effectively read-only. Use this for storage servers which are
4090- being decommissioned: the storage/ directory could be mounted read-only,
4091- while shares are moved to other servers. Note that this currently only
4092- affects immutable shares. Mutable shares (used for directories) will be
4093- written and modified anyway. See ticket #390 for the current status of this
4094- bug. The default value is False.
4095-
4096-reserved_space = (str, optional)
4097-
4098- If provided, this value defines how much disk space is reserved: the storage
4099- server will not accept any share which causes the amount of free disk space
4100- to drop below this value. (The free space is measured by a call to statvfs(2)
4101- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
4102- user account under which the storage server runs.)
4103-
4104- This string contains a number, with an optional case-insensitive scale
4105- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
4106- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
4107- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
4108-
4109-expire.enabled =
4110-expire.mode =
4111-expire.override_lease_duration =
4112-expire.cutoff_date =
4113-expire.immutable =
4114-expire.mutable =
4115-
4116- These settings control garbage-collection, in which the server will delete
4117- shares that no longer have an up-to-date lease on them. Please see the
4118- neighboring "garbage-collection.txt" document for full details.
4119-
4120-
4121-== Running A Helper ==
4122+Running A Helper
4123+================
4124hunk ./docs/configuration.rst 424
4125+mutable.format = sdmf or mdmf
4126+
4127+ This value tells Tahoe-LAFS what the default mutable file format should
4128+ be. If mutable.format=sdmf, then newly created mutable files will be in
4129+ the old SDMF format. This is desirable for clients that operate on
4130+ grids where some peers run older versions of Tahoe-LAFS, as these older
4131+ versions cannot read the new MDMF mutable file format. If
4132+ mutable.format = mdmf, then newly created mutable files will use the
4133+ new MDMF format, which supports efficient in-place modification and
4134+ streaming downloads. You can overwrite this value using a special
4135+ mutable-type parameter in the webapi. If you do not specify a value
4136+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
4137+
4138+ Note that this parameter only applies to mutable files. Mutable
4139+ directories, which are stored as mutable files, are not controlled by
4140+ this parameter and will always use SDMF. We may revisit this decision
4141+ in future versions of Tahoe-LAFS.
4142)
4143)
4144)
4145hunk ./docs/frontends/webapi.rst 363
4146  writeable mutable file, that file's contents will be overwritten in-place. If
4147  it is a read-cap for a mutable file, an error will occur. If it is an
4148  immutable file, the old file will be discarded, and a new one will be put in
4149- its place.
4150+ its place. If the target file is a writable mutable file, you may also
4151+ specify an "offset" parameter -- a byte offset that determines where in
4152+ the mutable file the data from the HTTP request body is placed. This
4153+ operation is relatively efficient for MDMF mutable files, and is
4154+ relatively inefficient (but still supported) for SDMF mutable files.
4155 
4156  When creating a new file, if "mutable=true" is in the query arguments, the
4157  operation will create a mutable file instead of an immutable one.
4158hunk ./docs/frontends/webapi.rst 388
4159 
4160  If "mutable=true" is in the query arguments, the operation will create a
4161  mutable file, and return its write-cap in the HTTP respose. The default is
4162- to create an immutable file, returning the read-cap as a response.
4163+ to create an immutable file, returning the read-cap as a response. If
4164+ you create a mutable file, you can also use the "mutable-type" query
4165+ parameter. If "mutable-type=sdmf", then the mutable file will be created
4166+ in the old SDMF mutable file format. This is desirable for files that
4167+ need to be read by old clients. If "mutable-type=mdmf", then the file
4168+ will be created in the new MDMF mutable file format. MDMF mutable files
4169+ can be downloaded more efficiently, and modified in-place efficiently,
4170+ but are not compatible with older versions of Tahoe-LAFS. If no
4171+ "mutable-type" argument is given, the file is created in whatever
4172+ format was configured in tahoe.cfg.
4173 
4174 Creating A New Directory
4175 ------------------------
4176hunk ./docs/frontends/webapi.rst 1082
4177  If a "mutable=true" argument is provided, the operation will create a
4178  mutable file, and the response body will contain the write-cap instead of
4179  the upload results page. The default is to create an immutable file,
4180- returning the upload results page as a response.
4181+ returning the upload results page as a response. If you create a
4182+ mutable file, you may choose to specify the format of that mutable file
4183+ with the "mutable-type" parameter. If "mutable-type=mdmf", then the
4184+ file will be created as an MDMF mutable file. If "mutable-type=sdmf",
4185+ then the file will be created as an SDMF mutable file. If no value is
4186+ specified, the file will be created in whatever format is specified in
4187+ tahoe.cfg.
4188 
4189 
4190 ``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload``
4191}
4192[mutable/layout.py and interfaces.py: add MDMF writer and reader
4193Kevan Carstensen <kevan@isnotajoke.com>**20100819003304
4194 Ignore-this: 44400fec923987b62830da2ed5075fb4
4195 
4196 The MDMF writer is responsible for keeping state as plaintext is
4197 gradually processed into share data by the upload process. When the
4198 upload finishes, it will write all of its share data to a remote server,
4199 reporting its status back to the publisher.
4200 
4201 The MDMF reader is responsible for abstracting an MDMF file as it sits
4202 on the grid from the downloader; specifically, by receiving and
4203 responding to requests for arbitrary data within the MDMF file.
4204 
4205 The interfaces.py file has also been modified to contain an interface
4206 for the writer.
4207] {
4208hunk ./src/allmydata/interfaces.py 7
4209      ChoiceOf, IntegerConstraint, Any, RemoteInterface, Referenceable
4210 
4211 HASH_SIZE=32
4212+SALT_SIZE=16
4213+
4214+SDMF_VERSION=0
4215+MDMF_VERSION=1
4216 
4217 Hash = StringConstraint(maxLength=HASH_SIZE,
4218                         minLength=HASH_SIZE)# binary format 32-byte SHA256 hash
4219hunk ./src/allmydata/interfaces.py 424
4220         """
4221 
4222 
4223+class IMutableSlotWriter(Interface):
4224+    """
4225+    The interface for a writer around a mutable slot on a remote server.
4226+    """
4227+    def set_checkstring(checkstring, *args):
4228+        """
4229+        Set the checkstring that I will pass to the remote server when
4230+        writing.
4231+
4232+            @param checkstring A packed checkstring to use.
4233+
4234+        Note that implementations can differ in which semantics they
4235+        wish to support for set_checkstring -- they can, for example,
4236+        build the checkstring themselves from its constituents, or
4237+        some other thing.
4238+        """
4239+
4240+    def get_checkstring():
4241+        """
4242+        Get the checkstring that I think currently exists on the remote
4243+        server.
4244+        """
4245+
4246+    def put_block(data, segnum, salt):
4247+        """
4248+        Add a block and salt to the share.
4249+        """
4250+
4251+    def put_encprivey(encprivkey):
4252+        """
4253+        Add the encrypted private key to the share.
4254+        """
4255+
4256+    def put_blockhashes(blockhashes=list):
4257+        """
4258+        Add the block hash tree to the share.
4259+        """
4260+
4261+    def put_sharehashes(sharehashes=dict):
4262+        """
4263+        Add the share hash chain to the share.
4264+        """
4265+
4266+    def get_signable():
4267+        """
4268+        Return the part of the share that needs to be signed.
4269+        """
4270+
4271+    def put_signature(signature):
4272+        """
4273+        Add the signature to the share.
4274+        """
4275+
4276+    def put_verification_key(verification_key):
4277+        """
4278+        Add the verification key to the share.
4279+        """
4280+
4281+    def finish_publishing():
4282+        """
4283+        Do anything necessary to finish writing the share to a remote
4284+        server. I require that no further publishing needs to take place
4285+        after this method has been called.
4286+        """
4287+
4288+
4289 class IURI(Interface):
4290     def init_from_string(uri):
4291         """Accept a string (as created by my to_string() method) and populate
4292hunk ./src/allmydata/mutable/layout.py 4
4293 
4294 import struct
4295 from allmydata.mutable.common import NeedMoreDataError, UnknownVersionError
4296+from allmydata.interfaces import HASH_SIZE, SALT_SIZE, SDMF_VERSION, \
4297+                                 MDMF_VERSION, IMutableSlotWriter
4298+from allmydata.util import mathutil, observer
4299+from twisted.python import failure
4300+from twisted.internet import defer
4301+from zope.interface import implements
4302+
4303+
4304+# These strings describe the format of the packed structs they help process
4305+# Here's what they mean:
4306+#
4307+#  PREFIX:
4308+#    >: Big-endian byte order; the most significant byte is first (leftmost).
4309+#    B: The version information; an 8 bit version identifier. Stored as
4310+#       an unsigned char. This is currently 00 00 00 00; our modifications
4311+#       will turn it into 00 00 00 01.
4312+#    Q: The sequence number; this is sort of like a revision history for
4313+#       mutable files; they start at 1 and increase as they are changed after
4314+#       being uploaded. Stored as an unsigned long long, which is 8 bytes in
4315+#       length.
4316+#  32s: The root hash of the share hash tree. We use sha-256d, so we use 32
4317+#       characters = 32 bytes to store the value.
4318+#  16s: The salt for the readkey. This is a 16-byte random value, stored as
4319+#       16 characters.
4320+#
4321+#  SIGNED_PREFIX additions, things that are covered by the signature:
4322+#    B: The "k" encoding parameter. We store this as an 8-bit character,
4323+#       which is convenient because our erasure coding scheme cannot
4324+#       encode if you ask for more than 255 pieces.
4325+#    B: The "N" encoding parameter. Stored as an 8-bit character for the
4326+#       same reasons as above.
4327+#    Q: The segment size of the uploaded file. This will essentially be the
4328+#       length of the file in SDMF. An unsigned long long, so we can store
4329+#       files of quite large size.
4330+#    Q: The data length of the uploaded file. Modulo padding, this will be
4331+#       the same of the data length field. Like the data length field, it is
4332+#       an unsigned long long and can be quite large.
4333+#
4334+#   HEADER additions:
4335+#     L: The offset of the signature of this. An unsigned long.
4336+#     L: The offset of the share hash chain. An unsigned long.
4337+#     L: The offset of the block hash tree. An unsigned long.
4338+#     L: The offset of the share data. An unsigned long.
4339+#     Q: The offset of the encrypted private key. An unsigned long long, to
4340+#        account for the possibility of a lot of share data.
4341+#     Q: The offset of the EOF. An unsigned long long, to account for the
4342+#        possibility of a lot of share data.
4343+#
4344+#  After all of these, we have the following:
4345+#    - The verification key: Occupies the space between the end of the header
4346+#      and the start of the signature (i.e.: data[HEADER_LENGTH:o['signature']].
4347+#    - The signature, which goes from the signature offset to the share hash
4348+#      chain offset.
4349+#    - The share hash chain, which goes from the share hash chain offset to
4350+#      the block hash tree offset.
4351+#    - The share data, which goes from the share data offset to the encrypted
4352+#      private key offset.
4353+#    - The encrypted private key offset, which goes until the end of the file.
4354+#
4355+#  The block hash tree in this encoding has only one share, so the offset of
4356+#  the share data will be 32 bits more than the offset of the block hash tree.
4357+#  Given this, we may need to check to see how many bytes a reasonably sized
4358+#  block hash tree will take up.
4359 
4360 PREFIX = ">BQ32s16s" # each version has a different prefix
4361 SIGNED_PREFIX = ">BQ32s16s BBQQ" # this is covered by the signature
4362hunk ./src/allmydata/mutable/layout.py 73
4363 SIGNED_PREFIX_LENGTH = struct.calcsize(SIGNED_PREFIX)
4364 HEADER = ">BQ32s16s BBQQ LLLLQQ" # includes offsets
4365 HEADER_LENGTH = struct.calcsize(HEADER)
4366+OFFSETS = ">LLLLQQ"
4367+OFFSETS_LENGTH = struct.calcsize(OFFSETS)
4368 
4369hunk ./src/allmydata/mutable/layout.py 76
4370+# These are still used for some tests.
4371 def unpack_header(data):
4372     o = {}
4373     (version,
4374hunk ./src/allmydata/mutable/layout.py 92
4375      o['EOF']) = struct.unpack(HEADER, data[:HEADER_LENGTH])
4376     return (version, seqnum, root_hash, IV, k, N, segsize, datalen, o)
4377 
4378-def unpack_prefix_and_signature(data):
4379-    assert len(data) >= HEADER_LENGTH, len(data)
4380-    prefix = data[:SIGNED_PREFIX_LENGTH]
4381-
4382-    (version,
4383-     seqnum,
4384-     root_hash,
4385-     IV,
4386-     k, N, segsize, datalen,
4387-     o) = unpack_header(data)
4388-
4389-    if version != 0:
4390-        raise UnknownVersionError("got mutable share version %d, but I only understand version 0" % version)
4391-
4392-    if len(data) < o['share_hash_chain']:
4393-        raise NeedMoreDataError(o['share_hash_chain'],
4394-                                o['enc_privkey'], o['EOF']-o['enc_privkey'])
4395-
4396-    pubkey_s = data[HEADER_LENGTH:o['signature']]
4397-    signature = data[o['signature']:o['share_hash_chain']]
4398-
4399-    return (seqnum, root_hash, IV, k, N, segsize, datalen,
4400-            pubkey_s, signature, prefix)
4401-
4402 def unpack_share(data):
4403     assert len(data) >= HEADER_LENGTH
4404     o = {}
4405hunk ./src/allmydata/mutable/layout.py 139
4406             pubkey, signature, share_hash_chain, block_hash_tree,
4407             share_data, enc_privkey)
4408 
4409-def unpack_share_data(verinfo, hash_and_data):
4410-    (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, o_t) = verinfo
4411-
4412-    # hash_and_data starts with the share_hash_chain, so figure out what the
4413-    # offsets really are
4414-    o = dict(o_t)
4415-    o_share_hash_chain = 0
4416-    o_block_hash_tree = o['block_hash_tree'] - o['share_hash_chain']
4417-    o_share_data = o['share_data'] - o['share_hash_chain']
4418-    o_enc_privkey = o['enc_privkey'] - o['share_hash_chain']
4419-
4420-    share_hash_chain_s = hash_and_data[o_share_hash_chain:o_block_hash_tree]
4421-    share_hash_format = ">H32s"
4422-    hsize = struct.calcsize(share_hash_format)
4423-    assert len(share_hash_chain_s) % hsize == 0, len(share_hash_chain_s)
4424-    share_hash_chain = []
4425-    for i in range(0, len(share_hash_chain_s), hsize):
4426-        chunk = share_hash_chain_s[i:i+hsize]
4427-        (hid, h) = struct.unpack(share_hash_format, chunk)
4428-        share_hash_chain.append( (hid, h) )
4429-    share_hash_chain = dict(share_hash_chain)
4430-    block_hash_tree_s = hash_and_data[o_block_hash_tree:o_share_data]
4431-    assert len(block_hash_tree_s) % 32 == 0, len(block_hash_tree_s)
4432-    block_hash_tree = []
4433-    for i in range(0, len(block_hash_tree_s), 32):
4434-        block_hash_tree.append(block_hash_tree_s[i:i+32])
4435-
4436-    share_data = hash_and_data[o_share_data:o_enc_privkey]
4437-
4438-    return (share_hash_chain, block_hash_tree, share_data)
4439-
4440-
4441-def pack_checkstring(seqnum, root_hash, IV):
4442-    return struct.pack(PREFIX,
4443-                       0, # version,
4444-                       seqnum,
4445-                       root_hash,
4446-                       IV)
4447-
4448 def unpack_checkstring(checkstring):
4449     cs_len = struct.calcsize(PREFIX)
4450     version, seqnum, root_hash, IV = struct.unpack(PREFIX, checkstring[:cs_len])
4451hunk ./src/allmydata/mutable/layout.py 146
4452         raise UnknownVersionError("got mutable share version %d, but I only understand version 0" % version)
4453     return (seqnum, root_hash, IV)
4454 
4455-def pack_prefix(seqnum, root_hash, IV,
4456-                required_shares, total_shares,
4457-                segment_size, data_length):
4458-    prefix = struct.pack(SIGNED_PREFIX,
4459-                         0, # version,
4460-                         seqnum,
4461-                         root_hash,
4462-                         IV,
4463-
4464-                         required_shares,
4465-                         total_shares,
4466-                         segment_size,
4467-                         data_length,
4468-                         )
4469-    return prefix
4470 
4471 def pack_offsets(verification_key_length, signature_length,
4472                  share_hash_chain_length, block_hash_tree_length,
4473hunk ./src/allmydata/mutable/layout.py 192
4474                            encprivkey])
4475     return final_share
4476 
4477+def pack_prefix(seqnum, root_hash, IV,
4478+                required_shares, total_shares,
4479+                segment_size, data_length):
4480+    prefix = struct.pack(SIGNED_PREFIX,
4481+                         0, # version,
4482+                         seqnum,
4483+                         root_hash,
4484+                         IV,
4485+                         required_shares,
4486+                         total_shares,
4487+                         segment_size,
4488+                         data_length,
4489+                         )
4490+    return prefix
4491+
4492+
4493+class SDMFSlotWriteProxy:
4494+    implements(IMutableSlotWriter)
4495+    """
4496+    I represent a remote write slot for an SDMF mutable file. I build a
4497+    share in memory, and then write it in one piece to the remote
4498+    server. This mimics how SDMF shares were built before MDMF (and the
4499+    new MDMF uploader), but provides that functionality in a way that
4500+    allows the MDMF uploader to be built without much special-casing for
4501+    file format, which makes the uploader code more readable.
4502+    """
4503+    def __init__(self,
4504+                 shnum,
4505+                 rref, # a remote reference to a storage server
4506+                 storage_index,
4507+                 secrets, # (write_enabler, renew_secret, cancel_secret)
4508+                 seqnum, # the sequence number of the mutable file
4509+                 required_shares,
4510+                 total_shares,
4511+                 segment_size,
4512+                 data_length): # the length of the original file
4513+        self.shnum = shnum
4514+        self._rref = rref
4515+        self._storage_index = storage_index
4516+        self._secrets = secrets
4517+        self._seqnum = seqnum
4518+        self._required_shares = required_shares
4519+        self._total_shares = total_shares
4520+        self._segment_size = segment_size
4521+        self._data_length = data_length
4522+
4523+        # This is an SDMF file, so it should have only one segment, so,
4524+        # modulo padding of the data length, the segment size and the
4525+        # data length should be the same.
4526+        expected_segment_size = mathutil.next_multiple(data_length,
4527+                                                       self._required_shares)
4528+        assert expected_segment_size == segment_size
4529+
4530+        self._block_size = self._segment_size / self._required_shares
4531+
4532+        # This is meant to mimic how SDMF files were built before MDMF
4533+        # entered the picture: we generate each share in its entirety,
4534+        # then push it off to the storage server in one write. When
4535+        # callers call set_*, they are just populating this dict.
4536+        # finish_publishing will stitch these pieces together into a
4537+        # coherent share, and then write the coherent share to the
4538+        # storage server.
4539+        self._share_pieces = {}
4540+
4541+        # This tells the write logic what checkstring to use when
4542+        # writing remote shares.
4543+        self._testvs = []
4544+
4545+        self._readvs = [(0, struct.calcsize(PREFIX))]
4546+
4547+
4548+    def set_checkstring(self, checkstring_or_seqnum,
4549+                              root_hash=None,
4550+                              salt=None):
4551+        """
4552+        Set the checkstring that I will pass to the remote server when
4553+        writing.
4554+
4555+            @param checkstring_or_seqnum: A packed checkstring to use,
4556+                   or a sequence number. I will treat this as a checkstr
4557+
4558+        Note that implementations can differ in which semantics they
4559+        wish to support for set_checkstring -- they can, for example,
4560+        build the checkstring themselves from its constituents, or
4561+        some other thing.
4562+        """
4563+        if root_hash and salt:
4564+            checkstring = struct.pack(PREFIX,
4565+                                      0,
4566+                                      checkstring_or_seqnum,
4567+                                      root_hash,
4568+                                      salt)
4569+        else:
4570+            checkstring = checkstring_or_seqnum
4571+        self._testvs = [(0, len(checkstring), "eq", checkstring)]
4572+
4573+
4574+    def get_checkstring(self):
4575+        """
4576+        Get the checkstring that I think currently exists on the remote
4577+        server.
4578+        """
4579+        if self._testvs:
4580+            return self._testvs[0][3]
4581+        return ""
4582+
4583+
4584+    def put_block(self, data, segnum, salt):
4585+        """
4586+        Add a block and salt to the share.
4587+        """
4588+        # SDMF files have only one segment
4589+        assert segnum == 0
4590+        assert len(data) == self._block_size
4591+        assert len(salt) == SALT_SIZE
4592+
4593+        self._share_pieces['sharedata'] = data
4594+        self._share_pieces['salt'] = salt
4595+
4596+        # TODO: Figure out something intelligent to return.
4597+        return defer.succeed(None)
4598+
4599+
4600+    def put_encprivkey(self, encprivkey):
4601+        """
4602+        Add the encrypted private key to the share.
4603+        """
4604+        self._share_pieces['encprivkey'] = encprivkey
4605+
4606+        return defer.succeed(None)
4607+
4608+
4609+    def put_blockhashes(self, blockhashes):
4610+        """
4611+        Add the block hash tree to the share.
4612+        """
4613+        assert isinstance(blockhashes, list)
4614+        for h in blockhashes:
4615+            assert len(h) == HASH_SIZE
4616+
4617+        # serialize the blockhashes, then set them.
4618+        blockhashes_s = "".join(blockhashes)
4619+        self._share_pieces['block_hash_tree'] = blockhashes_s
4620+
4621+        return defer.succeed(None)
4622+
4623+
4624+    def put_sharehashes(self, sharehashes):
4625+        """
4626+        Add the share hash chain to the share.
4627+        """
4628+        assert isinstance(sharehashes, dict)
4629+        for h in sharehashes.itervalues():
4630+            assert len(h) == HASH_SIZE
4631+
4632+        # serialize the sharehashes, then set them.
4633+        sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i])
4634+                                 for i in sorted(sharehashes.keys())])
4635+        self._share_pieces['share_hash_chain'] = sharehashes_s
4636+
4637+        return defer.succeed(None)
4638+
4639+
4640+    def put_root_hash(self, root_hash):
4641+        """
4642+        Add the root hash to the share.
4643+        """
4644+        assert len(root_hash) == HASH_SIZE
4645+
4646+        self._share_pieces['root_hash'] = root_hash
4647+
4648+        return defer.succeed(None)
4649+
4650+
4651+    def put_salt(self, salt):
4652+        """
4653+        Add a salt to an empty SDMF file.
4654+        """
4655+        assert len(salt) == SALT_SIZE
4656+
4657+        self._share_pieces['salt'] = salt
4658+        self._share_pieces['sharedata'] = ""
4659+
4660+
4661+    def get_signable(self):
4662+        """
4663+        Return the part of the share that needs to be signed.
4664+
4665+        SDMF writers need to sign the packed representation of the
4666+        first eight fields of the remote share, that is:
4667+            - version number (0)
4668+            - sequence number
4669+            - root of the share hash tree
4670+            - salt
4671+            - k
4672+            - n
4673+            - segsize
4674+            - datalen
4675+
4676+        This method is responsible for returning that to callers.
4677+        """
4678+        return struct.pack(SIGNED_PREFIX,
4679+                           0,
4680+                           self._seqnum,
4681+                           self._share_pieces['root_hash'],
4682+                           self._share_pieces['salt'],
4683+                           self._required_shares,
4684+                           self._total_shares,
4685+                           self._segment_size,
4686+                           self._data_length)
4687+
4688+
4689+    def put_signature(self, signature):
4690+        """
4691+        Add the signature to the share.
4692+        """
4693+        self._share_pieces['signature'] = signature
4694+
4695+        return defer.succeed(None)
4696+
4697+
4698+    def put_verification_key(self, verification_key):
4699+        """
4700+        Add the verification key to the share.
4701+        """
4702+        self._share_pieces['verification_key'] = verification_key
4703+
4704+        return defer.succeed(None)
4705+
4706+
4707+    def get_verinfo(self):
4708+        """
4709+        I return my verinfo tuple. This is used by the ServermapUpdater
4710+        to keep track of versions of mutable files.
4711+
4712+        The verinfo tuple for MDMF files contains:
4713+            - seqnum
4714+            - root hash
4715+            - a blank (nothing)
4716+            - segsize
4717+            - datalen
4718+            - k
4719+            - n
4720+            - prefix (the thing that you sign)
4721+            - a tuple of offsets
4722+
4723+        We include the nonce in MDMF to simplify processing of version
4724+        information tuples.
4725+
4726+        The verinfo tuple for SDMF files is the same, but contains a
4727+        16-byte IV instead of a hash of salts.
4728+        """
4729+        return (self._seqnum,
4730+                self._share_pieces['root_hash'],
4731+                self._share_pieces['salt'],
4732+                self._segment_size,
4733+                self._data_length,
4734+                self._required_shares,
4735+                self._total_shares,
4736+                self.get_signable(),
4737+                self._get_offsets_tuple())
4738+
4739+    def _get_offsets_dict(self):
4740+        post_offset = HEADER_LENGTH
4741+        offsets = {}
4742+
4743+        verification_key_length = len(self._share_pieces['verification_key'])
4744+        o1 = offsets['signature'] = post_offset + verification_key_length
4745+
4746+        signature_length = len(self._share_pieces['signature'])
4747+        o2 = offsets['share_hash_chain'] = o1 + signature_length
4748+
4749+        share_hash_chain_length = len(self._share_pieces['share_hash_chain'])
4750+        o3 = offsets['block_hash_tree'] = o2 + share_hash_chain_length
4751+
4752+        block_hash_tree_length = len(self._share_pieces['block_hash_tree'])
4753+        o4 = offsets['share_data'] = o3 + block_hash_tree_length
4754+
4755+        share_data_length = len(self._share_pieces['sharedata'])
4756+        o5 = offsets['enc_privkey'] = o4 + share_data_length
4757+
4758+        encprivkey_length = len(self._share_pieces['encprivkey'])
4759+        offsets['EOF'] = o5 + encprivkey_length
4760+        return offsets
4761+
4762+
4763+    def _get_offsets_tuple(self):
4764+        offsets = self._get_offsets_dict()
4765+        return tuple([(key, value) for key, value in offsets.items()])
4766+
4767+
4768+    def _pack_offsets(self):
4769+        offsets = self._get_offsets_dict()
4770+        return struct.pack(">LLLLQQ",
4771+                           offsets['signature'],
4772+                           offsets['share_hash_chain'],
4773+                           offsets['block_hash_tree'],
4774+                           offsets['share_data'],
4775+                           offsets['enc_privkey'],
4776+                           offsets['EOF'])
4777+
4778+
4779+    def finish_publishing(self):
4780+        """
4781+        Do anything necessary to finish writing the share to a remote
4782+        server. I require that no further publishing needs to take place
4783+        after this method has been called.
4784+        """
4785+        for k in ["sharedata", "encprivkey", "signature", "verification_key",
4786+                  "share_hash_chain", "block_hash_tree"]:
4787+            assert k in self._share_pieces
4788+        # This is the only method that actually writes something to the
4789+        # remote server.
4790+        # First, we need to pack the share into data that we can write
4791+        # to the remote server in one write.
4792+        offsets = self._pack_offsets()
4793+        prefix = self.get_signable()
4794+        final_share = "".join([prefix,
4795+                               offsets,
4796+                               self._share_pieces['verification_key'],
4797+                               self._share_pieces['signature'],
4798+                               self._share_pieces['share_hash_chain'],
4799+                               self._share_pieces['block_hash_tree'],
4800+                               self._share_pieces['sharedata'],
4801+                               self._share_pieces['encprivkey']])
4802+
4803+        # Our only data vector is going to be writing the final share,
4804+        # in its entirely.
4805+        datavs = [(0, final_share)]
4806+
4807+        if not self._testvs:
4808+            # Our caller has not provided us with another checkstring
4809+            # yet, so we assume that we are writing a new share, and set
4810+            # a test vector that will allow a new share to be written.
4811+            self._testvs = []
4812+            self._testvs.append(tuple([0, 1, "eq", ""]))
4813+
4814+        tw_vectors = {}
4815+        tw_vectors[self.shnum] = (self._testvs, datavs, None)
4816+        return self._rref.callRemote("slot_testv_and_readv_and_writev",
4817+                                     self._storage_index,
4818+                                     self._secrets,
4819+                                     tw_vectors,
4820+                                     # TODO is it useful to read something?
4821+                                     self._readvs)
4822+
4823+
4824+MDMFHEADER = ">BQ32sBBQQ QQQQQQ"
4825+MDMFHEADERWITHOUTOFFSETS = ">BQ32sBBQQ"
4826+MDMFHEADERSIZE = struct.calcsize(MDMFHEADER)
4827+MDMFHEADERWITHOUTOFFSETSSIZE = struct.calcsize(MDMFHEADERWITHOUTOFFSETS)
4828+MDMFCHECKSTRING = ">BQ32s"
4829+MDMFSIGNABLEHEADER = ">BQ32sBBQQ"
4830+MDMFOFFSETS = ">QQQQQQ"
4831+MDMFOFFSETS_LENGTH = struct.calcsize(MDMFOFFSETS)
4832+
4833+class MDMFSlotWriteProxy:
4834+    implements(IMutableSlotWriter)
4835+
4836+    """
4837+    I represent a remote write slot for an MDMF mutable file.
4838+
4839+    I abstract away from my caller the details of block and salt
4840+    management, and the implementation of the on-disk format for MDMF
4841+    shares.
4842+    """
4843+    # Expected layout, MDMF:
4844+    # offset:     size:       name:
4845+    #-- signed part --
4846+    # 0           1           version number (01)
4847+    # 1           8           sequence number
4848+    # 9           32          share tree root hash
4849+    # 41          1           The "k" encoding parameter
4850+    # 42          1           The "N" encoding parameter
4851+    # 43          8           The segment size of the uploaded file
4852+    # 51          8           The data length of the original plaintext
4853+    #-- end signed part --
4854+    # 59          8           The offset of the encrypted private key
4855+    # 83          8           The offset of the signature
4856+    # 91          8           The offset of the verification key
4857+    # 67          8           The offset of the block hash tree
4858+    # 75          8           The offset of the share hash chain
4859+    # 99          8           The offset of the EOF
4860+    #
4861+    # followed by salts and share data, the encrypted private key, the
4862+    # block hash tree, the salt hash tree, the share hash chain, a
4863+    # signature over the first eight fields, and a verification key.
4864+    #
4865+    # The checkstring is the first three fields -- the version number,
4866+    # sequence number, root hash and root salt hash. This is consistent
4867+    # in meaning to what we have with SDMF files, except now instead of
4868+    # using the literal salt, we use a value derived from all of the
4869+    # salts -- the share hash root.
4870+    #
4871+    # The salt is stored before the block for each segment. The block
4872+    # hash tree is computed over the combination of block and salt for
4873+    # each segment. In this way, we get integrity checking for both
4874+    # block and salt with the current block hash tree arrangement.
4875+    #
4876+    # The ordering of the offsets is different to reflect the dependencies
4877+    # that we'll run into with an MDMF file. The expected write flow is
4878+    # something like this:
4879+    #
4880+    #   0: Initialize with the sequence number, encoding parameters and
4881+    #      data length. From this, we can deduce the number of segments,
4882+    #      and where they should go.. We can also figure out where the
4883+    #      encrypted private key should go, because we can figure out how
4884+    #      big the share data will be.
4885+    #
4886+    #   1: Encrypt, encode, and upload the file in chunks. Do something
4887+    #      like
4888+    #
4889+    #       put_block(data, segnum, salt)
4890+    #
4891+    #      to write a block and a salt to the disk. We can do both of
4892+    #      these operations now because we have enough of the offsets to
4893+    #      know where to put them.
4894+    #
4895+    #   2: Put the encrypted private key. Use:
4896+    #
4897+    #        put_encprivkey(encprivkey)
4898+    #
4899+    #      Now that we know the length of the private key, we can fill
4900+    #      in the offset for the block hash tree.
4901+    #
4902+    #   3: We're now in a position to upload the block hash tree for
4903+    #      a share. Put that using something like:
4904+    #       
4905+    #        put_blockhashes(block_hash_tree)
4906+    #
4907+    #      Note that block_hash_tree is a list of hashes -- we'll take
4908+    #      care of the details of serializing that appropriately. When
4909+    #      we get the block hash tree, we are also in a position to
4910+    #      calculate the offset for the share hash chain, and fill that
4911+    #      into the offsets table.
4912+    #
4913+    #   4: At the same time, we're in a position to upload the salt hash
4914+    #      tree. This is a Merkle tree over all of the salts. We use a
4915+    #      Merkle tree so that we can validate each block,salt pair as
4916+    #      we download them later. We do this using
4917+    #
4918+    #        put_salthashes(salt_hash_tree)
4919+    #
4920+    #      When you do this, I automatically put the root of the tree
4921+    #      (the hash at index 0 of the list) in its appropriate slot in
4922+    #      the signed prefix of the share.
4923+    #
4924+    #   5: We're now in a position to upload the share hash chain for
4925+    #      a share. Do that with something like:
4926+    #     
4927+    #        put_sharehashes(share_hash_chain)
4928+    #
4929+    #      share_hash_chain should be a dictionary mapping shnums to
4930+    #      32-byte hashes -- the wrapper handles serialization.
4931+    #      We'll know where to put the signature at this point, also.
4932+    #      The root of this tree will be put explicitly in the next
4933+    #      step.
4934+    #
4935+    #      TODO: Why? Why not just include it in the tree here?
4936+    #
4937+    #   6: Before putting the signature, we must first put the
4938+    #      root_hash. Do this with:
4939+    #
4940+    #        put_root_hash(root_hash).
4941+    #     
4942+    #      In terms of knowing where to put this value, it was always
4943+    #      possible to place it, but it makes sense semantically to
4944+    #      place it after the share hash tree, so that's why you do it
4945+    #      in this order.
4946+    #
4947+    #   6: With the root hash put, we can now sign the header. Use:
4948+    #
4949+    #        get_signable()
4950+    #
4951+    #      to get the part of the header that you want to sign, and use:
4952+    #       
4953+    #        put_signature(signature)
4954+    #
4955+    #      to write your signature to the remote server.
4956+    #
4957+    #   6: Add the verification key, and finish. Do:
4958+    #
4959+    #        put_verification_key(key)
4960+    #
4961+    #      and
4962+    #
4963+    #        finish_publish()
4964+    #
4965+    # Checkstring management:
4966+    #
4967+    # To write to a mutable slot, we have to provide test vectors to ensure
4968+    # that we are writing to the same data that we think we are. These
4969+    # vectors allow us to detect uncoordinated writes; that is, writes
4970+    # where both we and some other shareholder are writing to the
4971+    # mutable slot, and to report those back to the parts of the program
4972+    # doing the writing.
4973+    #
4974+    # With SDMF, this was easy -- all of the share data was written in
4975+    # one go, so it was easy to detect uncoordinated writes, and we only
4976+    # had to do it once. With MDMF, not all of the file is written at
4977+    # once.
4978+    #
4979+    # If a share is new, we write out as much of the header as we can
4980+    # before writing out anything else. This gives other writers a
4981+    # canary that they can use to detect uncoordinated writes, and, if
4982+    # they do the same thing, gives us the same canary. We them update
4983+    # the share. We won't be able to write out two fields of the header
4984+    # -- the share tree hash and the salt hash -- until we finish
4985+    # writing out the share. We only require the writer to provide the
4986+    # initial checkstring, and keep track of what it should be after
4987+    # updates ourselves.
4988+    #
4989+    # If we haven't written anything yet, then on the first write (which
4990+    # will probably be a block + salt of a share), we'll also write out
4991+    # the header. On subsequent passes, we'll expect to see the header.
4992+    # This changes in two places:
4993+    #
4994+    #   - When we write out the salt hash
4995+    #   - When we write out the root of the share hash tree
4996+    #
4997+    # since these values will change the header. It is possible that we
4998+    # can just make those be written in one operation to minimize
4999+    # disruption.
5000+    def __init__(self,
5001+                 shnum,
5002+                 rref, # a remote reference to a storage server
5003+                 storage_index,
5004+                 secrets, # (write_enabler, renew_secret, cancel_secret)
5005+                 seqnum, # the sequence number of the mutable file
5006+                 required_shares,
5007+                 total_shares,
5008+                 segment_size,
5009+                 data_length): # the length of the original file
5010+        self.shnum = shnum
5011+        self._rref = rref
5012+        self._storage_index = storage_index
5013+        self._seqnum = seqnum
5014+        self._required_shares = required_shares
5015+        assert self.shnum >= 0 and self.shnum < total_shares
5016+        self._total_shares = total_shares
5017+        # We build up the offset table as we write things. It is the
5018+        # last thing we write to the remote server.
5019+        self._offsets = {}
5020+        self._testvs = []
5021+        # This is a list of write vectors that will be sent to our
5022+        # remote server once we are directed to write things there.
5023+        self._writevs = []
5024+        self._secrets = secrets
5025+        # The segment size needs to be a multiple of the k parameter --
5026+        # any padding should have been carried out by the publisher
5027+        # already.
5028+        assert segment_size % required_shares == 0
5029+        self._segment_size = segment_size
5030+        self._data_length = data_length
5031+
5032+        # These are set later -- we define them here so that we can
5033+        # check for their existence easily
5034+
5035+        # This is the root of the share hash tree -- the Merkle tree
5036+        # over the roots of the block hash trees computed for shares in
5037+        # this upload.
5038+        self._root_hash = None
5039+
5040+        # We haven't yet written anything to the remote bucket. By
5041+        # setting this, we tell the _write method as much. The write
5042+        # method will then know that it also needs to add a write vector
5043+        # for the checkstring (or what we have of it) to the first write
5044+        # request. We'll then record that value for future use.  If
5045+        # we're expecting something to be there already, we need to call
5046+        # set_checkstring before we write anything to tell the first
5047+        # write about that.
5048+        self._written = False
5049+
5050+        # When writing data to the storage servers, we get a read vector
5051+        # for free. We'll read the checkstring, which will help us
5052+        # figure out what's gone wrong if a write fails.
5053+        self._readv = [(0, struct.calcsize(MDMFCHECKSTRING))]
5054+
5055+        # We calculate the number of segments because it tells us
5056+        # where the salt part of the file ends/share segment begins,
5057+        # and also because it provides a useful amount of bounds checking.
5058+        self._num_segments = mathutil.div_ceil(self._data_length,
5059+                                               self._segment_size)
5060+        self._block_size = self._segment_size / self._required_shares
5061+        # We also calculate the share size, to help us with block
5062+        # constraints later.
5063+        tail_size = self._data_length % self._segment_size
5064+        if not tail_size:
5065+            self._tail_block_size = self._block_size
5066+        else:
5067+            self._tail_block_size = mathutil.next_multiple(tail_size,
5068+                                                           self._required_shares)
5069+            self._tail_block_size /= self._required_shares
5070+
5071+        # We already know where the sharedata starts; right after the end
5072+        # of the header (which is defined as the signable part + the offsets)
5073+        # We can also calculate where the encrypted private key begins
5074+        # from what we know know.
5075+        self._actual_block_size = self._block_size + SALT_SIZE
5076+        data_size = self._actual_block_size * (self._num_segments - 1)
5077+        data_size += self._tail_block_size
5078+        data_size += SALT_SIZE
5079+        self._offsets['enc_privkey'] = MDMFHEADERSIZE
5080+        self._offsets['enc_privkey'] += data_size
5081+        # We'll wait for the rest. Callers can now call my "put_block" and
5082+        # "set_checkstring" methods.
5083+
5084+
5085+    def set_checkstring(self,
5086+                        seqnum_or_checkstring,
5087+                        root_hash=None,
5088+                        salt=None):
5089+        """
5090+        Set checkstring checkstring for the given shnum.
5091+
5092+        This can be invoked in one of two ways.
5093+
5094+        With one argument, I assume that you are giving me a literal
5095+        checkstring -- e.g., the output of get_checkstring. I will then
5096+        set that checkstring as it is. This form is used by unit tests.
5097+
5098+        With two arguments, I assume that you are giving me a sequence
5099+        number and root hash to make a checkstring from. In that case, I
5100+        will build a checkstring and set it for you. This form is used
5101+        by the publisher.
5102+
5103+        By default, I assume that I am writing new shares to the grid.
5104+        If you don't explcitly set your own checkstring, I will use
5105+        one that requires that the remote share not exist. You will want
5106+        to use this method if you are updating a share in-place;
5107+        otherwise, writes will fail.
5108+        """
5109+        # You're allowed to overwrite checkstrings with this method;
5110+        # I assume that users know what they are doing when they call
5111+        # it.
5112+        if root_hash:
5113+            checkstring = struct.pack(MDMFCHECKSTRING,
5114+                                      1,
5115+                                      seqnum_or_checkstring,
5116+                                      root_hash)
5117+        else:
5118+            checkstring = seqnum_or_checkstring
5119+
5120+        if checkstring == "":
5121+            # We special-case this, since len("") = 0, but we need
5122+            # length of 1 for the case of an empty share to work on the
5123+            # storage server, which is what a checkstring that is the
5124+            # empty string means.
5125+            self._testvs = []
5126+        else:
5127+            self._testvs = []
5128+            self._testvs.append((0, len(checkstring), "eq", checkstring))
5129+
5130+
5131+    def __repr__(self):
5132+        return "MDMFSlotWriteProxy for share %d" % self.shnum
5133+
5134+
5135+    def get_checkstring(self):
5136+        """
5137+        Given a share number, I return a representation of what the
5138+        checkstring for that share on the server will look like.
5139+
5140+        I am mostly used for tests.
5141+        """
5142+        if self._root_hash:
5143+            roothash = self._root_hash
5144+        else:
5145+            roothash = "\x00" * 32
5146+        return struct.pack(MDMFCHECKSTRING,
5147+                           1,
5148+                           self._seqnum,
5149+                           roothash)
5150+
5151+
5152+    def put_block(self, data, segnum, salt):
5153+        """
5154+        I queue a write vector for the data, salt, and segment number
5155+        provided to me. I return None, as I do not actually cause
5156+        anything to be written yet.
5157+        """
5158+        if segnum >= self._num_segments:
5159+            raise LayoutInvalid("I won't overwrite the private key")
5160+        if len(salt) != SALT_SIZE:
5161+            raise LayoutInvalid("I was given a salt of size %d, but "
5162+                                "I wanted a salt of size %d")
5163+        if segnum + 1 == self._num_segments:
5164+            if len(data) != self._tail_block_size:
5165+                raise LayoutInvalid("I was given the wrong size block to write")
5166+        elif len(data) != self._block_size:
5167+            raise LayoutInvalid("I was given the wrong size block to write")
5168+
5169+        # We want to write at len(MDMFHEADER) + segnum * block_size.
5170+
5171+        offset = MDMFHEADERSIZE + (self._actual_block_size * segnum)
5172+        data = salt + data
5173+
5174+        self._writevs.append(tuple([offset, data]))
5175+
5176+
5177+    def put_encprivkey(self, encprivkey):
5178+        """
5179+        I queue a write vector for the encrypted private key provided to
5180+        me.
5181+        """
5182+        assert self._offsets
5183+        assert self._offsets['enc_privkey']
5184+        # You shouldn't re-write the encprivkey after the block hash
5185+        # tree is written, since that could cause the private key to run
5186+        # into the block hash tree. Before it writes the block hash
5187+        # tree, the block hash tree writing method writes the offset of
5188+        # the salt hash tree. So that's a good indicator of whether or
5189+        # not the block hash tree has been written.
5190+        if "share_hash_chain" in self._offsets:
5191+            raise LayoutInvalid("You must write this before the block hash tree")
5192+
5193+        self._offsets['block_hash_tree'] = self._offsets['enc_privkey'] + \
5194+            len(encprivkey)
5195+        self._writevs.append(tuple([self._offsets['enc_privkey'], encprivkey]))
5196+
5197+
5198+    def put_blockhashes(self, blockhashes):
5199+        """
5200+        I queue a write vector to put the block hash tree in blockhashes
5201+        onto the remote server.
5202+
5203+        The encrypted private key must be queued before the block hash
5204+        tree, since we need to know how large it is to know where the
5205+        block hash tree should go. The block hash tree must be put
5206+        before the salt hash tree, since its size determines the
5207+        offset of the share hash chain.
5208+        """
5209+        assert self._offsets
5210+        assert isinstance(blockhashes, list)
5211+        if "block_hash_tree" not in self._offsets:
5212+            raise LayoutInvalid("You must put the encrypted private key "
5213+                                "before you put the block hash tree")
5214+        # If written, the share hash chain causes the signature offset
5215+        # to be defined.
5216+        if "signature" in self._offsets:
5217+            raise LayoutInvalid("You must put the block hash tree before "
5218+                                "you put the share hash chain")
5219+        blockhashes_s = "".join(blockhashes)
5220+        self._offsets['share_hash_chain'] = self._offsets['block_hash_tree'] + len(blockhashes_s)
5221+
5222+        self._writevs.append(tuple([self._offsets['block_hash_tree'],
5223+                                  blockhashes_s]))
5224+
5225+
5226+    def put_sharehashes(self, sharehashes):
5227+        """
5228+        I queue a write vector to put the share hash chain in my
5229+        argument onto the remote server.
5230+
5231+        The salt hash tree must be queued before the share hash chain,
5232+        since we need to know where the salt hash tree ends before we
5233+        can know where the share hash chain starts. The share hash chain
5234+        must be put before the signature, since the length of the packed
5235+        share hash chain determines the offset of the signature. Also,
5236+        semantically, you must know what the root of the salt hash tree
5237+        is before you can generate a valid signature.
5238+        """
5239+        assert isinstance(sharehashes, dict)
5240+        if "share_hash_chain" not in self._offsets:
5241+            raise LayoutInvalid("You need to put the salt hash tree before "
5242+                                "you can put the share hash chain")
5243+        # The signature comes after the share hash chain. If the
5244+        # signature has already been written, we must not write another
5245+        # share hash chain. The signature writes the verification key
5246+        # offset when it gets sent to the remote server, so we look for
5247+        # that.
5248+        if "verification_key" in self._offsets:
5249+            raise LayoutInvalid("You must write the share hash chain "
5250+                                "before you write the signature")
5251+        sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i])
5252+                                  for i in sorted(sharehashes.keys())])
5253+        self._offsets['signature'] = self._offsets['share_hash_chain'] + len(sharehashes_s)
5254+        self._writevs.append(tuple([self._offsets['share_hash_chain'],
5255+                            sharehashes_s]))
5256+
5257+
5258+    def put_root_hash(self, roothash):
5259+        """
5260+        Put the root hash (the root of the share hash tree) in the
5261+        remote slot.
5262+        """
5263+        # It does not make sense to be able to put the root
5264+        # hash without first putting the share hashes, since you need
5265+        # the share hashes to generate the root hash.
5266+        #
5267+        # Signature is defined by the routine that places the share hash
5268+        # chain, so it's a good thing to look for in finding out whether
5269+        # or not the share hash chain exists on the remote server.
5270+        if "signature" not in self._offsets:
5271+            raise LayoutInvalid("You need to put the share hash chain "
5272+                                "before you can put the root share hash")
5273+        if len(roothash) != HASH_SIZE:
5274+            raise LayoutInvalid("hashes and salts must be exactly %d bytes"
5275+                                 % HASH_SIZE)
5276+        self._root_hash = roothash
5277+        # To write both of these values, we update the checkstring on
5278+        # the remote server, which includes them
5279+        checkstring = self.get_checkstring()
5280+        self._writevs.append(tuple([0, checkstring]))
5281+        # This write, if successful, changes the checkstring, so we need
5282+        # to update our internal checkstring to be consistent with the
5283+        # one on the server.
5284+
5285+
5286+    def get_signable(self):
5287+        """
5288+        Get the first seven fields of the mutable file; the parts that
5289+        are signed.
5290+        """
5291+        if not self._root_hash:
5292+            raise LayoutInvalid("You need to set the root hash "
5293+                                "before getting something to "
5294+                                "sign")
5295+        return struct.pack(MDMFSIGNABLEHEADER,
5296+                           1,
5297+                           self._seqnum,
5298+                           self._root_hash,
5299+                           self._required_shares,
5300+                           self._total_shares,
5301+                           self._segment_size,
5302+                           self._data_length)
5303+
5304+
5305+    def put_signature(self, signature):
5306+        """
5307+        I queue a write vector for the signature of the MDMF share.
5308+
5309+        I require that the root hash and share hash chain have been put
5310+        to the grid before I will write the signature to the grid.
5311+        """
5312+        if "signature" not in self._offsets:
5313+            raise LayoutInvalid("You must put the share hash chain "
5314+        # It does not make sense to put a signature without first
5315+        # putting the root hash and the salt hash (since otherwise
5316+        # the signature would be incomplete), so we don't allow that.
5317+                       "before putting the signature")
5318+        if not self._root_hash:
5319+            raise LayoutInvalid("You must complete the signed prefix "
5320+                                "before computing a signature")
5321+        # If we put the signature after we put the verification key, we
5322+        # could end up running into the verification key, and will
5323+        # probably screw up the offsets as well. So we don't allow that.
5324+        # The method that writes the verification key defines the EOF
5325+        # offset before writing the verification key, so look for that.
5326+        if "EOF" in self._offsets:
5327+            raise LayoutInvalid("You must write the signature before the verification key")
5328+
5329+        self._offsets['verification_key'] = self._offsets['signature'] + len(signature)
5330+        self._writevs.append(tuple([self._offsets['signature'], signature]))
5331+
5332+
5333+    def put_verification_key(self, verification_key):
5334+        """
5335+        I queue a write vector for the verification key.
5336+
5337+        I require that the signature have been written to the storage
5338+        server before I allow the verification key to be written to the
5339+        remote server.
5340+        """
5341+        if "verification_key" not in self._offsets:
5342+            raise LayoutInvalid("You must put the signature before you "
5343+                                "can put the verification key")
5344+        self._offsets['EOF'] = self._offsets['verification_key'] + len(verification_key)
5345+        self._writevs.append(tuple([self._offsets['verification_key'],
5346+                            verification_key]))
5347+
5348+
5349+    def _get_offsets_tuple(self):
5350+        return tuple([(key, value) for key, value in self._offsets.items()])
5351+
5352+
5353+    def get_verinfo(self):
5354+        return (self._seqnum,
5355+                self._root_hash,
5356+                self._required_shares,
5357+                self._total_shares,
5358+                self._segment_size,
5359+                self._data_length,
5360+                self.get_signable(),
5361+                self._get_offsets_tuple())
5362+
5363+
5364+    def finish_publishing(self):
5365+        """
5366+        I add a write vector for the offsets table, and then cause all
5367+        of the write vectors that I've dealt with so far to be published
5368+        to the remote server, ending the write process.
5369+        """
5370+        if "EOF" not in self._offsets:
5371+            raise LayoutInvalid("You must put the verification key before "
5372+                                "you can publish the offsets")
5373+        offsets_offset = struct.calcsize(MDMFHEADERWITHOUTOFFSETS)
5374+        offsets = struct.pack(MDMFOFFSETS,
5375+                              self._offsets['enc_privkey'],
5376+                              self._offsets['block_hash_tree'],
5377+                              self._offsets['share_hash_chain'],
5378+                              self._offsets['signature'],
5379+                              self._offsets['verification_key'],
5380+                              self._offsets['EOF'])
5381+        self._writevs.append(tuple([offsets_offset, offsets]))
5382+        encoding_parameters_offset = struct.calcsize(MDMFCHECKSTRING)
5383+        params = struct.pack(">BBQQ",
5384+                             self._required_shares,
5385+                             self._total_shares,
5386+                             self._segment_size,
5387+                             self._data_length)
5388+        self._writevs.append(tuple([encoding_parameters_offset, params]))
5389+        return self._write(self._writevs)
5390+
5391+
5392+    def _write(self, datavs, on_failure=None, on_success=None):
5393+        """I write the data vectors in datavs to the remote slot."""
5394+        tw_vectors = {}
5395+        if not self._testvs:
5396+            self._testvs = []
5397+            self._testvs.append(tuple([0, 1, "eq", ""]))
5398+        if not self._written:
5399+            # Write a new checkstring to the share when we write it, so
5400+            # that we have something to check later.
5401+            new_checkstring = self.get_checkstring()
5402+            datavs.append((0, new_checkstring))
5403+            def _first_write():
5404+                self._written = True
5405+                self._testvs = [(0, len(new_checkstring), "eq", new_checkstring)]
5406+            on_success = _first_write
5407+        tw_vectors[self.shnum] = (self._testvs, datavs, None)
5408+        d = self._rref.callRemote("slot_testv_and_readv_and_writev",
5409+                                  self._storage_index,
5410+                                  self._secrets,
5411+                                  tw_vectors,
5412+                                  self._readv)
5413+        def _result(results):
5414+            if isinstance(results, failure.Failure) or not results[0]:
5415+                # Do nothing; the write was unsuccessful.
5416+                if on_failure: on_failure()
5417+            else:
5418+                if on_success: on_success()
5419+            return results
5420+        d.addCallback(_result)
5421+        return d
5422+
5423+
5424+class MDMFSlotReadProxy:
5425+    """
5426+    I read from a mutable slot filled with data written in the MDMF data
5427+    format (which is described above).
5428+
5429+    I can be initialized with some amount of data, which I will use (if
5430+    it is valid) to eliminate some of the need to fetch it from servers.
5431+    """
5432+    def __init__(self,
5433+                 rref,
5434+                 storage_index,
5435+                 shnum,
5436+                 data=""):
5437+        # Start the initialization process.
5438+        self._rref = rref
5439+        self._storage_index = storage_index
5440+        self.shnum = shnum
5441+
5442+        # Before doing anything, the reader is probably going to want to
5443+        # verify that the signature is correct. To do that, they'll need
5444+        # the verification key, and the signature. To get those, we'll
5445+        # need the offset table. So fetch the offset table on the
5446+        # assumption that that will be the first thing that a reader is
5447+        # going to do.
5448+
5449+        # The fact that these encoding parameters are None tells us
5450+        # that we haven't yet fetched them from the remote share, so we
5451+        # should. We could just not set them, but the checks will be
5452+        # easier to read if we don't have to use hasattr.
5453+        self._version_number = None
5454+        self._sequence_number = None
5455+        self._root_hash = None
5456+        # Filled in if we're dealing with an SDMF file. Unused
5457+        # otherwise.
5458+        self._salt = None
5459+        self._required_shares = None
5460+        self._total_shares = None
5461+        self._segment_size = None
5462+        self._data_length = None
5463+        self._offsets = None
5464+
5465+        # If the user has chosen to initialize us with some data, we'll
5466+        # try to satisfy subsequent data requests with that data before
5467+        # asking the storage server for it. If
5468+        self._data = data
5469+        # The way callers interact with cache in the filenode returns
5470+        # None if there isn't any cached data, but the way we index the
5471+        # cached data requires a string, so convert None to "".
5472+        if self._data == None:
5473+            self._data = ""
5474+
5475+        self._queue_observers = observer.ObserverList()
5476+        self._queue_errbacks = observer.ObserverList()
5477+        self._readvs = []
5478+
5479+
5480+    def _maybe_fetch_offsets_and_header(self, force_remote=False):
5481+        """
5482+        I fetch the offset table and the header from the remote slot if
5483+        I don't already have them. If I do have them, I do nothing and
5484+        return an empty Deferred.
5485+        """
5486+        if self._offsets:
5487+            return defer.succeed(None)
5488+        # At this point, we may be either SDMF or MDMF. Fetching 107
5489+        # bytes will be enough to get header and offsets for both SDMF and
5490+        # MDMF, though we'll be left with 4 more bytes than we
5491+        # need if this ends up being MDMF. This is probably less
5492+        # expensive than the cost of a second roundtrip.
5493+        readvs = [(0, 107)]
5494+        d = self._read(readvs, force_remote)
5495+        d.addCallback(self._process_encoding_parameters)
5496+        d.addCallback(self._process_offsets)
5497+        return d
5498+
5499+
5500+    def _process_encoding_parameters(self, encoding_parameters):
5501+        assert self.shnum in encoding_parameters
5502+        encoding_parameters = encoding_parameters[self.shnum][0]
5503+        # The first byte is the version number. It will tell us what
5504+        # to do next.
5505+        (verno,) = struct.unpack(">B", encoding_parameters[:1])
5506+        if verno == MDMF_VERSION:
5507+            read_size = MDMFHEADERWITHOUTOFFSETSSIZE
5508+            (verno,
5509+             seqnum,
5510+             root_hash,
5511+             k,
5512+             n,
5513+             segsize,
5514+             datalen) = struct.unpack(MDMFHEADERWITHOUTOFFSETS,
5515+                                      encoding_parameters[:read_size])
5516+            if segsize == 0 and datalen == 0:
5517+                # Empty file, no segments.
5518+                self._num_segments = 0
5519+            else:
5520+                self._num_segments = mathutil.div_ceil(datalen, segsize)
5521+
5522+        elif verno == SDMF_VERSION:
5523+            read_size = SIGNED_PREFIX_LENGTH
5524+            (verno,
5525+             seqnum,
5526+             root_hash,
5527+             salt,
5528+             k,
5529+             n,
5530+             segsize,
5531+             datalen) = struct.unpack(">BQ32s16s BBQQ",
5532+                                encoding_parameters[:SIGNED_PREFIX_LENGTH])
5533+            self._salt = salt
5534+            if segsize == 0 and datalen == 0:
5535+                # empty file
5536+                self._num_segments = 0
5537+            else:
5538+                # non-empty SDMF files have one segment.
5539+                self._num_segments = 1
5540+        else:
5541+            raise UnknownVersionError("You asked me to read mutable file "
5542+                                      "version %d, but I only understand "
5543+                                      "%d and %d" % (verno, SDMF_VERSION,
5544+                                                     MDMF_VERSION))
5545+
5546+        self._version_number = verno
5547+        self._sequence_number = seqnum
5548+        self._root_hash = root_hash
5549+        self._required_shares = k
5550+        self._total_shares = n
5551+        self._segment_size = segsize
5552+        self._data_length = datalen
5553+
5554+        self._block_size = self._segment_size / self._required_shares
5555+        # We can upload empty files, and need to account for this fact
5556+        # so as to avoid zero-division and zero-modulo errors.
5557+        if datalen > 0:
5558+            tail_size = self._data_length % self._segment_size
5559+        else:
5560+            tail_size = 0
5561+        if not tail_size:
5562+            self._tail_block_size = self._block_size
5563+        else:
5564+            self._tail_block_size = mathutil.next_multiple(tail_size,
5565+                                                    self._required_shares)
5566+            self._tail_block_size /= self._required_shares
5567+
5568+        return encoding_parameters
5569+
5570+
5571+    def _process_offsets(self, offsets):
5572+        if self._version_number == 0:
5573+            read_size = OFFSETS_LENGTH
5574+            read_offset = SIGNED_PREFIX_LENGTH
5575+            end = read_size + read_offset
5576+            (signature,
5577+             share_hash_chain,
5578+             block_hash_tree,
5579+             share_data,
5580+             enc_privkey,
5581+             EOF) = struct.unpack(">LLLLQQ",
5582+                                  offsets[read_offset:end])
5583+            self._offsets = {}
5584+            self._offsets['signature'] = signature
5585+            self._offsets['share_data'] = share_data
5586+            self._offsets['block_hash_tree'] = block_hash_tree
5587+            self._offsets['share_hash_chain'] = share_hash_chain
5588+            self._offsets['enc_privkey'] = enc_privkey
5589+            self._offsets['EOF'] = EOF
5590+
5591+        elif self._version_number == 1:
5592+            read_offset = MDMFHEADERWITHOUTOFFSETSSIZE
5593+            read_length = MDMFOFFSETS_LENGTH
5594+            end = read_offset + read_length
5595+            (encprivkey,
5596+             blockhashes,
5597+             sharehashes,
5598+             signature,
5599+             verification_key,
5600+             eof) = struct.unpack(MDMFOFFSETS,
5601+                                  offsets[read_offset:end])
5602+            self._offsets = {}
5603+            self._offsets['enc_privkey'] = encprivkey
5604+            self._offsets['block_hash_tree'] = blockhashes
5605+            self._offsets['share_hash_chain'] = sharehashes
5606+            self._offsets['signature'] = signature
5607+            self._offsets['verification_key'] = verification_key
5608+            self._offsets['EOF'] = eof
5609+
5610+
5611+    def get_block_and_salt(self, segnum, queue=False):
5612+        """
5613+        I return (block, salt), where block is the block data and
5614+        salt is the salt used to encrypt that segment.
5615+        """
5616+        d = self._maybe_fetch_offsets_and_header()
5617+        def _then(ignored):
5618+            if self._version_number == 1:
5619+                base_share_offset = MDMFHEADERSIZE
5620+            else:
5621+                base_share_offset = self._offsets['share_data']
5622+
5623+            if segnum + 1 > self._num_segments:
5624+                raise LayoutInvalid("Not a valid segment number")
5625+
5626+            if self._version_number == 0:
5627+                share_offset = base_share_offset + self._block_size * segnum
5628+            else:
5629+                share_offset = base_share_offset + (self._block_size + \
5630+                                                    SALT_SIZE) * segnum
5631+            if segnum + 1 == self._num_segments:
5632+                data = self._tail_block_size
5633+            else:
5634+                data = self._block_size
5635+
5636+            if self._version_number == 1:
5637+                data += SALT_SIZE
5638+
5639+            readvs = [(share_offset, data)]
5640+            return readvs
5641+        d.addCallback(_then)
5642+        d.addCallback(lambda readvs:
5643+            self._read(readvs, queue=queue))
5644+        def _process_results(results):
5645+            assert self.shnum in results
5646+            if self._version_number == 0:
5647+                # We only read the share data, but we know the salt from
5648+                # when we fetched the header
5649+                data = results[self.shnum]
5650+                if not data:
5651+                    data = ""
5652+                else:
5653+                    assert len(data) == 1
5654+                    data = data[0]
5655+                salt = self._salt
5656+            else:
5657+                data = results[self.shnum]
5658+                if not data:
5659+                    salt = data = ""
5660+                else:
5661+                    salt_and_data = results[self.shnum][0]
5662+                    salt = salt_and_data[:SALT_SIZE]
5663+                    data = salt_and_data[SALT_SIZE:]
5664+            return data, salt
5665+        d.addCallback(_process_results)
5666+        return d
5667+
5668+
5669+    def get_blockhashes(self, needed=None, queue=False, force_remote=False):
5670+        """
5671+        I return the block hash tree
5672+
5673+        I take an optional argument, needed, which is a set of indices
5674+        correspond to hashes that I should fetch. If this argument is
5675+        missing, I will fetch the entire block hash tree; otherwise, I
5676+        may attempt to fetch fewer hashes, based on what needed says
5677+        that I should do. Note that I may fetch as many hashes as I
5678+        want, so long as the set of hashes that I do fetch is a superset
5679+        of the ones that I am asked for, so callers should be prepared
5680+        to tolerate additional hashes.
5681+        """
5682+        # TODO: Return only the parts of the block hash tree necessary
5683+        # to validate the blocknum provided?
5684+        # This is a good idea, but it is hard to implement correctly. It
5685+        # is bad to fetch any one block hash more than once, so we
5686+        # probably just want to fetch the whole thing at once and then
5687+        # serve it.
5688+        if needed == set([]):
5689+            return defer.succeed([])
5690+        d = self._maybe_fetch_offsets_and_header()
5691+        def _then(ignored):
5692+            blockhashes_offset = self._offsets['block_hash_tree']
5693+            if self._version_number == 1:
5694+                blockhashes_length = self._offsets['share_hash_chain'] - blockhashes_offset
5695+            else:
5696+                blockhashes_length = self._offsets['share_data'] - blockhashes_offset
5697+            readvs = [(blockhashes_offset, blockhashes_length)]
5698+            return readvs
5699+        d.addCallback(_then)
5700+        d.addCallback(lambda readvs:
5701+            self._read(readvs, queue=queue, force_remote=force_remote))
5702+        def _build_block_hash_tree(results):
5703+            assert self.shnum in results
5704+
5705+            rawhashes = results[self.shnum][0]
5706+            results = [rawhashes[i:i+HASH_SIZE]
5707+                       for i in range(0, len(rawhashes), HASH_SIZE)]
5708+            return results
5709+        d.addCallback(_build_block_hash_tree)
5710+        return d
5711+
5712+
5713+    def get_sharehashes(self, needed=None, queue=False, force_remote=False):
5714+        """
5715+        I return the part of the share hash chain placed to validate
5716+        this share.
5717+
5718+        I take an optional argument, needed. Needed is a set of indices
5719+        that correspond to the hashes that I should fetch. If needed is
5720+        not present, I will fetch and return the entire share hash
5721+        chain. Otherwise, I may fetch and return any part of the share
5722+        hash chain that is a superset of the part that I am asked to
5723+        fetch. Callers should be prepared to deal with more hashes than
5724+        they've asked for.
5725+        """
5726+        if needed == set([]):
5727+            return defer.succeed([])
5728+        d = self._maybe_fetch_offsets_and_header()
5729+
5730+        def _make_readvs(ignored):
5731+            sharehashes_offset = self._offsets['share_hash_chain']
5732+            if self._version_number == 0:
5733+                sharehashes_length = self._offsets['block_hash_tree'] - sharehashes_offset
5734+            else:
5735+                sharehashes_length = self._offsets['signature'] - sharehashes_offset
5736+            readvs = [(sharehashes_offset, sharehashes_length)]
5737+            return readvs
5738+        d.addCallback(_make_readvs)
5739+        d.addCallback(lambda readvs:
5740+            self._read(readvs, queue=queue, force_remote=force_remote))
5741+        def _build_share_hash_chain(results):
5742+            assert self.shnum in results
5743+
5744+            sharehashes = results[self.shnum][0]
5745+            results = [sharehashes[i:i+(HASH_SIZE + 2)]
5746+                       for i in range(0, len(sharehashes), HASH_SIZE + 2)]
5747+            results = dict([struct.unpack(">H32s", data)
5748+                            for data in results])
5749+            return results
5750+        d.addCallback(_build_share_hash_chain)
5751+        return d
5752+
5753+
5754+    def get_encprivkey(self, queue=False):
5755+        """
5756+        I return the encrypted private key.
5757+        """
5758+        d = self._maybe_fetch_offsets_and_header()
5759+
5760+        def _make_readvs(ignored):
5761+            privkey_offset = self._offsets['enc_privkey']
5762+            if self._version_number == 0:
5763+                privkey_length = self._offsets['EOF'] - privkey_offset
5764+            else:
5765+                privkey_length = self._offsets['block_hash_tree'] - privkey_offset
5766+            readvs = [(privkey_offset, privkey_length)]
5767+            return readvs
5768+        d.addCallback(_make_readvs)
5769+        d.addCallback(lambda readvs:
5770+            self._read(readvs, queue=queue))
5771+        def _process_results(results):
5772+            assert self.shnum in results
5773+            privkey = results[self.shnum][0]
5774+            return privkey
5775+        d.addCallback(_process_results)
5776+        return d
5777+
5778+
5779+    def get_signature(self, queue=False):
5780+        """
5781+        I return the signature of my share.
5782+        """
5783+        d = self._maybe_fetch_offsets_and_header()
5784+
5785+        def _make_readvs(ignored):
5786+            signature_offset = self._offsets['signature']
5787+            if self._version_number == 1:
5788+                signature_length = self._offsets['verification_key'] - signature_offset
5789+            else:
5790+                signature_length = self._offsets['share_hash_chain'] - signature_offset
5791+            readvs = [(signature_offset, signature_length)]
5792+            return readvs
5793+        d.addCallback(_make_readvs)
5794+        d.addCallback(lambda readvs:
5795+            self._read(readvs, queue=queue))
5796+        def _process_results(results):
5797+            assert self.shnum in results
5798+            signature = results[self.shnum][0]
5799+            return signature
5800+        d.addCallback(_process_results)
5801+        return d
5802+
5803+
5804+    def get_verification_key(self, queue=False):
5805+        """
5806+        I return the verification key.
5807+        """
5808+        d = self._maybe_fetch_offsets_and_header()
5809+
5810+        def _make_readvs(ignored):
5811+            if self._version_number == 1:
5812+                vk_offset = self._offsets['verification_key']
5813+                vk_length = self._offsets['EOF'] - vk_offset
5814+            else:
5815+                vk_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ")
5816+                vk_length = self._offsets['signature'] - vk_offset
5817+            readvs = [(vk_offset, vk_length)]
5818+            return readvs
5819+        d.addCallback(_make_readvs)
5820+        d.addCallback(lambda readvs:
5821+            self._read(readvs, queue=queue))
5822+        def _process_results(results):
5823+            assert self.shnum in results
5824+            verification_key = results[self.shnum][0]
5825+            return verification_key
5826+        d.addCallback(_process_results)
5827+        return d
5828+
5829+
5830+    def get_encoding_parameters(self):
5831+        """
5832+        I return (k, n, segsize, datalen)
5833+        """
5834+        d = self._maybe_fetch_offsets_and_header()
5835+        d.addCallback(lambda ignored:
5836+            (self._required_shares,
5837+             self._total_shares,
5838+             self._segment_size,
5839+             self._data_length))
5840+        return d
5841+
5842+
5843+    def get_seqnum(self):
5844+        """
5845+        I return the sequence number for this share.
5846+        """
5847+        d = self._maybe_fetch_offsets_and_header()
5848+        d.addCallback(lambda ignored:
5849+            self._sequence_number)
5850+        return d
5851+
5852+
5853+    def get_root_hash(self):
5854+        """
5855+        I return the root of the block hash tree
5856+        """
5857+        d = self._maybe_fetch_offsets_and_header()
5858+        d.addCallback(lambda ignored: self._root_hash)
5859+        return d
5860+
5861+
5862+    def get_checkstring(self):
5863+        """
5864+        I return the packed representation of the following:
5865+
5866+            - version number
5867+            - sequence number
5868+            - root hash
5869+            - salt hash
5870+
5871+        which my users use as a checkstring to detect other writers.
5872+        """
5873+        d = self._maybe_fetch_offsets_and_header()
5874+        def _build_checkstring(ignored):
5875+            if self._salt:
5876+                checkstring = struct.pack(PREFIX,
5877+                                          self._version_number,
5878+                                          self._sequence_number,
5879+                                          self._root_hash,
5880+                                          self._salt)
5881+            else:
5882+                checkstring = struct.pack(MDMFCHECKSTRING,
5883+                                          self._version_number,
5884+                                          self._sequence_number,
5885+                                          self._root_hash)
5886+
5887+            return checkstring
5888+        d.addCallback(_build_checkstring)
5889+        return d
5890+
5891+
5892+    def get_prefix(self, force_remote):
5893+        d = self._maybe_fetch_offsets_and_header(force_remote)
5894+        d.addCallback(lambda ignored:
5895+            self._build_prefix())
5896+        return d
5897+
5898+
5899+    def _build_prefix(self):
5900+        # The prefix is another name for the part of the remote share
5901+        # that gets signed. It consists of everything up to and
5902+        # including the datalength, packed by struct.
5903+        if self._version_number == SDMF_VERSION:
5904+            return struct.pack(SIGNED_PREFIX,
5905+                           self._version_number,
5906+                           self._sequence_number,
5907+                           self._root_hash,
5908+                           self._salt,
5909+                           self._required_shares,
5910+                           self._total_shares,
5911+                           self._segment_size,
5912+                           self._data_length)
5913+
5914+        else:
5915+            return struct.pack(MDMFSIGNABLEHEADER,
5916+                           self._version_number,
5917+                           self._sequence_number,
5918+                           self._root_hash,
5919+                           self._required_shares,
5920+                           self._total_shares,
5921+                           self._segment_size,
5922+                           self._data_length)
5923+
5924+
5925+    def _get_offsets_tuple(self):
5926+        # The offsets tuple is another component of the version
5927+        # information tuple. It is basically our offsets dictionary,
5928+        # itemized and in a tuple.
5929+        return self._offsets.copy()
5930+
5931+
5932+    def get_verinfo(self):
5933+        """
5934+        I return my verinfo tuple. This is used by the ServermapUpdater
5935+        to keep track of versions of mutable files.
5936+
5937+        The verinfo tuple for MDMF files contains:
5938+            - seqnum
5939+            - root hash
5940+            - a blank (nothing)
5941+            - segsize
5942+            - datalen
5943+            - k
5944+            - n
5945+            - prefix (the thing that you sign)
5946+            - a tuple of offsets
5947+
5948+        We include the nonce in MDMF to simplify processing of version
5949+        information tuples.
5950+
5951+        The verinfo tuple for SDMF files is the same, but contains a
5952+        16-byte IV instead of a hash of salts.
5953+        """
5954+        d = self._maybe_fetch_offsets_and_header()
5955+        def _build_verinfo(ignored):
5956+            if self._version_number == SDMF_VERSION:
5957+                salt_to_use = self._salt
5958+            else:
5959+                salt_to_use = None
5960+            return (self._sequence_number,
5961+                    self._root_hash,
5962+                    salt_to_use,
5963+                    self._segment_size,
5964+                    self._data_length,
5965+                    self._required_shares,
5966+                    self._total_shares,
5967+                    self._build_prefix(),
5968+                    self._get_offsets_tuple())
5969+        d.addCallback(_build_verinfo)
5970+        return d
5971+
5972+
5973+    def flush(self):
5974+        """
5975+        I flush my queue of read vectors.
5976+        """
5977+        d = self._read(self._readvs)
5978+        def _then(results):
5979+            self._readvs = []
5980+            if isinstance(results, failure.Failure):
5981+                self._queue_errbacks.notify(results)
5982+            else:
5983+                self._queue_observers.notify(results)
5984+            self._queue_observers = observer.ObserverList()
5985+            self._queue_errbacks = observer.ObserverList()
5986+        d.addBoth(_then)
5987+
5988+
5989+    def _read(self, readvs, force_remote=False, queue=False):
5990+        unsatisfiable = filter(lambda x: x[0] + x[1] > len(self._data), readvs)
5991+        # TODO: It's entirely possible to tweak this so that it just
5992+        # fulfills the requests that it can, and not demand that all
5993+        # requests are satisfiable before running it.
5994+        if not unsatisfiable and not force_remote:
5995+            results = [self._data[offset:offset+length]
5996+                       for (offset, length) in readvs]
5997+            results = {self.shnum: results}
5998+            return defer.succeed(results)
5999+        else:
6000+            if queue:
6001+                start = len(self._readvs)
6002+                self._readvs += readvs
6003+                end = len(self._readvs)
6004+                def _get_results(results, start, end):
6005+                    if not self.shnum in results:
6006+                        return {self._shnum: [""]}
6007+                    return {self.shnum: results[self.shnum][start:end]}
6008+                d = defer.Deferred()
6009+                d.addCallback(_get_results, start, end)
6010+                self._queue_observers.subscribe(d.callback)
6011+                self._queue_errbacks.subscribe(d.errback)
6012+                return d
6013+            return self._rref.callRemote("slot_readv",
6014+                                         self._storage_index,
6015+                                         [self.shnum],
6016+                                         readvs)
6017+
6018+
6019+    def is_sdmf(self):
6020+        """I tell my caller whether or not my remote file is SDMF or MDMF
6021+        """
6022+        d = self._maybe_fetch_offsets_and_header()
6023+        d.addCallback(lambda ignored:
6024+            self._version_number == 0)
6025+        return d
6026+
6027+
6028+class LayoutInvalid(Exception):
6029+    """
6030+    This isn't a valid MDMF mutable file
6031+    """
6032merger 0.0 (
6033hunk ./src/allmydata/test/test_storage.py 3
6034-from allmydata.util import log
6035-
6036merger 0.0 (
6037hunk ./src/allmydata/test/test_storage.py 3
6038-import time, os.path, stat, re, simplejson, struct
6039+from allmydata.util import log
6040+
6041+import mock
6042hunk ./src/allmydata/test/test_storage.py 3
6043-import time, os.path, stat, re, simplejson, struct
6044+import time, os.path, stat, re, simplejson, struct, shutil
6045)
6046)
6047hunk ./src/allmydata/test/test_storage.py 23
6048 from allmydata.storage.expirer import LeaseCheckingCrawler
6049 from allmydata.immutable.layout import WriteBucketProxy, WriteBucketProxy_v2, \
6050      ReadBucketProxy
6051-from allmydata.interfaces import BadWriteEnablerError
6052-from allmydata.test.common import LoggingServiceParent
6053+from allmydata.mutable.layout import MDMFSlotWriteProxy, MDMFSlotReadProxy, \
6054+                                     LayoutInvalid, MDMFSIGNABLEHEADER, \
6055+                                     SIGNED_PREFIX, MDMFHEADER, \
6056+                                     MDMFOFFSETS, SDMFSlotWriteProxy
6057+from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \
6058+                                 SDMF_VERSION
6059+from allmydata.test.common import LoggingServiceParent, ShouldFailMixin
6060 from allmydata.test.common_web import WebRenderingMixin
6061 from allmydata.web.storage import StorageStatus, remove_prefix
6062 
6063hunk ./src/allmydata/test/test_storage.py 107
6064 
6065 class RemoteBucket:
6066 
6067+    def __init__(self):
6068+        self.read_count = 0
6069+        self.write_count = 0
6070+
6071     def callRemote(self, methname, *args, **kwargs):
6072         def _call():
6073             meth = getattr(self.target, "remote_" + methname)
6074hunk ./src/allmydata/test/test_storage.py 115
6075             return meth(*args, **kwargs)
6076+
6077+        if methname == "slot_readv":
6078+            self.read_count += 1
6079+        if "writev" in methname:
6080+            self.write_count += 1
6081+
6082         return defer.maybeDeferred(_call)
6083 
6084hunk ./src/allmydata/test/test_storage.py 123
6085+
6086 class BucketProxy(unittest.TestCase):
6087     def make_bucket(self, name, size):
6088         basedir = os.path.join("storage", "BucketProxy", name)
6089hunk ./src/allmydata/test/test_storage.py 1306
6090         self.failUnless(os.path.exists(prefixdir), prefixdir)
6091         self.failIf(os.path.exists(bucketdir), bucketdir)
6092 
6093+
6094+class MDMFProxies(unittest.TestCase, ShouldFailMixin):
6095+    def setUp(self):
6096+        self.sparent = LoggingServiceParent()
6097+        self._lease_secret = itertools.count()
6098+        self.ss = self.create("MDMFProxies storage test server")
6099+        self.rref = RemoteBucket()
6100+        self.rref.target = self.ss
6101+        self.secrets = (self.write_enabler("we_secret"),
6102+                        self.renew_secret("renew_secret"),
6103+                        self.cancel_secret("cancel_secret"))
6104+        self.segment = "aaaaaa"
6105+        self.block = "aa"
6106+        self.salt = "a" * 16
6107+        self.block_hash = "a" * 32
6108+        self.block_hash_tree = [self.block_hash for i in xrange(6)]
6109+        self.share_hash = self.block_hash
6110+        self.share_hash_chain = dict([(i, self.share_hash) for i in xrange(6)])
6111+        self.signature = "foobarbaz"
6112+        self.verification_key = "vvvvvv"
6113+        self.encprivkey = "private"
6114+        self.root_hash = self.block_hash
6115+        self.salt_hash = self.root_hash
6116+        self.salt_hash_tree = [self.salt_hash for i in xrange(6)]
6117+        self.block_hash_tree_s = self.serialize_blockhashes(self.block_hash_tree)
6118+        self.share_hash_chain_s = self.serialize_sharehashes(self.share_hash_chain)
6119+        # blockhashes and salt hashes are serialized in the same way,
6120+        # only we lop off the first element and store that in the
6121+        # header.
6122+        self.salt_hash_tree_s = self.serialize_blockhashes(self.salt_hash_tree[1:])
6123+
6124+
6125+    def tearDown(self):
6126+        self.sparent.stopService()
6127+        shutil.rmtree(self.workdir("MDMFProxies storage test server"))
6128+
6129+
6130+    def write_enabler(self, we_tag):
6131+        return hashutil.tagged_hash("we_blah", we_tag)
6132+
6133+
6134+    def renew_secret(self, tag):
6135+        return hashutil.tagged_hash("renew_blah", str(tag))
6136+
6137+
6138+    def cancel_secret(self, tag):
6139+        return hashutil.tagged_hash("cancel_blah", str(tag))
6140+
6141+
6142+    def workdir(self, name):
6143+        basedir = os.path.join("storage", "MutableServer", name)
6144+        return basedir
6145+
6146+
6147+    def create(self, name):
6148+        workdir = self.workdir(name)
6149+        ss = StorageServer(workdir, "\x00" * 20)
6150+        ss.setServiceParent(self.sparent)
6151+        return ss
6152+
6153+
6154+    def build_test_mdmf_share(self, tail_segment=False, empty=False):
6155+        # Start with the checkstring
6156+        data = struct.pack(">BQ32s",
6157+                           1,
6158+                           0,
6159+                           self.root_hash)
6160+        self.checkstring = data
6161+        # Next, the encoding parameters
6162+        if tail_segment:
6163+            data += struct.pack(">BBQQ",
6164+                                3,
6165+                                10,
6166+                                6,
6167+                                33)
6168+        elif empty:
6169+            data += struct.pack(">BBQQ",
6170+                                3,
6171+                                10,
6172+                                0,
6173+                                0)
6174+        else:
6175+            data += struct.pack(">BBQQ",
6176+                                3,
6177+                                10,
6178+                                6,
6179+                                36)
6180+        # Now we'll build the offsets.
6181+        sharedata = ""
6182+        if not tail_segment and not empty:
6183+            for i in xrange(6):
6184+                sharedata += self.salt + self.block
6185+        elif tail_segment:
6186+            for i in xrange(5):
6187+                sharedata += self.salt + self.block
6188+            sharedata += self.salt + "a"
6189+
6190+        # The encrypted private key comes after the shares + salts
6191+        offset_size = struct.calcsize(MDMFOFFSETS)
6192+        encrypted_private_key_offset = len(data) + offset_size + len(sharedata)
6193+        # The blockhashes come after the private key
6194+        blockhashes_offset = encrypted_private_key_offset + len(self.encprivkey)
6195+        # The sharehashes come after the salt hashes
6196+        sharehashes_offset = blockhashes_offset + len(self.block_hash_tree_s)
6197+        # The signature comes after the share hash chain
6198+        signature_offset = sharehashes_offset + len(self.share_hash_chain_s)
6199+        # The verification key comes after the signature
6200+        verification_offset = signature_offset + len(self.signature)
6201+        # The EOF comes after the verification key
6202+        eof_offset = verification_offset + len(self.verification_key)
6203+        data += struct.pack(MDMFOFFSETS,
6204+                            encrypted_private_key_offset,
6205+                            blockhashes_offset,
6206+                            sharehashes_offset,
6207+                            signature_offset,
6208+                            verification_offset,
6209+                            eof_offset)
6210+        self.offsets = {}
6211+        self.offsets['enc_privkey'] = encrypted_private_key_offset
6212+        self.offsets['block_hash_tree'] = blockhashes_offset
6213+        self.offsets['share_hash_chain'] = sharehashes_offset
6214+        self.offsets['signature'] = signature_offset
6215+        self.offsets['verification_key'] = verification_offset
6216+        self.offsets['EOF'] = eof_offset
6217+        # Next, we'll add in the salts and share data,
6218+        data += sharedata
6219+        # the private key,
6220+        data += self.encprivkey
6221+        # the block hash tree,
6222+        data += self.block_hash_tree_s
6223+        # the share hash chain,
6224+        data += self.share_hash_chain_s
6225+        # the signature,
6226+        data += self.signature
6227+        # and the verification key
6228+        data += self.verification_key
6229+        return data
6230+
6231+
6232+    def write_test_share_to_server(self,
6233+                                   storage_index,
6234+                                   tail_segment=False,
6235+                                   empty=False):
6236+        """
6237+        I write some data for the read tests to read to self.ss
6238+
6239+        If tail_segment=True, then I will write a share that has a
6240+        smaller tail segment than other segments.
6241+        """
6242+        write = self.ss.remote_slot_testv_and_readv_and_writev
6243+        data = self.build_test_mdmf_share(tail_segment, empty)
6244+        # Finally, we write the whole thing to the storage server in one
6245+        # pass.
6246+        testvs = [(0, 1, "eq", "")]
6247+        tws = {}
6248+        tws[0] = (testvs, [(0, data)], None)
6249+        readv = [(0, 1)]
6250+        results = write(storage_index, self.secrets, tws, readv)
6251+        self.failUnless(results[0])
6252+
6253+
6254+    def build_test_sdmf_share(self, empty=False):
6255+        if empty:
6256+            sharedata = ""
6257+        else:
6258+            sharedata = self.segment * 6
6259+        self.sharedata = sharedata
6260+        blocksize = len(sharedata) / 3
6261+        block = sharedata[:blocksize]
6262+        self.blockdata = block
6263+        prefix = struct.pack(">BQ32s16s BBQQ",
6264+                             0, # version,
6265+                             0,
6266+                             self.root_hash,
6267+                             self.salt,
6268+                             3,
6269+                             10,
6270+                             len(sharedata),
6271+                             len(sharedata),
6272+                            )
6273+        post_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ")
6274+        signature_offset = post_offset + len(self.verification_key)
6275+        sharehashes_offset = signature_offset + len(self.signature)
6276+        blockhashes_offset = sharehashes_offset + len(self.share_hash_chain_s)
6277+        sharedata_offset = blockhashes_offset + len(self.block_hash_tree_s)
6278+        encprivkey_offset = sharedata_offset + len(block)
6279+        eof_offset = encprivkey_offset + len(self.encprivkey)
6280+        offsets = struct.pack(">LLLLQQ",
6281+                              signature_offset,
6282+                              sharehashes_offset,
6283+                              blockhashes_offset,
6284+                              sharedata_offset,
6285+                              encprivkey_offset,
6286+                              eof_offset)
6287+        final_share = "".join([prefix,
6288+                           offsets,
6289+                           self.verification_key,
6290+                           self.signature,
6291+                           self.share_hash_chain_s,
6292+                           self.block_hash_tree_s,
6293+                           block,
6294+                           self.encprivkey])
6295+        self.offsets = {}
6296+        self.offsets['signature'] = signature_offset
6297+        self.offsets['share_hash_chain'] = sharehashes_offset
6298+        self.offsets['block_hash_tree'] = blockhashes_offset
6299+        self.offsets['share_data'] = sharedata_offset
6300+        self.offsets['enc_privkey'] = encprivkey_offset
6301+        self.offsets['EOF'] = eof_offset
6302+        return final_share
6303+
6304+
6305+    def write_sdmf_share_to_server(self,
6306+                                   storage_index,
6307+                                   empty=False):
6308+        # Some tests need SDMF shares to verify that we can still
6309+        # read them. This method writes one, which resembles but is not
6310+        assert self.rref
6311+        write = self.ss.remote_slot_testv_and_readv_and_writev
6312+        share = self.build_test_sdmf_share(empty)
6313+        testvs = [(0, 1, "eq", "")]
6314+        tws = {}
6315+        tws[0] = (testvs, [(0, share)], None)
6316+        readv = []
6317+        results = write(storage_index, self.secrets, tws, readv)
6318+        self.failUnless(results[0])
6319+
6320+
6321+    def test_read(self):
6322+        self.write_test_share_to_server("si1")
6323+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6324+        # Check that every method equals what we expect it to.
6325+        d = defer.succeed(None)
6326+        def _check_block_and_salt((block, salt)):
6327+            self.failUnlessEqual(block, self.block)
6328+            self.failUnlessEqual(salt, self.salt)
6329+
6330+        for i in xrange(6):
6331+            d.addCallback(lambda ignored, i=i:
6332+                mr.get_block_and_salt(i))
6333+            d.addCallback(_check_block_and_salt)
6334+
6335+        d.addCallback(lambda ignored:
6336+            mr.get_encprivkey())
6337+        d.addCallback(lambda encprivkey:
6338+            self.failUnlessEqual(self.encprivkey, encprivkey))
6339+
6340+        d.addCallback(lambda ignored:
6341+            mr.get_blockhashes())
6342+        d.addCallback(lambda blockhashes:
6343+            self.failUnlessEqual(self.block_hash_tree, blockhashes))
6344+
6345+        d.addCallback(lambda ignored:
6346+            mr.get_sharehashes())
6347+        d.addCallback(lambda sharehashes:
6348+            self.failUnlessEqual(self.share_hash_chain, sharehashes))
6349+
6350+        d.addCallback(lambda ignored:
6351+            mr.get_signature())
6352+        d.addCallback(lambda signature:
6353+            self.failUnlessEqual(signature, self.signature))
6354+
6355+        d.addCallback(lambda ignored:
6356+            mr.get_verification_key())
6357+        d.addCallback(lambda verification_key:
6358+            self.failUnlessEqual(verification_key, self.verification_key))
6359+
6360+        d.addCallback(lambda ignored:
6361+            mr.get_seqnum())
6362+        d.addCallback(lambda seqnum:
6363+            self.failUnlessEqual(seqnum, 0))
6364+
6365+        d.addCallback(lambda ignored:
6366+            mr.get_root_hash())
6367+        d.addCallback(lambda root_hash:
6368+            self.failUnlessEqual(self.root_hash, root_hash))
6369+
6370+        d.addCallback(lambda ignored:
6371+            mr.get_seqnum())
6372+        d.addCallback(lambda seqnum:
6373+            self.failUnlessEqual(0, seqnum))
6374+
6375+        d.addCallback(lambda ignored:
6376+            mr.get_encoding_parameters())
6377+        def _check_encoding_parameters((k, n, segsize, datalen)):
6378+            self.failUnlessEqual(k, 3)
6379+            self.failUnlessEqual(n, 10)
6380+            self.failUnlessEqual(segsize, 6)
6381+            self.failUnlessEqual(datalen, 36)
6382+        d.addCallback(_check_encoding_parameters)
6383+
6384+        d.addCallback(lambda ignored:
6385+            mr.get_checkstring())
6386+        d.addCallback(lambda checkstring:
6387+            self.failUnlessEqual(checkstring, checkstring))
6388+        return d
6389+
6390+
6391+    def test_read_with_different_tail_segment_size(self):
6392+        self.write_test_share_to_server("si1", tail_segment=True)
6393+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6394+        d = mr.get_block_and_salt(5)
6395+        def _check_tail_segment(results):
6396+            block, salt = results
6397+            self.failUnlessEqual(len(block), 1)
6398+            self.failUnlessEqual(block, "a")
6399+        d.addCallback(_check_tail_segment)
6400+        return d
6401+
6402+
6403+    def test_get_block_with_invalid_segnum(self):
6404+        self.write_test_share_to_server("si1")
6405+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6406+        d = defer.succeed(None)
6407+        d.addCallback(lambda ignored:
6408+            self.shouldFail(LayoutInvalid, "test invalid segnum",
6409+                            None,
6410+                            mr.get_block_and_salt, 7))
6411+        return d
6412+
6413+
6414+    def test_get_encoding_parameters_first(self):
6415+        self.write_test_share_to_server("si1")
6416+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6417+        d = mr.get_encoding_parameters()
6418+        def _check_encoding_parameters((k, n, segment_size, datalen)):
6419+            self.failUnlessEqual(k, 3)
6420+            self.failUnlessEqual(n, 10)
6421+            self.failUnlessEqual(segment_size, 6)
6422+            self.failUnlessEqual(datalen, 36)
6423+        d.addCallback(_check_encoding_parameters)
6424+        return d
6425+
6426+
6427+    def test_get_seqnum_first(self):
6428+        self.write_test_share_to_server("si1")
6429+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6430+        d = mr.get_seqnum()
6431+        d.addCallback(lambda seqnum:
6432+            self.failUnlessEqual(seqnum, 0))
6433+        return d
6434+
6435+
6436+    def test_get_root_hash_first(self):
6437+        self.write_test_share_to_server("si1")
6438+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6439+        d = mr.get_root_hash()
6440+        d.addCallback(lambda root_hash:
6441+            self.failUnlessEqual(root_hash, self.root_hash))
6442+        return d
6443+
6444+
6445+    def test_get_checkstring_first(self):
6446+        self.write_test_share_to_server("si1")
6447+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
6448+        d = mr.get_checkstring()
6449+        d.addCallback(lambda checkstring:
6450+            self.failUnlessEqual(checkstring, self.checkstring))
6451+        return d
6452+
6453+
6454+    def test_write_read_vectors(self):
6455+        # When writing for us, the storage server will return to us a
6456+        # read vector, along with its result. If a write fails because
6457+        # the test vectors failed, this read vector can help us to
6458+        # diagnose the problem. This test ensures that the read vector
6459+        # is working appropriately.
6460+        mw = self._make_new_mw("si1", 0)
6461+
6462+        for i in xrange(6):
6463+            mw.put_block(self.block, i, self.salt)
6464+        mw.put_encprivkey(self.encprivkey)
6465+        mw.put_blockhashes(self.block_hash_tree)
6466+        mw.put_sharehashes(self.share_hash_chain)
6467+        mw.put_root_hash(self.root_hash)
6468+        mw.put_signature(self.signature)
6469+        mw.put_verification_key(self.verification_key)
6470+        d = mw.finish_publishing()
6471+        def _then(results):
6472+            self.failUnless(len(results), 2)
6473+            result, readv = results
6474+            self.failUnless(result)
6475+            self.failIf(readv)
6476+            self.old_checkstring = mw.get_checkstring()
6477+            mw.set_checkstring("")
6478+        d.addCallback(_then)
6479+        d.addCallback(lambda ignored:
6480+            mw.finish_publishing())
6481+        def _then_again(results):
6482+            self.failUnlessEqual(len(results), 2)
6483+            result, readvs = results
6484+            self.failIf(result)
6485+            self.failUnlessIn(0, readvs)
6486+            readv = readvs[0][0]
6487+            self.failUnlessEqual(readv, self.old_checkstring)
6488+        d.addCallback(_then_again)
6489+        # The checkstring remains the same for the rest of the process.
6490+        return d
6491+
6492+
6493+    def test_blockhashes_after_share_hash_chain(self):
6494+        mw = self._make_new_mw("si1", 0)
6495+        d = defer.succeed(None)
6496+        # Put everything up to and including the share hash chain
6497+        for i in xrange(6):
6498+            d.addCallback(lambda ignored, i=i:
6499+                mw.put_block(self.block, i, self.salt))
6500+        d.addCallback(lambda ignored:
6501+            mw.put_encprivkey(self.encprivkey))
6502+        d.addCallback(lambda ignored:
6503+            mw.put_blockhashes(self.block_hash_tree))
6504+        d.addCallback(lambda ignored:
6505+            mw.put_sharehashes(self.share_hash_chain))
6506+
6507+        # Now try to put the block hash tree again.
6508+        d.addCallback(lambda ignored:
6509+            self.shouldFail(LayoutInvalid, "test repeat salthashes",
6510+                            None,
6511+                            mw.put_blockhashes, self.block_hash_tree))
6512+        return d
6513+
6514+
6515+    def test_encprivkey_after_blockhashes(self):
6516+        mw = self._make_new_mw("si1", 0)
6517+        d = defer.succeed(None)
6518+        # Put everything up to and including the block hash tree
6519+        for i in xrange(6):
6520+            d.addCallback(lambda ignored, i=i:
6521+                mw.put_block(self.block, i, self.salt))
6522+        d.addCallback(lambda ignored:
6523+            mw.put_encprivkey(self.encprivkey))
6524+        d.addCallback(lambda ignored:
6525+            mw.put_blockhashes(self.block_hash_tree))
6526+        d.addCallback(lambda ignored:
6527+            self.shouldFail(LayoutInvalid, "out of order private key",
6528+                            None,
6529+                            mw.put_encprivkey, self.encprivkey))
6530+        return d
6531+
6532+
6533+    def test_share_hash_chain_after_signature(self):
6534+        mw = self._make_new_mw("si1", 0)
6535+        d = defer.succeed(None)
6536+        # Put everything up to and including the signature
6537+        for i in xrange(6):
6538+            d.addCallback(lambda ignored, i=i:
6539+                mw.put_block(self.block, i, self.salt))
6540+        d.addCallback(lambda ignored:
6541+            mw.put_encprivkey(self.encprivkey))
6542+        d.addCallback(lambda ignored:
6543+            mw.put_blockhashes(self.block_hash_tree))
6544+        d.addCallback(lambda ignored:
6545+            mw.put_sharehashes(self.share_hash_chain))
6546+        d.addCallback(lambda ignored:
6547+            mw.put_root_hash(self.root_hash))
6548+        d.addCallback(lambda ignored:
6549+            mw.put_signature(self.signature))
6550+        # Now try to put the share hash chain again. This should fail
6551+        d.addCallback(lambda ignored:
6552+            self.shouldFail(LayoutInvalid, "out of order share hash chain",
6553+                            None,
6554+                            mw.put_sharehashes, self.share_hash_chain))
6555+        return d
6556+
6557+
6558+    def test_signature_after_verification_key(self):
6559+        mw = self._make_new_mw("si1", 0)
6560+        d = defer.succeed(None)
6561+        # Put everything up to and including the verification key.
6562+        for i in xrange(6):
6563+            d.addCallback(lambda ignored, i=i:
6564+                mw.put_block(self.block, i, self.salt))
6565+        d.addCallback(lambda ignored:
6566+            mw.put_encprivkey(self.encprivkey))
6567+        d.addCallback(lambda ignored:
6568+            mw.put_blockhashes(self.block_hash_tree))
6569+        d.addCallback(lambda ignored:
6570+            mw.put_sharehashes(self.share_hash_chain))
6571+        d.addCallback(lambda ignored:
6572+            mw.put_root_hash(self.root_hash))
6573+        d.addCallback(lambda ignored:
6574+            mw.put_signature(self.signature))
6575+        d.addCallback(lambda ignored:
6576+            mw.put_verification_key(self.verification_key))
6577+        # Now try to put the signature again. This should fail
6578+        d.addCallback(lambda ignored:
6579+            self.shouldFail(LayoutInvalid, "signature after verification",
6580+                            None,
6581+                            mw.put_signature, self.signature))
6582+        return d
6583+
6584+
6585+    def test_uncoordinated_write(self):
6586+        # Make two mutable writers, both pointing to the same storage
6587+        # server, both at the same storage index, and try writing to the
6588+        # same share.
6589+        mw1 = self._make_new_mw("si1", 0)
6590+        mw2 = self._make_new_mw("si1", 0)
6591+
6592+        def _check_success(results):
6593+            result, readvs = results
6594+            self.failUnless(result)
6595+
6596+        def _check_failure(results):
6597+            result, readvs = results
6598+            self.failIf(result)
6599+
6600+        def _write_share(mw):
6601+            for i in xrange(6):
6602+                mw.put_block(self.block, i, self.salt)
6603+            mw.put_encprivkey(self.encprivkey)
6604+            mw.put_blockhashes(self.block_hash_tree)
6605+            mw.put_sharehashes(self.share_hash_chain)
6606+            mw.put_root_hash(self.root_hash)
6607+            mw.put_signature(self.signature)
6608+            mw.put_verification_key(self.verification_key)
6609+            return mw.finish_publishing()
6610+        d = _write_share(mw1)
6611+        d.addCallback(_check_success)
6612+        d.addCallback(lambda ignored:
6613+            _write_share(mw2))
6614+        d.addCallback(_check_failure)
6615+        return d
6616+
6617+
6618+    def test_invalid_salt_size(self):
6619+        # Salts need to be 16 bytes in size. Writes that attempt to
6620+        # write more or less than this should be rejected.
6621+        mw = self._make_new_mw("si1", 0)
6622+        invalid_salt = "a" * 17 # 17 bytes
6623+        another_invalid_salt = "b" * 15 # 15 bytes
6624+        d = defer.succeed(None)
6625+        d.addCallback(lambda ignored:
6626+            self.shouldFail(LayoutInvalid, "salt too big",
6627+                            None,
6628+                            mw.put_block, self.block, 0, invalid_salt))
6629+        d.addCallback(lambda ignored:
6630+            self.shouldFail(LayoutInvalid, "salt too small",
6631+                            None,
6632+                            mw.put_block, self.block, 0,
6633+                            another_invalid_salt))
6634+        return d
6635+
6636+
6637+    def test_write_test_vectors(self):
6638+        # If we give the write proxy a bogus test vector at
6639+        # any point during the process, it should fail to write when we
6640+        # tell it to write.
6641+        def _check_failure(results):
6642+            self.failUnlessEqual(len(results), 2)
6643+            res, d = results
6644+            self.failIf(res)
6645+
6646+        def _check_success(results):
6647+            self.failUnlessEqual(len(results), 2)
6648+            res, d = results
6649+            self.failUnless(results)
6650+
6651+        mw = self._make_new_mw("si1", 0)
6652+        mw.set_checkstring("this is a lie")
6653+        for i in xrange(6):
6654+            mw.put_block(self.block, i, self.salt)
6655+        mw.put_encprivkey(self.encprivkey)
6656+        mw.put_blockhashes(self.block_hash_tree)
6657+        mw.put_sharehashes(self.share_hash_chain)
6658+        mw.put_root_hash(self.root_hash)
6659+        mw.put_signature(self.signature)
6660+        mw.put_verification_key(self.verification_key)
6661+        d = mw.finish_publishing()
6662+        d.addCallback(_check_failure)
6663+        d.addCallback(lambda ignored:
6664+            mw.set_checkstring(""))
6665+        d.addCallback(lambda ignored:
6666+            mw.finish_publishing())
6667+        d.addCallback(_check_success)
6668+        return d
6669+
6670+
6671+    def serialize_blockhashes(self, blockhashes):
6672+        return "".join(blockhashes)
6673+
6674+
6675+    def serialize_sharehashes(self, sharehashes):
6676+        ret = "".join([struct.pack(">H32s", i, sharehashes[i])
6677+                        for i in sorted(sharehashes.keys())])
6678+        return ret
6679+
6680+
6681+    def test_write(self):
6682+        # This translates to a file with 6 6-byte segments, and with 2-byte
6683+        # blocks.
6684+        mw = self._make_new_mw("si1", 0)
6685+        # Test writing some blocks.
6686+        read = self.ss.remote_slot_readv
6687+        expected_sharedata_offset = struct.calcsize(MDMFHEADER)
6688+        written_block_size = 2 + len(self.salt)
6689+        written_block = self.block + self.salt
6690+        for i in xrange(6):
6691+            mw.put_block(self.block, i, self.salt)
6692+
6693+        mw.put_encprivkey(self.encprivkey)
6694+        mw.put_blockhashes(self.block_hash_tree)
6695+        mw.put_sharehashes(self.share_hash_chain)
6696+        mw.put_root_hash(self.root_hash)
6697+        mw.put_signature(self.signature)
6698+        mw.put_verification_key(self.verification_key)
6699+        d = mw.finish_publishing()
6700+        def _check_publish(results):
6701+            self.failUnlessEqual(len(results), 2)
6702+            result, ign = results
6703+            self.failUnless(result, "publish failed")
6704+            for i in xrange(6):
6705+                self.failUnlessEqual(read("si1", [0], [(expected_sharedata_offset + (i * written_block_size), written_block_size)]),
6706+                                {0: [written_block]})
6707+
6708+            expected_private_key_offset = expected_sharedata_offset + \
6709+                                      len(written_block) * 6
6710+            self.failUnlessEqual(len(self.encprivkey), 7)
6711+            self.failUnlessEqual(read("si1", [0], [(expected_private_key_offset, 7)]),
6712+                                 {0: [self.encprivkey]})
6713+
6714+            expected_block_hash_offset = expected_private_key_offset + len(self.encprivkey)
6715+            self.failUnlessEqual(len(self.block_hash_tree_s), 32 * 6)
6716+            self.failUnlessEqual(read("si1", [0], [(expected_block_hash_offset, 32 * 6)]),
6717+                                 {0: [self.block_hash_tree_s]})
6718+
6719+            expected_share_hash_offset = expected_block_hash_offset + len(self.block_hash_tree_s)
6720+            self.failUnlessEqual(read("si1", [0],[(expected_share_hash_offset, (32 + 2) * 6)]),
6721+                                 {0: [self.share_hash_chain_s]})
6722+
6723+            self.failUnlessEqual(read("si1", [0], [(9, 32)]),
6724+                                 {0: [self.root_hash]})
6725+            expected_signature_offset = expected_share_hash_offset + len(self.share_hash_chain_s)
6726+            self.failUnlessEqual(len(self.signature), 9)
6727+            self.failUnlessEqual(read("si1", [0], [(expected_signature_offset, 9)]),
6728+                                 {0: [self.signature]})
6729+
6730+            expected_verification_key_offset = expected_signature_offset + len(self.signature)
6731+            self.failUnlessEqual(len(self.verification_key), 6)
6732+            self.failUnlessEqual(read("si1", [0], [(expected_verification_key_offset, 6)]),
6733+                                 {0: [self.verification_key]})
6734+
6735+            signable = mw.get_signable()
6736+            verno, seq, roothash, k, n, segsize, datalen = \
6737+                                            struct.unpack(">BQ32sBBQQ",
6738+                                                          signable)
6739+            self.failUnlessEqual(verno, 1)
6740+            self.failUnlessEqual(seq, 0)
6741+            self.failUnlessEqual(roothash, self.root_hash)
6742+            self.failUnlessEqual(k, 3)
6743+            self.failUnlessEqual(n, 10)
6744+            self.failUnlessEqual(segsize, 6)
6745+            self.failUnlessEqual(datalen, 36)
6746+            expected_eof_offset = expected_verification_key_offset + len(self.verification_key)
6747+
6748+            # Check the version number to make sure that it is correct.
6749+            expected_version_number = struct.pack(">B", 1)
6750+            self.failUnlessEqual(read("si1", [0], [(0, 1)]),
6751+                                 {0: [expected_version_number]})
6752+            # Check the sequence number to make sure that it is correct
6753+            expected_sequence_number = struct.pack(">Q", 0)
6754+            self.failUnlessEqual(read("si1", [0], [(1, 8)]),
6755+                                 {0: [expected_sequence_number]})
6756+            # Check that the encoding parameters (k, N, segement size, data
6757+            # length) are what they should be. These are  3, 10, 6, 36
6758+            expected_k = struct.pack(">B", 3)
6759+            self.failUnlessEqual(read("si1", [0], [(41, 1)]),
6760+                                 {0: [expected_k]})
6761+            expected_n = struct.pack(">B", 10)
6762+            self.failUnlessEqual(read("si1", [0], [(42, 1)]),
6763+                                 {0: [expected_n]})
6764+            expected_segment_size = struct.pack(">Q", 6)
6765+            self.failUnlessEqual(read("si1", [0], [(43, 8)]),
6766+                                 {0: [expected_segment_size]})
6767+            expected_data_length = struct.pack(">Q", 36)
6768+            self.failUnlessEqual(read("si1", [0], [(51, 8)]),
6769+                                 {0: [expected_data_length]})
6770+            expected_offset = struct.pack(">Q", expected_private_key_offset)
6771+            self.failUnlessEqual(read("si1", [0], [(59, 8)]),
6772+                                 {0: [expected_offset]})
6773+            expected_offset = struct.pack(">Q", expected_block_hash_offset)
6774+            self.failUnlessEqual(read("si1", [0], [(67, 8)]),
6775+                                 {0: [expected_offset]})
6776+            expected_offset = struct.pack(">Q", expected_share_hash_offset)
6777+            self.failUnlessEqual(read("si1", [0], [(75, 8)]),
6778+                                 {0: [expected_offset]})
6779+            expected_offset = struct.pack(">Q", expected_signature_offset)
6780+            self.failUnlessEqual(read("si1", [0], [(83, 8)]),
6781+                                 {0: [expected_offset]})
6782+            expected_offset = struct.pack(">Q", expected_verification_key_offset)
6783+            self.failUnlessEqual(read("si1", [0], [(91, 8)]),
6784+                                 {0: [expected_offset]})
6785+            expected_offset = struct.pack(">Q", expected_eof_offset)
6786+            self.failUnlessEqual(read("si1", [0], [(99, 8)]),
6787+                                 {0: [expected_offset]})
6788+        d.addCallback(_check_publish)
6789+        return d
6790+
6791+    def _make_new_mw(self, si, share, datalength=36):
6792+        # This is a file of size 36 bytes. Since it has a segment
6793+        # size of 6, we know that it has 6 byte segments, which will
6794+        # be split into blocks of 2 bytes because our FEC k
6795+        # parameter is 3.
6796+        mw = MDMFSlotWriteProxy(share, self.rref, si, self.secrets, 0, 3, 10,
6797+                                6, datalength)
6798+        return mw
6799+
6800+
6801+    def test_write_rejected_with_too_many_blocks(self):
6802+        mw = self._make_new_mw("si0", 0)
6803+
6804+        # Try writing too many blocks. We should not be able to write
6805+        # more than 6
6806+        # blocks into each share.
6807+        d = defer.succeed(None)
6808+        for i in xrange(6):
6809+            d.addCallback(lambda ignored, i=i:
6810+                mw.put_block(self.block, i, self.salt))
6811+        d.addCallback(lambda ignored:
6812+            self.shouldFail(LayoutInvalid, "too many blocks",
6813+                            None,
6814+                            mw.put_block, self.block, 7, self.salt))
6815+        return d
6816+
6817+
6818+    def test_write_rejected_with_invalid_salt(self):
6819+        # Try writing an invalid salt. Salts are 16 bytes -- any more or
6820+        # less should cause an error.
6821+        mw = self._make_new_mw("si1", 0)
6822+        bad_salt = "a" * 17 # 17 bytes
6823+        d = defer.succeed(None)
6824+        d.addCallback(lambda ignored:
6825+            self.shouldFail(LayoutInvalid, "test_invalid_salt",
6826+                            None, mw.put_block, self.block, 7, bad_salt))
6827+        return d
6828+
6829+
6830+    def test_write_rejected_with_invalid_root_hash(self):
6831+        # Try writing an invalid root hash. This should be SHA256d, and
6832+        # 32 bytes long as a result.
6833+        mw = self._make_new_mw("si2", 0)
6834+        # 17 bytes != 32 bytes
6835+        invalid_root_hash = "a" * 17
6836+        d = defer.succeed(None)
6837+        # Before this test can work, we need to put some blocks + salts,
6838+        # a block hash tree, and a share hash tree. Otherwise, we'll see
6839+        # failures that match what we are looking for, but are caused by
6840+        # the constraints imposed on operation ordering.
6841+        for i in xrange(6):
6842+            d.addCallback(lambda ignored, i=i:
6843+                mw.put_block(self.block, i, self.salt))
6844+        d.addCallback(lambda ignored:
6845+            mw.put_encprivkey(self.encprivkey))
6846+        d.addCallback(lambda ignored:
6847+            mw.put_blockhashes(self.block_hash_tree))
6848+        d.addCallback(lambda ignored:
6849+            mw.put_sharehashes(self.share_hash_chain))
6850+        d.addCallback(lambda ignored:
6851+            self.shouldFail(LayoutInvalid, "invalid root hash",
6852+                            None, mw.put_root_hash, invalid_root_hash))
6853+        return d
6854+
6855+
6856+    def test_write_rejected_with_invalid_blocksize(self):
6857+        # The blocksize implied by the writer that we get from
6858+        # _make_new_mw is 2bytes -- any more or any less than this
6859+        # should be cause for failure, unless it is the tail segment, in
6860+        # which case it may not be failure.
6861+        invalid_block = "a"
6862+        mw = self._make_new_mw("si3", 0, 33) # implies a tail segment with
6863+                                             # one byte blocks
6864+        # 1 bytes != 2 bytes
6865+        d = defer.succeed(None)
6866+        d.addCallback(lambda ignored, invalid_block=invalid_block:
6867+            self.shouldFail(LayoutInvalid, "test blocksize too small",
6868+                            None, mw.put_block, invalid_block, 0,
6869+                            self.salt))
6870+        invalid_block = invalid_block * 3
6871+        # 3 bytes != 2 bytes
6872+        d.addCallback(lambda ignored:
6873+            self.shouldFail(LayoutInvalid, "test blocksize too large",
6874+                            None,
6875+                            mw.put_block, invalid_block, 0, self.salt))
6876+        for i in xrange(5):
6877+            d.addCallback(lambda ignored, i=i:
6878+                mw.put_block(self.block, i, self.salt))
6879+        # Try to put an invalid tail segment
6880+        d.addCallback(lambda ignored:
6881+            self.shouldFail(LayoutInvalid, "test invalid tail segment",
6882+                            None,
6883+                            mw.put_block, self.block, 5, self.salt))
6884+        valid_block = "a"
6885+        d.addCallback(lambda ignored:
6886+            mw.put_block(valid_block, 5, self.salt))
6887+        return d
6888+
6889+
6890+    def test_write_enforces_order_constraints(self):
6891+        # We require that the MDMFSlotWriteProxy be interacted with in a
6892+        # specific way.
6893+        # That way is:
6894+        # 0: __init__
6895+        # 1: write blocks and salts
6896+        # 2: Write the encrypted private key
6897+        # 3: Write the block hashes
6898+        # 4: Write the share hashes
6899+        # 5: Write the root hash and salt hash
6900+        # 6: Write the signature and verification key
6901+        # 7: Write the file.
6902+        #
6903+        # Some of these can be performed out-of-order, and some can't.
6904+        # The dependencies that I want to test here are:
6905+        #  - Private key before block hashes
6906+        #  - share hashes and block hashes before root hash
6907+        #  - root hash before signature
6908+        #  - signature before verification key
6909+        mw0 = self._make_new_mw("si0", 0)
6910+        # Write some shares
6911+        d = defer.succeed(None)
6912+        for i in xrange(6):
6913+            d.addCallback(lambda ignored, i=i:
6914+                mw0.put_block(self.block, i, self.salt))
6915+        # Try to write the block hashes before writing the encrypted
6916+        # private key
6917+        d.addCallback(lambda ignored:
6918+            self.shouldFail(LayoutInvalid, "block hashes before key",
6919+                            None, mw0.put_blockhashes,
6920+                            self.block_hash_tree))
6921+
6922+        # Write the private key.
6923+        d.addCallback(lambda ignored:
6924+            mw0.put_encprivkey(self.encprivkey))
6925+
6926+
6927+        # Try to write the share hash chain without writing the block
6928+        # hash tree
6929+        d.addCallback(lambda ignored:
6930+            self.shouldFail(LayoutInvalid, "share hash chain before "
6931+                                           "salt hash tree",
6932+                            None,
6933+                            mw0.put_sharehashes, self.share_hash_chain))
6934+
6935+        # Try to write the root hash and without writing either the
6936+        # block hashes or the or the share hashes
6937+        d.addCallback(lambda ignored:
6938+            self.shouldFail(LayoutInvalid, "root hash before share hashes",
6939+                            None,
6940+                            mw0.put_root_hash, self.root_hash))
6941+
6942+        # Now write the block hashes and try again
6943+        d.addCallback(lambda ignored:
6944+            mw0.put_blockhashes(self.block_hash_tree))
6945+
6946+        d.addCallback(lambda ignored:
6947+            self.shouldFail(LayoutInvalid, "root hash before share hashes",
6948+                            None, mw0.put_root_hash, self.root_hash))
6949+
6950+        # We haven't yet put the root hash on the share, so we shouldn't
6951+        # be able to sign it.
6952+        d.addCallback(lambda ignored:
6953+            self.shouldFail(LayoutInvalid, "signature before root hash",
6954+                            None, mw0.put_signature, self.signature))
6955+
6956+        d.addCallback(lambda ignored:
6957+            self.failUnlessRaises(LayoutInvalid, mw0.get_signable))
6958+
6959+        # ..and, since that fails, we also shouldn't be able to put the
6960+        # verification key.
6961+        d.addCallback(lambda ignored:
6962+            self.shouldFail(LayoutInvalid, "key before signature",
6963+                            None, mw0.put_verification_key,
6964+                            self.verification_key))
6965+
6966+        # Now write the share hashes.
6967+        d.addCallback(lambda ignored:
6968+            mw0.put_sharehashes(self.share_hash_chain))
6969+        # We should be able to write the root hash now too
6970+        d.addCallback(lambda ignored:
6971+            mw0.put_root_hash(self.root_hash))
6972+
6973+        # We should still be unable to put the verification key
6974+        d.addCallback(lambda ignored:
6975+            self.shouldFail(LayoutInvalid, "key before signature",
6976+                            None, mw0.put_verification_key,
6977+                            self.verification_key))
6978+
6979+        d.addCallback(lambda ignored:
6980+            mw0.put_signature(self.signature))
6981+
6982+        # We shouldn't be able to write the offsets to the remote server
6983+        # until the offset table is finished; IOW, until we have written
6984+        # the verification key.
6985+        d.addCallback(lambda ignored:
6986+            self.shouldFail(LayoutInvalid, "offsets before verification key",
6987+                            None,
6988+                            mw0.finish_publishing))
6989+
6990+        d.addCallback(lambda ignored:
6991+            mw0.put_verification_key(self.verification_key))
6992+        return d
6993+
6994+
6995+    def test_end_to_end(self):
6996+        mw = self._make_new_mw("si1", 0)
6997+        # Write a share using the mutable writer, and make sure that the
6998+        # reader knows how to read everything back to us.
6999+        d = defer.succeed(None)
7000+        for i in xrange(6):
7001+            d.addCallback(lambda ignored, i=i:
7002+                mw.put_block(self.block, i, self.salt))
7003+        d.addCallback(lambda ignored:
7004+            mw.put_encprivkey(self.encprivkey))
7005+        d.addCallback(lambda ignored:
7006+            mw.put_blockhashes(self.block_hash_tree))
7007+        d.addCallback(lambda ignored:
7008+            mw.put_sharehashes(self.share_hash_chain))
7009+        d.addCallback(lambda ignored:
7010+            mw.put_root_hash(self.root_hash))
7011+        d.addCallback(lambda ignored:
7012+            mw.put_signature(self.signature))
7013+        d.addCallback(lambda ignored:
7014+            mw.put_verification_key(self.verification_key))
7015+        d.addCallback(lambda ignored:
7016+            mw.finish_publishing())
7017+
7018+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7019+        def _check_block_and_salt((block, salt)):
7020+            self.failUnlessEqual(block, self.block)
7021+            self.failUnlessEqual(salt, self.salt)
7022+
7023+        for i in xrange(6):
7024+            d.addCallback(lambda ignored, i=i:
7025+                mr.get_block_and_salt(i))
7026+            d.addCallback(_check_block_and_salt)
7027+
7028+        d.addCallback(lambda ignored:
7029+            mr.get_encprivkey())
7030+        d.addCallback(lambda encprivkey:
7031+            self.failUnlessEqual(self.encprivkey, encprivkey))
7032+
7033+        d.addCallback(lambda ignored:
7034+            mr.get_blockhashes())
7035+        d.addCallback(lambda blockhashes:
7036+            self.failUnlessEqual(self.block_hash_tree, blockhashes))
7037+
7038+        d.addCallback(lambda ignored:
7039+            mr.get_sharehashes())
7040+        d.addCallback(lambda sharehashes:
7041+            self.failUnlessEqual(self.share_hash_chain, sharehashes))
7042+
7043+        d.addCallback(lambda ignored:
7044+            mr.get_signature())
7045+        d.addCallback(lambda signature:
7046+            self.failUnlessEqual(signature, self.signature))
7047+
7048+        d.addCallback(lambda ignored:
7049+            mr.get_verification_key())
7050+        d.addCallback(lambda verification_key:
7051+            self.failUnlessEqual(verification_key, self.verification_key))
7052+
7053+        d.addCallback(lambda ignored:
7054+            mr.get_seqnum())
7055+        d.addCallback(lambda seqnum:
7056+            self.failUnlessEqual(seqnum, 0))
7057+
7058+        d.addCallback(lambda ignored:
7059+            mr.get_root_hash())
7060+        d.addCallback(lambda root_hash:
7061+            self.failUnlessEqual(self.root_hash, root_hash))
7062+
7063+        d.addCallback(lambda ignored:
7064+            mr.get_encoding_parameters())
7065+        def _check_encoding_parameters((k, n, segsize, datalen)):
7066+            self.failUnlessEqual(k, 3)
7067+            self.failUnlessEqual(n, 10)
7068+            self.failUnlessEqual(segsize, 6)
7069+            self.failUnlessEqual(datalen, 36)
7070+        d.addCallback(_check_encoding_parameters)
7071+
7072+        d.addCallback(lambda ignored:
7073+            mr.get_checkstring())
7074+        d.addCallback(lambda checkstring:
7075+            self.failUnlessEqual(checkstring, mw.get_checkstring()))
7076+        return d
7077+
7078+
7079+    def test_is_sdmf(self):
7080+        # The MDMFSlotReadProxy should also know how to read SDMF files,
7081+        # since it will encounter them on the grid. Callers use the
7082+        # is_sdmf method to test this.
7083+        self.write_sdmf_share_to_server("si1")
7084+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7085+        d = mr.is_sdmf()
7086+        d.addCallback(lambda issdmf:
7087+            self.failUnless(issdmf))
7088+        return d
7089+
7090+
7091+    def test_reads_sdmf(self):
7092+        # The slot read proxy should, naturally, know how to tell us
7093+        # about data in the SDMF format
7094+        self.write_sdmf_share_to_server("si1")
7095+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7096+        d = defer.succeed(None)
7097+        d.addCallback(lambda ignored:
7098+            mr.is_sdmf())
7099+        d.addCallback(lambda issdmf:
7100+            self.failUnless(issdmf))
7101+
7102+        # What do we need to read?
7103+        #  - The sharedata
7104+        #  - The salt
7105+        d.addCallback(lambda ignored:
7106+            mr.get_block_and_salt(0))
7107+        def _check_block_and_salt(results):
7108+            block, salt = results
7109+            # Our original file is 36 bytes long. Then each share is 12
7110+            # bytes in size. The share is composed entirely of the
7111+            # letter a. self.block contains 2 as, so 6 * self.block is
7112+            # what we are looking for.
7113+            self.failUnlessEqual(block, self.block * 6)
7114+            self.failUnlessEqual(salt, self.salt)
7115+        d.addCallback(_check_block_and_salt)
7116+
7117+        #  - The blockhashes
7118+        d.addCallback(lambda ignored:
7119+            mr.get_blockhashes())
7120+        d.addCallback(lambda blockhashes:
7121+            self.failUnlessEqual(self.block_hash_tree,
7122+                                 blockhashes,
7123+                                 blockhashes))
7124+        #  - The sharehashes
7125+        d.addCallback(lambda ignored:
7126+            mr.get_sharehashes())
7127+        d.addCallback(lambda sharehashes:
7128+            self.failUnlessEqual(self.share_hash_chain,
7129+                                 sharehashes))
7130+        #  - The keys
7131+        d.addCallback(lambda ignored:
7132+            mr.get_encprivkey())
7133+        d.addCallback(lambda encprivkey:
7134+            self.failUnlessEqual(encprivkey, self.encprivkey, encprivkey))
7135+        d.addCallback(lambda ignored:
7136+            mr.get_verification_key())
7137+        d.addCallback(lambda verification_key:
7138+            self.failUnlessEqual(verification_key,
7139+                                 self.verification_key,
7140+                                 verification_key))
7141+        #  - The signature
7142+        d.addCallback(lambda ignored:
7143+            mr.get_signature())
7144+        d.addCallback(lambda signature:
7145+            self.failUnlessEqual(signature, self.signature, signature))
7146+
7147+        #  - The sequence number
7148+        d.addCallback(lambda ignored:
7149+            mr.get_seqnum())
7150+        d.addCallback(lambda seqnum:
7151+            self.failUnlessEqual(seqnum, 0, seqnum))
7152+
7153+        #  - The root hash
7154+        d.addCallback(lambda ignored:
7155+            mr.get_root_hash())
7156+        d.addCallback(lambda root_hash:
7157+            self.failUnlessEqual(root_hash, self.root_hash, root_hash))
7158+        return d
7159+
7160+
7161+    def test_only_reads_one_segment_sdmf(self):
7162+        # SDMF shares have only one segment, so it doesn't make sense to
7163+        # read more segments than that. The reader should know this and
7164+        # complain if we try to do that.
7165+        self.write_sdmf_share_to_server("si1")
7166+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7167+        d = defer.succeed(None)
7168+        d.addCallback(lambda ignored:
7169+            mr.is_sdmf())
7170+        d.addCallback(lambda issdmf:
7171+            self.failUnless(issdmf))
7172+        d.addCallback(lambda ignored:
7173+            self.shouldFail(LayoutInvalid, "test bad segment",
7174+                            None,
7175+                            mr.get_block_and_salt, 1))
7176+        return d
7177+
7178+
7179+    def test_read_with_prefetched_mdmf_data(self):
7180+        # The MDMFSlotReadProxy will prefill certain fields if you pass
7181+        # it data that you have already fetched. This is useful for
7182+        # cases like the Servermap, which prefetches ~2kb of data while
7183+        # finding out which shares are on the remote peer so that it
7184+        # doesn't waste round trips.
7185+        mdmf_data = self.build_test_mdmf_share()
7186+        self.write_test_share_to_server("si1")
7187+        def _make_mr(ignored, length):
7188+            mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:length])
7189+            return mr
7190+
7191+        d = defer.succeed(None)
7192+        # This should be enough to fill in both the encoding parameters
7193+        # and the table of offsets, which will complete the version
7194+        # information tuple.
7195+        d.addCallback(_make_mr, 107)
7196+        d.addCallback(lambda mr:
7197+            mr.get_verinfo())
7198+        def _check_verinfo(verinfo):
7199+            self.failUnless(verinfo)
7200+            self.failUnlessEqual(len(verinfo), 9)
7201+            (seqnum,
7202+             root_hash,
7203+             salt_hash,
7204+             segsize,
7205+             datalen,
7206+             k,
7207+             n,
7208+             prefix,
7209+             offsets) = verinfo
7210+            self.failUnlessEqual(seqnum, 0)
7211+            self.failUnlessEqual(root_hash, self.root_hash)
7212+            self.failUnlessEqual(segsize, 6)
7213+            self.failUnlessEqual(datalen, 36)
7214+            self.failUnlessEqual(k, 3)
7215+            self.failUnlessEqual(n, 10)
7216+            expected_prefix = struct.pack(MDMFSIGNABLEHEADER,
7217+                                          1,
7218+                                          seqnum,
7219+                                          root_hash,
7220+                                          k,
7221+                                          n,
7222+                                          segsize,
7223+                                          datalen)
7224+            self.failUnlessEqual(expected_prefix, prefix)
7225+            self.failUnlessEqual(self.rref.read_count, 0)
7226+        d.addCallback(_check_verinfo)
7227+        # This is not enough data to read a block and a share, so the
7228+        # wrapper should attempt to read this from the remote server.
7229+        d.addCallback(_make_mr, 107)
7230+        d.addCallback(lambda mr:
7231+            mr.get_block_and_salt(0))
7232+        def _check_block_and_salt((block, salt)):
7233+            self.failUnlessEqual(block, self.block)
7234+            self.failUnlessEqual(salt, self.salt)
7235+            self.failUnlessEqual(self.rref.read_count, 1)
7236+        # This should be enough data to read one block.
7237+        d.addCallback(_make_mr, 249)
7238+        d.addCallback(lambda mr:
7239+            mr.get_block_and_salt(0))
7240+        d.addCallback(_check_block_and_salt)
7241+        return d
7242+
7243+
7244+    def test_read_with_prefetched_sdmf_data(self):
7245+        sdmf_data = self.build_test_sdmf_share()
7246+        self.write_sdmf_share_to_server("si1")
7247+        def _make_mr(ignored, length):
7248+            mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:length])
7249+            return mr
7250+
7251+        d = defer.succeed(None)
7252+        # This should be enough to get us the encoding parameters,
7253+        # offset table, and everything else we need to build a verinfo
7254+        # string.
7255+        d.addCallback(_make_mr, 107)
7256+        d.addCallback(lambda mr:
7257+            mr.get_verinfo())
7258+        def _check_verinfo(verinfo):
7259+            self.failUnless(verinfo)
7260+            self.failUnlessEqual(len(verinfo), 9)
7261+            (seqnum,
7262+             root_hash,
7263+             salt,
7264+             segsize,
7265+             datalen,
7266+             k,
7267+             n,
7268+             prefix,
7269+             offsets) = verinfo
7270+            self.failUnlessEqual(seqnum, 0)
7271+            self.failUnlessEqual(root_hash, self.root_hash)
7272+            self.failUnlessEqual(salt, self.salt)
7273+            self.failUnlessEqual(segsize, 36)
7274+            self.failUnlessEqual(datalen, 36)
7275+            self.failUnlessEqual(k, 3)
7276+            self.failUnlessEqual(n, 10)
7277+            expected_prefix = struct.pack(SIGNED_PREFIX,
7278+                                          0,
7279+                                          seqnum,
7280+                                          root_hash,
7281+                                          salt,
7282+                                          k,
7283+                                          n,
7284+                                          segsize,
7285+                                          datalen)
7286+            self.failUnlessEqual(expected_prefix, prefix)
7287+            self.failUnlessEqual(self.rref.read_count, 0)
7288+        d.addCallback(_check_verinfo)
7289+        # This shouldn't be enough to read any share data.
7290+        d.addCallback(_make_mr, 107)
7291+        d.addCallback(lambda mr:
7292+            mr.get_block_and_salt(0))
7293+        def _check_block_and_salt((block, salt)):
7294+            self.failUnlessEqual(block, self.block * 6)
7295+            self.failUnlessEqual(salt, self.salt)
7296+            # TODO: Fix the read routine so that it reads only the data
7297+            #       that it has cached if it can't read all of it.
7298+            self.failUnlessEqual(self.rref.read_count, 2)
7299+
7300+        # This should be enough to read share data.
7301+        d.addCallback(_make_mr, self.offsets['share_data'])
7302+        d.addCallback(lambda mr:
7303+            mr.get_block_and_salt(0))
7304+        d.addCallback(_check_block_and_salt)
7305+        return d
7306+
7307+
7308+    def test_read_with_empty_mdmf_file(self):
7309+        # Some tests upload a file with no contents to test things
7310+        # unrelated to the actual handling of the content of the file.
7311+        # The reader should behave intelligently in these cases.
7312+        self.write_test_share_to_server("si1", empty=True)
7313+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7314+        # We should be able to get the encoding parameters, and they
7315+        # should be correct.
7316+        d = defer.succeed(None)
7317+        d.addCallback(lambda ignored:
7318+            mr.get_encoding_parameters())
7319+        def _check_encoding_parameters(params):
7320+            self.failUnlessEqual(len(params), 4)
7321+            k, n, segsize, datalen = params
7322+            self.failUnlessEqual(k, 3)
7323+            self.failUnlessEqual(n, 10)
7324+            self.failUnlessEqual(segsize, 0)
7325+            self.failUnlessEqual(datalen, 0)
7326+        d.addCallback(_check_encoding_parameters)
7327+
7328+        # We should not be able to fetch a block, since there are no
7329+        # blocks to fetch
7330+        d.addCallback(lambda ignored:
7331+            self.shouldFail(LayoutInvalid, "get block on empty file",
7332+                            None,
7333+                            mr.get_block_and_salt, 0))
7334+        return d
7335+
7336+
7337+    def test_read_with_empty_sdmf_file(self):
7338+        self.write_sdmf_share_to_server("si1", empty=True)
7339+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7340+        # We should be able to get the encoding parameters, and they
7341+        # should be correct
7342+        d = defer.succeed(None)
7343+        d.addCallback(lambda ignored:
7344+            mr.get_encoding_parameters())
7345+        def _check_encoding_parameters(params):
7346+            self.failUnlessEqual(len(params), 4)
7347+            k, n, segsize, datalen = params
7348+            self.failUnlessEqual(k, 3)
7349+            self.failUnlessEqual(n, 10)
7350+            self.failUnlessEqual(segsize, 0)
7351+            self.failUnlessEqual(datalen, 0)
7352+        d.addCallback(_check_encoding_parameters)
7353+
7354+        # It does not make sense to get a block in this format, so we
7355+        # should not be able to.
7356+        d.addCallback(lambda ignored:
7357+            self.shouldFail(LayoutInvalid, "get block on an empty file",
7358+                            None,
7359+                            mr.get_block_and_salt, 0))
7360+        return d
7361+
7362+
7363+    def test_verinfo_with_sdmf_file(self):
7364+        self.write_sdmf_share_to_server("si1")
7365+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7366+        # We should be able to get the version information.
7367+        d = defer.succeed(None)
7368+        d.addCallback(lambda ignored:
7369+            mr.get_verinfo())
7370+        def _check_verinfo(verinfo):
7371+            self.failUnless(verinfo)
7372+            self.failUnlessEqual(len(verinfo), 9)
7373+            (seqnum,
7374+             root_hash,
7375+             salt,
7376+             segsize,
7377+             datalen,
7378+             k,
7379+             n,
7380+             prefix,
7381+             offsets) = verinfo
7382+            self.failUnlessEqual(seqnum, 0)
7383+            self.failUnlessEqual(root_hash, self.root_hash)
7384+            self.failUnlessEqual(salt, self.salt)
7385+            self.failUnlessEqual(segsize, 36)
7386+            self.failUnlessEqual(datalen, 36)
7387+            self.failUnlessEqual(k, 3)
7388+            self.failUnlessEqual(n, 10)
7389+            expected_prefix = struct.pack(">BQ32s16s BBQQ",
7390+                                          0,
7391+                                          seqnum,
7392+                                          root_hash,
7393+                                          salt,
7394+                                          k,
7395+                                          n,
7396+                                          segsize,
7397+                                          datalen)
7398+            self.failUnlessEqual(prefix, expected_prefix)
7399+            self.failUnlessEqual(offsets, self.offsets)
7400+        d.addCallback(_check_verinfo)
7401+        return d
7402+
7403+
7404+    def test_verinfo_with_mdmf_file(self):
7405+        self.write_test_share_to_server("si1")
7406+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7407+        d = defer.succeed(None)
7408+        d.addCallback(lambda ignored:
7409+            mr.get_verinfo())
7410+        def _check_verinfo(verinfo):
7411+            self.failUnless(verinfo)
7412+            self.failUnlessEqual(len(verinfo), 9)
7413+            (seqnum,
7414+             root_hash,
7415+             IV,
7416+             segsize,
7417+             datalen,
7418+             k,
7419+             n,
7420+             prefix,
7421+             offsets) = verinfo
7422+            self.failUnlessEqual(seqnum, 0)
7423+            self.failUnlessEqual(root_hash, self.root_hash)
7424+            self.failIf(IV)
7425+            self.failUnlessEqual(segsize, 6)
7426+            self.failUnlessEqual(datalen, 36)
7427+            self.failUnlessEqual(k, 3)
7428+            self.failUnlessEqual(n, 10)
7429+            expected_prefix = struct.pack(">BQ32s BBQQ",
7430+                                          1,
7431+                                          seqnum,
7432+                                          root_hash,
7433+                                          k,
7434+                                          n,
7435+                                          segsize,
7436+                                          datalen)
7437+            self.failUnlessEqual(prefix, expected_prefix)
7438+            self.failUnlessEqual(offsets, self.offsets)
7439+        d.addCallback(_check_verinfo)
7440+        return d
7441+
7442+
7443+    def test_reader_queue(self):
7444+        self.write_test_share_to_server('si1')
7445+        mr = MDMFSlotReadProxy(self.rref, "si1", 0)
7446+        d1 = mr.get_block_and_salt(0, queue=True)
7447+        d2 = mr.get_blockhashes(queue=True)
7448+        d3 = mr.get_sharehashes(queue=True)
7449+        d4 = mr.get_signature(queue=True)
7450+        d5 = mr.get_verification_key(queue=True)
7451+        dl = defer.DeferredList([d1, d2, d3, d4, d5])
7452+        mr.flush()
7453+        def _print(results):
7454+            self.failUnlessEqual(len(results), 5)
7455+            # We have one read for version information and offsets, and
7456+            # one for everything else.
7457+            self.failUnlessEqual(self.rref.read_count, 2)
7458+            block, salt = results[0][1] # results[0] is a boolean that says
7459+                                           # whether or not the operation
7460+                                           # worked.
7461+            self.failUnlessEqual(self.block, block)
7462+            self.failUnlessEqual(self.salt, salt)
7463+
7464+            blockhashes = results[1][1]
7465+            self.failUnlessEqual(self.block_hash_tree, blockhashes)
7466+
7467+            sharehashes = results[2][1]
7468+            self.failUnlessEqual(self.share_hash_chain, sharehashes)
7469+
7470+            signature = results[3][1]
7471+            self.failUnlessEqual(self.signature, signature)
7472+
7473+            verification_key = results[4][1]
7474+            self.failUnlessEqual(self.verification_key, verification_key)
7475+        dl.addCallback(_print)
7476+        return dl
7477+
7478+
7479+    def test_sdmf_writer(self):
7480+        # Go through the motions of writing an SDMF share to the storage
7481+        # server. Then read the storage server to see that the share got
7482+        # written in the way that we think it should have.
7483+
7484+        # We do this first so that the necessary instance variables get
7485+        # set the way we want them for the tests below.
7486+        data = self.build_test_sdmf_share()
7487+        sdmfr = SDMFSlotWriteProxy(0,
7488+                                   self.rref,
7489+                                   "si1",
7490+                                   self.secrets,
7491+                                   0, 3, 10, 36, 36)
7492+        # Put the block and salt.
7493+        sdmfr.put_block(self.blockdata, 0, self.salt)
7494+
7495+        # Put the encprivkey
7496+        sdmfr.put_encprivkey(self.encprivkey)
7497+
7498+        # Put the block and share hash chains
7499+        sdmfr.put_blockhashes(self.block_hash_tree)
7500+        sdmfr.put_sharehashes(self.share_hash_chain)
7501+        sdmfr.put_root_hash(self.root_hash)
7502+
7503+        # Put the signature
7504+        sdmfr.put_signature(self.signature)
7505+
7506+        # Put the verification key
7507+        sdmfr.put_verification_key(self.verification_key)
7508+
7509+        # Now check to make sure that nothing has been written yet.
7510+        self.failUnlessEqual(self.rref.write_count, 0)
7511+
7512+        # Now finish publishing
7513+        d = sdmfr.finish_publishing()
7514+        def _then(ignored):
7515+            self.failUnlessEqual(self.rref.write_count, 1)
7516+            read = self.ss.remote_slot_readv
7517+            self.failUnlessEqual(read("si1", [0], [(0, len(data))]),
7518+                                 {0: [data]})
7519+        d.addCallback(_then)
7520+        return d
7521+
7522+
7523+    def test_sdmf_writer_preexisting_share(self):
7524+        data = self.build_test_sdmf_share()
7525+        self.write_sdmf_share_to_server("si1")
7526+
7527+        # Now there is a share on the storage server. To successfully
7528+        # write, we need to set the checkstring correctly. When we
7529+        # don't, no write should occur.
7530+        sdmfw = SDMFSlotWriteProxy(0,
7531+                                   self.rref,
7532+                                   "si1",
7533+                                   self.secrets,
7534+                                   1, 3, 10, 36, 36)
7535+        sdmfw.put_block(self.blockdata, 0, self.salt)
7536+
7537+        # Put the encprivkey
7538+        sdmfw.put_encprivkey(self.encprivkey)
7539+
7540+        # Put the block and share hash chains
7541+        sdmfw.put_blockhashes(self.block_hash_tree)
7542+        sdmfw.put_sharehashes(self.share_hash_chain)
7543+
7544+        # Put the root hash
7545+        sdmfw.put_root_hash(self.root_hash)
7546+
7547+        # Put the signature
7548+        sdmfw.put_signature(self.signature)
7549+
7550+        # Put the verification key
7551+        sdmfw.put_verification_key(self.verification_key)
7552+
7553+        # We shouldn't have a checkstring yet
7554+        self.failUnlessEqual(sdmfw.get_checkstring(), "")
7555+
7556+        d = sdmfw.finish_publishing()
7557+        def _then(results):
7558+            self.failIf(results[0])
7559+            # this is the correct checkstring
7560+            self._expected_checkstring = results[1][0][0]
7561+            return self._expected_checkstring
7562+
7563+        d.addCallback(_then)
7564+        d.addCallback(sdmfw.set_checkstring)
7565+        d.addCallback(lambda ignored:
7566+            sdmfw.get_checkstring())
7567+        d.addCallback(lambda checkstring:
7568+            self.failUnlessEqual(checkstring, self._expected_checkstring))
7569+        d.addCallback(lambda ignored:
7570+            sdmfw.finish_publishing())
7571+        def _then_again(results):
7572+            self.failUnless(results[0])
7573+            read = self.ss.remote_slot_readv
7574+            self.failUnlessEqual(read("si1", [0], [(1, 8)]),
7575+                                 {0: [struct.pack(">Q", 1)]})
7576+            self.failUnlessEqual(read("si1", [0], [(9, len(data) - 9)]),
7577+                                 {0: [data[9:]]})
7578+        d.addCallback(_then_again)
7579+        return d
7580+
7581+
7582 class Stats(unittest.TestCase):
7583 
7584     def setUp(self):
7585}
7586[mutable/retrieve.py: Modify the retrieval process to support MDMF
7587Kevan Carstensen <kevan@isnotajoke.com>**20100819003409
7588 Ignore-this: c03f4e41aaa0366a9bf44847f2caf9db
7589 
7590 The logic behind a mutable file download had to be adapted to work with
7591 segmented mutable files; this patch performs those adaptations. It also
7592 exposes some decoding and decrypting functionality to make partial-file
7593 updates a little easier, and supports efficient random-access downloads
7594 of parts of an MDMF file.
7595] {
7596hunk ./src/allmydata/mutable/retrieve.py 2
7597 
7598-import struct, time
7599+import time
7600 from itertools import count
7601 from zope.interface import implements
7602 from twisted.internet import defer
7603merger 0.0 (
7604hunk ./src/allmydata/mutable/retrieve.py 10
7605+from allmydata.util.dictutil import DictOfSets
7606hunk ./src/allmydata/mutable/retrieve.py 7
7607-from foolscap.api import DeadReferenceError, eventually, fireEventually
7608-from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError
7609-from allmydata.util import hashutil, idlib, log
7610+from twisted.internet.interfaces import IPushProducer, IConsumer
7611+from foolscap.api import eventually, fireEventually
7612+from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError, \
7613+                                 MDMF_VERSION, SDMF_VERSION
7614+from allmydata.util import hashutil, log, mathutil
7615)
7616hunk ./src/allmydata/mutable/retrieve.py 16
7617 from pycryptopp.publickey import rsa
7618 
7619 from allmydata.mutable.common import CorruptShareError, UncoordinatedWriteError
7620-from allmydata.mutable.layout import SIGNED_PREFIX, unpack_share_data
7621+from allmydata.mutable.layout import MDMFSlotReadProxy
7622 
7623 class RetrieveStatus:
7624     implements(IRetrieveStatus)
7625hunk ./src/allmydata/mutable/retrieve.py 83
7626     # times, and each will have a separate response chain. However the
7627     # Retrieve object will remain tied to a specific version of the file, and
7628     # will use a single ServerMap instance.
7629+    implements(IPushProducer)
7630 
7631hunk ./src/allmydata/mutable/retrieve.py 85
7632-    def __init__(self, filenode, servermap, verinfo, fetch_privkey=False):
7633+    def __init__(self, filenode, servermap, verinfo, fetch_privkey=False,
7634+                 verify=False):
7635         self._node = filenode
7636         assert self._node.get_pubkey()
7637         self._storage_index = filenode.get_storage_index()
7638hunk ./src/allmydata/mutable/retrieve.py 104
7639         self.verinfo = verinfo
7640         # during repair, we may be called upon to grab the private key, since
7641         # it wasn't picked up during a verify=False checker run, and we'll
7642-        # need it for repair to generate the a new version.
7643-        self._need_privkey = fetch_privkey
7644-        if self._node.get_privkey():
7645+        # need it for repair to generate a new version.
7646+        self._need_privkey = fetch_privkey or verify
7647+        if self._node.get_privkey() and not verify:
7648             self._need_privkey = False
7649 
7650hunk ./src/allmydata/mutable/retrieve.py 109
7651+        if self._need_privkey:
7652+            # TODO: Evaluate the need for this. We'll use it if we want
7653+            # to limit how many queries are on the wire for the privkey
7654+            # at once.
7655+            self._privkey_query_markers = [] # one Marker for each time we've
7656+                                             # tried to get the privkey.
7657+
7658+        # verify means that we are using the downloader logic to verify all
7659+        # of our shares. This tells the downloader a few things.
7660+        #
7661+        # 1. We need to download all of the shares.
7662+        # 2. We don't need to decode or decrypt the shares, since our
7663+        #    caller doesn't care about the plaintext, only the
7664+        #    information about which shares are or are not valid.
7665+        # 3. When we are validating readers, we need to validate the
7666+        #    signature on the prefix. Do we? We already do this in the
7667+        #    servermap update?
7668+        self._verify = False
7669+        if verify:
7670+            self._verify = True
7671+
7672         self._status = RetrieveStatus()
7673         self._status.set_storage_index(self._storage_index)
7674         self._status.set_helper(False)
7675hunk ./src/allmydata/mutable/retrieve.py 139
7676          offsets_tuple) = self.verinfo
7677         self._status.set_size(datalength)
7678         self._status.set_encoding(k, N)
7679+        self.readers = {}
7680+        self._paused = False
7681+        self._paused_deferred = None
7682+        self._offset = None
7683+        self._read_length = None
7684+        self.log("got seqnum %d" % self.verinfo[0])
7685+
7686 
7687     def get_status(self):
7688         return self._status
7689hunk ./src/allmydata/mutable/retrieve.py 157
7690             kwargs["facility"] = "tahoe.mutable.retrieve"
7691         return log.msg(*args, **kwargs)
7692 
7693-    def download(self):
7694+
7695+    ###################
7696+    # IPushProducer
7697+
7698+    def pauseProducing(self):
7699+        """
7700+        I am called by my download target if we have produced too much
7701+        data for it to handle. I make the downloader stop producing new
7702+        data until my resumeProducing method is called.
7703+        """
7704+        if self._paused:
7705+            return
7706+
7707+        # fired when the download is unpaused.
7708+        self._old_status = self._status.get_status()
7709+        self._status.set_status("Paused")
7710+
7711+        self._pause_deferred = defer.Deferred()
7712+        self._paused = True
7713+
7714+
7715+    def resumeProducing(self):
7716+        """
7717+        I am called by my download target once it is ready to begin
7718+        receiving data again.
7719+        """
7720+        if not self._paused:
7721+            return
7722+
7723+        self._paused = False
7724+        p = self._pause_deferred
7725+        self._pause_deferred = None
7726+        self._status.set_status(self._old_status)
7727+
7728+        eventually(p.callback, None)
7729+
7730+
7731+    def _check_for_paused(self, res):
7732+        """
7733+        I am called just before a write to the consumer. I return a
7734+        Deferred that eventually fires with the data that is to be
7735+        written to the consumer. If the download has not been paused,
7736+        the Deferred fires immediately. Otherwise, the Deferred fires
7737+        when the downloader is unpaused.
7738+        """
7739+        if self._paused:
7740+            d = defer.Deferred()
7741+            self._pause_defered.addCallback(lambda ignored: d.callback(res))
7742+            return d
7743+        return defer.succeed(res)
7744+
7745+
7746+    def download(self, consumer=None, offset=0, size=None):
7747+        assert IConsumer.providedBy(consumer) or self._verify
7748+
7749+        if consumer:
7750+            self._consumer = consumer
7751+            # we provide IPushProducer, so streaming=True, per
7752+            # IConsumer.
7753+            self._consumer.registerProducer(self, streaming=True)
7754+
7755         self._done_deferred = defer.Deferred()
7756         self._started = time.time()
7757         self._status.set_status("Retrieving Shares")
7758hunk ./src/allmydata/mutable/retrieve.py 222
7759 
7760+        self._offset = offset
7761+        self._read_length = size
7762+
7763         # first, which servers can we use?
7764         versionmap = self.servermap.make_versionmap()
7765         shares = versionmap[self.verinfo]
7766hunk ./src/allmydata/mutable/retrieve.py 232
7767         self.remaining_sharemap = DictOfSets()
7768         for (shnum, peerid, timestamp) in shares:
7769             self.remaining_sharemap.add(shnum, peerid)
7770+            # If the servermap update fetched anything, it fetched at least 1
7771+            # KiB, so we ask for that much.
7772+            # TODO: Change the cache methods to allow us to fetch all of the
7773+            # data that they have, then change this method to do that.
7774+            any_cache, timestamp = self._node._read_from_cache(self.verinfo,
7775+                                                               shnum,
7776+                                                               0,
7777+                                                               1000)
7778+            ss = self.servermap.connections[peerid]
7779+            reader = MDMFSlotReadProxy(ss,
7780+                                       self._storage_index,
7781+                                       shnum,
7782+                                       any_cache)
7783+            reader.peerid = peerid
7784+            self.readers[shnum] = reader
7785+
7786 
7787         self.shares = {} # maps shnum to validated blocks
7788hunk ./src/allmydata/mutable/retrieve.py 250
7789+        self._active_readers = [] # list of active readers for this dl.
7790+        self._validated_readers = set() # set of readers that we have
7791+                                        # validated the prefix of
7792+        self._block_hash_trees = {} # shnum => hashtree
7793 
7794         # how many shares do we need?
7795hunk ./src/allmydata/mutable/retrieve.py 256
7796-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
7797+        (seqnum,
7798+         root_hash,
7799+         IV,
7800+         segsize,
7801+         datalength,
7802+         k,
7803+         N,
7804+         prefix,
7805          offsets_tuple) = self.verinfo
7806hunk ./src/allmydata/mutable/retrieve.py 265
7807-        assert len(self.remaining_sharemap) >= k
7808-        # we start with the lowest shnums we have available, since FEC is
7809-        # faster if we're using "primary shares"
7810-        self.active_shnums = set(sorted(self.remaining_sharemap.keys())[:k])
7811-        for shnum in self.active_shnums:
7812-            # we use an arbitrary peer who has the share. If shares are
7813-            # doubled up (more than one share per peer), we could make this
7814-            # run faster by spreading the load among multiple peers. But the
7815-            # algorithm to do that is more complicated than I want to write
7816-            # right now, and a well-provisioned grid shouldn't have multiple
7817-            # shares per peer.
7818-            peerid = list(self.remaining_sharemap[shnum])[0]
7819-            self.get_data(shnum, peerid)
7820 
7821hunk ./src/allmydata/mutable/retrieve.py 266
7822-        # control flow beyond this point: state machine. Receiving responses
7823-        # from queries is the input. We might send out more queries, or we
7824-        # might produce a result.
7825 
7826hunk ./src/allmydata/mutable/retrieve.py 267
7827+        # We need one share hash tree for the entire file; its leaves
7828+        # are the roots of the block hash trees for the shares that
7829+        # comprise it, and its root is in the verinfo.
7830+        self.share_hash_tree = hashtree.IncompleteHashTree(N)
7831+        self.share_hash_tree.set_hashes({0: root_hash})
7832+
7833+        # This will set up both the segment decoder and the tail segment
7834+        # decoder, as well as a variety of other instance variables that
7835+        # the download process will use.
7836+        self._setup_encoding_parameters()
7837+        assert len(self.remaining_sharemap) >= k
7838+
7839+        self.log("starting download")
7840+        self._paused = False
7841+        self._started_fetching = time.time()
7842+
7843+        self._add_active_peers()
7844+        # The download process beyond this is a state machine.
7845+        # _add_active_peers will select the peers that we want to use
7846+        # for the download, and then attempt to start downloading. After
7847+        # each segment, it will check for doneness, reacting to broken
7848+        # peers and corrupt shares as necessary. If it runs out of good
7849+        # peers before downloading all of the segments, _done_deferred
7850+        # will errback.  Otherwise, it will eventually callback with the
7851+        # contents of the mutable file.
7852         return self._done_deferred
7853 
7854hunk ./src/allmydata/mutable/retrieve.py 294
7855-    def get_data(self, shnum, peerid):
7856-        self.log(format="sending sh#%(shnum)d request to [%(peerid)s]",
7857-                 shnum=shnum,
7858-                 peerid=idlib.shortnodeid_b2a(peerid),
7859-                 level=log.NOISY)
7860-        ss = self.servermap.connections[peerid]
7861-        started = time.time()
7862-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
7863+
7864+    def decode(self, blocks_and_salts, segnum):
7865+        """
7866+        I am a helper method that the mutable file update process uses
7867+        as a shortcut to decode and decrypt the segments that it needs
7868+        to fetch in order to perform a file update. I take in a
7869+        collection of blocks and salts, and pick some of those to make a
7870+        segment with. I return the plaintext associated with that
7871+        segment.
7872+        """
7873+        # shnum => block hash tree. Unusued, but setup_encoding_parameters will
7874+        # want to set this.
7875+        # XXX: Make it so that it won't set this if we're just decoding.
7876+        self._block_hash_trees = {}
7877+        self._setup_encoding_parameters()
7878+        # This is the form expected by decode.
7879+        blocks_and_salts = blocks_and_salts.items()
7880+        blocks_and_salts = [(True, [d]) for d in blocks_and_salts]
7881+
7882+        d = self._decode_blocks(blocks_and_salts, segnum)
7883+        d.addCallback(self._decrypt_segment)
7884+        return d
7885+
7886+
7887+    def _setup_encoding_parameters(self):
7888+        """
7889+        I set up the encoding parameters, including k, n, the number
7890+        of segments associated with this file, and the segment decoder.
7891+        """
7892+        (seqnum,
7893+         root_hash,
7894+         IV,
7895+         segsize,
7896+         datalength,
7897+         k,
7898+         n,
7899+         known_prefix,
7900          offsets_tuple) = self.verinfo
7901hunk ./src/allmydata/mutable/retrieve.py 332
7902-        offsets = dict(offsets_tuple)
7903+        self._required_shares = k
7904+        self._total_shares = n
7905+        self._segment_size = segsize
7906+        self._data_length = datalength
7907 
7908hunk ./src/allmydata/mutable/retrieve.py 337
7909-        # we read the checkstring, to make sure that the data we grab is from
7910-        # the right version.
7911-        readv = [ (0, struct.calcsize(SIGNED_PREFIX)) ]
7912+        if not IV:
7913+            self._version = MDMF_VERSION
7914+        else:
7915+            self._version = SDMF_VERSION
7916 
7917hunk ./src/allmydata/mutable/retrieve.py 342
7918-        # We also read the data, and the hashes necessary to validate them
7919-        # (share_hash_chain, block_hash_tree, share_data). We don't read the
7920-        # signature or the pubkey, since that was handled during the
7921-        # servermap phase, and we'll be comparing the share hash chain
7922-        # against the roothash that was validated back then.
7923+        if datalength and segsize:
7924+            self._num_segments = mathutil.div_ceil(datalength, segsize)
7925+            self._tail_data_size = datalength % segsize
7926+        else:
7927+            self._num_segments = 0
7928+            self._tail_data_size = 0
7929 
7930hunk ./src/allmydata/mutable/retrieve.py 349
7931-        readv.append( (offsets['share_hash_chain'],
7932-                       offsets['enc_privkey'] - offsets['share_hash_chain'] ) )
7933+        self._segment_decoder = codec.CRSDecoder()
7934+        self._segment_decoder.set_params(segsize, k, n)
7935 
7936hunk ./src/allmydata/mutable/retrieve.py 352
7937-        # if we need the private key (for repair), we also fetch that
7938-        if self._need_privkey:
7939-            readv.append( (offsets['enc_privkey'],
7940-                           offsets['EOF'] - offsets['enc_privkey']) )
7941+        if  not self._tail_data_size:
7942+            self._tail_data_size = segsize
7943+
7944+        self._tail_segment_size = mathutil.next_multiple(self._tail_data_size,
7945+                                                         self._required_shares)
7946+        if self._tail_segment_size == self._segment_size:
7947+            self._tail_decoder = self._segment_decoder
7948+        else:
7949+            self._tail_decoder = codec.CRSDecoder()
7950+            self._tail_decoder.set_params(self._tail_segment_size,
7951+                                          self._required_shares,
7952+                                          self._total_shares)
7953 
7954hunk ./src/allmydata/mutable/retrieve.py 365
7955-        m = Marker()
7956-        self._outstanding_queries[m] = (peerid, shnum, started)
7957+        self.log("got encoding parameters: "
7958+                 "k: %d "
7959+                 "n: %d "
7960+                 "%d segments of %d bytes each (%d byte tail segment)" % \
7961+                 (k, n, self._num_segments, self._segment_size,
7962+                  self._tail_segment_size))
7963 
7964         # ask the cache first
7965         got_from_cache = False
7966merger 0.0 (
7967hunk ./src/allmydata/mutable/retrieve.py 376
7968-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
7969-                                                            offset, length)
7970+            data = self._node._read_from_cache(self.verinfo, shnum, offset, length)
7971hunk ./src/allmydata/mutable/retrieve.py 372
7972-        # ask the cache first
7973-        got_from_cache = False
7974-        datavs = []
7975-        for (offset, length) in readv:
7976-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
7977-                                                            offset, length)
7978-            if data is not None:
7979-                datavs.append(data)
7980-        if len(datavs) == len(readv):
7981-            self.log("got data from cache")
7982-            got_from_cache = True
7983-            d = fireEventually({shnum: datavs})
7984-            # datavs is a dict mapping shnum to a pair of strings
7985+        for i in xrange(self._total_shares):
7986+            # So we don't have to do this later.
7987+            self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments)
7988+
7989+        # Our last task is to tell the downloader where to start and
7990+        # where to stop. We use three parameters for that:
7991+        #   - self._start_segment: the segment that we need to start
7992+        #     downloading from.
7993+        #   - self._current_segment: the next segment that we need to
7994+        #     download.
7995+        #   - self._last_segment: The last segment that we were asked to
7996+        #     download.
7997+        #
7998+        #  We say that the download is complete when
7999+        #  self._current_segment > self._last_segment. We use
8000+        #  self._start_segment and self._last_segment to know when to
8001+        #  strip things off of segments, and how much to strip.
8002+        if self._offset:
8003+            self.log("got offset: %d" % self._offset)
8004+            # our start segment is the first segment containing the
8005+            # offset we were given.
8006+            start = mathutil.div_ceil(self._offset,
8007+                                      self._segment_size)
8008+            # this gets us the first segment after self._offset. Then
8009+            # our start segment is the one before it.
8010+            start -= 1
8011+
8012+            assert start < self._num_segments
8013+            self._start_segment = start
8014+            self.log("got start segment: %d" % self._start_segment)
8015)
8016hunk ./src/allmydata/mutable/retrieve.py 386
8017             d = fireEventually({shnum: datavs})
8018             # datavs is a dict mapping shnum to a pair of strings
8019         else:
8020-            d = self._do_read(ss, peerid, self._storage_index, [shnum], readv)
8021-        self.remaining_sharemap.discard(shnum, peerid)
8022+            self._start_segment = 0
8023 
8024hunk ./src/allmydata/mutable/retrieve.py 388
8025-        d.addCallback(self._got_results, m, peerid, started, got_from_cache)
8026-        d.addErrback(self._query_failed, m, peerid)
8027-        # errors that aren't handled by _query_failed (and errors caused by
8028-        # _query_failed) get logged, but we still want to check for doneness.
8029-        def _oops(f):
8030-            self.log(format="problem in _query_failed for sh#%(shnum)d to %(peerid)s",
8031-                     shnum=shnum,
8032-                     peerid=idlib.shortnodeid_b2a(peerid),
8033-                     failure=f,
8034-                     level=log.WEIRD, umid="W0xnQA")
8035-        d.addErrback(_oops)
8036-        d.addBoth(self._check_for_done)
8037-        # any error during _check_for_done means the download fails. If the
8038-        # download is successful, _check_for_done will fire _done by itself.
8039-        d.addErrback(self._done)
8040-        d.addErrback(log.err)
8041-        return d # purely for testing convenience
8042 
8043hunk ./src/allmydata/mutable/retrieve.py 389
8044-    def _do_read(self, ss, peerid, storage_index, shnums, readv):
8045-        # isolate the callRemote to a separate method, so tests can subclass
8046-        # Publish and override it
8047-        d = ss.callRemote("slot_readv", storage_index, shnums, readv)
8048-        return d
8049+        if self._read_length:
8050+            # our end segment is the last segment containing part of the
8051+            # segment that we were asked to read.
8052+            self.log("got read length %d" % self._read_length)
8053+            end_data = self._offset + self._read_length
8054+            end = mathutil.div_ceil(end_data,
8055+                                    self._segment_size)
8056+            end -= 1
8057+            assert end < self._num_segments
8058+            self._last_segment = end
8059+            self.log("got end segment: %d" % self._last_segment)
8060+        else:
8061+            self._last_segment = self._num_segments - 1
8062 
8063hunk ./src/allmydata/mutable/retrieve.py 403
8064-    def remove_peer(self, peerid):
8065-        for shnum in list(self.remaining_sharemap.keys()):
8066-            self.remaining_sharemap.discard(shnum, peerid)
8067+        self._current_segment = self._start_segment
8068 
8069hunk ./src/allmydata/mutable/retrieve.py 405
8070-    def _got_results(self, datavs, marker, peerid, started, got_from_cache):
8071-        now = time.time()
8072-        elapsed = now - started
8073-        if not got_from_cache:
8074-            self._status.add_fetch_timing(peerid, elapsed)
8075-        self.log(format="got results (%(shares)d shares) from [%(peerid)s]",
8076-                 shares=len(datavs),
8077-                 peerid=idlib.shortnodeid_b2a(peerid),
8078-                 level=log.NOISY)
8079-        self._outstanding_queries.pop(marker, None)
8080-        if not self._running:
8081-            return
8082+    def _add_active_peers(self):
8083+        """
8084+        I populate self._active_readers with enough active readers to
8085+        retrieve the contents of this mutable file. I am called before
8086+        downloading starts, and (eventually) after each validation
8087+        error, connection error, or other problem in the download.
8088+        """
8089+        # TODO: It would be cool to investigate other heuristics for
8090+        # reader selection. For instance, the cost (in time the user
8091+        # spends waiting for their file) of selecting a really slow peer
8092+        # that happens to have a primary share is probably more than
8093+        # selecting a really fast peer that doesn't have a primary
8094+        # share. Maybe the servermap could be extended to provide this
8095+        # information; it could keep track of latency information while
8096+        # it gathers more important data, and then this routine could
8097+        # use that to select active readers.
8098+        #
8099+        # (these and other questions would be easier to answer with a
8100+        #  robust, configurable tahoe-lafs simulator, which modeled node
8101+        #  failures, differences in node speed, and other characteristics
8102+        #  that we expect storage servers to have.  You could have
8103+        #  presets for really stable grids (like allmydata.com),
8104+        #  friendnets, make it easy to configure your own settings, and
8105+        #  then simulate the effect of big changes on these use cases
8106+        #  instead of just reasoning about what the effect might be. Out
8107+        #  of scope for MDMF, though.)
8108 
8109hunk ./src/allmydata/mutable/retrieve.py 432
8110-        # note that we only ask for a single share per query, so we only
8111-        # expect a single share back. On the other hand, we use the extra
8112-        # shares if we get them.. seems better than an assert().
8113+        # We need at least self._required_shares readers to download a
8114+        # segment.
8115+        if self._verify:
8116+            needed = self._total_shares
8117+        else:
8118+            needed = self._required_shares - len(self._active_readers)
8119+        # XXX: Why don't format= log messages work here?
8120+        self.log("adding %d peers to the active peers list" % needed)
8121 
8122hunk ./src/allmydata/mutable/retrieve.py 441
8123-        for shnum,datav in datavs.items():
8124-            (prefix, hash_and_data) = datav[:2]
8125-            try:
8126-                self._got_results_one_share(shnum, peerid,
8127-                                            prefix, hash_and_data)
8128-            except CorruptShareError, e:
8129-                # log it and give the other shares a chance to be processed
8130-                f = failure.Failure()
8131-                self.log(format="bad share: %(f_value)s",
8132-                         f_value=str(f.value), failure=f,
8133-                         level=log.WEIRD, umid="7fzWZw")
8134-                self.notify_server_corruption(peerid, shnum, str(e))
8135-                self.remove_peer(peerid)
8136-                self.servermap.mark_bad_share(peerid, shnum, prefix)
8137-                self._bad_shares.add( (peerid, shnum) )
8138-                self._status.problems[peerid] = f
8139-                self._last_failure = f
8140-                pass
8141-            if self._need_privkey and len(datav) > 2:
8142-                lp = None
8143-                self._try_to_validate_privkey(datav[2], peerid, shnum, lp)
8144-        # all done!
8145+        # We favor lower numbered shares, since FEC is faster with
8146+        # primary shares than with other shares, and lower-numbered
8147+        # shares are more likely to be primary than higher numbered
8148+        # shares.
8149+        active_shnums = set(sorted(self.remaining_sharemap.keys()))
8150+        # We shouldn't consider adding shares that we already have; this
8151+        # will cause problems later.
8152+        active_shnums -= set([reader.shnum for reader in self._active_readers])
8153+        active_shnums = list(active_shnums)[:needed]
8154+        if len(active_shnums) < needed and not self._verify:
8155+            # We don't have enough readers to retrieve the file; fail.
8156+            return self._failed()
8157 
8158hunk ./src/allmydata/mutable/retrieve.py 454
8159-    def notify_server_corruption(self, peerid, shnum, reason):
8160-        ss = self.servermap.connections[peerid]
8161-        ss.callRemoteOnly("advise_corrupt_share",
8162-                          "mutable", self._storage_index, shnum, reason)
8163+        for shnum in active_shnums:
8164+            self._active_readers.append(self.readers[shnum])
8165+            self.log("added reader for share %d" % shnum)
8166+        assert len(self._active_readers) >= self._required_shares
8167+        # Conceptually, this is part of the _add_active_peers step. It
8168+        # validates the prefixes of newly added readers to make sure
8169+        # that they match what we are expecting for self.verinfo. If
8170+        # validation is successful, _validate_active_prefixes will call
8171+        # _download_current_segment for us. If validation is
8172+        # unsuccessful, then _validate_prefixes will remove the peer and
8173+        # call _add_active_peers again, where we will attempt to rectify
8174+        # the problem by choosing another peer.
8175+        return self._validate_active_prefixes()
8176 
8177hunk ./src/allmydata/mutable/retrieve.py 468
8178-    def _got_results_one_share(self, shnum, peerid,
8179-                               got_prefix, got_hash_and_data):
8180-        self.log("_got_results: got shnum #%d from peerid %s"
8181-                 % (shnum, idlib.shortnodeid_b2a(peerid)))
8182-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8183-         offsets_tuple) = self.verinfo
8184-        assert len(got_prefix) == len(prefix), (len(got_prefix), len(prefix))
8185-        if got_prefix != prefix:
8186-            msg = "someone wrote to the data since we read the servermap: prefix changed"
8187-            raise UncoordinatedWriteError(msg)
8188-        (share_hash_chain, block_hash_tree,
8189-         share_data) = unpack_share_data(self.verinfo, got_hash_and_data)
8190 
8191hunk ./src/allmydata/mutable/retrieve.py 469
8192-        assert isinstance(share_data, str)
8193-        # build the block hash tree. SDMF has only one leaf.
8194-        leaves = [hashutil.block_hash(share_data)]
8195-        t = hashtree.HashTree(leaves)
8196-        if list(t) != block_hash_tree:
8197-            raise CorruptShareError(peerid, shnum, "block hash tree failure")
8198-        share_hash_leaf = t[0]
8199-        t2 = hashtree.IncompleteHashTree(N)
8200-        # root_hash was checked by the signature
8201-        t2.set_hashes({0: root_hash})
8202-        try:
8203-            t2.set_hashes(hashes=share_hash_chain,
8204-                          leaves={shnum: share_hash_leaf})
8205-        except (hashtree.BadHashError, hashtree.NotEnoughHashesError,
8206-                IndexError), e:
8207-            msg = "corrupt hashes: %s" % (e,)
8208-            raise CorruptShareError(peerid, shnum, msg)
8209-        self.log(" data valid! len=%d" % len(share_data))
8210-        # each query comes down to this: placing validated share data into
8211-        # self.shares
8212-        self.shares[shnum] = share_data
8213+    def _validate_active_prefixes(self):
8214+        """
8215+        I check to make sure that the prefixes on the peers that I am
8216+        currently reading from match the prefix that we want to see, as
8217+        said in self.verinfo.
8218 
8219hunk ./src/allmydata/mutable/retrieve.py 475
8220-    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
8221+        If I find that all of the active peers have acceptable prefixes,
8222+        I pass control to _download_current_segment, which will use
8223+        those peers to do cool things. If I find that some of the active
8224+        peers have unacceptable prefixes, I will remove them from active
8225+        peers (and from further consideration) and call
8226+        _add_active_peers to attempt to rectify the situation. I keep
8227+        track of which peers I have already validated so that I don't
8228+        need to do so again.
8229+        """
8230+        assert self._active_readers, "No more active readers"
8231 
8232hunk ./src/allmydata/mutable/retrieve.py 486
8233-        alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
8234-        alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
8235-        if alleged_writekey != self._node.get_writekey():
8236-            self.log("invalid privkey from %s shnum %d" %
8237-                     (idlib.nodeid_b2a(peerid)[:8], shnum),
8238-                     parent=lp, level=log.WEIRD, umid="YIw4tA")
8239-            return
8240+        ds = []
8241+        new_readers = set(self._active_readers) - self._validated_readers
8242+        self.log('validating %d newly-added active readers' % len(new_readers))
8243 
8244hunk ./src/allmydata/mutable/retrieve.py 490
8245-        # it's good
8246-        self.log("got valid privkey from shnum %d on peerid %s" %
8247-                 (shnum, idlib.shortnodeid_b2a(peerid)),
8248-                 parent=lp)
8249-        privkey = rsa.create_signing_key_from_string(alleged_privkey_s)
8250-        self._node._populate_encprivkey(enc_privkey)
8251-        self._node._populate_privkey(privkey)
8252-        self._need_privkey = False
8253+        for reader in new_readers:
8254+            # We force a remote read here -- otherwise, we are relying
8255+            # on cached data that we already verified as valid, and we
8256+            # won't detect an uncoordinated write that has occurred
8257+            # since the last servermap update.
8258+            d = reader.get_prefix(force_remote=True)
8259+            d.addCallback(self._try_to_validate_prefix, reader)
8260+            ds.append(d)
8261+        dl = defer.DeferredList(ds, consumeErrors=True)
8262+        def _check_results(results):
8263+            # Each result in results will be of the form (success, msg).
8264+            # We don't care about msg, but success will tell us whether
8265+            # or not the checkstring validated. If it didn't, we need to
8266+            # remove the offending (peer,share) from our active readers,
8267+            # and ensure that active readers is again populated.
8268+            bad_readers = []
8269+            for i, result in enumerate(results):
8270+                if not result[0]:
8271+                    reader = self._active_readers[i]
8272+                    f = result[1]
8273+                    assert isinstance(f, failure.Failure)
8274 
8275hunk ./src/allmydata/mutable/retrieve.py 512
8276-    def _query_failed(self, f, marker, peerid):
8277-        self.log(format="query to [%(peerid)s] failed",
8278-                 peerid=idlib.shortnodeid_b2a(peerid),
8279-                 level=log.NOISY)
8280-        self._status.problems[peerid] = f
8281-        self._outstanding_queries.pop(marker, None)
8282-        if not self._running:
8283-            return
8284-        self._last_failure = f
8285-        self.remove_peer(peerid)
8286-        level = log.WEIRD
8287-        if f.check(DeadReferenceError):
8288-            level = log.UNUSUAL
8289-        self.log(format="error during query: %(f_value)s",
8290-                 f_value=str(f.value), failure=f, level=level, umid="gOJB5g")
8291+                    self.log("The reader %s failed to "
8292+                             "properly validate: %s" % \
8293+                             (reader, str(f.value)))
8294+                    bad_readers.append((reader, f))
8295+                else:
8296+                    reader = self._active_readers[i]
8297+                    self.log("the reader %s checks out, so we'll use it" % \
8298+                             reader)
8299+                    self._validated_readers.add(reader)
8300+                    # Each time we validate a reader, we check to see if
8301+                    # we need the private key. If we do, we politely ask
8302+                    # for it and then continue computing. If we find
8303+                    # that we haven't gotten it at the end of
8304+                    # segment decoding, then we'll take more drastic
8305+                    # measures.
8306+                    if self._need_privkey and not self._node.is_readonly():
8307+                        d = reader.get_encprivkey()
8308+                        d.addCallback(self._try_to_validate_privkey, reader)
8309+            if bad_readers:
8310+                # We do them all at once, or else we screw up list indexing.
8311+                for (reader, f) in bad_readers:
8312+                    self._mark_bad_share(reader, f)
8313+                if self._verify:
8314+                    if len(self._active_readers) >= self._required_shares:
8315+                        return self._download_current_segment()
8316+                    else:
8317+                        return self._failed()
8318+                else:
8319+                    return self._add_active_peers()
8320+            else:
8321+                return self._download_current_segment()
8322+            # The next step will assert that it has enough active
8323+            # readers to fetch shares; we just need to remove it.
8324+        dl.addCallback(_check_results)
8325+        return dl
8326 
8327hunk ./src/allmydata/mutable/retrieve.py 548
8328-    def _check_for_done(self, res):
8329-        # exit paths:
8330-        #  return : keep waiting, no new queries
8331-        #  return self._send_more_queries(outstanding) : send some more queries
8332-        #  fire self._done(plaintext) : download successful
8333-        #  raise exception : download fails
8334 
8335hunk ./src/allmydata/mutable/retrieve.py 549
8336-        self.log(format="_check_for_done: running=%(running)s, decoding=%(decoding)s",
8337-                 running=self._running, decoding=self._decoding,
8338-                 level=log.NOISY)
8339-        if not self._running:
8340-            return
8341-        if self._decoding:
8342-            return
8343-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8344+    def _try_to_validate_prefix(self, prefix, reader):
8345+        """
8346+        I check that the prefix returned by a candidate server for
8347+        retrieval matches the prefix that the servermap knows about
8348+        (and, hence, the prefix that was validated earlier). If it does,
8349+        I return True, which means that I approve of the use of the
8350+        candidate server for segment retrieval. If it doesn't, I return
8351+        False, which means that another server must be chosen.
8352+        """
8353+        (seqnum,
8354+         root_hash,
8355+         IV,
8356+         segsize,
8357+         datalength,
8358+         k,
8359+         N,
8360+         known_prefix,
8361          offsets_tuple) = self.verinfo
8362hunk ./src/allmydata/mutable/retrieve.py 567
8363+        if known_prefix != prefix:
8364+            self.log("prefix from share %d doesn't match" % reader.shnum)
8365+            raise UncoordinatedWriteError("Mismatched prefix -- this could "
8366+                                          "indicate an uncoordinated write")
8367+        # Otherwise, we're okay -- no issues.
8368 
8369hunk ./src/allmydata/mutable/retrieve.py 573
8370-        if len(self.shares) < k:
8371-            # we don't have enough shares yet
8372-            return self._maybe_send_more_queries(k)
8373-        if self._need_privkey:
8374-            # we got k shares, but none of them had a valid privkey. TODO:
8375-            # look further. Adding code to do this is a bit complicated, and
8376-            # I want to avoid that complication, and this should be pretty
8377-            # rare (k shares with bitflips in the enc_privkey but not in the
8378-            # data blocks). If we actually do get here, the subsequent repair
8379-            # will fail for lack of a privkey.
8380-            self.log("got k shares but still need_privkey, bummer",
8381-                     level=log.WEIRD, umid="MdRHPA")
8382 
8383hunk ./src/allmydata/mutable/retrieve.py 574
8384-        # we have enough to finish. All the shares have had their hashes
8385-        # checked, so if something fails at this point, we don't know how
8386-        # to fix it, so the download will fail.
8387+    def _remove_reader(self, reader):
8388+        """
8389+        At various points, we will wish to remove a peer from
8390+        consideration and/or use. These include, but are not necessarily
8391+        limited to:
8392 
8393hunk ./src/allmydata/mutable/retrieve.py 580
8394-        self._decoding = True # avoid reentrancy
8395-        self._status.set_status("decoding")
8396-        now = time.time()
8397-        elapsed = now - self._started
8398-        self._status.timings["fetch"] = elapsed
8399+            - A connection error.
8400+            - A mismatched prefix (that is, a prefix that does not match
8401+              our conception of the version information string).
8402+            - A failing block hash, salt hash, or share hash, which can
8403+              indicate disk failure/bit flips, or network trouble.
8404 
8405hunk ./src/allmydata/mutable/retrieve.py 586
8406-        d = defer.maybeDeferred(self._decode)
8407-        d.addCallback(self._decrypt, IV, self._node.get_readkey())
8408-        d.addBoth(self._done)
8409-        return d # purely for test convenience
8410+        This method will do that. I will make sure that the
8411+        (shnum,reader) combination represented by my reader argument is
8412+        not used for anything else during this download. I will not
8413+        advise the reader of any corruption, something that my callers
8414+        may wish to do on their own.
8415+        """
8416+        # TODO: When you're done writing this, see if this is ever
8417+        # actually used for something that _mark_bad_share isn't. I have
8418+        # a feeling that they will be used for very similar things, and
8419+        # that having them both here is just going to be an epic amount
8420+        # of code duplication.
8421+        #
8422+        # (well, okay, not epic, but meaningful)
8423+        self.log("removing reader %s" % reader)
8424+        # Remove the reader from _active_readers
8425+        self._active_readers.remove(reader)
8426+        # TODO: self.readers.remove(reader)?
8427+        for shnum in list(self.remaining_sharemap.keys()):
8428+            self.remaining_sharemap.discard(shnum, reader.peerid)
8429 
8430hunk ./src/allmydata/mutable/retrieve.py 606
8431-    def _maybe_send_more_queries(self, k):
8432-        # we don't have enough shares yet. Should we send out more queries?
8433-        # There are some number of queries outstanding, each for a single
8434-        # share. If we can generate 'needed_shares' additional queries, we do
8435-        # so. If we can't, then we know this file is a goner, and we raise
8436-        # NotEnoughSharesError.
8437-        self.log(format=("_maybe_send_more_queries, have=%(have)d, k=%(k)d, "
8438-                         "outstanding=%(outstanding)d"),
8439-                 have=len(self.shares), k=k,
8440-                 outstanding=len(self._outstanding_queries),
8441-                 level=log.NOISY)
8442 
8443hunk ./src/allmydata/mutable/retrieve.py 607
8444-        remaining_shares = k - len(self.shares)
8445-        needed = remaining_shares - len(self._outstanding_queries)
8446-        if not needed:
8447-            # we have enough queries in flight already
8448+    def _mark_bad_share(self, reader, f):
8449+        """
8450+        I mark the (peerid, shnum) encapsulated by my reader argument as
8451+        a bad share, which means that it will not be used anywhere else.
8452 
8453hunk ./src/allmydata/mutable/retrieve.py 612
8454-            # TODO: but if they've been in flight for a long time, and we
8455-            # have reason to believe that new queries might respond faster
8456-            # (i.e. we've seen other queries come back faster, then consider
8457-            # sending out new queries. This could help with peers which have
8458-            # silently gone away since the servermap was updated, for which
8459-            # we're still waiting for the 15-minute TCP disconnect to happen.
8460-            self.log("enough queries are in flight, no more are needed",
8461-                     level=log.NOISY)
8462-            return
8463+        There are several reasons to want to mark something as a bad
8464+        share. These include:
8465+
8466+            - A connection error to the peer.
8467+            - A mismatched prefix (that is, a prefix that does not match
8468+              our local conception of the version information string).
8469+            - A failing block hash, salt hash, share hash, or other
8470+              integrity check.
8471 
8472hunk ./src/allmydata/mutable/retrieve.py 621
8473-        outstanding_shnums = set([shnum
8474-                                  for (peerid, shnum, started)
8475-                                  in self._outstanding_queries.values()])
8476-        # prefer low-numbered shares, they are more likely to be primary
8477-        available_shnums = sorted(self.remaining_sharemap.keys())
8478-        for shnum in available_shnums:
8479-            if shnum in outstanding_shnums:
8480-                # skip ones that are already in transit
8481-                continue
8482-            if shnum not in self.remaining_sharemap:
8483-                # no servers for that shnum. note that DictOfSets removes
8484-                # empty sets from the dict for us.
8485-                continue
8486-            peerid = list(self.remaining_sharemap[shnum])[0]
8487-            # get_data will remove that peerid from the sharemap, and add the
8488-            # query to self._outstanding_queries
8489-            self._status.set_status("Retrieving More Shares")
8490-            self.get_data(shnum, peerid)
8491-            needed -= 1
8492-            if not needed:
8493+        This method will ensure that readers that we wish to mark bad
8494+        (for these reasons or other reasons) are not used for the rest
8495+        of the download. Additionally, it will attempt to tell the
8496+        remote peer (with no guarantee of success) that its share is
8497+        corrupt.
8498+        """
8499+        self.log("marking share %d on server %s as bad" % \
8500+                 (reader.shnum, reader))
8501+        prefix = self.verinfo[-2]
8502+        self.servermap.mark_bad_share(reader.peerid,
8503+                                      reader.shnum,
8504+                                      prefix)
8505+        self._remove_reader(reader)
8506+        self._bad_shares.add((reader.peerid, reader.shnum, f))
8507+        self._status.problems[reader.peerid] = f
8508+        self._last_failure = f
8509+        self.notify_server_corruption(reader.peerid, reader.shnum,
8510+                                      str(f.value))
8511+
8512+
8513+    def _download_current_segment(self):
8514+        """
8515+        I download, validate, decode, decrypt, and assemble the segment
8516+        that this Retrieve is currently responsible for downloading.
8517+        """
8518+        assert len(self._active_readers) >= self._required_shares
8519+        if self._current_segment <= self._last_segment:
8520+            d = self._process_segment(self._current_segment)
8521+        else:
8522+            d = defer.succeed(None)
8523+        d.addBoth(self._turn_barrier)
8524+        d.addCallback(self._check_for_done)
8525+        return d
8526+
8527+
8528+    def _turn_barrier(self, result):
8529+        """
8530+        I help the download process avoid the recursion limit issues
8531+        discussed in #237.
8532+        """
8533+        return fireEventually(result)
8534+
8535+
8536+    def _process_segment(self, segnum):
8537+        """
8538+        I download, validate, decode, and decrypt one segment of the
8539+        file that this Retrieve is retrieving. This means coordinating
8540+        the process of getting k blocks of that file, validating them,
8541+        assembling them into one segment with the decoder, and then
8542+        decrypting them.
8543+        """
8544+        self.log("processing segment %d" % segnum)
8545+
8546+        # TODO: The old code uses a marker. Should this code do that
8547+        # too? What did the Marker do?
8548+        assert len(self._active_readers) >= self._required_shares
8549+
8550+        # We need to ask each of our active readers for its block and
8551+        # salt. We will then validate those. If validation is
8552+        # successful, we will assemble the results into plaintext.
8553+        ds = []
8554+        for reader in self._active_readers:
8555+            started = time.time()
8556+            d = reader.get_block_and_salt(segnum, queue=True)
8557+            d2 = self._get_needed_hashes(reader, segnum)
8558+            dl = defer.DeferredList([d, d2], consumeErrors=True)
8559+            dl.addCallback(self._validate_block, segnum, reader, started)
8560+            dl.addErrback(self._validation_or_decoding_failed, [reader])
8561+            ds.append(dl)
8562+            reader.flush()
8563+        dl = defer.DeferredList(ds)
8564+        if self._verify:
8565+            dl.addCallback(lambda ignored: "")
8566+            dl.addCallback(self._set_segment)
8567+        else:
8568+            dl.addCallback(self._maybe_decode_and_decrypt_segment, segnum)
8569+        return dl
8570+
8571+
8572+    def _maybe_decode_and_decrypt_segment(self, blocks_and_salts, segnum):
8573+        """
8574+        I take the results of fetching and validating the blocks from a
8575+        callback chain in another method. If the results are such that
8576+        they tell me that validation and fetching succeeded without
8577+        incident, I will proceed with decoding and decryption.
8578+        Otherwise, I will do nothing.
8579+        """
8580+        self.log("trying to decode and decrypt segment %d" % segnum)
8581+        failures = False
8582+        for block_and_salt in blocks_and_salts:
8583+            if not block_and_salt[0] or block_and_salt[1] == None:
8584+                self.log("some validation operations failed; not proceeding")
8585+                failures = True
8586                 break
8587hunk ./src/allmydata/mutable/retrieve.py 715
8588+        if not failures:
8589+            self.log("everything looks ok, building segment %d" % segnum)
8590+            d = self._decode_blocks(blocks_and_salts, segnum)
8591+            d.addCallback(self._decrypt_segment)
8592+            d.addErrback(self._validation_or_decoding_failed,
8593+                         self._active_readers)
8594+            # check to see whether we've been paused before writing
8595+            # anything.
8596+            d.addCallback(self._check_for_paused)
8597+            d.addCallback(self._set_segment)
8598+            return d
8599+        else:
8600+            return defer.succeed(None)
8601+
8602+
8603+    def _set_segment(self, segment):
8604+        """
8605+        Given a plaintext segment, I register that segment with the
8606+        target that is handling the file download.
8607+        """
8608+        self.log("got plaintext for segment %d" % self._current_segment)
8609+        if self._current_segment == self._start_segment:
8610+            # We're on the first segment. It's possible that we want
8611+            # only some part of the end of this segment, and that we
8612+            # just downloaded the whole thing to get that part. If so,
8613+            # we need to account for that and give the reader just the
8614+            # data that they want.
8615+            n = self._offset % self._segment_size
8616+            self.log("stripping %d bytes off of the first segment" % n)
8617+            self.log("original segment length: %d" % len(segment))
8618+            segment = segment[n:]
8619+            self.log("new segment length: %d" % len(segment))
8620+
8621+        if self._current_segment == self._last_segment and self._read_length is not None:
8622+            # We're on the last segment. It's possible that we only want
8623+            # part of the beginning of this segment, and that we
8624+            # downloaded the whole thing anyway. Make sure to give the
8625+            # caller only the portion of the segment that they want to
8626+            # receive.
8627+            extra = self._read_length
8628+            if self._start_segment != self._last_segment:
8629+                extra -= self._segment_size - \
8630+                            (self._offset % self._segment_size)
8631+            extra %= self._segment_size
8632+            self.log("original segment length: %d" % len(segment))
8633+            segment = segment[:extra]
8634+            self.log("new segment length: %d" % len(segment))
8635+            self.log("only taking %d bytes of the last segment" % extra)
8636+
8637+        if not self._verify:
8638+            self._consumer.write(segment)
8639+        else:
8640+            # we don't care about the plaintext if we are doing a verify.
8641+            segment = None
8642+        self._current_segment += 1
8643 
8644hunk ./src/allmydata/mutable/retrieve.py 771
8645-        # at this point, we have as many outstanding queries as we can. If
8646-        # needed!=0 then we might not have enough to recover the file.
8647-        if needed:
8648-            format = ("ran out of peers: "
8649-                      "have %(have)d shares (k=%(k)d), "
8650-                      "%(outstanding)d queries in flight, "
8651-                      "need %(need)d more, "
8652-                      "found %(bad)d bad shares")
8653-            args = {"have": len(self.shares),
8654-                    "k": k,
8655-                    "outstanding": len(self._outstanding_queries),
8656-                    "need": needed,
8657-                    "bad": len(self._bad_shares),
8658-                    }
8659-            self.log(format=format,
8660-                     level=log.WEIRD, umid="ezTfjw", **args)
8661-            err = NotEnoughSharesError("%s, last failure: %s" %
8662-                                      (format % args, self._last_failure))
8663-            if self._bad_shares:
8664-                self.log("We found some bad shares this pass. You should "
8665-                         "update the servermap and try again to check "
8666-                         "more peers",
8667-                         level=log.WEIRD, umid="EFkOlA")
8668-                err.servermap = self.servermap
8669-            raise err
8670 
8671hunk ./src/allmydata/mutable/retrieve.py 772
8672+    def _validation_or_decoding_failed(self, f, readers):
8673+        """
8674+        I am called when a block or a salt fails to correctly validate, or when
8675+        the decryption or decoding operation fails for some reason.  I react to
8676+        this failure by notifying the remote server of corruption, and then
8677+        removing the remote peer from further activity.
8678+        """
8679+        assert isinstance(readers, list)
8680+        bad_shnums = [reader.shnum for reader in readers]
8681+
8682+        self.log("validation or decoding failed on share(s) %s, peer(s) %s "
8683+                 ", segment %d: %s" % \
8684+                 (bad_shnums, readers, self._current_segment, str(f)))
8685+        for reader in readers:
8686+            self._mark_bad_share(reader, f)
8687         return
8688 
8689hunk ./src/allmydata/mutable/retrieve.py 789
8690-    def _decode(self):
8691-        started = time.time()
8692-        (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
8693-         offsets_tuple) = self.verinfo
8694 
8695hunk ./src/allmydata/mutable/retrieve.py 790
8696-        # shares_dict is a dict mapping shnum to share data, but the codec
8697-        # wants two lists.
8698-        shareids = []; shares = []
8699-        for shareid, share in self.shares.items():
8700+    def _validate_block(self, results, segnum, reader, started):
8701+        """
8702+        I validate a block from one share on a remote server.
8703+        """
8704+        # Grab the part of the block hash tree that is necessary to
8705+        # validate this block, then generate the block hash root.
8706+        self.log("validating share %d for segment %d" % (reader.shnum,
8707+                                                             segnum))
8708+        self._status.add_fetch_timing(reader.peerid, started)
8709+        self._status.set_status("Valdiating blocks for segment %d" % segnum)
8710+        # Did we fail to fetch either of the things that we were
8711+        # supposed to? Fail if so.
8712+        if not results[0][0] and results[1][0]:
8713+            # handled by the errback handler.
8714+
8715+            # These all get batched into one query, so the resulting
8716+            # failure should be the same for all of them, so we can just
8717+            # use the first one.
8718+            assert isinstance(results[0][1], failure.Failure)
8719+
8720+            f = results[0][1]
8721+            raise CorruptShareError(reader.peerid,
8722+                                    reader.shnum,
8723+                                    "Connection error: %s" % str(f))
8724+
8725+        block_and_salt, block_and_sharehashes = results
8726+        block, salt = block_and_salt[1]
8727+        blockhashes, sharehashes = block_and_sharehashes[1]
8728+
8729+        blockhashes = dict(enumerate(blockhashes[1]))
8730+        self.log("the reader gave me the following blockhashes: %s" % \
8731+                 blockhashes.keys())
8732+        self.log("the reader gave me the following sharehashes: %s" % \
8733+                 sharehashes[1].keys())
8734+        bht = self._block_hash_trees[reader.shnum]
8735+
8736+        if bht.needed_hashes(segnum, include_leaf=True):
8737+            try:
8738+                bht.set_hashes(blockhashes)
8739+            except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8740+                    IndexError), e:
8741+                raise CorruptShareError(reader.peerid,
8742+                                        reader.shnum,
8743+                                        "block hash tree failure: %s" % e)
8744+
8745+        if self._version == MDMF_VERSION:
8746+            blockhash = hashutil.block_hash(salt + block)
8747+        else:
8748+            blockhash = hashutil.block_hash(block)
8749+        # If this works without an error, then validation is
8750+        # successful.
8751+        try:
8752+           bht.set_hashes(leaves={segnum: blockhash})
8753+        except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8754+                IndexError), e:
8755+            raise CorruptShareError(reader.peerid,
8756+                                    reader.shnum,
8757+                                    "block hash tree failure: %s" % e)
8758+
8759+        # Reaching this point means that we know that this segment
8760+        # is correct. Now we need to check to see whether the share
8761+        # hash chain is also correct.
8762+        # SDMF wrote share hash chains that didn't contain the
8763+        # leaves, which would be produced from the block hash tree.
8764+        # So we need to validate the block hash tree first. If
8765+        # successful, then bht[0] will contain the root for the
8766+        # shnum, which will be a leaf in the share hash tree, which
8767+        # will allow us to validate the rest of the tree.
8768+        if self.share_hash_tree.needed_hashes(reader.shnum,
8769+                                              include_leaf=True) or \
8770+                                              self._verify:
8771+            try:
8772+                self.share_hash_tree.set_hashes(hashes=sharehashes[1],
8773+                                            leaves={reader.shnum: bht[0]})
8774+            except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \
8775+                    IndexError), e:
8776+                raise CorruptShareError(reader.peerid,
8777+                                        reader.shnum,
8778+                                        "corrupt hashes: %s" % e)
8779+
8780+        self.log('share %d is valid for segment %d' % (reader.shnum,
8781+                                                       segnum))
8782+        return {reader.shnum: (block, salt)}
8783+
8784+
8785+    def _get_needed_hashes(self, reader, segnum):
8786+        """
8787+        I get the hashes needed to validate segnum from the reader, then return
8788+        to my caller when this is done.
8789+        """
8790+        bht = self._block_hash_trees[reader.shnum]
8791+        needed = bht.needed_hashes(segnum, include_leaf=True)
8792+        # The root of the block hash tree is also a leaf in the share
8793+        # hash tree. So we don't need to fetch it from the remote
8794+        # server. In the case of files with one segment, this means that
8795+        # we won't fetch any block hash tree from the remote server,
8796+        # since the hash of each share of the file is the entire block
8797+        # hash tree, and is a leaf in the share hash tree. This is fine,
8798+        # since any share corruption will be detected in the share hash
8799+        # tree.
8800+        #needed.discard(0)
8801+        self.log("getting blockhashes for segment %d, share %d: %s" % \
8802+                 (segnum, reader.shnum, str(needed)))
8803+        d1 = reader.get_blockhashes(needed, queue=True, force_remote=True)
8804+        if self.share_hash_tree.needed_hashes(reader.shnum):
8805+            need = self.share_hash_tree.needed_hashes(reader.shnum)
8806+            self.log("also need sharehashes for share %d: %s" % (reader.shnum,
8807+                                                                 str(need)))
8808+            d2 = reader.get_sharehashes(need, queue=True, force_remote=True)
8809+        else:
8810+            d2 = defer.succeed({}) # the logic in the next method
8811+                                   # expects a dict
8812+        dl = defer.DeferredList([d1, d2], consumeErrors=True)
8813+        return dl
8814+
8815+
8816+    def _decode_blocks(self, blocks_and_salts, segnum):
8817+        """
8818+        I take a list of k blocks and salts, and decode that into a
8819+        single encrypted segment.
8820+        """
8821+        d = {}
8822+        # We want to merge our dictionaries to the form
8823+        # {shnum: blocks_and_salts}
8824+        #
8825+        # The dictionaries come from validate block that way, so we just
8826+        # need to merge them.
8827+        for block_and_salt in blocks_and_salts:
8828+            d.update(block_and_salt[1])
8829+
8830+        # All of these blocks should have the same salt; in SDMF, it is
8831+        # the file-wide IV, while in MDMF it is the per-segment salt. In
8832+        # either case, we just need to get one of them and use it.
8833+        #
8834+        # d.items()[0] is like (shnum, (block, salt))
8835+        # d.items()[0][1] is like (block, salt)
8836+        # d.items()[0][1][1] is the salt.
8837+        salt = d.items()[0][1][1]
8838+        # Next, extract just the blocks from the dict. We'll use the
8839+        # salt in the next step.
8840+        share_and_shareids = [(k, v[0]) for k, v in d.items()]
8841+        d2 = dict(share_and_shareids)
8842+        shareids = []
8843+        shares = []
8844+        for shareid, share in d2.items():
8845             shareids.append(shareid)
8846             shares.append(share)
8847 
8848hunk ./src/allmydata/mutable/retrieve.py 938
8849-        assert len(shareids) >= k, len(shareids)
8850+        self._status.set_status("Decoding")
8851+        started = time.time()
8852+        assert len(shareids) >= self._required_shares, len(shareids)
8853         # zfec really doesn't want extra shares
8854hunk ./src/allmydata/mutable/retrieve.py 942
8855-        shareids = shareids[:k]
8856-        shares = shares[:k]
8857-
8858-        fec = codec.CRSDecoder()
8859-        fec.set_params(segsize, k, N)
8860-
8861-        self.log("params %s, we have %d shares" % ((segsize, k, N), len(shares)))
8862-        self.log("about to decode, shareids=%s" % (shareids,))
8863-        d = defer.maybeDeferred(fec.decode, shares, shareids)
8864-        def _done(buffers):
8865-            self._status.timings["decode"] = time.time() - started
8866-            self.log(" decode done, %d buffers" % len(buffers))
8867+        shareids = shareids[:self._required_shares]
8868+        shares = shares[:self._required_shares]
8869+        self.log("decoding segment %d" % segnum)
8870+        if segnum == self._num_segments - 1:
8871+            d = defer.maybeDeferred(self._tail_decoder.decode, shares, shareids)
8872+        else:
8873+            d = defer.maybeDeferred(self._segment_decoder.decode, shares, shareids)
8874+        def _process(buffers):
8875             segment = "".join(buffers)
8876hunk ./src/allmydata/mutable/retrieve.py 951
8877+            self.log(format="now decoding segment %(segnum)s of %(numsegs)s",
8878+                     segnum=segnum,
8879+                     numsegs=self._num_segments,
8880+                     level=log.NOISY)
8881             self.log(" joined length %d, datalength %d" %
8882hunk ./src/allmydata/mutable/retrieve.py 956
8883-                     (len(segment), datalength))
8884-            segment = segment[:datalength]
8885+                     (len(segment), self._data_length))
8886+            if segnum == self._num_segments - 1:
8887+                size_to_use = self._tail_data_size
8888+            else:
8889+                size_to_use = self._segment_size
8890+            segment = segment[:size_to_use]
8891             self.log(" segment len=%d" % len(segment))
8892hunk ./src/allmydata/mutable/retrieve.py 963
8893-            return segment
8894-        def _err(f):
8895-            self.log(" decode failed: %s" % f)
8896-            return f
8897-        d.addCallback(_done)
8898-        d.addErrback(_err)
8899+            self._status.timings.setdefault("decode", 0)
8900+            self._status.timings['decode'] = time.time() - started
8901+            return segment, salt
8902+        d.addCallback(_process)
8903         return d
8904 
8905hunk ./src/allmydata/mutable/retrieve.py 969
8906-    def _decrypt(self, crypttext, IV, readkey):
8907+
8908+    def _decrypt_segment(self, segment_and_salt):
8909+        """
8910+        I take a single segment and its salt, and decrypt it. I return
8911+        the plaintext of the segment that is in my argument.
8912+        """
8913+        segment, salt = segment_and_salt
8914         self._status.set_status("decrypting")
8915hunk ./src/allmydata/mutable/retrieve.py 977
8916+        self.log("decrypting segment %d" % self._current_segment)
8917         started = time.time()
8918hunk ./src/allmydata/mutable/retrieve.py 979
8919-        key = hashutil.ssk_readkey_data_hash(IV, readkey)
8920+        key = hashutil.ssk_readkey_data_hash(salt, self._node.get_readkey())
8921         decryptor = AES(key)
8922hunk ./src/allmydata/mutable/retrieve.py 981
8923-        plaintext = decryptor.process(crypttext)
8924-        self._status.timings["decrypt"] = time.time() - started
8925+        plaintext = decryptor.process(segment)
8926+        self._status.timings.setdefault("decrypt", 0)
8927+        self._status.timings['decrypt'] = time.time() - started
8928         return plaintext
8929 
8930hunk ./src/allmydata/mutable/retrieve.py 986
8931-    def _done(self, res):
8932-        if not self._running:
8933+
8934+    def notify_server_corruption(self, peerid, shnum, reason):
8935+        ss = self.servermap.connections[peerid]
8936+        ss.callRemoteOnly("advise_corrupt_share",
8937+                          "mutable", self._storage_index, shnum, reason)
8938+
8939+
8940+    def _try_to_validate_privkey(self, enc_privkey, reader):
8941+        alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
8942+        alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
8943+        if alleged_writekey != self._node.get_writekey():
8944+            self.log("invalid privkey from %s shnum %d" %
8945+                     (reader, reader.shnum),
8946+                     level=log.WEIRD, umid="YIw4tA")
8947+            if self._verify:
8948+                self.servermap.mark_bad_share(reader.peerid, reader.shnum,
8949+                                              self.verinfo[-2])
8950+                e = CorruptShareError(reader.peerid,
8951+                                      reader.shnum,
8952+                                      "invalid privkey")
8953+                f = failure.Failure(e)
8954+                self._bad_shares.add((reader.peerid, reader.shnum, f))
8955             return
8956hunk ./src/allmydata/mutable/retrieve.py 1009
8957+
8958+        # it's good
8959+        self.log("got valid privkey from shnum %d on reader %s" %
8960+                 (reader.shnum, reader))
8961+        privkey = rsa.create_signing_key_from_string(alleged_privkey_s)
8962+        self._node._populate_encprivkey(enc_privkey)
8963+        self._node._populate_privkey(privkey)
8964+        self._need_privkey = False
8965+
8966+
8967+    def _check_for_done(self, res):
8968+        """
8969+        I check to see if this Retrieve object has successfully finished
8970+        its work.
8971+
8972+        I can exit in the following ways:
8973+            - If there are no more segments to download, then I exit by
8974+              causing self._done_deferred to fire with the plaintext
8975+              content requested by the caller.
8976+            - If there are still segments to be downloaded, and there
8977+              are enough active readers (readers which have not broken
8978+              and have not given us corrupt data) to continue
8979+              downloading, I send control back to
8980+              _download_current_segment.
8981+            - If there are still segments to be downloaded but there are
8982+              not enough active peers to download them, I ask
8983+              _add_active_peers to add more peers. If it is successful,
8984+              it will call _download_current_segment. If there are not
8985+              enough peers to retrieve the file, then that will cause
8986+              _done_deferred to errback.
8987+        """
8988+        self.log("checking for doneness")
8989+        if self._current_segment > self._last_segment:
8990+            # No more segments to download, we're done.
8991+            self.log("got plaintext, done")
8992+            return self._done()
8993+
8994+        if len(self._active_readers) >= self._required_shares:
8995+            # More segments to download, but we have enough good peers
8996+            # in self._active_readers that we can do that without issue,
8997+            # so go nab the next segment.
8998+            self.log("not done yet: on segment %d of %d" % \
8999+                     (self._current_segment + 1, self._num_segments))
9000+            return self._download_current_segment()
9001+
9002+        self.log("not done yet: on segment %d of %d, need to add peers" % \
9003+                 (self._current_segment + 1, self._num_segments))
9004+        return self._add_active_peers()
9005+
9006+
9007+    def _done(self):
9008+        """
9009+        I am called by _check_for_done when the download process has
9010+        finished successfully. After making some useful logging
9011+        statements, I return the decrypted contents to the owner of this
9012+        Retrieve object through self._done_deferred.
9013+        """
9014         self._running = False
9015         self._status.set_active(False)
9016hunk ./src/allmydata/mutable/retrieve.py 1068
9017-        self._status.timings["total"] = time.time() - self._started
9018-        # res is either the new contents, or a Failure
9019-        if isinstance(res, failure.Failure):
9020-            self.log("Retrieve done, with failure", failure=res,
9021-                     level=log.UNUSUAL)
9022-            self._status.set_status("Failed")
9023+        now = time.time()
9024+        self._status.timings['total'] = now - self._started
9025+        self._status.timings['fetch'] = now - self._started_fetching
9026+
9027+        if self._verify:
9028+            ret = list(self._bad_shares)
9029+            self.log("done verifying, found %d bad shares" % len(ret))
9030         else:
9031hunk ./src/allmydata/mutable/retrieve.py 1076
9032-            self.log("Retrieve done, success!")
9033-            self._status.set_status("Finished")
9034-            self._status.set_progress(1.0)
9035-            # remember the encoding parameters, use them again next time
9036-            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9037-             offsets_tuple) = self.verinfo
9038-            self._node._populate_required_shares(k)
9039-            self._node._populate_total_shares(N)
9040-        eventually(self._done_deferred.callback, res)
9041+            # TODO: upload status here?
9042+            ret = self._consumer
9043+            self._consumer.unregisterProducer()
9044+        eventually(self._done_deferred.callback, ret)
9045+
9046 
9047hunk ./src/allmydata/mutable/retrieve.py 1082
9048+    def _failed(self):
9049+        """
9050+        I am called by _add_active_peers when there are not enough
9051+        active peers left to complete the download. After making some
9052+        useful logging statements, I return an exception to that effect
9053+        to the caller of this Retrieve object through
9054+        self._done_deferred.
9055+        """
9056+        self._running = False
9057+        self._status.set_active(False)
9058+        now = time.time()
9059+        self._status.timings['total'] = now - self._started
9060+        self._status.timings['fetch'] = now - self._started_fetching
9061+
9062+        if self._verify:
9063+            ret = list(self._bad_shares)
9064+        else:
9065+            format = ("ran out of peers: "
9066+                      "have %(have)d of %(total)d segments "
9067+                      "found %(bad)d bad shares "
9068+                      "encoding %(k)d-of-%(n)d")
9069+            args = {"have": self._current_segment,
9070+                    "total": self._num_segments,
9071+                    "need": self._last_segment,
9072+                    "k": self._required_shares,
9073+                    "n": self._total_shares,
9074+                    "bad": len(self._bad_shares)}
9075+            e = NotEnoughSharesError("%s, last failure: %s" % \
9076+                                     (format % args, str(self._last_failure)))
9077+            f = failure.Failure(e)
9078+            ret = f
9079+        eventually(self._done_deferred.callback, ret)
9080}
9081[mutable/servermap.py: Alter the servermap updater to work with MDMF files
9082Kevan Carstensen <kevan@isnotajoke.com>**20100819003439
9083 Ignore-this: 7e408303194834bd59a2f27efab3bdb
9084 
9085 These modifications were basically all to the end of having the
9086 servermap updater use the unified MDMF + SDMF read interface whenever
9087 possible -- this reduces the complexity of the code, making it easier to
9088 read and maintain. To do this, I needed to modify the process of
9089 updating the servermap a little bit.
9090 
9091 To support partial-file updates, I also modified the servermap updater
9092 to fetch the block hash trees and certain segments of files while it
9093 performed a servermap update (this can be done without adding any new
9094 roundtrips because of batch-read functionality that the read proxy has).
9095 
9096] {
9097hunk ./src/allmydata/mutable/servermap.py 2
9098 
9099-import sys, time
9100+import sys, time, struct
9101 from zope.interface import implements
9102 from itertools import count
9103 from twisted.internet import defer
9104merger 0.0 (
9105hunk ./src/allmydata/mutable/servermap.py 9
9106+from allmydata.util.dictutil import DictOfSets
9107hunk ./src/allmydata/mutable/servermap.py 7
9108-from foolscap.api import DeadReferenceError, RemoteException, eventually
9109-from allmydata.util import base32, hashutil, idlib, log
9110+from foolscap.api import DeadReferenceError, RemoteException, eventually, \
9111+                         fireEventually
9112+from allmydata.util import base32, hashutil, idlib, log, deferredutil
9113)
9114merger 0.0 (
9115hunk ./src/allmydata/mutable/servermap.py 14
9116-     DictOfSets, CorruptShareError, NeedMoreDataError
9117+     CorruptShareError, NeedMoreDataError
9118hunk ./src/allmydata/mutable/servermap.py 14
9119-     DictOfSets, CorruptShareError, NeedMoreDataError
9120-from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \
9121-     SIGNED_PREFIX_LENGTH
9122+     DictOfSets, CorruptShareError
9123+from allmydata.mutable.layout import SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy
9124)
9125hunk ./src/allmydata/mutable/servermap.py 123
9126         self.bad_shares = {} # maps (peerid,shnum) to old checkstring
9127         self.last_update_mode = None
9128         self.last_update_time = 0
9129+        self.update_data = {} # (verinfo,shnum) => data
9130 
9131     def copy(self):
9132         s = ServerMap()
9133hunk ./src/allmydata/mutable/servermap.py 254
9134         """Return a set of versionids, one for each version that is currently
9135         recoverable."""
9136         versionmap = self.make_versionmap()
9137-
9138         recoverable_versions = set()
9139         for (verinfo, shares) in versionmap.items():
9140             (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9141hunk ./src/allmydata/mutable/servermap.py 339
9142         return False
9143 
9144 
9145+    def get_update_data_for_share_and_verinfo(self, shnum, verinfo):
9146+        """
9147+        I return the update data for the given shnum
9148+        """
9149+        update_data = self.update_data[shnum]
9150+        update_datum = [i[1] for i in update_data if i[0] == verinfo][0]
9151+        return update_datum
9152+
9153+
9154+    def set_update_data_for_share_and_verinfo(self, shnum, verinfo, data):
9155+        """
9156+        I record the block hash tree for the given shnum.
9157+        """
9158+        self.update_data.setdefault(shnum , []).append((verinfo, data))
9159+
9160+
9161 class ServermapUpdater:
9162     def __init__(self, filenode, storage_broker, monitor, servermap,
9163hunk ./src/allmydata/mutable/servermap.py 357
9164-                 mode=MODE_READ, add_lease=False):
9165+                 mode=MODE_READ, add_lease=False, update_range=None):
9166         """I update a servermap, locating a sufficient number of useful
9167         shares and remembering where they are located.
9168 
9169hunk ./src/allmydata/mutable/servermap.py 382
9170         self._servers_responded = set()
9171 
9172         # how much data should we read?
9173+        # SDMF:
9174         #  * if we only need the checkstring, then [0:75]
9175         #  * if we need to validate the checkstring sig, then [543ish:799ish]
9176         #  * if we need the verification key, then [107:436ish]
9177merger 0.0 (
9178hunk ./src/allmydata/mutable/servermap.py 392
9179-        # read 2000 bytes, which also happens to read enough actual data to
9180-        # pre-fetch a 9-entry dirnode.
9181+        # read 4000 bytes, which also happens to read enough actual data to
9182+        # pre-fetch an 18-entry dirnode.
9183hunk ./src/allmydata/mutable/servermap.py 390
9184-        # A future version of the SMDF slot format should consider using
9185-        # fixed-size slots so we can retrieve less data. For now, we'll just
9186-        # read 2000 bytes, which also happens to read enough actual data to
9187-        # pre-fetch a 9-entry dirnode.
9188+        # MDMF:
9189+        #  * Checkstring? [0:72]
9190+        #  * If we want to validate the checkstring, then [0:72], [143:?] --
9191+        #    the offset table will tell us for sure.
9192+        #  * If we need the verification key, we have to consult the offset
9193+        #    table as well.
9194+        # At this point, we don't know which we are. Our filenode can
9195+        # tell us, but it might be lying -- in some cases, we're
9196+        # responsible for telling it which kind of file it is.
9197)
9198hunk ./src/allmydata/mutable/servermap.py 399
9199             # we use unpack_prefix_and_signature, so we need 1k
9200             self._read_size = 1000
9201         self._need_privkey = False
9202+
9203         if mode == MODE_WRITE and not self._node.get_privkey():
9204             self._need_privkey = True
9205         # check+repair: repair requires the privkey, so if we didn't happen
9206hunk ./src/allmydata/mutable/servermap.py 406
9207         # to ask for it during the check, we'll have problems doing the
9208         # publish.
9209 
9210+        self.fetch_update_data = False
9211+        if mode == MODE_WRITE and update_range:
9212+            # We're updating the servermap in preparation for an
9213+            # in-place file update, so we need to fetch some additional
9214+            # data from each share that we find.
9215+            assert len(update_range) == 2
9216+
9217+            self.start_segment = update_range[0]
9218+            self.end_segment = update_range[1]
9219+            self.fetch_update_data = True
9220+
9221         prefix = si_b2a(self._storage_index)[:5]
9222         self._log_number = log.msg(format="SharemapUpdater(%(si)s): starting (%(mode)s)",
9223                                    si=prefix, mode=mode)
9224merger 0.0 (
9225hunk ./src/allmydata/mutable/servermap.py 455
9226-        full_peerlist = sb.get_servers_for_index(self._storage_index)
9227+        full_peerlist = [(s.get_serverid(), s.get_rref())
9228+                         for s in sb.get_servers_for_psi(self._storage_index)]
9229hunk ./src/allmydata/mutable/servermap.py 455
9230+        # All of the peers, permuted by the storage index, as usual.
9231)
9232hunk ./src/allmydata/mutable/servermap.py 461
9233         self._good_peers = set() # peers who had some shares
9234         self._empty_peers = set() # peers who don't have any shares
9235         self._bad_peers = set() # peers to whom our queries failed
9236+        self._readers = {} # peerid -> dict(sharewriters), filled in
9237+                           # after responses come in.
9238 
9239         k = self._node.get_required_shares()
9240hunk ./src/allmydata/mutable/servermap.py 465
9241+        # For what cases can these conditions work?
9242         if k is None:
9243             # make a guess
9244             k = 3
9245hunk ./src/allmydata/mutable/servermap.py 478
9246         self.num_peers_to_query = k + self.EPSILON
9247 
9248         if self.mode == MODE_CHECK:
9249+            # We want to query all of the peers.
9250             initial_peers_to_query = dict(full_peerlist)
9251             must_query = set(initial_peers_to_query.keys())
9252             self.extra_peers = []
9253hunk ./src/allmydata/mutable/servermap.py 486
9254             # we're planning to replace all the shares, so we want a good
9255             # chance of finding them all. We will keep searching until we've
9256             # seen epsilon that don't have a share.
9257+            # We don't query all of the peers because that could take a while.
9258             self.num_peers_to_query = N + self.EPSILON
9259             initial_peers_to_query, must_query = self._build_initial_querylist()
9260             self.required_num_empty_peers = self.EPSILON
9261hunk ./src/allmydata/mutable/servermap.py 496
9262             # might also avoid the round trip required to read the encrypted
9263             # private key.
9264 
9265-        else:
9266+        else: # MODE_READ, MODE_ANYTHING
9267+            # 2k peers is good enough.
9268             initial_peers_to_query, must_query = self._build_initial_querylist()
9269 
9270         # this is a set of peers that we are required to get responses from:
9271hunk ./src/allmydata/mutable/servermap.py 512
9272         # before we can consider ourselves finished, and self.extra_peers
9273         # contains the overflow (peers that we should tap if we don't get
9274         # enough responses)
9275+        # I guess that self._must_query is a subset of
9276+        # initial_peers_to_query?
9277+        assert set(must_query).issubset(set(initial_peers_to_query))
9278 
9279         self._send_initial_requests(initial_peers_to_query)
9280         self._status.timings["initial_queries"] = time.time() - self._started
9281hunk ./src/allmydata/mutable/servermap.py 571
9282         # errors that aren't handled by _query_failed (and errors caused by
9283         # _query_failed) get logged, but we still want to check for doneness.
9284         d.addErrback(log.err)
9285-        d.addBoth(self._check_for_done)
9286         d.addErrback(self._fatal_error)
9287hunk ./src/allmydata/mutable/servermap.py 572
9288+        d.addCallback(self._check_for_done)
9289         return d
9290 
9291     def _do_read(self, ss, peerid, storage_index, shnums, readv):
9292hunk ./src/allmydata/mutable/servermap.py 591
9293         d = ss.callRemote("slot_readv", storage_index, shnums, readv)
9294         return d
9295 
9296+
9297+    def _got_corrupt_share(self, e, shnum, peerid, data, lp):
9298+        """
9299+        I am called when a remote server returns a corrupt share in
9300+        response to one of our queries. By corrupt, I mean a share
9301+        without a valid signature. I then record the failure, notify the
9302+        server of the corruption, and record the share as bad.
9303+        """
9304+        f = failure.Failure(e)
9305+        self.log(format="bad share: %(f_value)s", f_value=str(f),
9306+                 failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
9307+        # Notify the server that its share is corrupt.
9308+        self.notify_server_corruption(peerid, shnum, str(e))
9309+        # By flagging this as a bad peer, we won't count any of
9310+        # the other shares on that peer as valid, though if we
9311+        # happen to find a valid version string amongst those
9312+        # shares, we'll keep track of it so that we don't need
9313+        # to validate the signature on those again.
9314+        self._bad_peers.add(peerid)
9315+        self._last_failure = f
9316+        # XXX: Use the reader for this?
9317+        checkstring = data[:SIGNED_PREFIX_LENGTH]
9318+        self._servermap.mark_bad_share(peerid, shnum, checkstring)
9319+        self._servermap.problems.append(f)
9320+
9321+
9322+    def _cache_good_sharedata(self, verinfo, shnum, now, data):
9323+        """
9324+        If one of my queries returns successfully (which means that we
9325+        were able to and successfully did validate the signature), I
9326+        cache the data that we initially fetched from the storage
9327+        server. This will help reduce the number of roundtrips that need
9328+        to occur when the file is downloaded, or when the file is
9329+        updated.
9330+        """
9331+        if verinfo:
9332+            self._node._add_to_cache(verinfo, shnum, 0, data, now)
9333+
9334+
9335     def _got_results(self, datavs, peerid, readsize, stuff, started):
9336         lp = self.log(format="got result from [%(peerid)s], %(numshares)d shares",
9337                       peerid=idlib.shortnodeid_b2a(peerid),
9338hunk ./src/allmydata/mutable/servermap.py 633
9339-                      numshares=len(datavs),
9340-                      level=log.NOISY)
9341+                      numshares=len(datavs))
9342         now = time.time()
9343         elapsed = now - started
9344hunk ./src/allmydata/mutable/servermap.py 636
9345-        self._queries_outstanding.discard(peerid)
9346-        self._servermap.reachable_peers.add(peerid)
9347-        self._must_query.discard(peerid)
9348-        self._queries_completed += 1
9349+        def _done_processing(ignored=None):
9350+            self._queries_outstanding.discard(peerid)
9351+            self._servermap.reachable_peers.add(peerid)
9352+            self._must_query.discard(peerid)
9353+            self._queries_completed += 1
9354         if not self._running:
9355hunk ./src/allmydata/mutable/servermap.py 642
9356-            self.log("but we're not running, so we'll ignore it", parent=lp,
9357-                     level=log.NOISY)
9358+            self.log("but we're not running, so we'll ignore it", parent=lp)
9359+            _done_processing()
9360             self._status.add_per_server_time(peerid, "late", started, elapsed)
9361             return
9362         self._status.add_per_server_time(peerid, "query", started, elapsed)
9363hunk ./src/allmydata/mutable/servermap.py 653
9364         else:
9365             self._empty_peers.add(peerid)
9366 
9367-        last_verinfo = None
9368-        last_shnum = None
9369+        ss, storage_index = stuff
9370+        ds = []
9371+
9372         for shnum,datav in datavs.items():
9373             data = datav[0]
9374             try:
9375merger 0.0 (
9376hunk ./src/allmydata/mutable/servermap.py 662
9377-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
9378+                self._node._add_to_cache(verinfo, shnum, 0, data)
9379hunk ./src/allmydata/mutable/servermap.py 658
9380-            try:
9381-                verinfo = self._got_results_one_share(shnum, data, peerid, lp)
9382-                last_verinfo = verinfo
9383-                last_shnum = shnum
9384-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
9385-            except CorruptShareError, e:
9386-                # log it and give the other shares a chance to be processed
9387-                f = failure.Failure()
9388-                self.log(format="bad share: %(f_value)s", f_value=str(f.value),
9389-                         failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
9390-                self.notify_server_corruption(peerid, shnum, str(e))
9391-                self._bad_peers.add(peerid)
9392-                self._last_failure = f
9393-                checkstring = data[:SIGNED_PREFIX_LENGTH]
9394-                self._servermap.mark_bad_share(peerid, shnum, checkstring)
9395-                self._servermap.problems.append(f)
9396-                pass
9397+            reader = MDMFSlotReadProxy(ss,
9398+                                       storage_index,
9399+                                       shnum,
9400+                                       data)
9401+            self._readers.setdefault(peerid, dict())[shnum] = reader
9402+            # our goal, with each response, is to validate the version
9403+            # information and share data as best we can at this point --
9404+            # we do this by validating the signature. To do this, we
9405+            # need to do the following:
9406+            #   - If we don't already have the public key, fetch the
9407+            #     public key. We use this to validate the signature.
9408+            if not self._node.get_pubkey():
9409+                # fetch and set the public key.
9410+                d = reader.get_verification_key(queue=True)
9411+                d.addCallback(lambda results, shnum=shnum, peerid=peerid:
9412+                    self._try_to_set_pubkey(results, peerid, shnum, lp))
9413+                # XXX: Make self._pubkey_query_failed?
9414+                d.addErrback(lambda error, shnum=shnum, peerid=peerid:
9415+                    self._got_corrupt_share(error, shnum, peerid, data, lp))
9416+            else:
9417+                # we already have the public key.
9418+                d = defer.succeed(None)
9419)
9420hunk ./src/allmydata/mutable/servermap.py 676
9421                 self._servermap.problems.append(f)
9422                 pass
9423 
9424-        self._status.timings["cumulative_verify"] += (time.time() - now)
9425+            # Neither of these two branches return anything of
9426+            # consequence, so the first entry in our deferredlist will
9427+            # be None.
9428 
9429hunk ./src/allmydata/mutable/servermap.py 680
9430-        if self._need_privkey and last_verinfo:
9431-            # send them a request for the privkey. We send one request per
9432-            # server.
9433-            lp2 = self.log("sending privkey request",
9434-                           parent=lp, level=log.NOISY)
9435-            (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9436-             offsets_tuple) = last_verinfo
9437-            o = dict(offsets_tuple)
9438+            # - Next, we need the version information. We almost
9439+            #   certainly got this by reading the first thousand or so
9440+            #   bytes of the share on the storage server, so we
9441+            #   shouldn't need to fetch anything at this step.
9442+            d2 = reader.get_verinfo()
9443+            d2.addErrback(lambda error, shnum=shnum, peerid=peerid:
9444+                self._got_corrupt_share(error, shnum, peerid, data, lp))
9445+            # - Next, we need the signature. For an SDMF share, it is
9446+            #   likely that we fetched this when doing our initial fetch
9447+            #   to get the version information. In MDMF, this lives at
9448+            #   the end of the share, so unless the file is quite small,
9449+            #   we'll need to do a remote fetch to get it.
9450+            d3 = reader.get_signature(queue=True)
9451+            d3.addErrback(lambda error, shnum=shnum, peerid=peerid:
9452+                self._got_corrupt_share(error, shnum, peerid, data, lp))
9453+            #  Once we have all three of these responses, we can move on
9454+            #  to validating the signature
9455 
9456hunk ./src/allmydata/mutable/servermap.py 698
9457-            self._queries_outstanding.add(peerid)
9458-            readv = [ (o['enc_privkey'], (o['EOF'] - o['enc_privkey'])) ]
9459-            ss = self._servermap.connections[peerid]
9460-            privkey_started = time.time()
9461-            d = self._do_read(ss, peerid, self._storage_index,
9462-                              [last_shnum], readv)
9463-            d.addCallback(self._got_privkey_results, peerid, last_shnum,
9464-                          privkey_started, lp2)
9465-            d.addErrback(self._privkey_query_failed, peerid, last_shnum, lp2)
9466-            d.addErrback(log.err)
9467-            d.addCallback(self._check_for_done)
9468-            d.addErrback(self._fatal_error)
9469+            # Does the node already have a privkey? If not, we'll try to
9470+            # fetch it here.
9471+            if self._need_privkey:
9472+                d4 = reader.get_encprivkey(queue=True)
9473+                d4.addCallback(lambda results, shnum=shnum, peerid=peerid:
9474+                    self._try_to_validate_privkey(results, peerid, shnum, lp))
9475+                d4.addErrback(lambda error, shnum=shnum, peerid=peerid:
9476+                    self._privkey_query_failed(error, shnum, data, lp))
9477+            else:
9478+                d4 = defer.succeed(None)
9479+
9480+
9481+            if self.fetch_update_data:
9482+                # fetch the block hash tree and first + last segment, as
9483+                # configured earlier.
9484+                # Then set them in wherever we happen to want to set
9485+                # them.
9486+                ds = []
9487+                # XXX: We do this above, too. Is there a good way to
9488+                # make the two routines share the value without
9489+                # introducing more roundtrips?
9490+                ds.append(reader.get_verinfo())
9491+                ds.append(reader.get_blockhashes(queue=True))
9492+                ds.append(reader.get_block_and_salt(self.start_segment,
9493+                                                    queue=True))
9494+                ds.append(reader.get_block_and_salt(self.end_segment,
9495+                                                    queue=True))
9496+                d5 = deferredutil.gatherResults(ds)
9497+                d5.addCallback(self._got_update_results_one_share, shnum)
9498+            else:
9499+                d5 = defer.succeed(None)
9500 
9501hunk ./src/allmydata/mutable/servermap.py 730
9502+            dl = defer.DeferredList([d, d2, d3, d4, d5])
9503+            dl.addBoth(self._turn_barrier)
9504+            reader.flush()
9505+            dl.addCallback(lambda results, shnum=shnum, peerid=peerid:
9506+                self._got_signature_one_share(results, shnum, peerid, lp))
9507+            dl.addErrback(lambda error, shnum=shnum, data=data:
9508+               self._got_corrupt_share(error, shnum, peerid, data, lp))
9509+            dl.addCallback(lambda verinfo, shnum=shnum, peerid=peerid, data=data:
9510+                self._cache_good_sharedata(verinfo, shnum, now, data))
9511+            ds.append(dl)
9512+        # dl is a deferred list that will fire when all of the shares
9513+        # that we found on this peer are done processing. When dl fires,
9514+        # we know that processing is done, so we can decrement the
9515+        # semaphore-like thing that we incremented earlier.
9516+        dl = defer.DeferredList(ds, fireOnOneErrback=True)
9517+        # Are we done? Done means that there are no more queries to
9518+        # send, that there are no outstanding queries, and that we
9519+        # haven't received any queries that are still processing. If we
9520+        # are done, self._check_for_done will cause the done deferred
9521+        # that we returned to our caller to fire, which tells them that
9522+        # they have a complete servermap, and that we won't be touching
9523+        # the servermap anymore.
9524+        dl.addCallback(_done_processing)
9525+        dl.addCallback(self._check_for_done)
9526+        dl.addErrback(self._fatal_error)
9527         # all done!
9528         self.log("_got_results done", parent=lp, level=log.NOISY)
9529hunk ./src/allmydata/mutable/servermap.py 757
9530+        return dl
9531+
9532+
9533+    def _turn_barrier(self, result):
9534+        """
9535+        I help the servermap updater avoid the recursion limit issues
9536+        discussed in #237.
9537+        """
9538+        return fireEventually(result)
9539+
9540+
9541+    def _try_to_set_pubkey(self, pubkey_s, peerid, shnum, lp):
9542+        if self._node.get_pubkey():
9543+            return # don't go through this again if we don't have to
9544+        fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)
9545+        assert len(fingerprint) == 32
9546+        if fingerprint != self._node.get_fingerprint():
9547+            raise CorruptShareError(peerid, shnum,
9548+                                "pubkey doesn't match fingerprint")
9549+        self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s))
9550+        assert self._node.get_pubkey()
9551+
9552 
9553     def notify_server_corruption(self, peerid, shnum, reason):
9554         ss = self._servermap.connections[peerid]
9555hunk ./src/allmydata/mutable/servermap.py 785
9556         ss.callRemoteOnly("advise_corrupt_share",
9557                           "mutable", self._storage_index, shnum, reason)
9558 
9559-    def _got_results_one_share(self, shnum, data, peerid, lp):
9560+
9561+    def _got_signature_one_share(self, results, shnum, peerid, lp):
9562+        # It is our job to give versioninfo to our caller. We need to
9563+        # raise CorruptShareError if the share is corrupt for any
9564+        # reason, something that our caller will handle.
9565         self.log(format="_got_results: got shnum #%(shnum)d from peerid %(peerid)s",
9566                  shnum=shnum,
9567                  peerid=idlib.shortnodeid_b2a(peerid),
9568hunk ./src/allmydata/mutable/servermap.py 795
9569                  level=log.NOISY,
9570                  parent=lp)
9571+        if not self._running:
9572+            # We can't process the results, since we can't touch the
9573+            # servermap anymore.
9574+            self.log("but we're not running anymore.")
9575+            return None
9576 
9577hunk ./src/allmydata/mutable/servermap.py 801
9578-        # this might raise NeedMoreDataError, if the pubkey and signature
9579-        # live at some weird offset. That shouldn't happen, so I'm going to
9580-        # treat it as a bad share.
9581-        (seqnum, root_hash, IV, k, N, segsize, datalength,
9582-         pubkey_s, signature, prefix) = unpack_prefix_and_signature(data)
9583-
9584-        if not self._node.get_pubkey():
9585-            fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)
9586-            assert len(fingerprint) == 32
9587-            if fingerprint != self._node.get_fingerprint():
9588-                raise CorruptShareError(peerid, shnum,
9589-                                        "pubkey doesn't match fingerprint")
9590-            self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s))
9591-
9592-        if self._need_privkey:
9593-            self._try_to_extract_privkey(data, peerid, shnum, lp)
9594-
9595-        (ig_version, ig_seqnum, ig_root_hash, ig_IV, ig_k, ig_N,
9596-         ig_segsize, ig_datalen, offsets) = unpack_header(data)
9597+        _, verinfo, signature, __, ___ = results
9598+        (seqnum,
9599+         root_hash,
9600+         saltish,
9601+         segsize,
9602+         datalen,
9603+         k,
9604+         n,
9605+         prefix,
9606+         offsets) = verinfo[1]
9607         offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
9608 
9609hunk ./src/allmydata/mutable/servermap.py 813
9610-        verinfo = (seqnum, root_hash, IV, segsize, datalength, k, N, prefix,
9611+        # XXX: This should be done for us in the method, so
9612+        # presumably you can go in there and fix it.
9613+        verinfo = (seqnum,
9614+                   root_hash,
9615+                   saltish,
9616+                   segsize,
9617+                   datalen,
9618+                   k,
9619+                   n,
9620+                   prefix,
9621                    offsets_tuple)
9622hunk ./src/allmydata/mutable/servermap.py 824
9623+        # This tuple uniquely identifies a share on the grid; we use it
9624+        # to keep track of the ones that we've already seen.
9625 
9626         if verinfo not in self._valid_versions:
9627hunk ./src/allmydata/mutable/servermap.py 828
9628-            # it's a new pair. Verify the signature.
9629-            valid = self._node.get_pubkey().verify(prefix, signature)
9630+            # This is a new version tuple, and we need to validate it
9631+            # against the public key before keeping track of it.
9632+            assert self._node.get_pubkey()
9633+            valid = self._node.get_pubkey().verify(prefix, signature[1])
9634             if not valid:
9635hunk ./src/allmydata/mutable/servermap.py 833
9636-                raise CorruptShareError(peerid, shnum, "signature is invalid")
9637+                raise CorruptShareError(peerid, shnum,
9638+                                        "signature is invalid")
9639 
9640hunk ./src/allmydata/mutable/servermap.py 836
9641-            # ok, it's a valid verinfo. Add it to the list of validated
9642-            # versions.
9643-            self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d"
9644-                     % (seqnum, base32.b2a(root_hash)[:4],
9645-                        idlib.shortnodeid_b2a(peerid), shnum,
9646-                        k, N, segsize, datalength),
9647-                     parent=lp)
9648-            self._valid_versions.add(verinfo)
9649-        # We now know that this is a valid candidate verinfo.
9650+        # ok, it's a valid verinfo. Add it to the list of validated
9651+        # versions.
9652+        self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d"
9653+                 % (seqnum, base32.b2a(root_hash)[:4],
9654+                    idlib.shortnodeid_b2a(peerid), shnum,
9655+                    k, n, segsize, datalen),
9656+                    parent=lp)
9657+        self._valid_versions.add(verinfo)
9658+        # We now know that this is a valid candidate verinfo. Whether or
9659+        # not this instance of it is valid is a matter for the next
9660+        # statement; at this point, we just know that if we see this
9661+        # version info again, that its signature checks out and that
9662+        # we're okay to skip the signature-checking step.
9663 
9664hunk ./src/allmydata/mutable/servermap.py 850
9665+        # (peerid, shnum) are bound in the method invocation.
9666         if (peerid, shnum) in self._servermap.bad_shares:
9667             # we've been told that the rest of the data in this share is
9668             # unusable, so don't add it to the servermap.
9669hunk ./src/allmydata/mutable/servermap.py 863
9670         self._servermap.add_new_share(peerid, shnum, verinfo, timestamp)
9671         # and the versionmap
9672         self.versionmap.add(verinfo, (shnum, peerid, timestamp))
9673+
9674+        # It's our job to set the protocol version of our parent
9675+        # filenode if it isn't already set.
9676+        if not self._node.get_version():
9677+            # The first byte of the prefix is the version.
9678+            v = struct.unpack(">B", prefix[:1])[0]
9679+            self.log("got version %d" % v)
9680+            self._node.set_version(v)
9681+
9682         return verinfo
9683 
9684hunk ./src/allmydata/mutable/servermap.py 874
9685-    def _deserialize_pubkey(self, pubkey_s):
9686-        verifier = rsa.create_verifying_key_from_string(pubkey_s)
9687-        return verifier
9688 
9689hunk ./src/allmydata/mutable/servermap.py 875
9690-    def _try_to_extract_privkey(self, data, peerid, shnum, lp):
9691-        try:
9692-            r = unpack_share(data)
9693-        except NeedMoreDataError, e:
9694-            # this share won't help us. oh well.
9695-            offset = e.encprivkey_offset
9696-            length = e.encprivkey_length
9697-            self.log("shnum %d on peerid %s: share was too short (%dB) "
9698-                     "to get the encprivkey; [%d:%d] ought to hold it" %
9699-                     (shnum, idlib.shortnodeid_b2a(peerid), len(data),
9700-                      offset, offset+length),
9701-                     parent=lp)
9702-            # NOTE: if uncoordinated writes are taking place, someone might
9703-            # change the share (and most probably move the encprivkey) before
9704-            # we get a chance to do one of these reads and fetch it. This
9705-            # will cause us to see a NotEnoughSharesError(unable to fetch
9706-            # privkey) instead of an UncoordinatedWriteError . This is a
9707-            # nuisance, but it will go away when we move to DSA-based mutable
9708-            # files (since the privkey will be small enough to fit in the
9709-            # write cap).
9710+    def _got_update_results_one_share(self, results, share):
9711+        """
9712+        I record the update results in results.
9713+        """
9714+        assert len(results) == 4
9715+        verinfo, blockhashes, start, end = results
9716+        (seqnum,
9717+         root_hash,
9718+         saltish,
9719+         segsize,
9720+         datalen,
9721+         k,
9722+         n,
9723+         prefix,
9724+         offsets) = verinfo
9725+        offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] )
9726 
9727hunk ./src/allmydata/mutable/servermap.py 892
9728-            return
9729+        # XXX: This should be done for us in the method, so
9730+        # presumably you can go in there and fix it.
9731+        verinfo = (seqnum,
9732+                   root_hash,
9733+                   saltish,
9734+                   segsize,
9735+                   datalen,
9736+                   k,
9737+                   n,
9738+                   prefix,
9739+                   offsets_tuple)
9740 
9741hunk ./src/allmydata/mutable/servermap.py 904
9742-        (seqnum, root_hash, IV, k, N, segsize, datalen,
9743-         pubkey, signature, share_hash_chain, block_hash_tree,
9744-         share_data, enc_privkey) = r
9745+        update_data = (blockhashes, start, end)
9746+        self._servermap.set_update_data_for_share_and_verinfo(share,
9747+                                                              verinfo,
9748+                                                              update_data)
9749 
9750hunk ./src/allmydata/mutable/servermap.py 909
9751-        return self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp)
9752+
9753+    def _deserialize_pubkey(self, pubkey_s):
9754+        verifier = rsa.create_verifying_key_from_string(pubkey_s)
9755+        return verifier
9756 
9757hunk ./src/allmydata/mutable/servermap.py 914
9758-    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
9759 
9760hunk ./src/allmydata/mutable/servermap.py 915
9761+    def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp):
9762+        """
9763+        Given a writekey from a remote server, I validate it against the
9764+        writekey stored in my node. If it is valid, then I set the
9765+        privkey and encprivkey properties of the node.
9766+        """
9767         alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
9768         alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s)
9769         if alleged_writekey != self._node.get_writekey():
9770hunk ./src/allmydata/mutable/servermap.py 993
9771         self._queries_completed += 1
9772         self._last_failure = f
9773 
9774-    def _got_privkey_results(self, datavs, peerid, shnum, started, lp):
9775-        now = time.time()
9776-        elapsed = now - started
9777-        self._status.add_per_server_time(peerid, "privkey", started, elapsed)
9778-        self._queries_outstanding.discard(peerid)
9779-        if not self._need_privkey:
9780-            return
9781-        if shnum not in datavs:
9782-            self.log("privkey wasn't there when we asked it",
9783-                     level=log.WEIRD, umid="VA9uDQ")
9784-            return
9785-        datav = datavs[shnum]
9786-        enc_privkey = datav[0]
9787-        self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp)
9788 
9789     def _privkey_query_failed(self, f, peerid, shnum, lp):
9790         self._queries_outstanding.discard(peerid)
9791hunk ./src/allmydata/mutable/servermap.py 1007
9792         self._servermap.problems.append(f)
9793         self._last_failure = f
9794 
9795+
9796     def _check_for_done(self, res):
9797         # exit paths:
9798         #  return self._send_more_queries(outstanding) : send some more queries
9799hunk ./src/allmydata/mutable/servermap.py 1013
9800         #  return self._done() : all done
9801         #  return : keep waiting, no new queries
9802-
9803         lp = self.log(format=("_check_for_done, mode is '%(mode)s', "
9804                               "%(outstanding)d queries outstanding, "
9805                               "%(extra)d extra peers available, "
9806hunk ./src/allmydata/mutable/servermap.py 1204
9807 
9808     def _done(self):
9809         if not self._running:
9810+            self.log("not running; we're already done")
9811             return
9812         self._running = False
9813         now = time.time()
9814hunk ./src/allmydata/mutable/servermap.py 1219
9815         self._servermap.last_update_time = self._started
9816         # the servermap will not be touched after this
9817         self.log("servermap: %s" % self._servermap.summarize_versions())
9818+
9819         eventually(self._done_deferred.callback, self._servermap)
9820 
9821     def _fatal_error(self, f):
9822}
9823[tests:
9824Kevan Carstensen <kevan@isnotajoke.com>**20100819003531
9825 Ignore-this: 314e8bbcce532ea4d5d2cecc9f31cca0
9826 
9827     - A lot of existing tests relied on aspects of the mutable file
9828       implementation that were changed. This patch updates those tests
9829       to work with the changes.
9830     - This patch also adds tests for new features.
9831] {
9832hunk ./src/allmydata/test/common.py 11
9833 from foolscap.api import flushEventualQueue, fireEventually
9834 from allmydata import uri, dirnode, client
9835 from allmydata.introducer.server import IntroducerNode
9836-from allmydata.interfaces import IMutableFileNode, IImmutableFileNode, \
9837-     FileTooLargeError, NotEnoughSharesError, ICheckable
9838+from allmydata.interfaces import IMutableFileNode, IImmutableFileNode,\
9839+                                 NotEnoughSharesError, ICheckable, \
9840+                                 IMutableUploadable, SDMF_VERSION, \
9841+                                 MDMF_VERSION
9842 from allmydata.check_results import CheckResults, CheckAndRepairResults, \
9843      DeepCheckResults, DeepCheckAndRepairResults
9844 from allmydata.mutable.common import CorruptShareError
9845hunk ./src/allmydata/test/common.py 19
9846 from allmydata.mutable.layout import unpack_header
9847+from allmydata.mutable.publish import MutableData
9848 from allmydata.storage.server import storage_index_to_dir
9849 from allmydata.storage.mutable import MutableShareFile
9850 from allmydata.util import hashutil, log, fileutil, pollmixin
9851hunk ./src/allmydata/test/common.py 153
9852         consumer.write(data[start:end])
9853         return consumer
9854 
9855+
9856+    def get_best_readable_version(self):
9857+        return defer.succeed(self)
9858+
9859+
9860+    download_best_version = download_to_data
9861+
9862+
9863+    def download_to_data(self):
9864+        return download_to_data(self)
9865+
9866+
9867+    def get_size_of_best_version(self):
9868+        return defer.succeed(self.get_size)
9869+
9870+
9871 def make_chk_file_cap(size):
9872     return uri.CHKFileURI(key=os.urandom(16),
9873                           uri_extension_hash=os.urandom(32),
9874hunk ./src/allmydata/test/common.py 193
9875     MUTABLE_SIZELIMIT = 10000
9876     all_contents = {}
9877     bad_shares = {}
9878+    file_types = {} # storage index => MDMF_VERSION or SDMF_VERSION
9879 
9880     def __init__(self, storage_broker, secret_holder,
9881                  default_encoding_parameters, history):
9882hunk ./src/allmydata/test/common.py 200
9883         self.init_from_cap(make_mutable_file_cap())
9884     def create(self, contents, key_generator=None, keysize=None):
9885         initial_contents = self._get_initial_contents(contents)
9886-        if len(initial_contents) > self.MUTABLE_SIZELIMIT:
9887-            raise FileTooLargeError("SDMF is limited to one segment, and "
9888-                                    "%d > %d" % (len(initial_contents),
9889-                                                 self.MUTABLE_SIZELIMIT))
9890-        self.all_contents[self.storage_index] = initial_contents
9891+        data = initial_contents.read(initial_contents.get_size())
9892+        data = "".join(data)
9893+        self.all_contents[self.storage_index] = data
9894         return defer.succeed(self)
9895     def _get_initial_contents(self, contents):
9896hunk ./src/allmydata/test/common.py 205
9897-        if isinstance(contents, str):
9898-            return contents
9899         if contents is None:
9900hunk ./src/allmydata/test/common.py 206
9901-            return ""
9902+            return MutableData("")
9903+
9904+        if IMutableUploadable.providedBy(contents):
9905+            return contents
9906+
9907         assert callable(contents), "%s should be callable, not %s" % \
9908                (contents, type(contents))
9909         return contents(self)
9910hunk ./src/allmydata/test/common.py 258
9911     def get_storage_index(self):
9912         return self.storage_index
9913 
9914+    def get_servermap(self, mode):
9915+        return defer.succeed(None)
9916+
9917+    def set_version(self, version):
9918+        assert version in (SDMF_VERSION, MDMF_VERSION)
9919+        self.file_types[self.storage_index] = version
9920+
9921+    def get_version(self):
9922+        assert self.storage_index in self.file_types
9923+        return self.file_types[self.storage_index]
9924+
9925     def check(self, monitor, verify=False, add_lease=False):
9926         r = CheckResults(self.my_uri, self.storage_index)
9927         is_bad = self.bad_shares.get(self.storage_index, None)
9928hunk ./src/allmydata/test/common.py 327
9929         return d
9930 
9931     def download_best_version(self):
9932+        return defer.succeed(self._download_best_version())
9933+
9934+
9935+    def _download_best_version(self, ignored=None):
9936         if isinstance(self.my_uri, uri.LiteralFileURI):
9937hunk ./src/allmydata/test/common.py 332
9938-            return defer.succeed(self.my_uri.data)
9939+            return self.my_uri.data
9940         if self.storage_index not in self.all_contents:
9941hunk ./src/allmydata/test/common.py 334
9942-            return defer.fail(NotEnoughSharesError(None, 0, 3))
9943-        return defer.succeed(self.all_contents[self.storage_index])
9944+            raise NotEnoughSharesError(None, 0, 3)
9945+        return self.all_contents[self.storage_index]
9946+
9947 
9948     def overwrite(self, new_contents):
9949hunk ./src/allmydata/test/common.py 339
9950-        if len(new_contents) > self.MUTABLE_SIZELIMIT:
9951-            raise FileTooLargeError("SDMF is limited to one segment, and "
9952-                                    "%d > %d" % (len(new_contents),
9953-                                                 self.MUTABLE_SIZELIMIT))
9954         assert not self.is_readonly()
9955hunk ./src/allmydata/test/common.py 340
9956-        self.all_contents[self.storage_index] = new_contents
9957+        new_data = new_contents.read(new_contents.get_size())
9958+        new_data = "".join(new_data)
9959+        self.all_contents[self.storage_index] = new_data
9960         return defer.succeed(None)
9961     def modify(self, modifier):
9962         # this does not implement FileTooLargeError, but the real one does
9963hunk ./src/allmydata/test/common.py 350
9964     def _modify(self, modifier):
9965         assert not self.is_readonly()
9966         old_contents = self.all_contents[self.storage_index]
9967-        self.all_contents[self.storage_index] = modifier(old_contents, None, True)
9968+        new_data = modifier(old_contents, None, True)
9969+        self.all_contents[self.storage_index] = new_data
9970         return None
9971 
9972hunk ./src/allmydata/test/common.py 354
9973+    # As actually implemented, MutableFilenode and MutableFileVersion
9974+    # are distinct. However, nothing in the webapi uses (yet) that
9975+    # distinction -- it just uses the unified download interface
9976+    # provided by get_best_readable_version and read. When we start
9977+    # doing cooler things like LDMF, we will want to revise this code to
9978+    # be less simplistic.
9979+    def get_best_readable_version(self):
9980+        return defer.succeed(self)
9981+
9982+
9983+    def get_best_mutable_version(self):
9984+        return defer.succeed(self)
9985+
9986+    # Ditto for this, which is an implementation of IWritable.
9987+    # XXX: Declare that the same is implemented.
9988+    def update(self, data, offset):
9989+        assert not self.is_readonly()
9990+        def modifier(old, servermap, first_time):
9991+            new = old[:offset] + "".join(data.read(data.get_size()))
9992+            new += old[len(new):]
9993+            return new
9994+        return self.modify(modifier)
9995+
9996+
9997+    def read(self, consumer, offset=0, size=None):
9998+        data = self._download_best_version()
9999+        if size:
10000+            data = data[offset:offset+size]
10001+        consumer.write(data)
10002+        return defer.succeed(consumer)
10003+
10004+
10005 def make_mutable_file_cap():
10006     return uri.WriteableSSKFileURI(writekey=os.urandom(16),
10007                                    fingerprint=os.urandom(32))
10008hunk ./src/allmydata/test/test_checker.py 11
10009 from allmydata.test.no_network import GridTestMixin
10010 from allmydata.immutable.upload import Data
10011 from allmydata.test.common_web import WebRenderingMixin
10012+from allmydata.mutable.publish import MutableData
10013 
10014 class FakeClient:
10015     def get_storage_broker(self):
10016hunk ./src/allmydata/test/test_checker.py 291
10017         def _stash_immutable(ur):
10018             self.imm = c0.create_node_from_uri(ur.uri)
10019         d.addCallback(_stash_immutable)
10020-        d.addCallback(lambda ign: c0.create_mutable_file("contents"))
10021+        d.addCallback(lambda ign:
10022+            c0.create_mutable_file(MutableData("contents")))
10023         def _stash_mutable(node):
10024             self.mut = node
10025         d.addCallback(_stash_mutable)
10026hunk ./src/allmydata/test/test_cli.py 13
10027 from allmydata.util import fileutil, hashutil, base32
10028 from allmydata import uri
10029 from allmydata.immutable import upload
10030+from allmydata.mutable.publish import MutableData
10031 from allmydata.dirnode import normalize
10032 
10033 # Test that the scripts can be imported.
10034hunk ./src/allmydata/test/test_cli.py 662
10035 
10036         d = self.do_cli("create-alias", etudes_arg)
10037         def _check_create_unicode((rc, out, err)):
10038-            self.failUnlessReallyEqual(rc, 0)
10039+            #self.failUnlessReallyEqual(rc, 0)
10040             self.failUnlessReallyEqual(err, "")
10041             self.failUnlessIn("Alias %s created" % quote_output(u"\u00E9tudes"), out)
10042 
10043hunk ./src/allmydata/test/test_cli.py 967
10044         d.addCallback(lambda (rc,out,err): self.failUnlessReallyEqual(out, DATA2))
10045         return d
10046 
10047+    def test_mutable_type(self):
10048+        self.basedir = "cli/Put/mutable_type"
10049+        self.set_up_grid()
10050+        data = "data" * 100000
10051+        fn1 = os.path.join(self.basedir, "data")
10052+        fileutil.write(fn1, data)
10053+        d = self.do_cli("create-alias", "tahoe")
10054+        d.addCallback(lambda ignored:
10055+            self.do_cli("put", "--mutable", "--mutable-type=mdmf",
10056+                        fn1, "tahoe:uploaded.txt"))
10057+        d.addCallback(lambda ignored:
10058+            self.do_cli("ls", "--json", "tahoe:uploaded.txt"))
10059+        d.addCallback(lambda (rc, json, err): self.failUnlessIn("mdmf", json))
10060+        d.addCallback(lambda ignored:
10061+            self.do_cli("put", "--mutable", "--mutable-type=sdmf",
10062+                        fn1, "tahoe:uploaded2.txt"))
10063+        d.addCallback(lambda ignored:
10064+            self.do_cli("ls", "--json", "tahoe:uploaded2.txt"))
10065+        d.addCallback(lambda (rc, json, err):
10066+            self.failUnlessIn("sdmf", json))
10067+        return d
10068+
10069+    def test_mutable_type_unlinked(self):
10070+        self.basedir = "cli/Put/mutable_type_unlinked"
10071+        self.set_up_grid()
10072+        data = "data" * 100000
10073+        fn1 = os.path.join(self.basedir, "data")
10074+        fileutil.write(fn1, data)
10075+        d = self.do_cli("put", "--mutable", "--mutable-type=mdmf", fn1)
10076+        d.addCallback(lambda (rc, cap, err):
10077+            self.do_cli("ls", "--json", cap))
10078+        d.addCallback(lambda (rc, json, err): self.failUnlessIn("mdmf", json))
10079+        d.addCallback(lambda ignored:
10080+            self.do_cli("put", "--mutable", "--mutable-type=sdmf", fn1))
10081+        d.addCallback(lambda (rc, cap, err):
10082+            self.do_cli("ls", "--json", cap))
10083+        d.addCallback(lambda (rc, json, err):
10084+            self.failUnlessIn("sdmf", json))
10085+        return d
10086+
10087+    def test_mutable_type_invalid_format(self):
10088+        self.basedir = "cli/Put/mutable_type_invalid_format"
10089+        self.set_up_grid()
10090+        data = "data" * 100000
10091+        fn1 = os.path.join(self.basedir, "data")
10092+        fileutil.write(fn1, data)
10093+        d = self.do_cli("put", "--mutable", "--mutable-type=ldmf", fn1)
10094+        def _check_failure((rc, out, err)):
10095+            self.failIfEqual(rc, 0)
10096+            self.failUnlessIn("invalid", err)
10097+        d.addCallback(_check_failure)
10098+        return d
10099+
10100     def test_put_with_nonexistent_alias(self):
10101         # when invoked with an alias that doesn't exist, 'tahoe put'
10102         # should output a useful error message, not a stack trace
10103hunk ./src/allmydata/test/test_cli.py 2136
10104         self.set_up_grid()
10105         c0 = self.g.clients[0]
10106         DATA = "data" * 100
10107-        d = c0.create_mutable_file(DATA)
10108+        DATA_uploadable = MutableData(DATA)
10109+        d = c0.create_mutable_file(DATA_uploadable)
10110         def _stash_uri(n):
10111             self.uri = n.get_uri()
10112         d.addCallback(_stash_uri)
10113hunk ./src/allmydata/test/test_cli.py 2238
10114                                            upload.Data("literal",
10115                                                         convergence="")))
10116         d.addCallback(_stash_uri, "small")
10117-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"1"))
10118+        d.addCallback(lambda ign:
10119+            c0.create_mutable_file(MutableData(DATA+"1")))
10120         d.addCallback(lambda fn: self.rootnode.set_node(u"mutable", fn))
10121         d.addCallback(_stash_uri, "mutable")
10122 
10123hunk ./src/allmydata/test/test_cli.py 2257
10124         # root/small
10125         # root/mutable
10126 
10127+        # We haven't broken anything yet, so this should all be healthy.
10128         d.addCallback(lambda ign: self.do_cli("deep-check", "--verbose",
10129                                               self.rooturi))
10130         def _check2((rc, out, err)):
10131hunk ./src/allmydata/test/test_cli.py 2272
10132                             in lines, out)
10133         d.addCallback(_check2)
10134 
10135+        # Similarly, all of these results should be as we expect them to
10136+        # be for a healthy file layout.
10137         d.addCallback(lambda ign: self.do_cli("stats", self.rooturi))
10138         def _check_stats((rc, out, err)):
10139             self.failUnlessReallyEqual(err, "")
10140hunk ./src/allmydata/test/test_cli.py 2289
10141             self.failUnlessIn(" 317-1000 : 1    (1000 B, 1000 B)", lines)
10142         d.addCallback(_check_stats)
10143 
10144+        # Now we break things.
10145         def _clobber_shares(ignored):
10146             shares = self.find_uri_shares(self.uris[u"g\u00F6\u00F6d"])
10147             self.failUnlessReallyEqual(len(shares), 10)
10148hunk ./src/allmydata/test/test_cli.py 2314
10149 
10150         d.addCallback(lambda ign:
10151                       self.do_cli("deep-check", "--verbose", self.rooturi))
10152+        # This should reveal the missing share, but not the corrupt
10153+        # share, since we didn't tell the deep check operation to also
10154+        # verify.
10155         def _check3((rc, out, err)):
10156             self.failUnlessReallyEqual(err, "")
10157             self.failUnlessReallyEqual(rc, 0)
10158hunk ./src/allmydata/test/test_cli.py 2365
10159                                   "--verbose", "--verify", "--repair",
10160                                   self.rooturi))
10161         def _check6((rc, out, err)):
10162+            # We've just repaired the directory. There is no reason for
10163+            # that repair to be unsuccessful.
10164             self.failUnlessReallyEqual(err, "")
10165             self.failUnlessReallyEqual(rc, 0)
10166             lines = out.splitlines()
10167hunk ./src/allmydata/test/test_deepcheck.py 9
10168 from twisted.internet import threads # CLI tests use deferToThread
10169 from allmydata.immutable import upload
10170 from allmydata.mutable.common import UnrecoverableFileError
10171+from allmydata.mutable.publish import MutableData
10172 from allmydata.util import idlib
10173 from allmydata.util import base32
10174 from allmydata.scripts import runner
10175hunk ./src/allmydata/test/test_deepcheck.py 38
10176         self.basedir = "deepcheck/MutableChecker/good"
10177         self.set_up_grid()
10178         CONTENTS = "a little bit of data"
10179-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10180+        CONTENTS_uploadable = MutableData(CONTENTS)
10181+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10182         def _created(node):
10183             self.node = node
10184             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10185hunk ./src/allmydata/test/test_deepcheck.py 61
10186         self.basedir = "deepcheck/MutableChecker/corrupt"
10187         self.set_up_grid()
10188         CONTENTS = "a little bit of data"
10189-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10190+        CONTENTS_uploadable = MutableData(CONTENTS)
10191+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10192         def _stash_and_corrupt(node):
10193             self.node = node
10194             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10195hunk ./src/allmydata/test/test_deepcheck.py 99
10196         self.basedir = "deepcheck/MutableChecker/delete_share"
10197         self.set_up_grid()
10198         CONTENTS = "a little bit of data"
10199-        d = self.g.clients[0].create_mutable_file(CONTENTS)
10200+        CONTENTS_uploadable = MutableData(CONTENTS)
10201+        d = self.g.clients[0].create_mutable_file(CONTENTS_uploadable)
10202         def _stash_and_delete(node):
10203             self.node = node
10204             self.fileurl = "uri/" + urllib.quote(node.get_uri())
10205hunk ./src/allmydata/test/test_deepcheck.py 223
10206             self.root = n
10207             self.root_uri = n.get_uri()
10208         d.addCallback(_created_root)
10209-        d.addCallback(lambda ign: c0.create_mutable_file("mutable file contents"))
10210+        d.addCallback(lambda ign:
10211+            c0.create_mutable_file(MutableData("mutable file contents")))
10212         d.addCallback(lambda n: self.root.set_node(u"mutable", n))
10213         def _created_mutable(n):
10214             self.mutable = n
10215hunk ./src/allmydata/test/test_deepcheck.py 965
10216     def create_mangled(self, ignored, name):
10217         nodetype, mangletype = name.split("-", 1)
10218         if nodetype == "mutable":
10219-            d = self.g.clients[0].create_mutable_file("mutable file contents")
10220+            mutable_uploadable = MutableData("mutable file contents")
10221+            d = self.g.clients[0].create_mutable_file(mutable_uploadable)
10222             d.addCallback(lambda n: self.root.set_node(unicode(name), n))
10223         elif nodetype == "large":
10224             large = upload.Data("Lots of data\n" * 1000 + name + "\n", None)
10225hunk ./src/allmydata/test/test_dirnode.py 1304
10226     implements(IMutableFileNode)
10227     counter = 0
10228     def __init__(self, initial_contents=""):
10229-        self.data = self._get_initial_contents(initial_contents)
10230+        data = self._get_initial_contents(initial_contents)
10231+        self.data = data.read(data.get_size())
10232+        self.data = "".join(self.data)
10233+
10234         counter = FakeMutableFile.counter
10235         FakeMutableFile.counter += 1
10236         writekey = hashutil.ssk_writekey_hash(str(counter))
10237hunk ./src/allmydata/test/test_dirnode.py 1354
10238         pass
10239 
10240     def modify(self, modifier):
10241-        self.data = modifier(self.data, None, True)
10242+        data = modifier(self.data, None, True)
10243+        self.data = data
10244         return defer.succeed(None)
10245 
10246 class FakeNodeMaker(NodeMaker):
10247hunk ./src/allmydata/test/test_dirnode.py 1359
10248-    def create_mutable_file(self, contents="", keysize=None):
10249+    def create_mutable_file(self, contents="", keysize=None, version=None):
10250         return defer.succeed(FakeMutableFile(contents))
10251 
10252 class FakeClient2(Client):
10253hunk ./src/allmydata/test/test_filenode.py 98
10254         def _check_segment(res):
10255             self.failUnlessEqual(res, DATA[1:1+5])
10256         d.addCallback(_check_segment)
10257+        d.addCallback(lambda ignored: fn1.get_best_readable_version())
10258+        d.addCallback(lambda fn2: self.failUnlessEqual(fn1, fn2))
10259+        d.addCallback(lambda ignored:
10260+            fn1.get_size_of_best_version())
10261+        d.addCallback(lambda size:
10262+            self.failUnlessEqual(size, len(DATA)))
10263+        d.addCallback(lambda ignored:
10264+            fn1.download_to_data())
10265+        d.addCallback(lambda data:
10266+            self.failUnlessEqual(data, DATA))
10267+        d.addCallback(lambda ignored:
10268+            fn1.download_best_version())
10269+        d.addCallback(lambda data:
10270+            self.failUnlessEqual(data, DATA))
10271 
10272         return d
10273 
10274hunk ./src/allmydata/test/test_hung_server.py 10
10275 from allmydata.util.consumer import download_to_data
10276 from allmydata.immutable import upload
10277 from allmydata.mutable.common import UnrecoverableFileError
10278+from allmydata.mutable.publish import MutableData
10279 from allmydata.storage.common import storage_index_to_dir
10280 from allmydata.test.no_network import GridTestMixin
10281 from allmydata.test.common import ShouldFailMixin
10282hunk ./src/allmydata/test/test_hung_server.py 110
10283         self.servers = self.servers[5:] + self.servers[:5]
10284 
10285         if mutable:
10286-            d = nm.create_mutable_file(mutable_plaintext)
10287+            uploadable = MutableData(mutable_plaintext)
10288+            d = nm.create_mutable_file(uploadable)
10289             def _uploaded_mutable(node):
10290                 self.uri = node.get_uri()
10291                 self.shares = self.find_uri_shares(self.uri)
10292hunk ./src/allmydata/test/test_immutable.py 263
10293         d.addCallback(_after_attempt)
10294         return d
10295 
10296+    def test_download_to_data(self):
10297+        d = self.n.download_to_data()
10298+        d.addCallback(lambda data:
10299+            self.failUnlessEqual(data, common.TEST_DATA))
10300+        return d
10301 
10302hunk ./src/allmydata/test/test_immutable.py 269
10303+
10304+    def test_download_best_version(self):
10305+        d = self.n.download_best_version()
10306+        d.addCallback(lambda data:
10307+            self.failUnlessEqual(data, common.TEST_DATA))
10308+        return d
10309+
10310+
10311+    def test_get_best_readable_version(self):
10312+        d = self.n.get_best_readable_version()
10313+        d.addCallback(lambda n2:
10314+            self.failUnlessEqual(n2, self.n))
10315+        return d
10316+
10317+    def test_get_size_of_best_version(self):
10318+        d = self.n.get_size_of_best_version()
10319+        d.addCallback(lambda size:
10320+            self.failUnlessEqual(size, len(common.TEST_DATA)))
10321+        return d
10322+
10323+
10324 # XXX extend these tests to show bad behavior of various kinds from servers:
10325 # raising exception from each remove_foo() method, for example
10326 
10327hunk ./src/allmydata/test/test_mutable.py 2
10328 
10329-import struct
10330+import os
10331 from cStringIO import StringIO
10332 from twisted.trial import unittest
10333 from twisted.internet import defer, reactor
10334hunk ./src/allmydata/test/test_mutable.py 8
10335 from allmydata import uri, client
10336 from allmydata.nodemaker import NodeMaker
10337-from allmydata.util import base32
10338+from allmydata.util import base32, consumer
10339 from allmydata.util.hashutil import tagged_hash, ssk_writekey_hash, \
10340      ssk_pubkey_fingerprint_hash
10341hunk ./src/allmydata/test/test_mutable.py 11
10342+from allmydata.util.deferredutil import gatherResults
10343 from allmydata.interfaces import IRepairResults, ICheckAndRepairResults, \
10344hunk ./src/allmydata/test/test_mutable.py 13
10345-     NotEnoughSharesError
10346+     NotEnoughSharesError, SDMF_VERSION, MDMF_VERSION
10347 from allmydata.monitor import Monitor
10348 from allmydata.test.common import ShouldFailMixin
10349 from allmydata.test.no_network import GridTestMixin
10350hunk ./src/allmydata/test/test_mutable.py 27
10351      NeedMoreDataError, UnrecoverableFileError, UncoordinatedWriteError, \
10352      NotEnoughServersError, CorruptShareError
10353 from allmydata.mutable.retrieve import Retrieve
10354-from allmydata.mutable.publish import Publish
10355+from allmydata.mutable.publish import Publish, MutableFileHandle, \
10356+                                      MutableData, \
10357+                                      DEFAULT_MAX_SEGMENT_SIZE
10358 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
10359hunk ./src/allmydata/test/test_mutable.py 31
10360-from allmydata.mutable.layout import unpack_header, unpack_share
10361+from allmydata.mutable.layout import unpack_header, MDMFSlotReadProxy
10362 from allmydata.mutable.repairer import MustForceRepairError
10363 
10364 import allmydata.test.common_util as testutil
10365hunk ./src/allmydata/test/test_mutable.py 100
10366         self.storage = storage
10367         self.queries = 0
10368     def callRemote(self, methname, *args, **kwargs):
10369+        self.queries += 1
10370         def _call():
10371             meth = getattr(self, methname)
10372             return meth(*args, **kwargs)
10373hunk ./src/allmydata/test/test_mutable.py 107
10374         d = fireEventually()
10375         d.addCallback(lambda res: _call())
10376         return d
10377+
10378     def callRemoteOnly(self, methname, *args, **kwargs):
10379hunk ./src/allmydata/test/test_mutable.py 109
10380+        self.queries += 1
10381         d = self.callRemote(methname, *args, **kwargs)
10382         d.addBoth(lambda ignore: None)
10383         pass
10384hunk ./src/allmydata/test/test_mutable.py 157
10385             chr(ord(original[byte_offset]) ^ 0x01) +
10386             original[byte_offset+1:])
10387 
10388+def add_two(original, byte_offset):
10389+    # It isn't enough to simply flip the bit for the version number,
10390+    # because 1 is a valid version number. So we add two instead.
10391+    return (original[:byte_offset] +
10392+            chr(ord(original[byte_offset]) ^ 0x02) +
10393+            original[byte_offset+1:])
10394+
10395 def corrupt(res, s, offset, shnums_to_corrupt=None, offset_offset=0):
10396     # if shnums_to_corrupt is None, corrupt all shares. Otherwise it is a
10397     # list of shnums to corrupt.
10398hunk ./src/allmydata/test/test_mutable.py 167
10399+    ds = []
10400     for peerid in s._peers:
10401         shares = s._peers[peerid]
10402         for shnum in shares:
10403hunk ./src/allmydata/test/test_mutable.py 175
10404                 and shnum not in shnums_to_corrupt):
10405                 continue
10406             data = shares[shnum]
10407-            (version,
10408-             seqnum,
10409-             root_hash,
10410-             IV,
10411-             k, N, segsize, datalen,
10412-             o) = unpack_header(data)
10413-            if isinstance(offset, tuple):
10414-                offset1, offset2 = offset
10415-            else:
10416-                offset1 = offset
10417-                offset2 = 0
10418-            if offset1 == "pubkey":
10419-                real_offset = 107
10420-            elif offset1 in o:
10421-                real_offset = o[offset1]
10422-            else:
10423-                real_offset = offset1
10424-            real_offset = int(real_offset) + offset2 + offset_offset
10425-            assert isinstance(real_offset, int), offset
10426-            shares[shnum] = flip_bit(data, real_offset)
10427-    return res
10428+            # We're feeding the reader all of the share data, so it
10429+            # won't need to use the rref that we didn't provide, nor the
10430+            # storage index that we didn't provide. We do this because
10431+            # the reader will work for both MDMF and SDMF.
10432+            reader = MDMFSlotReadProxy(None, None, shnum, data)
10433+            # We need to get the offsets for the next part.
10434+            d = reader.get_verinfo()
10435+            def _do_corruption(verinfo, data, shnum):
10436+                (seqnum,
10437+                 root_hash,
10438+                 IV,
10439+                 segsize,
10440+                 datalen,
10441+                 k, n, prefix, o) = verinfo
10442+                if isinstance(offset, tuple):
10443+                    offset1, offset2 = offset
10444+                else:
10445+                    offset1 = offset
10446+                    offset2 = 0
10447+                if offset1 == "pubkey" and IV:
10448+                    real_offset = 107
10449+                elif offset1 == "share_data" and not IV:
10450+                    real_offset = 107
10451+                elif offset1 in o:
10452+                    real_offset = o[offset1]
10453+                else:
10454+                    real_offset = offset1
10455+                real_offset = int(real_offset) + offset2 + offset_offset
10456+                assert isinstance(real_offset, int), offset
10457+                if offset1 == 0: # verbyte
10458+                    f = add_two
10459+                else:
10460+                    f = flip_bit
10461+                shares[shnum] = f(data, real_offset)
10462+            d.addCallback(_do_corruption, data, shnum)
10463+            ds.append(d)
10464+    dl = defer.DeferredList(ds)
10465+    dl.addCallback(lambda ignored: res)
10466+    return dl
10467 
10468 def make_storagebroker(s=None, num_peers=10):
10469     if not s:
10470hunk ./src/allmydata/test/test_mutable.py 256
10471             self.failUnlessEqual(len(shnums), 1)
10472         d.addCallback(_created)
10473         return d
10474+    test_create.timeout = 15
10475+
10476+
10477+    def test_create_mdmf(self):
10478+        d = self.nodemaker.create_mutable_file(version=MDMF_VERSION)
10479+        def _created(n):
10480+            self.failUnless(isinstance(n, MutableFileNode))
10481+            self.failUnlessEqual(n.get_storage_index(), n._storage_index)
10482+            sb = self.nodemaker.storage_broker
10483+            peer0 = sorted(sb.get_all_serverids())[0]
10484+            shnums = self._storage._peers[peer0].keys()
10485+            self.failUnlessEqual(len(shnums), 1)
10486+        d.addCallback(_created)
10487+        return d
10488+
10489 
10490     def test_serialize(self):
10491         n = MutableFileNode(None, None, {"k": 3, "n": 10}, None)
10492hunk ./src/allmydata/test/test_mutable.py 301
10493             d.addCallback(lambda smap: smap.dump(StringIO()))
10494             d.addCallback(lambda sio:
10495                           self.failUnless("3-of-10" in sio.getvalue()))
10496-            d.addCallback(lambda res: n.overwrite("contents 1"))
10497+            d.addCallback(lambda res: n.overwrite(MutableData("contents 1")))
10498             d.addCallback(lambda res: self.failUnlessIdentical(res, None))
10499             d.addCallback(lambda res: n.download_best_version())
10500             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10501hunk ./src/allmydata/test/test_mutable.py 308
10502             d.addCallback(lambda res: n.get_size_of_best_version())
10503             d.addCallback(lambda size:
10504                           self.failUnlessEqual(size, len("contents 1")))
10505-            d.addCallback(lambda res: n.overwrite("contents 2"))
10506+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
10507             d.addCallback(lambda res: n.download_best_version())
10508             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10509             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
10510hunk ./src/allmydata/test/test_mutable.py 312
10511-            d.addCallback(lambda smap: n.upload("contents 3", smap))
10512+            d.addCallback(lambda smap: n.upload(MutableData("contents 3"), smap))
10513             d.addCallback(lambda res: n.download_best_version())
10514             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 3"))
10515             d.addCallback(lambda res: n.get_servermap(MODE_ANYTHING))
10516hunk ./src/allmydata/test/test_mutable.py 324
10517             # mapupdate-to-retrieve data caching (i.e. make the shares larger
10518             # than the default readsize, which is 2000 bytes). A 15kB file
10519             # will have 5kB shares.
10520-            d.addCallback(lambda res: n.overwrite("large size file" * 1000))
10521+            d.addCallback(lambda res: n.overwrite(MutableData("large size file" * 1000)))
10522             d.addCallback(lambda res: n.download_best_version())
10523             d.addCallback(lambda res:
10524                           self.failUnlessEqual(res, "large size file" * 1000))
10525hunk ./src/allmydata/test/test_mutable.py 332
10526         d.addCallback(_created)
10527         return d
10528 
10529+
10530+    def test_upload_and_download_mdmf(self):
10531+        d = self.nodemaker.create_mutable_file(version=MDMF_VERSION)
10532+        def _created(n):
10533+            d = defer.succeed(None)
10534+            d.addCallback(lambda ignored:
10535+                n.get_servermap(MODE_READ))
10536+            def _then(servermap):
10537+                dumped = servermap.dump(StringIO())
10538+                self.failUnlessIn("3-of-10", dumped.getvalue())
10539+            d.addCallback(_then)
10540+            # Now overwrite the contents with some new contents. We want
10541+            # to make them big enough to force the file to be uploaded
10542+            # in more than one segment.
10543+            big_contents = "contents1" * 100000 # about 900 KiB
10544+            big_contents_uploadable = MutableData(big_contents)
10545+            d.addCallback(lambda ignored:
10546+                n.overwrite(big_contents_uploadable))
10547+            d.addCallback(lambda ignored:
10548+                n.download_best_version())
10549+            d.addCallback(lambda data:
10550+                self.failUnlessEqual(data, big_contents))
10551+            # Overwrite the contents again with some new contents. As
10552+            # before, they need to be big enough to force multiple
10553+            # segments, so that we make the downloader deal with
10554+            # multiple segments.
10555+            bigger_contents = "contents2" * 1000000 # about 9MiB
10556+            bigger_contents_uploadable = MutableData(bigger_contents)
10557+            d.addCallback(lambda ignored:
10558+                n.overwrite(bigger_contents_uploadable))
10559+            d.addCallback(lambda ignored:
10560+                n.download_best_version())
10561+            d.addCallback(lambda data:
10562+                self.failUnlessEqual(data, bigger_contents))
10563+            return d
10564+        d.addCallback(_created)
10565+        return d
10566+
10567+
10568+    def test_mdmf_write_count(self):
10569+        # Publishing an MDMF file should only cause one write for each
10570+        # share that is to be published. Otherwise, we introduce
10571+        # undesirable semantics that are a regression from SDMF
10572+        upload = MutableData("MDMF" * 100000) # about 400 KiB
10573+        d = self.nodemaker.create_mutable_file(upload,
10574+                                               version=MDMF_VERSION)
10575+        def _check_server_write_counts(ignored):
10576+            sb = self.nodemaker.storage_broker
10577+            peers = sb.test_servers.values()
10578+            for peer in peers:
10579+                self.failUnlessEqual(peer.queries, 1)
10580+        d.addCallback(_check_server_write_counts)
10581+        return d
10582+
10583+
10584     def test_create_with_initial_contents(self):
10585hunk ./src/allmydata/test/test_mutable.py 388
10586-        d = self.nodemaker.create_mutable_file("contents 1")
10587+        upload1 = MutableData("contents 1")
10588+        d = self.nodemaker.create_mutable_file(upload1)
10589         def _created(n):
10590             d = n.download_best_version()
10591             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10592hunk ./src/allmydata/test/test_mutable.py 393
10593-            d.addCallback(lambda res: n.overwrite("contents 2"))
10594+            upload2 = MutableData("contents 2")
10595+            d.addCallback(lambda res: n.overwrite(upload2))
10596             d.addCallback(lambda res: n.download_best_version())
10597             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10598             return d
10599hunk ./src/allmydata/test/test_mutable.py 400
10600         d.addCallback(_created)
10601         return d
10602+    test_create_with_initial_contents.timeout = 15
10603+
10604+
10605+    def test_create_mdmf_with_initial_contents(self):
10606+        initial_contents = "foobarbaz" * 131072 # 900KiB
10607+        initial_contents_uploadable = MutableData(initial_contents)
10608+        d = self.nodemaker.create_mutable_file(initial_contents_uploadable,
10609+                                               version=MDMF_VERSION)
10610+        def _created(n):
10611+            d = n.download_best_version()
10612+            d.addCallback(lambda data:
10613+                self.failUnlessEqual(data, initial_contents))
10614+            uploadable2 = MutableData(initial_contents + "foobarbaz")
10615+            d.addCallback(lambda ignored:
10616+                n.overwrite(uploadable2))
10617+            d.addCallback(lambda ignored:
10618+                n.download_best_version())
10619+            d.addCallback(lambda data:
10620+                self.failUnlessEqual(data, initial_contents +
10621+                                           "foobarbaz"))
10622+            return d
10623+        d.addCallback(_created)
10624+        return d
10625+    test_create_mdmf_with_initial_contents.timeout = 20
10626+
10627 
10628     def test_response_cache_memory_leak(self):
10629         d = self.nodemaker.create_mutable_file("contents")
10630hunk ./src/allmydata/test/test_mutable.py 451
10631             key = n.get_writekey()
10632             self.failUnless(isinstance(key, str), key)
10633             self.failUnlessEqual(len(key), 16) # AES key size
10634-            return data
10635+            return MutableData(data)
10636         d = self.nodemaker.create_mutable_file(_make_contents)
10637         def _created(n):
10638             return n.download_best_version()
10639hunk ./src/allmydata/test/test_mutable.py 459
10640         d.addCallback(lambda data2: self.failUnlessEqual(data2, data))
10641         return d
10642 
10643+
10644+    def test_create_mdmf_with_initial_contents_function(self):
10645+        data = "initial contents" * 100000
10646+        def _make_contents(n):
10647+            self.failUnless(isinstance(n, MutableFileNode))
10648+            key = n.get_writekey()
10649+            self.failUnless(isinstance(key, str), key)
10650+            self.failUnlessEqual(len(key), 16)
10651+            return MutableData(data)
10652+        d = self.nodemaker.create_mutable_file(_make_contents,
10653+                                               version=MDMF_VERSION)
10654+        d.addCallback(lambda n:
10655+            n.download_best_version())
10656+        d.addCallback(lambda data2:
10657+            self.failUnlessEqual(data2, data))
10658+        return d
10659+
10660+
10661     def test_create_with_too_large_contents(self):
10662         BIG = "a" * (self.OLD_MAX_SEGMENT_SIZE + 1)
10663hunk ./src/allmydata/test/test_mutable.py 479
10664-        d = self.nodemaker.create_mutable_file(BIG)
10665+        BIG_uploadable = MutableData(BIG)
10666+        d = self.nodemaker.create_mutable_file(BIG_uploadable)
10667         def _created(n):
10668hunk ./src/allmydata/test/test_mutable.py 482
10669-            d = n.overwrite(BIG)
10670+            other_BIG_uploadable = MutableData(BIG)
10671+            d = n.overwrite(other_BIG_uploadable)
10672             return d
10673         d.addCallback(_created)
10674         return d
10675hunk ./src/allmydata/test/test_mutable.py 497
10676 
10677     def test_modify(self):
10678         def _modifier(old_contents, servermap, first_time):
10679-            return old_contents + "line2"
10680+            new_contents = old_contents + "line2"
10681+            return new_contents
10682         def _non_modifier(old_contents, servermap, first_time):
10683             return old_contents
10684         def _none_modifier(old_contents, servermap, first_time):
10685hunk ./src/allmydata/test/test_mutable.py 506
10686         def _error_modifier(old_contents, servermap, first_time):
10687             raise ValueError("oops")
10688         def _toobig_modifier(old_contents, servermap, first_time):
10689-            return "b" * (self.OLD_MAX_SEGMENT_SIZE+1)
10690+            new_content = "b" * (self.OLD_MAX_SEGMENT_SIZE + 1)
10691+            return new_content
10692         calls = []
10693         def _ucw_error_modifier(old_contents, servermap, first_time):
10694             # simulate an UncoordinatedWriteError once
10695hunk ./src/allmydata/test/test_mutable.py 514
10696             calls.append(1)
10697             if len(calls) <= 1:
10698                 raise UncoordinatedWriteError("simulated")
10699-            return old_contents + "line3"
10700+            new_contents = old_contents + "line3"
10701+            return new_contents
10702         def _ucw_error_non_modifier(old_contents, servermap, first_time):
10703             # simulate an UncoordinatedWriteError once, and don't actually
10704             # modify the contents on subsequent invocations
10705hunk ./src/allmydata/test/test_mutable.py 524
10706                 raise UncoordinatedWriteError("simulated")
10707             return old_contents
10708 
10709-        d = self.nodemaker.create_mutable_file("line1")
10710+        initial_contents = "line1"
10711+        d = self.nodemaker.create_mutable_file(MutableData(initial_contents))
10712         def _created(n):
10713             d = n.modify(_modifier)
10714             d.addCallback(lambda res: n.download_best_version())
10715hunk ./src/allmydata/test/test_mutable.py 582
10716             return d
10717         d.addCallback(_created)
10718         return d
10719+    test_modify.timeout = 15
10720+
10721 
10722     def test_modify_backoffer(self):
10723         def _modifier(old_contents, servermap, first_time):
10724hunk ./src/allmydata/test/test_mutable.py 609
10725         giveuper._delay = 0.1
10726         giveuper.factor = 1
10727 
10728-        d = self.nodemaker.create_mutable_file("line1")
10729+        d = self.nodemaker.create_mutable_file(MutableData("line1"))
10730         def _created(n):
10731             d = n.modify(_modifier)
10732             d.addCallback(lambda res: n.download_best_version())
10733hunk ./src/allmydata/test/test_mutable.py 659
10734             d.addCallback(lambda smap: smap.dump(StringIO()))
10735             d.addCallback(lambda sio:
10736                           self.failUnless("3-of-10" in sio.getvalue()))
10737-            d.addCallback(lambda res: n.overwrite("contents 1"))
10738+            d.addCallback(lambda res: n.overwrite(MutableData("contents 1")))
10739             d.addCallback(lambda res: self.failUnlessIdentical(res, None))
10740             d.addCallback(lambda res: n.download_best_version())
10741             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
10742hunk ./src/allmydata/test/test_mutable.py 663
10743-            d.addCallback(lambda res: n.overwrite("contents 2"))
10744+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
10745             d.addCallback(lambda res: n.download_best_version())
10746             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
10747             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
10748hunk ./src/allmydata/test/test_mutable.py 667
10749-            d.addCallback(lambda smap: n.upload("contents 3", smap))
10750+            d.addCallback(lambda smap: n.upload(MutableData("contents 3"), smap))
10751             d.addCallback(lambda res: n.download_best_version())
10752             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 3"))
10753             d.addCallback(lambda res: n.get_servermap(MODE_ANYTHING))
10754hunk ./src/allmydata/test/test_mutable.py 680
10755         return d
10756 
10757 
10758-class MakeShares(unittest.TestCase):
10759-    def test_encrypt(self):
10760-        nm = make_nodemaker()
10761-        CONTENTS = "some initial contents"
10762-        d = nm.create_mutable_file(CONTENTS)
10763-        def _created(fn):
10764-            p = Publish(fn, nm.storage_broker, None)
10765-            p.salt = "SALT" * 4
10766-            p.readkey = "\x00" * 16
10767-            p.newdata = CONTENTS
10768-            p.required_shares = 3
10769-            p.total_shares = 10
10770-            p.setup_encoding_parameters()
10771-            return p._encrypt_and_encode()
10772+    def test_size_after_servermap_update(self):
10773+        # a mutable file node should have something to say about how big
10774+        # it is after a servermap update is performed, since this tells
10775+        # us how large the best version of that mutable file is.
10776+        d = self.nodemaker.create_mutable_file()
10777+        def _created(n):
10778+            self.n = n
10779+            return n.get_servermap(MODE_READ)
10780+        d.addCallback(_created)
10781+        d.addCallback(lambda ignored:
10782+            self.failUnlessEqual(self.n.get_size(), 0))
10783+        d.addCallback(lambda ignored:
10784+            self.n.overwrite(MutableData("foobarbaz")))
10785+        d.addCallback(lambda ignored:
10786+            self.failUnlessEqual(self.n.get_size(), 9))
10787+        d.addCallback(lambda ignored:
10788+            self.nodemaker.create_mutable_file(MutableData("foobarbaz")))
10789+        d.addCallback(_created)
10790+        d.addCallback(lambda ignored:
10791+            self.failUnlessEqual(self.n.get_size(), 9))
10792+        return d
10793+
10794+
10795+class PublishMixin:
10796+    def publish_one(self):
10797+        # publish a file and create shares, which can then be manipulated
10798+        # later.
10799+        self.CONTENTS = "New contents go here" * 1000
10800+        self.uploadable = MutableData(self.CONTENTS)
10801+        self._storage = FakeStorage()
10802+        self._nodemaker = make_nodemaker(self._storage)
10803+        self._storage_broker = self._nodemaker.storage_broker
10804+        d = self._nodemaker.create_mutable_file(self.uploadable)
10805+        def _created(node):
10806+            self._fn = node
10807+            self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10808         d.addCallback(_created)
10809hunk ./src/allmydata/test/test_mutable.py 717
10810-        def _done(shares_and_shareids):
10811-            (shares, share_ids) = shares_and_shareids
10812-            self.failUnlessEqual(len(shares), 10)
10813-            for sh in shares:
10814-                self.failUnless(isinstance(sh, str))
10815-                self.failUnlessEqual(len(sh), 7)
10816-            self.failUnlessEqual(len(share_ids), 10)
10817-        d.addCallback(_done)
10818         return d
10819 
10820hunk ./src/allmydata/test/test_mutable.py 719
10821-    def test_generate(self):
10822-        nm = make_nodemaker()
10823-        CONTENTS = "some initial contents"
10824-        d = nm.create_mutable_file(CONTENTS)
10825-        def _created(fn):
10826-            self._fn = fn
10827-            p = Publish(fn, nm.storage_broker, None)
10828-            self._p = p
10829-            p.newdata = CONTENTS
10830-            p.required_shares = 3
10831-            p.total_shares = 10
10832-            p.setup_encoding_parameters()
10833-            p._new_seqnum = 3
10834-            p.salt = "SALT" * 4
10835-            # make some fake shares
10836-            shares_and_ids = ( ["%07d" % i for i in range(10)], range(10) )
10837-            p._privkey = fn.get_privkey()
10838-            p._encprivkey = fn.get_encprivkey()
10839-            p._pubkey = fn.get_pubkey()
10840-            return p._generate_shares(shares_and_ids)
10841+    def publish_mdmf(self):
10842+        # like publish_one, except that the result is guaranteed to be
10843+        # an MDMF file.
10844+        # self.CONTENTS should have more than one segment.
10845+        self.CONTENTS = "This is an MDMF file" * 100000
10846+        self.uploadable = MutableData(self.CONTENTS)
10847+        self._storage = FakeStorage()
10848+        self._nodemaker = make_nodemaker(self._storage)
10849+        self._storage_broker = self._nodemaker.storage_broker
10850+        d = self._nodemaker.create_mutable_file(self.uploadable, version=MDMF_VERSION)
10851+        def _created(node):
10852+            self._fn = node
10853+            self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10854         d.addCallback(_created)
10855hunk ./src/allmydata/test/test_mutable.py 733
10856-        def _generated(res):
10857-            p = self._p
10858-            final_shares = p.shares
10859-            root_hash = p.root_hash
10860-            self.failUnlessEqual(len(root_hash), 32)
10861-            self.failUnless(isinstance(final_shares, dict))
10862-            self.failUnlessEqual(len(final_shares), 10)
10863-            self.failUnlessEqual(sorted(final_shares.keys()), range(10))
10864-            for i,sh in final_shares.items():
10865-                self.failUnless(isinstance(sh, str))
10866-                # feed the share through the unpacker as a sanity-check
10867-                pieces = unpack_share(sh)
10868-                (u_seqnum, u_root_hash, IV, k, N, segsize, datalen,
10869-                 pubkey, signature, share_hash_chain, block_hash_tree,
10870-                 share_data, enc_privkey) = pieces
10871-                self.failUnlessEqual(u_seqnum, 3)
10872-                self.failUnlessEqual(u_root_hash, root_hash)
10873-                self.failUnlessEqual(k, 3)
10874-                self.failUnlessEqual(N, 10)
10875-                self.failUnlessEqual(segsize, 21)
10876-                self.failUnlessEqual(datalen, len(CONTENTS))
10877-                self.failUnlessEqual(pubkey, p._pubkey.serialize())
10878-                sig_material = struct.pack(">BQ32s16s BBQQ",
10879-                                           0, p._new_seqnum, root_hash, IV,
10880-                                           k, N, segsize, datalen)
10881-                self.failUnless(p._pubkey.verify(sig_material, signature))
10882-                #self.failUnlessEqual(signature, p._privkey.sign(sig_material))
10883-                self.failUnless(isinstance(share_hash_chain, dict))
10884-                self.failUnlessEqual(len(share_hash_chain), 4) # ln2(10)++
10885-                for shnum,share_hash in share_hash_chain.items():
10886-                    self.failUnless(isinstance(shnum, int))
10887-                    self.failUnless(isinstance(share_hash, str))
10888-                    self.failUnlessEqual(len(share_hash), 32)
10889-                self.failUnless(isinstance(block_hash_tree, list))
10890-                self.failUnlessEqual(len(block_hash_tree), 1) # very small tree
10891-                self.failUnlessEqual(IV, "SALT"*4)
10892-                self.failUnlessEqual(len(share_data), len("%07d" % 1))
10893-                self.failUnlessEqual(enc_privkey, self._fn.get_encprivkey())
10894-        d.addCallback(_generated)
10895         return d
10896 
10897hunk ./src/allmydata/test/test_mutable.py 735
10898-    # TODO: when we publish to 20 peers, we should get one share per peer on 10
10899-    # when we publish to 3 peers, we should get either 3 or 4 shares per peer
10900-    # when we publish to zero peers, we should get a NotEnoughSharesError
10901 
10902hunk ./src/allmydata/test/test_mutable.py 736
10903-class PublishMixin:
10904-    def publish_one(self):
10905-        # publish a file and create shares, which can then be manipulated
10906-        # later.
10907-        self.CONTENTS = "New contents go here" * 1000
10908+    def publish_sdmf(self):
10909+        # like publish_one, except that the result is guaranteed to be
10910+        # an SDMF file
10911+        self.CONTENTS = "This is an SDMF file" * 1000
10912+        self.uploadable = MutableData(self.CONTENTS)
10913         self._storage = FakeStorage()
10914         self._nodemaker = make_nodemaker(self._storage)
10915         self._storage_broker = self._nodemaker.storage_broker
10916hunk ./src/allmydata/test/test_mutable.py 744
10917-        d = self._nodemaker.create_mutable_file(self.CONTENTS)
10918+        d = self._nodemaker.create_mutable_file(self.uploadable, version=SDMF_VERSION)
10919         def _created(node):
10920             self._fn = node
10921             self._fn2 = self._nodemaker.create_from_cap(node.get_uri())
10922hunk ./src/allmydata/test/test_mutable.py 751
10923         d.addCallback(_created)
10924         return d
10925 
10926-    def publish_multiple(self):
10927+
10928+    def publish_multiple(self, version=0):
10929         self.CONTENTS = ["Contents 0",
10930                          "Contents 1",
10931                          "Contents 2",
10932hunk ./src/allmydata/test/test_mutable.py 758
10933                          "Contents 3a",
10934                          "Contents 3b"]
10935+        self.uploadables = [MutableData(d) for d in self.CONTENTS]
10936         self._copied_shares = {}
10937         self._storage = FakeStorage()
10938         self._nodemaker = make_nodemaker(self._storage)
10939hunk ./src/allmydata/test/test_mutable.py 762
10940-        d = self._nodemaker.create_mutable_file(self.CONTENTS[0]) # seqnum=1
10941+        d = self._nodemaker.create_mutable_file(self.uploadables[0], version=version) # seqnum=1
10942         def _created(node):
10943             self._fn = node
10944             # now create multiple versions of the same file, and accumulate
10945hunk ./src/allmydata/test/test_mutable.py 769
10946             # their shares, so we can mix and match them later.
10947             d = defer.succeed(None)
10948             d.addCallback(self._copy_shares, 0)
10949-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[1])) #s2
10950+            d.addCallback(lambda res: node.overwrite(self.uploadables[1])) #s2
10951             d.addCallback(self._copy_shares, 1)
10952hunk ./src/allmydata/test/test_mutable.py 771
10953-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[2])) #s3
10954+            d.addCallback(lambda res: node.overwrite(self.uploadables[2])) #s3
10955             d.addCallback(self._copy_shares, 2)
10956hunk ./src/allmydata/test/test_mutable.py 773
10957-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[3])) #s4a
10958+            d.addCallback(lambda res: node.overwrite(self.uploadables[3])) #s4a
10959             d.addCallback(self._copy_shares, 3)
10960             # now we replace all the shares with version s3, and upload a new
10961             # version to get s4b.
10962hunk ./src/allmydata/test/test_mutable.py 779
10963             rollback = dict([(i,2) for i in range(10)])
10964             d.addCallback(lambda res: self._set_versions(rollback))
10965-            d.addCallback(lambda res: node.overwrite(self.CONTENTS[4])) #s4b
10966+            d.addCallback(lambda res: node.overwrite(self.uploadables[4])) #s4b
10967             d.addCallback(self._copy_shares, 4)
10968             # we leave the storage in state 4
10969             return d
10970hunk ./src/allmydata/test/test_mutable.py 786
10971         d.addCallback(_created)
10972         return d
10973 
10974+
10975     def _copy_shares(self, ignored, index):
10976         shares = self._storage._peers
10977         # we need a deep copy
10978hunk ./src/allmydata/test/test_mutable.py 810
10979                     shares[peerid][shnum] = oldshares[index][peerid][shnum]
10980 
10981 
10982+
10983+
10984 class Servermap(unittest.TestCase, PublishMixin):
10985     def setUp(self):
10986         return self.publish_one()
10987hunk ./src/allmydata/test/test_mutable.py 816
10988 
10989-    def make_servermap(self, mode=MODE_CHECK, fn=None, sb=None):
10990+    def make_servermap(self, mode=MODE_CHECK, fn=None, sb=None,
10991+                       update_range=None):
10992         if fn is None:
10993             fn = self._fn
10994         if sb is None:
10995hunk ./src/allmydata/test/test_mutable.py 823
10996             sb = self._storage_broker
10997         smu = ServermapUpdater(fn, sb, Monitor(),
10998-                               ServerMap(), mode)
10999+                               ServerMap(), mode, update_range=update_range)
11000         d = smu.update()
11001         return d
11002 
11003hunk ./src/allmydata/test/test_mutable.py 889
11004         # create a new file, which is large enough to knock the privkey out
11005         # of the early part of the file
11006         LARGE = "These are Larger contents" * 200 # about 5KB
11007-        d.addCallback(lambda res: self._nodemaker.create_mutable_file(LARGE))
11008+        LARGE_uploadable = MutableData(LARGE)
11009+        d.addCallback(lambda res: self._nodemaker.create_mutable_file(LARGE_uploadable))
11010         def _created(large_fn):
11011             large_fn2 = self._nodemaker.create_from_cap(large_fn.get_uri())
11012             return self.make_servermap(MODE_WRITE, large_fn2)
11013hunk ./src/allmydata/test/test_mutable.py 898
11014         d.addCallback(lambda sm: self.failUnlessOneRecoverable(sm, 10))
11015         return d
11016 
11017+
11018     def test_mark_bad(self):
11019         d = defer.succeed(None)
11020         ms = self.make_servermap
11021hunk ./src/allmydata/test/test_mutable.py 944
11022         self._storage._peers = {} # delete all shares
11023         ms = self.make_servermap
11024         d = defer.succeed(None)
11025-
11026+#
11027         d.addCallback(lambda res: ms(mode=MODE_CHECK))
11028         d.addCallback(lambda sm: self.failUnlessNoneRecoverable(sm))
11029 
11030hunk ./src/allmydata/test/test_mutable.py 996
11031         return d
11032 
11033 
11034+    def test_servermapupdater_finds_mdmf_files(self):
11035+        # setUp already published an MDMF file for us. We just need to
11036+        # make sure that when we run the ServermapUpdater, the file is
11037+        # reported to have one recoverable version.
11038+        d = defer.succeed(None)
11039+        d.addCallback(lambda ignored:
11040+            self.publish_mdmf())
11041+        d.addCallback(lambda ignored:
11042+            self.make_servermap(mode=MODE_CHECK))
11043+        # Calling make_servermap also updates the servermap in the mode
11044+        # that we specify, so we just need to see what it says.
11045+        def _check_servermap(sm):
11046+            self.failUnlessEqual(len(sm.recoverable_versions()), 1)
11047+        d.addCallback(_check_servermap)
11048+        return d
11049+
11050+
11051+    def test_fetch_update(self):
11052+        d = defer.succeed(None)
11053+        d.addCallback(lambda ignored:
11054+            self.publish_mdmf())
11055+        d.addCallback(lambda ignored:
11056+            self.make_servermap(mode=MODE_WRITE, update_range=(1, 2)))
11057+        def _check_servermap(sm):
11058+            # 10 shares
11059+            self.failUnlessEqual(len(sm.update_data), 10)
11060+            # one version
11061+            for data in sm.update_data.itervalues():
11062+                self.failUnlessEqual(len(data), 1)
11063+        d.addCallback(_check_servermap)
11064+        return d
11065+
11066+
11067+    def test_servermapupdater_finds_sdmf_files(self):
11068+        d = defer.succeed(None)
11069+        d.addCallback(lambda ignored:
11070+            self.publish_sdmf())
11071+        d.addCallback(lambda ignored:
11072+            self.make_servermap(mode=MODE_CHECK))
11073+        d.addCallback(lambda servermap:
11074+            self.failUnlessEqual(len(servermap.recoverable_versions()), 1))
11075+        return d
11076+
11077 
11078 class Roundtrip(unittest.TestCase, testutil.ShouldFailMixin, PublishMixin):
11079     def setUp(self):
11080hunk ./src/allmydata/test/test_mutable.py 1079
11081         if version is None:
11082             version = servermap.best_recoverable_version()
11083         r = Retrieve(self._fn, servermap, version)
11084-        return r.download()
11085+        c = consumer.MemoryConsumer()
11086+        d = r.download(consumer=c)
11087+        d.addCallback(lambda mc: "".join(mc.chunks))
11088+        return d
11089+
11090 
11091     def test_basic(self):
11092         d = self.make_servermap()
11093hunk ./src/allmydata/test/test_mutable.py 1160
11094         return d
11095     test_no_servers_download.timeout = 15
11096 
11097+
11098     def _test_corrupt_all(self, offset, substring,
11099hunk ./src/allmydata/test/test_mutable.py 1162
11100-                          should_succeed=False, corrupt_early=True,
11101-                          failure_checker=None):
11102+                          should_succeed=False,
11103+                          corrupt_early=True,
11104+                          failure_checker=None,
11105+                          fetch_privkey=False):
11106         d = defer.succeed(None)
11107         if corrupt_early:
11108             d.addCallback(corrupt, self._storage, offset)
11109hunk ./src/allmydata/test/test_mutable.py 1182
11110                     self.failUnlessIn(substring, "".join(allproblems))
11111                 return servermap
11112             if should_succeed:
11113-                d1 = self._fn.download_version(servermap, ver)
11114+                d1 = self._fn.download_version(servermap, ver,
11115+                                               fetch_privkey)
11116                 d1.addCallback(lambda new_contents:
11117                                self.failUnlessEqual(new_contents, self.CONTENTS))
11118             else:
11119hunk ./src/allmydata/test/test_mutable.py 1190
11120                 d1 = self.shouldFail(NotEnoughSharesError,
11121                                      "_corrupt_all(offset=%s)" % (offset,),
11122                                      substring,
11123-                                     self._fn.download_version, servermap, ver)
11124+                                     self._fn.download_version, servermap,
11125+                                                                ver,
11126+                                                                fetch_privkey)
11127             if failure_checker:
11128                 d1.addCallback(failure_checker)
11129             d1.addCallback(lambda res: servermap)
11130hunk ./src/allmydata/test/test_mutable.py 1201
11131         return d
11132 
11133     def test_corrupt_all_verbyte(self):
11134-        # when the version byte is not 0, we hit an UnknownVersionError error
11135-        # in unpack_share().
11136+        # when the version byte is not 0 or 1, we hit an UnknownVersionError
11137+        # error in unpack_share().
11138         d = self._test_corrupt_all(0, "UnknownVersionError")
11139         def _check_servermap(servermap):
11140             # and the dump should mention the problems
11141hunk ./src/allmydata/test/test_mutable.py 1208
11142             s = StringIO()
11143             dump = servermap.dump(s).getvalue()
11144-            self.failUnless("10 PROBLEMS" in dump, dump)
11145+            self.failUnless("30 PROBLEMS" in dump, dump)
11146         d.addCallback(_check_servermap)
11147         return d
11148 
11149hunk ./src/allmydata/test/test_mutable.py 1278
11150         return self._test_corrupt_all("enc_privkey", None, should_succeed=True)
11151 
11152 
11153+    def test_corrupt_all_encprivkey_late(self):
11154+        # this should work for the same reason as above, but we corrupt
11155+        # after the servermap update to exercise the error handling
11156+        # code.
11157+        # We need to remove the privkey from the node, or the retrieve
11158+        # process won't know to update it.
11159+        self._fn._privkey = None
11160+        return self._test_corrupt_all("enc_privkey",
11161+                                      None, # this shouldn't fail
11162+                                      should_succeed=True,
11163+                                      corrupt_early=False,
11164+                                      fetch_privkey=True)
11165+
11166+
11167     def test_corrupt_all_seqnum_late(self):
11168         # corrupting the seqnum between mapupdate and retrieve should result
11169         # in NotEnoughSharesError, since each share will look invalid
11170hunk ./src/allmydata/test/test_mutable.py 1298
11171         def _check(res):
11172             f = res[0]
11173             self.failUnless(f.check(NotEnoughSharesError))
11174-            self.failUnless("someone wrote to the data since we read the servermap" in str(f))
11175+            self.failUnless("uncoordinated write" in str(f))
11176         return self._test_corrupt_all(1, "ran out of peers",
11177                                       corrupt_early=False,
11178                                       failure_checker=_check)
11179hunk ./src/allmydata/test/test_mutable.py 1342
11180                             in str(servermap.problems[0]))
11181             ver = servermap.best_recoverable_version()
11182             r = Retrieve(self._fn, servermap, ver)
11183-            return r.download()
11184+            c = consumer.MemoryConsumer()
11185+            return r.download(c)
11186         d.addCallback(_do_retrieve)
11187hunk ./src/allmydata/test/test_mutable.py 1345
11188+        d.addCallback(lambda mc: "".join(mc.chunks))
11189         d.addCallback(lambda new_contents:
11190                       self.failUnlessEqual(new_contents, self.CONTENTS))
11191         return d
11192hunk ./src/allmydata/test/test_mutable.py 1350
11193 
11194-    def test_corrupt_some(self):
11195-        # corrupt the data of first five shares (so the servermap thinks
11196-        # they're good but retrieve marks them as bad), so that the
11197-        # MODE_READ set of 6 will be insufficient, forcing node.download to
11198-        # retry with more servers.
11199-        corrupt(None, self._storage, "share_data", range(5))
11200-        d = self.make_servermap()
11201+
11202+    def _test_corrupt_some(self, offset, mdmf=False):
11203+        if mdmf:
11204+            d = self.publish_mdmf()
11205+        else:
11206+            d = defer.succeed(None)
11207+        d.addCallback(lambda ignored:
11208+            corrupt(None, self._storage, offset, range(5)))
11209+        d.addCallback(lambda ignored:
11210+            self.make_servermap())
11211         def _do_retrieve(servermap):
11212             ver = servermap.best_recoverable_version()
11213             self.failUnless(ver)
11214hunk ./src/allmydata/test/test_mutable.py 1366
11215             return self._fn.download_best_version()
11216         d.addCallback(_do_retrieve)
11217         d.addCallback(lambda new_contents:
11218-                      self.failUnlessEqual(new_contents, self.CONTENTS))
11219+            self.failUnlessEqual(new_contents, self.CONTENTS))
11220         return d
11221 
11222hunk ./src/allmydata/test/test_mutable.py 1369
11223+
11224+    def test_corrupt_some(self):
11225+        # corrupt the data of first five shares (so the servermap thinks
11226+        # they're good but retrieve marks them as bad), so that the
11227+        # MODE_READ set of 6 will be insufficient, forcing node.download to
11228+        # retry with more servers.
11229+        return self._test_corrupt_some("share_data")
11230+
11231+
11232     def test_download_fails(self):
11233hunk ./src/allmydata/test/test_mutable.py 1379
11234-        corrupt(None, self._storage, "signature")
11235-        d = self.shouldFail(UnrecoverableFileError, "test_download_anyway",
11236+        d = corrupt(None, self._storage, "signature")
11237+        d.addCallback(lambda ignored:
11238+            self.shouldFail(UnrecoverableFileError, "test_download_anyway",
11239                             "no recoverable versions",
11240hunk ./src/allmydata/test/test_mutable.py 1383
11241-                            self._fn.download_best_version)
11242+                            self._fn.download_best_version))
11243         return d
11244 
11245 
11246hunk ./src/allmydata/test/test_mutable.py 1387
11247+
11248+    def test_corrupt_mdmf_block_hash_tree(self):
11249+        d = self.publish_mdmf()
11250+        d.addCallback(lambda ignored:
11251+            self._test_corrupt_all(("block_hash_tree", 12 * 32),
11252+                                   "block hash tree failure",
11253+                                   corrupt_early=False,
11254+                                   should_succeed=False))
11255+        return d
11256+
11257+
11258+    def test_corrupt_mdmf_block_hash_tree_late(self):
11259+        d = self.publish_mdmf()
11260+        d.addCallback(lambda ignored:
11261+            self._test_corrupt_all(("block_hash_tree", 12 * 32),
11262+                                   "block hash tree failure",
11263+                                   corrupt_early=True,
11264+                                   should_succeed=False))
11265+        return d
11266+
11267+
11268+    def test_corrupt_mdmf_share_data(self):
11269+        d = self.publish_mdmf()
11270+        d.addCallback(lambda ignored:
11271+            # TODO: Find out what the block size is and corrupt a
11272+            # specific block, rather than just guessing.
11273+            self._test_corrupt_all(("share_data", 12 * 40),
11274+                                    "block hash tree failure",
11275+                                    corrupt_early=True,
11276+                                    should_succeed=False))
11277+        return d
11278+
11279+
11280+    def test_corrupt_some_mdmf(self):
11281+        return self._test_corrupt_some(("share_data", 12 * 40),
11282+                                       mdmf=True)
11283+
11284+
11285 class CheckerMixin:
11286     def check_good(self, r, where):
11287         self.failUnless(r.is_healthy(), where)
11288hunk ./src/allmydata/test/test_mutable.py 1455
11289         d.addCallback(self.check_good, "test_check_good")
11290         return d
11291 
11292+    def test_check_mdmf_good(self):
11293+        d = self.publish_mdmf()
11294+        d.addCallback(lambda ignored:
11295+            self._fn.check(Monitor()))
11296+        d.addCallback(self.check_good, "test_check_mdmf_good")
11297+        return d
11298+
11299     def test_check_no_shares(self):
11300         for shares in self._storage._peers.values():
11301             shares.clear()
11302hunk ./src/allmydata/test/test_mutable.py 1469
11303         d.addCallback(self.check_bad, "test_check_no_shares")
11304         return d
11305 
11306+    def test_check_mdmf_no_shares(self):
11307+        d = self.publish_mdmf()
11308+        def _then(ignored):
11309+            for share in self._storage._peers.values():
11310+                share.clear()
11311+        d.addCallback(_then)
11312+        d.addCallback(lambda ignored:
11313+            self._fn.check(Monitor()))
11314+        d.addCallback(self.check_bad, "test_check_mdmf_no_shares")
11315+        return d
11316+
11317     def test_check_not_enough_shares(self):
11318         for shares in self._storage._peers.values():
11319             for shnum in shares.keys():
11320hunk ./src/allmydata/test/test_mutable.py 1489
11321         d.addCallback(self.check_bad, "test_check_not_enough_shares")
11322         return d
11323 
11324+    def test_check_mdmf_not_enough_shares(self):
11325+        d = self.publish_mdmf()
11326+        def _then(ignored):
11327+            for shares in self._storage._peers.values():
11328+                for shnum in shares.keys():
11329+                    if shnum > 0:
11330+                        del shares[shnum]
11331+        d.addCallback(_then)
11332+        d.addCallback(lambda ignored:
11333+            self._fn.check(Monitor()))
11334+        d.addCallback(self.check_bad, "test_check_mdmf_not_enougH_shares")
11335+        return d
11336+
11337+
11338     def test_check_all_bad_sig(self):
11339hunk ./src/allmydata/test/test_mutable.py 1504
11340-        corrupt(None, self._storage, 1) # bad sig
11341-        d = self._fn.check(Monitor())
11342+        d = corrupt(None, self._storage, 1) # bad sig
11343+        d.addCallback(lambda ignored:
11344+            self._fn.check(Monitor()))
11345         d.addCallback(self.check_bad, "test_check_all_bad_sig")
11346         return d
11347 
11348hunk ./src/allmydata/test/test_mutable.py 1510
11349+    def test_check_mdmf_all_bad_sig(self):
11350+        d = self.publish_mdmf()
11351+        d.addCallback(lambda ignored:
11352+            corrupt(None, self._storage, 1))
11353+        d.addCallback(lambda ignored:
11354+            self._fn.check(Monitor()))
11355+        d.addCallback(self.check_bad, "test_check_mdmf_all_bad_sig")
11356+        return d
11357+
11358     def test_check_all_bad_blocks(self):
11359hunk ./src/allmydata/test/test_mutable.py 1520
11360-        corrupt(None, self._storage, "share_data", [9]) # bad blocks
11361+        d = corrupt(None, self._storage, "share_data", [9]) # bad blocks
11362         # the Checker won't notice this.. it doesn't look at actual data
11363hunk ./src/allmydata/test/test_mutable.py 1522
11364-        d = self._fn.check(Monitor())
11365+        d.addCallback(lambda ignored:
11366+            self._fn.check(Monitor()))
11367         d.addCallback(self.check_good, "test_check_all_bad_blocks")
11368         return d
11369 
11370hunk ./src/allmydata/test/test_mutable.py 1527
11371+
11372+    def test_check_mdmf_all_bad_blocks(self):
11373+        d = self.publish_mdmf()
11374+        d.addCallback(lambda ignored:
11375+            corrupt(None, self._storage, "share_data"))
11376+        d.addCallback(lambda ignored:
11377+            self._fn.check(Monitor()))
11378+        d.addCallback(self.check_good, "test_check_mdmf_all_bad_blocks")
11379+        return d
11380+
11381     def test_verify_good(self):
11382         d = self._fn.check(Monitor(), verify=True)
11383         d.addCallback(self.check_good, "test_verify_good")
11384hunk ./src/allmydata/test/test_mutable.py 1541
11385         return d
11386+    test_verify_good.timeout = 15
11387 
11388     def test_verify_all_bad_sig(self):
11389hunk ./src/allmydata/test/test_mutable.py 1544
11390-        corrupt(None, self._storage, 1) # bad sig
11391-        d = self._fn.check(Monitor(), verify=True)
11392+        d = corrupt(None, self._storage, 1) # bad sig
11393+        d.addCallback(lambda ignored:
11394+            self._fn.check(Monitor(), verify=True))
11395         d.addCallback(self.check_bad, "test_verify_all_bad_sig")
11396         return d
11397 
11398hunk ./src/allmydata/test/test_mutable.py 1551
11399     def test_verify_one_bad_sig(self):
11400-        corrupt(None, self._storage, 1, [9]) # bad sig
11401-        d = self._fn.check(Monitor(), verify=True)
11402+        d = corrupt(None, self._storage, 1, [9]) # bad sig
11403+        d.addCallback(lambda ignored:
11404+            self._fn.check(Monitor(), verify=True))
11405         d.addCallback(self.check_bad, "test_verify_one_bad_sig")
11406         return d
11407 
11408hunk ./src/allmydata/test/test_mutable.py 1558
11409     def test_verify_one_bad_block(self):
11410-        corrupt(None, self._storage, "share_data", [9]) # bad blocks
11411+        d = corrupt(None, self._storage, "share_data", [9]) # bad blocks
11412         # the Verifier *will* notice this, since it examines every byte
11413hunk ./src/allmydata/test/test_mutable.py 1560
11414-        d = self._fn.check(Monitor(), verify=True)
11415+        d.addCallback(lambda ignored:
11416+            self._fn.check(Monitor(), verify=True))
11417         d.addCallback(self.check_bad, "test_verify_one_bad_block")
11418         d.addCallback(self.check_expected_failure,
11419                       CorruptShareError, "block hash tree failure",
11420hunk ./src/allmydata/test/test_mutable.py 1569
11421         return d
11422 
11423     def test_verify_one_bad_sharehash(self):
11424-        corrupt(None, self._storage, "share_hash_chain", [9], 5)
11425-        d = self._fn.check(Monitor(), verify=True)
11426+        d = corrupt(None, self._storage, "share_hash_chain", [9], 5)
11427+        d.addCallback(lambda ignored:
11428+            self._fn.check(Monitor(), verify=True))
11429         d.addCallback(self.check_bad, "test_verify_one_bad_sharehash")
11430         d.addCallback(self.check_expected_failure,
11431                       CorruptShareError, "corrupt hashes",
11432hunk ./src/allmydata/test/test_mutable.py 1579
11433         return d
11434 
11435     def test_verify_one_bad_encprivkey(self):
11436-        corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11437-        d = self._fn.check(Monitor(), verify=True)
11438+        d = corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11439+        d.addCallback(lambda ignored:
11440+            self._fn.check(Monitor(), verify=True))
11441         d.addCallback(self.check_bad, "test_verify_one_bad_encprivkey")
11442         d.addCallback(self.check_expected_failure,
11443                       CorruptShareError, "invalid privkey",
11444hunk ./src/allmydata/test/test_mutable.py 1589
11445         return d
11446 
11447     def test_verify_one_bad_encprivkey_uncheckable(self):
11448-        corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11449+        d = corrupt(None, self._storage, "enc_privkey", [9]) # bad privkey
11450         readonly_fn = self._fn.get_readonly()
11451         # a read-only node has no way to validate the privkey
11452hunk ./src/allmydata/test/test_mutable.py 1592
11453-        d = readonly_fn.check(Monitor(), verify=True)
11454+        d.addCallback(lambda ignored:
11455+            readonly_fn.check(Monitor(), verify=True))
11456         d.addCallback(self.check_good,
11457                       "test_verify_one_bad_encprivkey_uncheckable")
11458         return d
11459hunk ./src/allmydata/test/test_mutable.py 1598
11460 
11461+
11462+    def test_verify_mdmf_good(self):
11463+        d = self.publish_mdmf()
11464+        d.addCallback(lambda ignored:
11465+            self._fn.check(Monitor(), verify=True))
11466+        d.addCallback(self.check_good, "test_verify_mdmf_good")
11467+        return d
11468+
11469+
11470+    def test_verify_mdmf_one_bad_block(self):
11471+        d = self.publish_mdmf()
11472+        d.addCallback(lambda ignored:
11473+            corrupt(None, self._storage, "share_data", [1]))
11474+        d.addCallback(lambda ignored:
11475+            self._fn.check(Monitor(), verify=True))
11476+        # We should find one bad block here
11477+        d.addCallback(self.check_bad, "test_verify_mdmf_one_bad_block")
11478+        d.addCallback(self.check_expected_failure,
11479+                      CorruptShareError, "block hash tree failure",
11480+                      "test_verify_mdmf_one_bad_block")
11481+        return d
11482+
11483+
11484+    def test_verify_mdmf_bad_encprivkey(self):
11485+        d = self.publish_mdmf()
11486+        d.addCallback(lambda ignored:
11487+            corrupt(None, self._storage, "enc_privkey", [1]))
11488+        d.addCallback(lambda ignored:
11489+            self._fn.check(Monitor(), verify=True))
11490+        d.addCallback(self.check_bad, "test_verify_mdmf_bad_encprivkey")
11491+        d.addCallback(self.check_expected_failure,
11492+                      CorruptShareError, "privkey",
11493+                      "test_verify_mdmf_bad_encprivkey")
11494+        return d
11495+
11496+
11497+    def test_verify_mdmf_bad_sig(self):
11498+        d = self.publish_mdmf()
11499+        d.addCallback(lambda ignored:
11500+            corrupt(None, self._storage, 1, [1]))
11501+        d.addCallback(lambda ignored:
11502+            self._fn.check(Monitor(), verify=True))
11503+        d.addCallback(self.check_bad, "test_verify_mdmf_bad_sig")
11504+        return d
11505+
11506+
11507+    def test_verify_mdmf_bad_encprivkey_uncheckable(self):
11508+        d = self.publish_mdmf()
11509+        d.addCallback(lambda ignored:
11510+            corrupt(None, self._storage, "enc_privkey", [1]))
11511+        d.addCallback(lambda ignored:
11512+            self._fn.get_readonly())
11513+        d.addCallback(lambda fn:
11514+            fn.check(Monitor(), verify=True))
11515+        d.addCallback(self.check_good,
11516+                      "test_verify_mdmf_bad_encprivkey_uncheckable")
11517+        return d
11518+
11519+
11520 class Repair(unittest.TestCase, PublishMixin, ShouldFailMixin):
11521 
11522     def get_shares(self, s):
11523hunk ./src/allmydata/test/test_mutable.py 1722
11524         current_shares = self.old_shares[-1]
11525         self.failUnlessEqual(old_shares, current_shares)
11526 
11527+
11528     def test_unrepairable_0shares(self):
11529         d = self.publish_one()
11530         def _delete_all_shares(ign):
11531hunk ./src/allmydata/test/test_mutable.py 1737
11532         d.addCallback(_check)
11533         return d
11534 
11535+    def test_mdmf_unrepairable_0shares(self):
11536+        d = self.publish_mdmf()
11537+        def _delete_all_shares(ign):
11538+            shares = self._storage._peers
11539+            for peerid in shares:
11540+                shares[peerid] = {}
11541+        d.addCallback(_delete_all_shares)
11542+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11543+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11544+        d.addCallback(lambda crr: self.failIf(crr.get_successful()))
11545+        return d
11546+
11547+
11548     def test_unrepairable_1share(self):
11549         d = self.publish_one()
11550         def _delete_all_shares(ign):
11551hunk ./src/allmydata/test/test_mutable.py 1766
11552         d.addCallback(_check)
11553         return d
11554 
11555+    def test_mdmf_unrepairable_1share(self):
11556+        d = self.publish_mdmf()
11557+        def _delete_all_shares(ign):
11558+            shares = self._storage._peers
11559+            for peerid in shares:
11560+                for shnum in list(shares[peerid]):
11561+                    if shnum > 0:
11562+                        del shares[peerid][shnum]
11563+        d.addCallback(_delete_all_shares)
11564+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11565+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11566+        def _check(crr):
11567+            self.failUnlessEqual(crr.get_successful(), False)
11568+        d.addCallback(_check)
11569+        return d
11570+
11571+    def test_repairable_5shares(self):
11572+        d = self.publish_mdmf()
11573+        def _delete_all_shares(ign):
11574+            shares = self._storage._peers
11575+            for peerid in shares:
11576+                for shnum in list(shares[peerid]):
11577+                    if shnum > 4:
11578+                        del shares[peerid][shnum]
11579+        d.addCallback(_delete_all_shares)
11580+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11581+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11582+        def _check(crr):
11583+            self.failUnlessEqual(crr.get_successful(), True)
11584+        d.addCallback(_check)
11585+        return d
11586+
11587+    def test_mdmf_repairable_5shares(self):
11588+        d = self.publish_mdmf()
11589+        def _delete_some_shares(ign):
11590+            shares = self._storage._peers
11591+            for peerid in shares:
11592+                for shnum in list(shares[peerid]):
11593+                    if shnum > 5:
11594+                        del shares[peerid][shnum]
11595+        d.addCallback(_delete_some_shares)
11596+        d.addCallback(lambda ign: self._fn.check(Monitor()))
11597+        def _check(cr):
11598+            self.failIf(cr.is_healthy())
11599+            self.failUnless(cr.is_recoverable())
11600+            return cr
11601+        d.addCallback(_check)
11602+        d.addCallback(lambda check_results: self._fn.repair(check_results))
11603+        def _check1(crr):
11604+            self.failUnlessEqual(crr.get_successful(), True)
11605+        d.addCallback(_check1)
11606+        return d
11607+
11608+
11609     def test_merge(self):
11610         self.old_shares = []
11611         d = self.publish_multiple()
11612hunk ./src/allmydata/test/test_mutable.py 1934
11613 class MultipleEncodings(unittest.TestCase):
11614     def setUp(self):
11615         self.CONTENTS = "New contents go here"
11616+        self.uploadable = MutableData(self.CONTENTS)
11617         self._storage = FakeStorage()
11618         self._nodemaker = make_nodemaker(self._storage, num_peers=20)
11619         self._storage_broker = self._nodemaker.storage_broker
11620hunk ./src/allmydata/test/test_mutable.py 1938
11621-        d = self._nodemaker.create_mutable_file(self.CONTENTS)
11622+        d = self._nodemaker.create_mutable_file(self.uploadable)
11623         def _created(node):
11624             self._fn = node
11625         d.addCallback(_created)
11626hunk ./src/allmydata/test/test_mutable.py 1944
11627         return d
11628 
11629-    def _encode(self, k, n, data):
11630+    def _encode(self, k, n, data, version=SDMF_VERSION):
11631         # encode 'data' into a peerid->shares dict.
11632 
11633         fn = self._fn
11634hunk ./src/allmydata/test/test_mutable.py 1960
11635         # and set the encoding parameters to something completely different
11636         fn2._required_shares = k
11637         fn2._total_shares = n
11638+        # Normally a servermap update would occur before a publish.
11639+        # Here, it doesn't, so we have to do it ourselves.
11640+        fn2.set_version(version)
11641 
11642         s = self._storage
11643         s._peers = {} # clear existing storage
11644hunk ./src/allmydata/test/test_mutable.py 1967
11645         p2 = Publish(fn2, self._storage_broker, None)
11646-        d = p2.publish(data)
11647+        uploadable = MutableData(data)
11648+        d = p2.publish(uploadable)
11649         def _published(res):
11650             shares = s._peers
11651             s._peers = {}
11652hunk ./src/allmydata/test/test_mutable.py 2235
11653         self.basedir = "mutable/Problems/test_publish_surprise"
11654         self.set_up_grid()
11655         nm = self.g.clients[0].nodemaker
11656-        d = nm.create_mutable_file("contents 1")
11657+        d = nm.create_mutable_file(MutableData("contents 1"))
11658         def _created(n):
11659             d = defer.succeed(None)
11660             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
11661hunk ./src/allmydata/test/test_mutable.py 2245
11662             d.addCallback(_got_smap1)
11663             # then modify the file, leaving the old map untouched
11664             d.addCallback(lambda res: log.msg("starting winning write"))
11665-            d.addCallback(lambda res: n.overwrite("contents 2"))
11666+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11667             # now attempt to modify the file with the old servermap. This
11668             # will look just like an uncoordinated write, in which every
11669             # single share got updated between our mapupdate and our publish
11670hunk ./src/allmydata/test/test_mutable.py 2254
11671                           self.shouldFail(UncoordinatedWriteError,
11672                                           "test_publish_surprise", None,
11673                                           n.upload,
11674-                                          "contents 2a", self.old_map))
11675+                                          MutableData("contents 2a"), self.old_map))
11676             return d
11677         d.addCallback(_created)
11678         return d
11679hunk ./src/allmydata/test/test_mutable.py 2263
11680         self.basedir = "mutable/Problems/test_retrieve_surprise"
11681         self.set_up_grid()
11682         nm = self.g.clients[0].nodemaker
11683-        d = nm.create_mutable_file("contents 1")
11684+        d = nm.create_mutable_file(MutableData("contents 1"))
11685         def _created(n):
11686             d = defer.succeed(None)
11687             d.addCallback(lambda res: n.get_servermap(MODE_READ))
11688hunk ./src/allmydata/test/test_mutable.py 2273
11689             d.addCallback(_got_smap1)
11690             # then modify the file, leaving the old map untouched
11691             d.addCallback(lambda res: log.msg("starting winning write"))
11692-            d.addCallback(lambda res: n.overwrite("contents 2"))
11693+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11694             # now attempt to retrieve the old version with the old servermap.
11695             # This will look like someone has changed the file since we
11696             # updated the servermap.
11697hunk ./src/allmydata/test/test_mutable.py 2282
11698             d.addCallback(lambda res:
11699                           self.shouldFail(NotEnoughSharesError,
11700                                           "test_retrieve_surprise",
11701-                                          "ran out of peers: have 0 shares (k=3)",
11702+                                          "ran out of peers: have 0 of 1",
11703                                           n.download_version,
11704                                           self.old_map,
11705                                           self.old_map.best_recoverable_version(),
11706hunk ./src/allmydata/test/test_mutable.py 2291
11707         d.addCallback(_created)
11708         return d
11709 
11710+
11711     def test_unexpected_shares(self):
11712         # upload the file, take a servermap, shut down one of the servers,
11713         # upload it again (causing shares to appear on a new server), then
11714hunk ./src/allmydata/test/test_mutable.py 2301
11715         self.basedir = "mutable/Problems/test_unexpected_shares"
11716         self.set_up_grid()
11717         nm = self.g.clients[0].nodemaker
11718-        d = nm.create_mutable_file("contents 1")
11719+        d = nm.create_mutable_file(MutableData("contents 1"))
11720         def _created(n):
11721             d = defer.succeed(None)
11722             d.addCallback(lambda res: n.get_servermap(MODE_WRITE))
11723hunk ./src/allmydata/test/test_mutable.py 2313
11724                 self.g.remove_server(peer0)
11725                 # then modify the file, leaving the old map untouched
11726                 log.msg("starting winning write")
11727-                return n.overwrite("contents 2")
11728+                return n.overwrite(MutableData("contents 2"))
11729             d.addCallback(_got_smap1)
11730             # now attempt to modify the file with the old servermap. This
11731             # will look just like an uncoordinated write, in which every
11732hunk ./src/allmydata/test/test_mutable.py 2323
11733                           self.shouldFail(UncoordinatedWriteError,
11734                                           "test_surprise", None,
11735                                           n.upload,
11736-                                          "contents 2a", self.old_map))
11737+                                          MutableData("contents 2a"), self.old_map))
11738             return d
11739         d.addCallback(_created)
11740         return d
11741hunk ./src/allmydata/test/test_mutable.py 2327
11742+    test_unexpected_shares.timeout = 15
11743 
11744     def test_bad_server(self):
11745         # Break one server, then create the file: the initial publish should
11746hunk ./src/allmydata/test/test_mutable.py 2361
11747         d.addCallback(_break_peer0)
11748         # now "create" the file, using the pre-established key, and let the
11749         # initial publish finally happen
11750-        d.addCallback(lambda res: nm.create_mutable_file("contents 1"))
11751+        d.addCallback(lambda res: nm.create_mutable_file(MutableData("contents 1")))
11752         # that ought to work
11753         def _got_node(n):
11754             d = n.download_best_version()
11755hunk ./src/allmydata/test/test_mutable.py 2370
11756             def _break_peer1(res):
11757                 self.g.break_server(self.server1.get_serverid())
11758             d.addCallback(_break_peer1)
11759-            d.addCallback(lambda res: n.overwrite("contents 2"))
11760+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11761             # that ought to work too
11762             d.addCallback(lambda res: n.download_best_version())
11763             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
11764hunk ./src/allmydata/test/test_mutable.py 2402
11765         peerids = [s.get_serverid() for s in sb.get_connected_servers()]
11766         self.g.break_server(peerids[0])
11767 
11768-        d = nm.create_mutable_file("contents 1")
11769+        d = nm.create_mutable_file(MutableData("contents 1"))
11770         def _created(n):
11771             d = n.download_best_version()
11772             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 1"))
11773hunk ./src/allmydata/test/test_mutable.py 2410
11774             def _break_second_server(res):
11775                 self.g.break_server(peerids[1])
11776             d.addCallback(_break_second_server)
11777-            d.addCallback(lambda res: n.overwrite("contents 2"))
11778+            d.addCallback(lambda res: n.overwrite(MutableData("contents 2")))
11779             # that ought to work too
11780             d.addCallback(lambda res: n.download_best_version())
11781             d.addCallback(lambda res: self.failUnlessEqual(res, "contents 2"))
11782hunk ./src/allmydata/test/test_mutable.py 2429
11783         d = self.shouldFail(NotEnoughServersError,
11784                             "test_publish_all_servers_bad",
11785                             "Ran out of non-bad servers",
11786-                            nm.create_mutable_file, "contents")
11787+                            nm.create_mutable_file, MutableData("contents"))
11788         return d
11789 
11790     def test_publish_no_servers(self):
11791hunk ./src/allmydata/test/test_mutable.py 2441
11792         d = self.shouldFail(NotEnoughServersError,
11793                             "test_publish_no_servers",
11794                             "Ran out of non-bad servers",
11795-                            nm.create_mutable_file, "contents")
11796+                            nm.create_mutable_file, MutableData("contents"))
11797         return d
11798     test_publish_no_servers.timeout = 30
11799 
11800hunk ./src/allmydata/test/test_mutable.py 2459
11801         # we need some contents that are large enough to push the privkey out
11802         # of the early part of the file
11803         LARGE = "These are Larger contents" * 2000 # about 50KB
11804-        d = nm.create_mutable_file(LARGE)
11805+        LARGE_uploadable = MutableData(LARGE)
11806+        d = nm.create_mutable_file(LARGE_uploadable)
11807         def _created(n):
11808             self.uri = n.get_uri()
11809             self.n2 = nm.create_from_cap(self.uri)
11810hunk ./src/allmydata/test/test_mutable.py 2495
11811         self.basedir = "mutable/Problems/test_privkey_query_missing"
11812         self.set_up_grid(num_servers=20)
11813         nm = self.g.clients[0].nodemaker
11814-        LARGE = "These are Larger contents" * 2000 # about 50KB
11815+        LARGE = "These are Larger contents" * 2000 # about 50KiB
11816+        LARGE_uploadable = MutableData(LARGE)
11817         nm._node_cache = DevNullDictionary() # disable the nodecache
11818 
11819hunk ./src/allmydata/test/test_mutable.py 2499
11820-        d = nm.create_mutable_file(LARGE)
11821+        d = nm.create_mutable_file(LARGE_uploadable)
11822         def _created(n):
11823             self.uri = n.get_uri()
11824             self.n2 = nm.create_from_cap(self.uri)
11825hunk ./src/allmydata/test/test_mutable.py 2509
11826         d.addCallback(_created)
11827         d.addCallback(lambda res: self.n2.get_servermap(MODE_WRITE))
11828         return d
11829+
11830+
11831+    def test_block_and_hash_query_error(self):
11832+        # This tests for what happens when a query to a remote server
11833+        # fails in either the hash validation step or the block getting
11834+        # step (because of batching, this is the same actual query).
11835+        # We need to have the storage server persist up until the point
11836+        # that its prefix is validated, then suddenly die. This
11837+        # exercises some exception handling code in Retrieve.
11838+        self.basedir = "mutable/Problems/test_block_and_hash_query_error"
11839+        self.set_up_grid(num_servers=20)
11840+        nm = self.g.clients[0].nodemaker
11841+        CONTENTS = "contents" * 2000
11842+        CONTENTS_uploadable = MutableData(CONTENTS)
11843+        d = nm.create_mutable_file(CONTENTS_uploadable)
11844+        def _created(node):
11845+            self._node = node
11846+        d.addCallback(_created)
11847+        d.addCallback(lambda ignored:
11848+            self._node.get_servermap(MODE_READ))
11849+        def _then(servermap):
11850+            # we have our servermap. Now we set up the servers like the
11851+            # tests above -- the first one that gets a read call should
11852+            # start throwing errors, but only after returning its prefix
11853+            # for validation. Since we'll download without fetching the
11854+            # private key, the next query to the remote server will be
11855+            # for either a block and salt or for hashes, either of which
11856+            # will exercise the error handling code.
11857+            killer = FirstServerGetsKilled()
11858+            for (serverid, ss) in nm.storage_broker.get_all_servers():
11859+                ss.post_call_notifier = killer.notify
11860+            ver = servermap.best_recoverable_version()
11861+            assert ver
11862+            return self._node.download_version(servermap, ver)
11863+        d.addCallback(_then)
11864+        d.addCallback(lambda data:
11865+            self.failUnlessEqual(data, CONTENTS))
11866+        return d
11867+
11868+
11869+class FileHandle(unittest.TestCase):
11870+    def setUp(self):
11871+        self.test_data = "Test Data" * 50000
11872+        self.sio = StringIO(self.test_data)
11873+        self.uploadable = MutableFileHandle(self.sio)
11874+
11875+
11876+    def test_filehandle_read(self):
11877+        self.basedir = "mutable/FileHandle/test_filehandle_read"
11878+        chunk_size = 10
11879+        for i in xrange(0, len(self.test_data), chunk_size):
11880+            data = self.uploadable.read(chunk_size)
11881+            data = "".join(data)
11882+            start = i
11883+            end = i + chunk_size
11884+            self.failUnlessEqual(data, self.test_data[start:end])
11885+
11886+
11887+    def test_filehandle_get_size(self):
11888+        self.basedir = "mutable/FileHandle/test_filehandle_get_size"
11889+        actual_size = len(self.test_data)
11890+        size = self.uploadable.get_size()
11891+        self.failUnlessEqual(size, actual_size)
11892+
11893+
11894+    def test_filehandle_get_size_out_of_order(self):
11895+        # We should be able to call get_size whenever we want without
11896+        # disturbing the location of the seek pointer.
11897+        chunk_size = 100
11898+        data = self.uploadable.read(chunk_size)
11899+        self.failUnlessEqual("".join(data), self.test_data[:chunk_size])
11900+
11901+        # Now get the size.
11902+        size = self.uploadable.get_size()
11903+        self.failUnlessEqual(size, len(self.test_data))
11904+
11905+        # Now get more data. We should be right where we left off.
11906+        more_data = self.uploadable.read(chunk_size)
11907+        start = chunk_size
11908+        end = chunk_size * 2
11909+        self.failUnlessEqual("".join(more_data), self.test_data[start:end])
11910+
11911+
11912+    def test_filehandle_file(self):
11913+        # Make sure that the MutableFileHandle works on a file as well
11914+        # as a StringIO object, since in some cases it will be asked to
11915+        # deal with files.
11916+        self.basedir = self.mktemp()
11917+        # necessary? What am I doing wrong here?
11918+        os.mkdir(self.basedir)
11919+        f_path = os.path.join(self.basedir, "test_file")
11920+        f = open(f_path, "w")
11921+        f.write(self.test_data)
11922+        f.close()
11923+        f = open(f_path, "r")
11924+
11925+        uploadable = MutableFileHandle(f)
11926+
11927+        data = uploadable.read(len(self.test_data))
11928+        self.failUnlessEqual("".join(data), self.test_data)
11929+        size = uploadable.get_size()
11930+        self.failUnlessEqual(size, len(self.test_data))
11931+
11932+
11933+    def test_close(self):
11934+        # Make sure that the MutableFileHandle closes its handle when
11935+        # told to do so.
11936+        self.uploadable.close()
11937+        self.failUnless(self.sio.closed)
11938+
11939+
11940+class DataHandle(unittest.TestCase):
11941+    def setUp(self):
11942+        self.test_data = "Test Data" * 50000
11943+        self.uploadable = MutableData(self.test_data)
11944+
11945+
11946+    def test_datahandle_read(self):
11947+        chunk_size = 10
11948+        for i in xrange(0, len(self.test_data), chunk_size):
11949+            data = self.uploadable.read(chunk_size)
11950+            data = "".join(data)
11951+            start = i
11952+            end = i + chunk_size
11953+            self.failUnlessEqual(data, self.test_data[start:end])
11954+
11955+
11956+    def test_datahandle_get_size(self):
11957+        actual_size = len(self.test_data)
11958+        size = self.uploadable.get_size()
11959+        self.failUnlessEqual(size, actual_size)
11960+
11961+
11962+    def test_datahandle_get_size_out_of_order(self):
11963+        # We should be able to call get_size whenever we want without
11964+        # disturbing the location of the seek pointer.
11965+        chunk_size = 100
11966+        data = self.uploadable.read(chunk_size)
11967+        self.failUnlessEqual("".join(data), self.test_data[:chunk_size])
11968+
11969+        # Now get the size.
11970+        size = self.uploadable.get_size()
11971+        self.failUnlessEqual(size, len(self.test_data))
11972+
11973+        # Now get more data. We should be right where we left off.
11974+        more_data = self.uploadable.read(chunk_size)
11975+        start = chunk_size
11976+        end = chunk_size * 2
11977+        self.failUnlessEqual("".join(more_data), self.test_data[start:end])
11978+
11979+
11980+class Version(GridTestMixin, unittest.TestCase, testutil.ShouldFailMixin, \
11981+              PublishMixin):
11982+    def setUp(self):
11983+        GridTestMixin.setUp(self)
11984+        self.basedir = self.mktemp()
11985+        self.set_up_grid()
11986+        self.c = self.g.clients[0]
11987+        self.nm = self.c.nodemaker
11988+        self.data = "test data" * 100000 # about 900 KiB; MDMF
11989+        self.small_data = "test data" * 10 # about 90 B; SDMF
11990+        return self.do_upload()
11991+
11992+
11993+    def do_upload(self):
11994+        d1 = self.nm.create_mutable_file(MutableData(self.data),
11995+                                         version=MDMF_VERSION)
11996+        d2 = self.nm.create_mutable_file(MutableData(self.small_data))
11997+        dl = gatherResults([d1, d2])
11998+        def _then((n1, n2)):
11999+            assert isinstance(n1, MutableFileNode)
12000+            assert isinstance(n2, MutableFileNode)
12001+
12002+            self.mdmf_node = n1
12003+            self.sdmf_node = n2
12004+        dl.addCallback(_then)
12005+        return dl
12006+
12007+
12008+    def test_get_readonly_mutable_version(self):
12009+        # Attempting to get a mutable version of a mutable file from a
12010+        # filenode initialized with a readcap should return a readonly
12011+        # version of that same node.
12012+        ro = self.mdmf_node.get_readonly()
12013+        d = ro.get_best_mutable_version()
12014+        d.addCallback(lambda version:
12015+            self.failUnless(version.is_readonly()))
12016+        d.addCallback(lambda ignored:
12017+            self.sdmf_node.get_readonly())
12018+        d.addCallback(lambda version:
12019+            self.failUnless(version.is_readonly()))
12020+        return d
12021+
12022+
12023+    def test_get_sequence_number(self):
12024+        d = self.mdmf_node.get_best_readable_version()
12025+        d.addCallback(lambda bv:
12026+            self.failUnlessEqual(bv.get_sequence_number(), 1))
12027+        d.addCallback(lambda ignored:
12028+            self.sdmf_node.get_best_readable_version())
12029+        d.addCallback(lambda bv:
12030+            self.failUnlessEqual(bv.get_sequence_number(), 1))
12031+        # Now update. The sequence number in both cases should be 1 in
12032+        # both cases.
12033+        def _do_update(ignored):
12034+            new_data = MutableData("foo bar baz" * 100000)
12035+            new_small_data = MutableData("foo bar baz" * 10)
12036+            d1 = self.mdmf_node.overwrite(new_data)
12037+            d2 = self.sdmf_node.overwrite(new_small_data)
12038+            dl = gatherResults([d1, d2])
12039+            return dl
12040+        d.addCallback(_do_update)
12041+        d.addCallback(lambda ignored:
12042+            self.mdmf_node.get_best_readable_version())
12043+        d.addCallback(lambda bv:
12044+            self.failUnlessEqual(bv.get_sequence_number(), 2))
12045+        d.addCallback(lambda ignored:
12046+            self.sdmf_node.get_best_readable_version())
12047+        d.addCallback(lambda bv:
12048+            self.failUnlessEqual(bv.get_sequence_number(), 2))
12049+        return d
12050+
12051+
12052+    def test_get_writekey(self):
12053+        d = self.mdmf_node.get_best_mutable_version()
12054+        d.addCallback(lambda bv:
12055+            self.failUnlessEqual(bv.get_writekey(),
12056+                                 self.mdmf_node.get_writekey()))
12057+        d.addCallback(lambda ignored:
12058+            self.sdmf_node.get_best_mutable_version())
12059+        d.addCallback(lambda bv:
12060+            self.failUnlessEqual(bv.get_writekey(),
12061+                                 self.sdmf_node.get_writekey()))
12062+        return d
12063+
12064+
12065+    def test_get_storage_index(self):
12066+        d = self.mdmf_node.get_best_mutable_version()
12067+        d.addCallback(lambda bv:
12068+            self.failUnlessEqual(bv.get_storage_index(),
12069+                                 self.mdmf_node.get_storage_index()))
12070+        d.addCallback(lambda ignored:
12071+            self.sdmf_node.get_best_mutable_version())
12072+        d.addCallback(lambda bv:
12073+            self.failUnlessEqual(bv.get_storage_index(),
12074+                                 self.sdmf_node.get_storage_index()))
12075+        return d
12076+
12077+
12078+    def test_get_readonly_version(self):
12079+        d = self.mdmf_node.get_best_readable_version()
12080+        d.addCallback(lambda bv:
12081+            self.failUnless(bv.is_readonly()))
12082+        d.addCallback(lambda ignored:
12083+            self.sdmf_node.get_best_readable_version())
12084+        d.addCallback(lambda bv:
12085+            self.failUnless(bv.is_readonly()))
12086+        return d
12087+
12088+
12089+    def test_get_mutable_version(self):
12090+        d = self.mdmf_node.get_best_mutable_version()
12091+        d.addCallback(lambda bv:
12092+            self.failIf(bv.is_readonly()))
12093+        d.addCallback(lambda ignored:
12094+            self.sdmf_node.get_best_mutable_version())
12095+        d.addCallback(lambda bv:
12096+            self.failIf(bv.is_readonly()))
12097+        return d
12098+
12099+
12100+    def test_toplevel_overwrite(self):
12101+        new_data = MutableData("foo bar baz" * 100000)
12102+        new_small_data = MutableData("foo bar baz" * 10)
12103+        d = self.mdmf_node.overwrite(new_data)
12104+        d.addCallback(lambda ignored:
12105+            self.mdmf_node.download_best_version())
12106+        d.addCallback(lambda data:
12107+            self.failUnlessEqual(data, "foo bar baz" * 100000))
12108+        d.addCallback(lambda ignored:
12109+            self.sdmf_node.overwrite(new_small_data))
12110+        d.addCallback(lambda ignored:
12111+            self.sdmf_node.download_best_version())
12112+        d.addCallback(lambda data:
12113+            self.failUnlessEqual(data, "foo bar baz" * 10))
12114+        return d
12115+
12116+
12117+    def test_toplevel_modify(self):
12118+        def modifier(old_contents, servermap, first_time):
12119+            return old_contents + "modified"
12120+        d = self.mdmf_node.modify(modifier)
12121+        d.addCallback(lambda ignored:
12122+            self.mdmf_node.download_best_version())
12123+        d.addCallback(lambda data:
12124+            self.failUnlessIn("modified", data))
12125+        d.addCallback(lambda ignored:
12126+            self.sdmf_node.modify(modifier))
12127+        d.addCallback(lambda ignored:
12128+            self.sdmf_node.download_best_version())
12129+        d.addCallback(lambda data:
12130+            self.failUnlessIn("modified", data))
12131+        return d
12132+
12133+
12134+    def test_version_modify(self):
12135+        # TODO: When we can publish multiple versions, alter this test
12136+        # to modify a version other than the best usable version, then
12137+        # test to see that the best recoverable version is that.
12138+        def modifier(old_contents, servermap, first_time):
12139+            return old_contents + "modified"
12140+        d = self.mdmf_node.modify(modifier)
12141+        d.addCallback(lambda ignored:
12142+            self.mdmf_node.download_best_version())
12143+        d.addCallback(lambda data:
12144+            self.failUnlessIn("modified", data))
12145+        d.addCallback(lambda ignored:
12146+            self.sdmf_node.modify(modifier))
12147+        d.addCallback(lambda ignored:
12148+            self.sdmf_node.download_best_version())
12149+        d.addCallback(lambda data:
12150+            self.failUnlessIn("modified", data))
12151+        return d
12152+
12153+
12154+    def test_download_version(self):
12155+        d = self.publish_multiple()
12156+        # We want to have two recoverable versions on the grid.
12157+        d.addCallback(lambda res:
12158+                      self._set_versions({0:0,2:0,4:0,6:0,8:0,
12159+                                          1:1,3:1,5:1,7:1,9:1}))
12160+        # Now try to download each version. We should get the plaintext
12161+        # associated with that version.
12162+        d.addCallback(lambda ignored:
12163+            self._fn.get_servermap(mode=MODE_READ))
12164+        def _got_servermap(smap):
12165+            versions = smap.recoverable_versions()
12166+            assert len(versions) == 2
12167+
12168+            self.servermap = smap
12169+            self.version1, self.version2 = versions
12170+            assert self.version1 != self.version2
12171+
12172+            self.version1_seqnum = self.version1[0]
12173+            self.version2_seqnum = self.version2[0]
12174+            self.version1_index = self.version1_seqnum - 1
12175+            self.version2_index = self.version2_seqnum - 1
12176+
12177+        d.addCallback(_got_servermap)
12178+        d.addCallback(lambda ignored:
12179+            self._fn.download_version(self.servermap, self.version1))
12180+        d.addCallback(lambda results:
12181+            self.failUnlessEqual(self.CONTENTS[self.version1_index],
12182+                                 results))
12183+        d.addCallback(lambda ignored:
12184+            self._fn.download_version(self.servermap, self.version2))
12185+        d.addCallback(lambda results:
12186+            self.failUnlessEqual(self.CONTENTS[self.version2_index],
12187+                                 results))
12188+        return d
12189+
12190+
12191+    def test_download_nonexistent_version(self):
12192+        d = self.mdmf_node.get_servermap(mode=MODE_WRITE)
12193+        def _set_servermap(servermap):
12194+            self.servermap = servermap
12195+        d.addCallback(_set_servermap)
12196+        d.addCallback(lambda ignored:
12197+           self.shouldFail(UnrecoverableFileError, "nonexistent version",
12198+                           None,
12199+                           self.mdmf_node.download_version, self.servermap,
12200+                           "not a version"))
12201+        return d
12202+
12203+
12204+    def test_partial_read(self):
12205+        # read only a few bytes at a time, and see that the results are
12206+        # what we expect.
12207+        d = self.mdmf_node.get_best_readable_version()
12208+        def _read_data(version):
12209+            c = consumer.MemoryConsumer()
12210+            d2 = defer.succeed(None)
12211+            for i in xrange(0, len(self.data), 10000):
12212+                d2.addCallback(lambda ignored, i=i: version.read(c, i, 10000))
12213+            d2.addCallback(lambda ignored:
12214+                self.failUnlessEqual(self.data, "".join(c.chunks)))
12215+            return d2
12216+        d.addCallback(_read_data)
12217+        return d
12218+
12219+
12220+    def test_read(self):
12221+        d = self.mdmf_node.get_best_readable_version()
12222+        def _read_data(version):
12223+            c = consumer.MemoryConsumer()
12224+            d2 = defer.succeed(None)
12225+            d2.addCallback(lambda ignored: version.read(c))
12226+            d2.addCallback(lambda ignored:
12227+                self.failUnlessEqual("".join(c.chunks), self.data))
12228+            return d2
12229+        d.addCallback(_read_data)
12230+        return d
12231+
12232+
12233+    def test_download_best_version(self):
12234+        d = self.mdmf_node.download_best_version()
12235+        d.addCallback(lambda data:
12236+            self.failUnlessEqual(data, self.data))
12237+        d.addCallback(lambda ignored:
12238+            self.sdmf_node.download_best_version())
12239+        d.addCallback(lambda data:
12240+            self.failUnlessEqual(data, self.small_data))
12241+        return d
12242+
12243+
12244+class Update(GridTestMixin, unittest.TestCase, testutil.ShouldFailMixin):
12245+    def setUp(self):
12246+        GridTestMixin.setUp(self)
12247+        self.basedir = self.mktemp()
12248+        self.set_up_grid()
12249+        self.c = self.g.clients[0]
12250+        self.nm = self.c.nodemaker
12251+        self.data = "test data" * 100000 # about 900 KiB; MDMF
12252+        self.small_data = "test data" * 10 # about 90 B; SDMF
12253+        return self.do_upload()
12254+
12255+
12256+    def do_upload(self):
12257+        d1 = self.nm.create_mutable_file(MutableData(self.data),
12258+                                         version=MDMF_VERSION)
12259+        d2 = self.nm.create_mutable_file(MutableData(self.small_data))
12260+        dl = gatherResults([d1, d2])
12261+        def _then((n1, n2)):
12262+            assert isinstance(n1, MutableFileNode)
12263+            assert isinstance(n2, MutableFileNode)
12264+
12265+            self.mdmf_node = n1
12266+            self.sdmf_node = n2
12267+        dl.addCallback(_then)
12268+        return dl
12269+
12270+
12271+    def test_append(self):
12272+        # We should be able to append data to the middle of a mutable
12273+        # file and get what we expect.
12274+        new_data = self.data + "appended"
12275+        d = self.mdmf_node.get_best_mutable_version()
12276+        d.addCallback(lambda mv:
12277+            mv.update(MutableData("appended"), len(self.data)))
12278+        d.addCallback(lambda ignored:
12279+            self.mdmf_node.download_best_version())
12280+        d.addCallback(lambda results:
12281+            self.failUnlessEqual(results, new_data))
12282+        return d
12283+    test_append.timeout = 15
12284+
12285+
12286+    def test_replace(self):
12287+        # We should be able to replace data in the middle of a mutable
12288+        # file and get what we expect back.
12289+        new_data = self.data[:100]
12290+        new_data += "appended"
12291+        new_data += self.data[108:]
12292+        d = self.mdmf_node.get_best_mutable_version()
12293+        d.addCallback(lambda mv:
12294+            mv.update(MutableData("appended"), 100))
12295+        d.addCallback(lambda ignored:
12296+            self.mdmf_node.download_best_version())
12297+        d.addCallback(lambda results:
12298+            self.failUnlessEqual(results, new_data))
12299+        return d
12300+
12301+
12302+    def test_replace_and_extend(self):
12303+        # We should be able to replace data in the middle of a mutable
12304+        # file and extend that mutable file and get what we expect.
12305+        new_data = self.data[:100]
12306+        new_data += "modified " * 100000
12307+        d = self.mdmf_node.get_best_mutable_version()
12308+        d.addCallback(lambda mv:
12309+            mv.update(MutableData("modified " * 100000), 100))
12310+        d.addCallback(lambda ignored:
12311+            self.mdmf_node.download_best_version())
12312+        d.addCallback(lambda results:
12313+            self.failUnlessEqual(results, new_data))
12314+        return d
12315+
12316+
12317+    def test_append_power_of_two(self):
12318+        # If we attempt to extend a mutable file so that its segment
12319+        # count crosses a power-of-two boundary, the update operation
12320+        # should know how to reencode the file.
12321+
12322+        # Note that the data populating self.mdmf_node is about 900 KiB
12323+        # long -- this is 7 segments in the default segment size. So we
12324+        # need to add 2 segments worth of data to push it over a
12325+        # power-of-two boundary.
12326+        segment = "a" * DEFAULT_MAX_SEGMENT_SIZE
12327+        new_data = self.data + (segment * 2)
12328+        d = self.mdmf_node.get_best_mutable_version()
12329+        d.addCallback(lambda mv:
12330+            mv.update(MutableData(segment * 2), len(self.data)))
12331+        d.addCallback(lambda ignored:
12332+            self.mdmf_node.download_best_version())
12333+        d.addCallback(lambda results:
12334+            self.failUnlessEqual(results, new_data))
12335+        return d
12336+    test_append_power_of_two.timeout = 15
12337+
12338+
12339+    def test_update_sdmf(self):
12340+        # Running update on a single-segment file should still work.
12341+        new_data = self.small_data + "appended"
12342+        d = self.sdmf_node.get_best_mutable_version()
12343+        d.addCallback(lambda mv:
12344+            mv.update(MutableData("appended"), len(self.small_data)))
12345+        d.addCallback(lambda ignored:
12346+            self.sdmf_node.download_best_version())
12347+        d.addCallback(lambda results:
12348+            self.failUnlessEqual(results, new_data))
12349+        return d
12350+
12351+    def test_replace_in_last_segment(self):
12352+        # The wrapper should know how to handle the tail segment
12353+        # appropriately.
12354+        replace_offset = len(self.data) - 100
12355+        new_data = self.data[:replace_offset] + "replaced"
12356+        rest_offset = replace_offset + len("replaced")
12357+        new_data += self.data[rest_offset:]
12358+        d = self.mdmf_node.get_best_mutable_version()
12359+        d.addCallback(lambda mv:
12360+            mv.update(MutableData("replaced"), replace_offset))
12361+        d.addCallback(lambda ignored:
12362+            self.mdmf_node.download_best_version())
12363+        d.addCallback(lambda results:
12364+            self.failUnlessEqual(results, new_data))
12365+        return d
12366+
12367+
12368+    def test_multiple_segment_replace(self):
12369+        replace_offset = 2 * DEFAULT_MAX_SEGMENT_SIZE
12370+        new_data = self.data[:replace_offset]
12371+        new_segment = "a" * DEFAULT_MAX_SEGMENT_SIZE
12372+        new_data += 2 * new_segment
12373+        new_data += "replaced"
12374+        rest_offset = len(new_data)
12375+        new_data += self.data[rest_offset:]
12376+        d = self.mdmf_node.get_best_mutable_version()
12377+        d.addCallback(lambda mv:
12378+            mv.update(MutableData((2 * new_segment) + "replaced"),
12379+                      replace_offset))
12380+        d.addCallback(lambda ignored:
12381+            self.mdmf_node.download_best_version())
12382+        d.addCallback(lambda results:
12383+            self.failUnlessEqual(results, new_data))
12384+        return d
12385hunk ./src/allmydata/test/test_sftp.py 32
12386 
12387 from allmydata.util.consumer import download_to_data
12388 from allmydata.immutable import upload
12389+from allmydata.mutable import publish
12390 from allmydata.test.no_network import GridTestMixin
12391 from allmydata.test.common import ShouldFailMixin
12392 from allmydata.test.common_util import ReallyEqualMixin
12393hunk ./src/allmydata/test/test_sftp.py 84
12394         return d
12395 
12396     def _set_up_tree(self):
12397-        d = self.client.create_mutable_file("mutable file contents")
12398+        u = publish.MutableData("mutable file contents")
12399+        d = self.client.create_mutable_file(u)
12400         d.addCallback(lambda node: self.root.set_node(u"mutable", node))
12401         def _created_mutable(n):
12402             self.mutable = n
12403hunk ./src/allmydata/test/test_sftp.py 1334
12404         d.addCallback(lambda ign: self.failUnlessEqual(sftpd.all_heisenfiles, {}))
12405         d.addCallback(lambda ign: self.failUnlessEqual(self.handler._heisenfiles, {}))
12406         return d
12407+    test_makeDirectory.timeout = 15
12408 
12409     def test_execCommand_and_openShell(self):
12410         class FakeProtocol:
12411hunk ./src/allmydata/test/test_storage.py 27
12412                                      LayoutInvalid, MDMFSIGNABLEHEADER, \
12413                                      SIGNED_PREFIX, MDMFHEADER, \
12414                                      MDMFOFFSETS, SDMFSlotWriteProxy
12415-from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \
12416-                                 SDMF_VERSION
12417+from allmydata.interfaces import BadWriteEnablerError
12418 from allmydata.test.common import LoggingServiceParent, ShouldFailMixin
12419 from allmydata.test.common_web import WebRenderingMixin
12420 from allmydata.web.storage import StorageStatus, remove_prefix
12421hunk ./src/allmydata/test/test_system.py 26
12422 from allmydata.monitor import Monitor
12423 from allmydata.mutable.common import NotWriteableError
12424 from allmydata.mutable import layout as mutable_layout
12425+from allmydata.mutable.publish import MutableData
12426 from foolscap.api import DeadReferenceError
12427 from twisted.python.failure import Failure
12428 from twisted.web.client import getPage
12429hunk ./src/allmydata/test/test_system.py 467
12430     def test_mutable(self):
12431         self.basedir = "system/SystemTest/test_mutable"
12432         DATA = "initial contents go here."  # 25 bytes % 3 != 0
12433+        DATA_uploadable = MutableData(DATA)
12434         NEWDATA = "new contents yay"
12435hunk ./src/allmydata/test/test_system.py 469
12436+        NEWDATA_uploadable = MutableData(NEWDATA)
12437         NEWERDATA = "this is getting old"
12438hunk ./src/allmydata/test/test_system.py 471
12439+        NEWERDATA_uploadable = MutableData(NEWERDATA)
12440 
12441         d = self.set_up_nodes(use_key_generator=True)
12442 
12443hunk ./src/allmydata/test/test_system.py 478
12444         def _create_mutable(res):
12445             c = self.clients[0]
12446             log.msg("starting create_mutable_file")
12447-            d1 = c.create_mutable_file(DATA)
12448+            d1 = c.create_mutable_file(DATA_uploadable)
12449             def _done(res):
12450                 log.msg("DONE: %s" % (res,))
12451                 self._mutable_node_1 = res
12452hunk ./src/allmydata/test/test_system.py 565
12453             self.failUnlessEqual(res, DATA)
12454             # replace the data
12455             log.msg("starting replace1")
12456-            d1 = newnode.overwrite(NEWDATA)
12457+            d1 = newnode.overwrite(NEWDATA_uploadable)
12458             d1.addCallback(lambda res: newnode.download_best_version())
12459             return d1
12460         d.addCallback(_check_download_3)
12461hunk ./src/allmydata/test/test_system.py 579
12462             newnode2 = self.clients[3].create_node_from_uri(uri)
12463             self._newnode3 = self.clients[3].create_node_from_uri(uri)
12464             log.msg("starting replace2")
12465-            d1 = newnode1.overwrite(NEWERDATA)
12466+            d1 = newnode1.overwrite(NEWERDATA_uploadable)
12467             d1.addCallback(lambda res: newnode2.download_best_version())
12468             return d1
12469         d.addCallback(_check_download_4)
12470hunk ./src/allmydata/test/test_system.py 649
12471         def _check_empty_file(res):
12472             # make sure we can create empty files, this usually screws up the
12473             # segsize math
12474-            d1 = self.clients[2].create_mutable_file("")
12475+            d1 = self.clients[2].create_mutable_file(MutableData(""))
12476             d1.addCallback(lambda newnode: newnode.download_best_version())
12477             d1.addCallback(lambda res: self.failUnlessEqual("", res))
12478             return d1
12479hunk ./src/allmydata/test/test_system.py 680
12480                                  self.key_generator_svc.key_generator.pool_size + size_delta)
12481 
12482         d.addCallback(check_kg_poolsize, 0)
12483-        d.addCallback(lambda junk: self.clients[3].create_mutable_file('hello, world'))
12484+        d.addCallback(lambda junk:
12485+            self.clients[3].create_mutable_file(MutableData('hello, world')))
12486         d.addCallback(check_kg_poolsize, -1)
12487         d.addCallback(lambda junk: self.clients[3].create_dirnode())
12488         d.addCallback(check_kg_poolsize, -2)
12489hunk ./src/allmydata/test/test_web.py 28
12490 from allmydata.util.encodingutil import to_str
12491 from allmydata.test.common import FakeCHKFileNode, FakeMutableFileNode, \
12492      create_chk_filenode, WebErrorMixin, ShouldFailMixin, make_mutable_file_uri
12493-from allmydata.interfaces import IMutableFileNode
12494+from allmydata.interfaces import IMutableFileNode, SDMF_VERSION, MDMF_VERSION
12495 from allmydata.mutable import servermap, publish, retrieve
12496 import allmydata.test.common_util as testutil
12497 from allmydata.test.no_network import GridTestMixin
12498hunk ./src/allmydata/test/test_web.py 57
12499         return FakeCHKFileNode(cap)
12500     def _create_mutable(self, cap):
12501         return FakeMutableFileNode(None, None, None, None).init_from_cap(cap)
12502-    def create_mutable_file(self, contents="", keysize=None):
12503+    def create_mutable_file(self, contents="", keysize=None,
12504+                            version=SDMF_VERSION):
12505         n = FakeMutableFileNode(None, None, None, None)
12506hunk ./src/allmydata/test/test_web.py 60
12507+        n.set_version(version)
12508         return n.create(contents)
12509 
12510 class FakeUploader(service.Service):
12511hunk ./src/allmydata/test/test_web.py 157
12512         self.nodemaker = FakeNodeMaker(None, self._secret_holder, None,
12513                                        self.uploader, None,
12514                                        None, None)
12515+        self.mutable_file_default = SDMF_VERSION
12516 
12517     def startService(self):
12518         return service.MultiService.startService(self)
12519hunk ./src/allmydata/test/test_web.py 762
12520                              self.PUT, base + "/@@name=/blah.txt", "")
12521         return d
12522 
12523+
12524     def test_GET_DIRURL_named_bad(self):
12525         base = "/file/%s" % urllib.quote(self._foo_uri)
12526         d = self.shouldFail2(error.Error, "test_PUT_DIRURL_named_bad",
12527hunk ./src/allmydata/test/test_web.py 878
12528                                                       self.NEWFILE_CONTENTS))
12529         return d
12530 
12531+    def test_PUT_NEWFILEURL_unlinked_mdmf(self):
12532+        # this should get us a few segments of an MDMF mutable file,
12533+        # which we can then test for.
12534+        contents = self.NEWFILE_CONTENTS * 300000
12535+        d = self.PUT("/uri?mutable=true&mutable-type=mdmf",
12536+                     contents)
12537+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12538+        d.addCallback(lambda json: self.failUnlessIn("mdmf", json))
12539+        return d
12540+
12541+    def test_PUT_NEWFILEURL_unlinked_sdmf(self):
12542+        contents = self.NEWFILE_CONTENTS * 300000
12543+        d = self.PUT("/uri?mutable=true&mutable-type=sdmf",
12544+                     contents)
12545+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12546+        d.addCallback(lambda json: self.failUnlessIn("sdmf", json))
12547+        return d
12548+
12549     def test_PUT_NEWFILEURL_range_bad(self):
12550         headers = {"content-range": "bytes 1-10/%d" % len(self.NEWFILE_CONTENTS)}
12551         target = self.public_url + "/foo/new.txt"
12552hunk ./src/allmydata/test/test_web.py 928
12553         return d
12554 
12555     def test_PUT_NEWFILEURL_mutable_toobig(self):
12556-        d = self.shouldFail2(error.Error, "test_PUT_NEWFILEURL_mutable_toobig",
12557-                             "413 Request Entity Too Large",
12558-                             "SDMF is limited to one segment, and 10001 > 10000",
12559-                             self.PUT,
12560-                             self.public_url + "/foo/new.txt?mutable=true",
12561-                             "b" * (self.s.MUTABLE_SIZELIMIT+1))
12562+        # It is okay to upload large mutable files, so we should be able
12563+        # to do that.
12564+        d = self.PUT(self.public_url + "/foo/new.txt?mutable=true",
12565+                     "b" * (self.s.MUTABLE_SIZELIMIT + 1))
12566         return d
12567 
12568     def test_PUT_NEWFILEURL_replace(self):
12569hunk ./src/allmydata/test/test_web.py 1026
12570         d.addCallback(_check1)
12571         return d
12572 
12573+    def test_GET_FILEURL_json_mutable_type(self):
12574+        # The JSON should include mutable-type, which says whether the
12575+        # file is SDMF or MDMF
12576+        d = self.PUT("/uri?mutable=true&mutable-type=mdmf",
12577+                     self.NEWFILE_CONTENTS * 300000)
12578+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12579+        def _got_json(json, version):
12580+            data = simplejson.loads(json)
12581+            assert "filenode" == data[0]
12582+            data = data[1]
12583+            assert isinstance(data, dict)
12584+
12585+            self.failUnlessIn("mutable-type", data)
12586+            self.failUnlessEqual(data['mutable-type'], version)
12587+
12588+        d.addCallback(_got_json, "mdmf")
12589+        # Now make an SDMF file and check that it is reported correctly.
12590+        d.addCallback(lambda ignored:
12591+            self.PUT("/uri?mutable=true&mutable-type=sdmf",
12592+                      self.NEWFILE_CONTENTS * 300000))
12593+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12594+        d.addCallback(_got_json, "sdmf")
12595+        return d
12596+
12597     def test_GET_FILEURL_json_missing(self):
12598         d = self.GET(self.public_url + "/foo/missing?json")
12599         d.addBoth(self.should404, "test_GET_FILEURL_json_missing")
12600hunk ./src/allmydata/test/test_web.py 1088
12601         d.addBoth(self.should404, "test_GET_FILEURL_uri_missing")
12602         return d
12603 
12604-    def test_GET_DIRECTORY_html_banner(self):
12605+    def test_GET_DIRECTORY_html(self):
12606         d = self.GET(self.public_url + "/foo", followRedirect=True)
12607         def _check(res):
12608             self.failUnlessIn('<div class="toolbar-item"><a href="../../..">Return to Welcome page</a></div>',res)
12609hunk ./src/allmydata/test/test_web.py 1092
12610+            self.failUnlessIn("mutable-type-mdmf", res)
12611+            self.failUnlessIn("mutable-type-sdmf", res)
12612         d.addCallback(_check)
12613         return d
12614 
12615hunk ./src/allmydata/test/test_web.py 1097
12616+    def test_GET_root_html(self):
12617+        # make sure that we have the option to upload an unlinked
12618+        # mutable file in SDMF and MDMF formats.
12619+        d = self.GET("/")
12620+        def _got_html(html):
12621+            # These are radio buttons that allow the user to toggle
12622+            # whether a particular mutable file is MDMF or SDMF.
12623+            self.failUnlessIn("mutable-type-mdmf", html)
12624+            self.failUnlessIn("mutable-type-sdmf", html)
12625+        d.addCallback(_got_html)
12626+        return d
12627+
12628+    def test_mutable_type_defaults(self):
12629+        # The checked="checked" attribute of the inputs corresponding to
12630+        # the mutable-type parameter should change as expected with the
12631+        # value configured in tahoe.cfg.
12632+        #
12633+        # By default, the value configured with the client is
12634+        # SDMF_VERSION, so that should be checked.
12635+        assert self.s.mutable_file_default == SDMF_VERSION
12636+
12637+        d = self.GET("/")
12638+        def _got_html(html, value):
12639+            i = 'input checked="checked" type="radio" id="mutable-type-%s"'
12640+            self.failUnlessIn(i % value, html)
12641+        d.addCallback(_got_html, "sdmf")
12642+        d.addCallback(lambda ignored:
12643+            self.GET(self.public_url + "/foo", followRedirect=True))
12644+        d.addCallback(_got_html, "sdmf")
12645+        # Now switch the configuration value to MDMF. The MDMF radio
12646+        # buttons should now be checked on these pages.
12647+        def _swap_values(ignored):
12648+            self.s.mutable_file_default = MDMF_VERSION
12649+        d.addCallback(_swap_values)
12650+        d.addCallback(lambda ignored: self.GET("/"))
12651+        d.addCallback(_got_html, "mdmf")
12652+        d.addCallback(lambda ignored:
12653+            self.GET(self.public_url + "/foo", followRedirect=True))
12654+        d.addCallback(_got_html, "mdmf")
12655+        return d
12656+
12657     def test_GET_DIRURL(self):
12658         # the addSlash means we get a redirect here
12659         # from /uri/$URI/foo/ , we need ../../../ to get back to the root
12660hunk ./src/allmydata/test/test_web.py 1227
12661         d.addCallback(self.failUnlessIsFooJSON)
12662         return d
12663 
12664+    def test_GET_DIRURL_json_mutable_type(self):
12665+        d = self.PUT(self.public_url + \
12666+                     "/foo/sdmf.txt?mutable=true&mutable-type=sdmf",
12667+                     self.NEWFILE_CONTENTS * 300000)
12668+        d.addCallback(lambda ignored:
12669+            self.PUT(self.public_url + \
12670+                     "/foo/mdmf.txt?mutable=true&mutable-type=mdmf",
12671+                     self.NEWFILE_CONTENTS * 300000))
12672+        # Now we have an MDMF and SDMF file in the directory. If we GET
12673+        # its JSON, we should see their encodings.
12674+        d.addCallback(lambda ignored:
12675+            self.GET(self.public_url + "/foo?t=json"))
12676+        def _got_json(json):
12677+            data = simplejson.loads(json)
12678+            assert data[0] == "dirnode"
12679+
12680+            data = data[1]
12681+            kids = data['children']
12682+
12683+            mdmf_data = kids['mdmf.txt'][1]
12684+            self.failUnlessIn("mutable-type", mdmf_data)
12685+            self.failUnlessEqual(mdmf_data['mutable-type'], "mdmf")
12686+
12687+            sdmf_data = kids['sdmf.txt'][1]
12688+            self.failUnlessIn("mutable-type", sdmf_data)
12689+            self.failUnlessEqual(sdmf_data['mutable-type'], "sdmf")
12690+        d.addCallback(_got_json)
12691+        return d
12692+
12693 
12694     def test_POST_DIRURL_manifest_no_ophandle(self):
12695         d = self.shouldFail2(error.Error,
12696hunk ./src/allmydata/test/test_web.py 1810
12697         return d
12698 
12699     def test_POST_upload_no_link_mutable_toobig(self):
12700-        d = self.shouldFail2(error.Error,
12701-                             "test_POST_upload_no_link_mutable_toobig",
12702-                             "413 Request Entity Too Large",
12703-                             "SDMF is limited to one segment, and 10001 > 10000",
12704-                             self.POST,
12705-                             "/uri", t="upload", mutable="true",
12706-                             file=("new.txt",
12707-                                   "b" * (self.s.MUTABLE_SIZELIMIT+1)) )
12708+        # The SDMF size limit is no longer in place, so we should be
12709+        # able to upload mutable files that are as large as we want them
12710+        # to be.
12711+        d = self.POST("/uri", t="upload", mutable="true",
12712+                      file=("new.txt", "b" * (self.s.MUTABLE_SIZELIMIT + 1)))
12713         return d
12714 
12715hunk ./src/allmydata/test/test_web.py 1817
12716+
12717+    def test_POST_upload_mutable_type_unlinked(self):
12718+        d = self.POST("/uri?t=upload&mutable=true&mutable-type=sdmf",
12719+                      file=("sdmf.txt", self.NEWFILE_CONTENTS * 300000))
12720+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12721+        def _got_json(json, version):
12722+            data = simplejson.loads(json)
12723+            data = data[1]
12724+
12725+            self.failUnlessIn("mutable-type", data)
12726+            self.failUnlessEqual(data['mutable-type'], version)
12727+        d.addCallback(_got_json, "sdmf")
12728+        d.addCallback(lambda ignored:
12729+            self.POST("/uri?t=upload&mutable=true&mutable-type=mdmf",
12730+                      file=('mdmf.txt', self.NEWFILE_CONTENTS * 300000)))
12731+        d.addCallback(lambda filecap: self.GET("/uri/%s?t=json" % filecap))
12732+        d.addCallback(_got_json, "mdmf")
12733+        return d
12734+
12735+    def test_POST_upload_mutable_type(self):
12736+        d = self.POST(self.public_url + \
12737+                      "/foo?t=upload&mutable=true&mutable-type=sdmf",
12738+                      file=("sdmf.txt", self.NEWFILE_CONTENTS * 300000))
12739+        fn = self._foo_node
12740+        def _got_cap(filecap, filename):
12741+            filenameu = unicode(filename)
12742+            self.failUnlessURIMatchesRWChild(filecap, fn, filenameu)
12743+            return self.GET(self.public_url + "/foo/%s?t=json" % filename)
12744+        d.addCallback(_got_cap, "sdmf.txt")
12745+        def _got_json(json, version):
12746+            data = simplejson.loads(json)
12747+            data = data[1]
12748+
12749+            self.failUnlessIn("mutable-type", data)
12750+            self.failUnlessEqual(data['mutable-type'], version)
12751+        d.addCallback(_got_json, "sdmf")
12752+        d.addCallback(lambda ignored:
12753+            self.POST(self.public_url + \
12754+                      "/foo?t=upload&mutable=true&mutable-type=mdmf",
12755+                      file=("mdmf.txt", self.NEWFILE_CONTENTS * 300000)))
12756+        d.addCallback(_got_cap, "mdmf.txt")
12757+        d.addCallback(_got_json, "mdmf")
12758+        return d
12759+
12760     def test_POST_upload_mutable(self):
12761         # this creates a mutable file
12762         d = self.POST(self.public_url + "/foo", t="upload", mutable="true",
12763hunk ./src/allmydata/test/test_web.py 1985
12764             self.failUnlessReallyEqual(headers["content-type"], ["text/plain"])
12765         d.addCallback(_got_headers)
12766 
12767-        # make sure that size errors are displayed correctly for overwrite
12768-        d.addCallback(lambda res:
12769-                      self.shouldFail2(error.Error,
12770-                                       "test_POST_upload_mutable-toobig",
12771-                                       "413 Request Entity Too Large",
12772-                                       "SDMF is limited to one segment, and 10001 > 10000",
12773-                                       self.POST,
12774-                                       self.public_url + "/foo", t="upload",
12775-                                       mutable="true",
12776-                                       file=("new.txt",
12777-                                             "b" * (self.s.MUTABLE_SIZELIMIT+1)),
12778-                                       ))
12779-
12780+        # make sure that outdated size limits aren't enforced anymore.
12781+        d.addCallback(lambda ignored:
12782+            self.POST(self.public_url + "/foo", t="upload",
12783+                      mutable="true",
12784+                      file=("new.txt",
12785+                            "b" * (self.s.MUTABLE_SIZELIMIT+1))))
12786         d.addErrback(self.dump_error)
12787         return d
12788 
12789hunk ./src/allmydata/test/test_web.py 1995
12790     def test_POST_upload_mutable_toobig(self):
12791-        d = self.shouldFail2(error.Error,
12792-                             "test_POST_upload_mutable_toobig",
12793-                             "413 Request Entity Too Large",
12794-                             "SDMF is limited to one segment, and 10001 > 10000",
12795-                             self.POST,
12796-                             self.public_url + "/foo",
12797-                             t="upload", mutable="true",
12798-                             file=("new.txt",
12799-                                   "b" * (self.s.MUTABLE_SIZELIMIT+1)) )
12800+        # SDMF had a size limti that was removed a while ago. MDMF has
12801+        # never had a size limit. Test to make sure that we do not
12802+        # encounter errors when trying to upload large mutable files,
12803+        # since there should be no coded prohibitions regarding large
12804+        # mutable files.
12805+        d = self.POST(self.public_url + "/foo",
12806+                      t="upload", mutable="true",
12807+                      file=("new.txt", "b" * (self.s.MUTABLE_SIZELIMIT + 1)))
12808         return d
12809 
12810     def dump_error(self, f):
12811hunk ./src/allmydata/test/test_web.py 3005
12812                                                       contents))
12813         return d
12814 
12815+    def test_PUT_NEWFILEURL_mdmf(self):
12816+        new_contents = self.NEWFILE_CONTENTS * 300000
12817+        d = self.PUT(self.public_url + \
12818+                     "/foo/mdmf.txt?mutable=true&mutable-type=mdmf",
12819+                     new_contents)
12820+        d.addCallback(lambda ignored:
12821+            self.GET(self.public_url + "/foo/mdmf.txt?t=json"))
12822+        def _got_json(json):
12823+            data = simplejson.loads(json)
12824+            data = data[1]
12825+            self.failUnlessIn("mutable-type", data)
12826+            self.failUnlessEqual(data['mutable-type'], "mdmf")
12827+        d.addCallback(_got_json)
12828+        return d
12829+
12830+    def test_PUT_NEWFILEURL_sdmf(self):
12831+        new_contents = self.NEWFILE_CONTENTS * 300000
12832+        d = self.PUT(self.public_url + \
12833+                     "/foo/sdmf.txt?mutable=true&mutable-type=sdmf",
12834+                     new_contents)
12835+        d.addCallback(lambda ignored:
12836+            self.GET(self.public_url + "/foo/sdmf.txt?t=json"))
12837+        def _got_json(json):
12838+            data = simplejson.loads(json)
12839+            data = data[1]
12840+            self.failUnlessIn("mutable-type", data)
12841+            self.failUnlessEqual(data['mutable-type'], "sdmf")
12842+        d.addCallback(_got_json)
12843+        return d
12844+
12845     def test_PUT_NEWFILEURL_uri_replace(self):
12846         contents, n, new_uri = self.makefile(8)
12847         d = self.PUT(self.public_url + "/foo/bar.txt?t=uri", new_uri)
12848hunk ./src/allmydata/test/test_web.py 3156
12849         d.addCallback(_done)
12850         return d
12851 
12852+
12853+    def test_PUT_update_at_offset(self):
12854+        file_contents = "test file" * 100000 # about 900 KiB
12855+        d = self.PUT("/uri?mutable=true", file_contents)
12856+        def _then(filecap):
12857+            self.filecap = filecap
12858+            new_data = file_contents[:100]
12859+            new = "replaced and so on"
12860+            new_data += new
12861+            new_data += file_contents[len(new_data):]
12862+            assert len(new_data) == len(file_contents)
12863+            self.new_data = new_data
12864+        d.addCallback(_then)
12865+        d.addCallback(lambda ignored:
12866+            self.PUT("/uri/%s?replace=True&offset=100" % self.filecap,
12867+                     "replaced and so on"))
12868+        def _get_data(filecap):
12869+            n = self.s.create_node_from_uri(filecap)
12870+            return n.download_best_version()
12871+        d.addCallback(_get_data)
12872+        d.addCallback(lambda results:
12873+            self.failUnlessEqual(results, self.new_data))
12874+        # Now try appending things to the file
12875+        d.addCallback(lambda ignored:
12876+            self.PUT("/uri/%s?offset=%d" % (self.filecap, len(self.new_data)),
12877+                     "puppies" * 100))
12878+        d.addCallback(_get_data)
12879+        d.addCallback(lambda results:
12880+            self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
12881+        return d
12882+
12883+
12884+    def test_PUT_update_at_offset_immutable(self):
12885+        file_contents = "Test file" * 100000
12886+        d = self.PUT("/uri", file_contents)
12887+        def _then(filecap):
12888+            self.filecap = filecap
12889+        d.addCallback(_then)
12890+        d.addCallback(lambda ignored:
12891+            self.shouldHTTPError("test immutable update",
12892+                                 400, "Bad Request",
12893+                                 "immutable",
12894+                                 self.PUT,
12895+                                 "/uri/%s?offset=50" % self.filecap,
12896+                                 "foo"))
12897+        return d
12898+
12899+
12900     def test_bad_method(self):
12901         url = self.webish_url + self.public_url + "/foo/bar.txt"
12902         d = self.shouldHTTPError("test_bad_method",
12903hunk ./src/allmydata/test/test_web.py 3473
12904         def _stash_mutable_uri(n, which):
12905             self.uris[which] = n.get_uri()
12906             assert isinstance(self.uris[which], str)
12907-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"3"))
12908+        d.addCallback(lambda ign:
12909+            c0.create_mutable_file(publish.MutableData(DATA+"3")))
12910         d.addCallback(_stash_mutable_uri, "corrupt")
12911         d.addCallback(lambda ign:
12912                       c0.upload(upload.Data("literal", convergence="")))
12913hunk ./src/allmydata/test/test_web.py 3620
12914         def _stash_mutable_uri(n, which):
12915             self.uris[which] = n.get_uri()
12916             assert isinstance(self.uris[which], str)
12917-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"3"))
12918+        d.addCallback(lambda ign:
12919+            c0.create_mutable_file(publish.MutableData(DATA+"3")))
12920         d.addCallback(_stash_mutable_uri, "corrupt")
12921 
12922         def _compute_fileurls(ignored):
12923hunk ./src/allmydata/test/test_web.py 4283
12924         def _stash_mutable_uri(n, which):
12925             self.uris[which] = n.get_uri()
12926             assert isinstance(self.uris[which], str)
12927-        d.addCallback(lambda ign: c0.create_mutable_file(DATA+"2"))
12928+        d.addCallback(lambda ign:
12929+            c0.create_mutable_file(publish.MutableData(DATA+"2")))
12930         d.addCallback(_stash_mutable_uri, "mutable")
12931 
12932         def _compute_fileurls(ignored):
12933hunk ./src/allmydata/test/test_web.py 4383
12934                                                         convergence="")))
12935         d.addCallback(_stash_uri, "small")
12936 
12937-        d.addCallback(lambda ign: c0.create_mutable_file("mutable"))
12938+        d.addCallback(lambda ign:
12939+            c0.create_mutable_file(publish.MutableData("mutable")))
12940         d.addCallback(lambda fn: self.rootnode.set_node(u"mutable", fn))
12941         d.addCallback(_stash_uri, "mutable")
12942 
12943}
12944[resolve conflicts between 393-MDMF patches and trunk as of 1.8.2
12945"Brian Warner <warner@lothar.com>"**20110220230201
12946 Ignore-this: 9bbf5d26c994e8069202331dcb4cdd95
12947] {
12948merger 0.0 (
12949merger 0.0 (
12950merger 0.0 (
12951replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
12952merger 0.0 (
12953hunk ./docs/configuration.rst 384
12954-shares.needed = (int, optional) aka "k", default 3
12955-shares.total = (int, optional) aka "N", N >= k, default 10
12956-shares.happy = (int, optional) 1 <= happy <= N, default 7
12957-
12958- These three values set the default encoding parameters. Each time a new file
12959- is uploaded, erasure-coding is used to break the ciphertext into separate
12960- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
12961- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
12962- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
12963- Setting k to 1 is equivalent to simple replication (uploading N copies of
12964- the file).
12965-
12966- These values control the tradeoff between storage overhead, performance, and
12967- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
12968- backend storage space (the actual value will be a bit more, because of other
12969- forms of overhead). Up to N-k shares can be lost before the file becomes
12970- unrecoverable, so assuming there are at least N servers, up to N-k servers
12971- can be offline without losing the file. So large N/k ratios are more
12972- reliable, and small N/k ratios use less disk space. Clearly, k must never be
12973- smaller than N.
12974-
12975- Large values of N will slow down upload operations slightly, since more
12976- servers must be involved, and will slightly increase storage overhead due to
12977- the hash trees that are created. Large values of k will cause downloads to
12978- be marginally slower, because more servers must be involved. N cannot be
12979- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
12980- uses.
12981-
12982- shares.happy allows you control over the distribution of your immutable file.
12983- For a successful upload, shares are guaranteed to be initially placed on
12984- at least 'shares.happy' distinct servers, the correct functioning of any
12985- k of which is sufficient to guarantee the availability of the uploaded file.
12986- This value should not be larger than the number of servers on your grid.
12987-
12988- A value of shares.happy <= k is allowed, but does not provide any redundancy
12989- if some servers fail or lose shares.
12990-
12991- (Mutable files use a different share placement algorithm that does not
12992-  consider this parameter.)
12993-
12994-
12995-== Storage Server Configuration ==
12996-
12997-[storage]
12998-enabled = (boolean, optional)
12999-
13000- If this is True, the node will run a storage server, offering space to other
13001- clients. If it is False, the node will not run a storage server, meaning
13002- that no shares will be stored on this node. Use False this for clients who
13003- do not wish to provide storage service. The default value is True.
13004-
13005-readonly = (boolean, optional)
13006-
13007- If True, the node will run a storage server but will not accept any shares,
13008- making it effectively read-only. Use this for storage servers which are
13009- being decommissioned: the storage/ directory could be mounted read-only,
13010- while shares are moved to other servers. Note that this currently only
13011- affects immutable shares. Mutable shares (used for directories) will be
13012- written and modified anyway. See ticket #390 for the current status of this
13013- bug. The default value is False.
13014-
13015-reserved_space = (str, optional)
13016-
13017- If provided, this value defines how much disk space is reserved: the storage
13018- server will not accept any share which causes the amount of free disk space
13019- to drop below this value. (The free space is measured by a call to statvfs(2)
13020- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13021- user account under which the storage server runs.)
13022-
13023- This string contains a number, with an optional case-insensitive scale
13024- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13025- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13026- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13027-
13028-expire.enabled =
13029-expire.mode =
13030-expire.override_lease_duration =
13031-expire.cutoff_date =
13032-expire.immutable =
13033-expire.mutable =
13034-
13035- These settings control garbage-collection, in which the server will delete
13036- shares that no longer have an up-to-date lease on them. Please see the
13037- neighboring "garbage-collection.txt" document for full details.
13038-
13039-
13040-== Running A Helper ==
13041+Running A Helper
13042+================
13043hunk ./docs/configuration.rst 424
13044+mutable.format = sdmf or mdmf
13045+
13046+ This value tells Tahoe-LAFS what the default mutable file format should
13047+ be. If mutable.format=sdmf, then newly created mutable files will be in
13048+ the old SDMF format. This is desirable for clients that operate on
13049+ grids where some peers run older versions of Tahoe-LAFS, as these older
13050+ versions cannot read the new MDMF mutable file format. If
13051+ mutable.format = mdmf, then newly created mutable files will use the
13052+ new MDMF format, which supports efficient in-place modification and
13053+ streaming downloads. You can overwrite this value using a special
13054+ mutable-type parameter in the webapi. If you do not specify a value
13055+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
13056+
13057+ Note that this parameter only applies to mutable files. Mutable
13058+ directories, which are stored as mutable files, are not controlled by
13059+ this parameter and will always use SDMF. We may revisit this decision
13060+ in future versions of Tahoe-LAFS.
13061)
13062)
13063hunk ./docs/configuration.rst 324
13064+Frontend Configuration
13065+======================
13066+
13067+The Tahoe client process can run a variety of frontend file-access protocols.
13068+You will use these to create and retrieve files from the virtual filesystem.
13069+Configuration details for each are documented in the following
13070+protocol-specific guides:
13071+
13072+HTTP
13073+
13074+    Tahoe runs a webserver by default on port 3456. This interface provides a
13075+    human-oriented "WUI", with pages to create, modify, and browse
13076+    directories and files, as well as a number of pages to check on the
13077+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
13078+    with a REST-ful HTTP interface that can be used by other programs
13079+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
13080+    details, and the ``web.port`` and ``web.static`` config variables above.
13081+    The `<frontends/download-status.rst>`_ document also describes a few WUI
13082+    status pages.
13083+
13084+CLI
13085+
13086+    The main "bin/tahoe" executable includes subcommands for manipulating the
13087+    filesystem, uploading/downloading files, and creating/running Tahoe
13088+    nodes. See `<frontends/CLI.rst>`_ for details.
13089+
13090+FTP, SFTP
13091+
13092+    Tahoe can also run both FTP and SFTP servers, and map a username/password
13093+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
13094+    for instructions on configuring these services, and the ``[ftpd]`` and
13095+    ``[sftpd]`` sections of ``tahoe.cfg``.
13096+
13097)
13098hunk ./docs/configuration.rst 324
13099+``mutable.format = sdmf or mdmf``
13100+
13101+    This value tells Tahoe what the default mutable file format should
13102+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
13103+    in the old SDMF format. This is desirable for clients that operate on
13104+    grids where some peers run older versions of Tahoe, as these older
13105+    versions cannot read the new MDMF mutable file format. If
13106+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
13107+    the new MDMF format, which supports efficient in-place modification and
13108+    streaming downloads. You can overwrite this value using a special
13109+    mutable-type parameter in the webapi. If you do not specify a value here,
13110+    Tahoe will use SDMF for all newly-created mutable files.
13111+
13112+    Note that this parameter only applies to mutable files. Mutable
13113+    directories, which are stored as mutable files, are not controlled by
13114+    this parameter and will always use SDMF. We may revisit this decision
13115+    in future versions of Tahoe-LAFS.
13116+
13117)
13118merger 0.0 (
13119merger 0.0 (
13120hunk ./docs/configuration.rst 324
13121+``mutable.format = sdmf or mdmf``
13122+
13123+    This value tells Tahoe what the default mutable file format should
13124+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
13125+    in the old SDMF format. This is desirable for clients that operate on
13126+    grids where some peers run older versions of Tahoe, as these older
13127+    versions cannot read the new MDMF mutable file format. If
13128+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
13129+    the new MDMF format, which supports efficient in-place modification and
13130+    streaming downloads. You can overwrite this value using a special
13131+    mutable-type parameter in the webapi. If you do not specify a value here,
13132+    Tahoe will use SDMF for all newly-created mutable files.
13133+
13134+    Note that this parameter only applies to mutable files. Mutable
13135+    directories, which are stored as mutable files, are not controlled by
13136+    this parameter and will always use SDMF. We may revisit this decision
13137+    in future versions of Tahoe-LAFS.
13138+
13139merger 0.0 (
13140merger 0.0 (
13141replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13142merger 0.0 (
13143hunk ./docs/configuration.rst 384
13144-shares.needed = (int, optional) aka "k", default 3
13145-shares.total = (int, optional) aka "N", N >= k, default 10
13146-shares.happy = (int, optional) 1 <= happy <= N, default 7
13147-
13148- These three values set the default encoding parameters. Each time a new file
13149- is uploaded, erasure-coding is used to break the ciphertext into separate
13150- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13151- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13152- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13153- Setting k to 1 is equivalent to simple replication (uploading N copies of
13154- the file).
13155-
13156- These values control the tradeoff between storage overhead, performance, and
13157- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13158- backend storage space (the actual value will be a bit more, because of other
13159- forms of overhead). Up to N-k shares can be lost before the file becomes
13160- unrecoverable, so assuming there are at least N servers, up to N-k servers
13161- can be offline without losing the file. So large N/k ratios are more
13162- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13163- smaller than N.
13164-
13165- Large values of N will slow down upload operations slightly, since more
13166- servers must be involved, and will slightly increase storage overhead due to
13167- the hash trees that are created. Large values of k will cause downloads to
13168- be marginally slower, because more servers must be involved. N cannot be
13169- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13170- uses.
13171-
13172- shares.happy allows you control over the distribution of your immutable file.
13173- For a successful upload, shares are guaranteed to be initially placed on
13174- at least 'shares.happy' distinct servers, the correct functioning of any
13175- k of which is sufficient to guarantee the availability of the uploaded file.
13176- This value should not be larger than the number of servers on your grid.
13177-
13178- A value of shares.happy <= k is allowed, but does not provide any redundancy
13179- if some servers fail or lose shares.
13180-
13181- (Mutable files use a different share placement algorithm that does not
13182-  consider this parameter.)
13183-
13184-
13185-== Storage Server Configuration ==
13186-
13187-[storage]
13188-enabled = (boolean, optional)
13189-
13190- If this is True, the node will run a storage server, offering space to other
13191- clients. If it is False, the node will not run a storage server, meaning
13192- that no shares will be stored on this node. Use False this for clients who
13193- do not wish to provide storage service. The default value is True.
13194-
13195-readonly = (boolean, optional)
13196-
13197- If True, the node will run a storage server but will not accept any shares,
13198- making it effectively read-only. Use this for storage servers which are
13199- being decommissioned: the storage/ directory could be mounted read-only,
13200- while shares are moved to other servers. Note that this currently only
13201- affects immutable shares. Mutable shares (used for directories) will be
13202- written and modified anyway. See ticket #390 for the current status of this
13203- bug. The default value is False.
13204-
13205-reserved_space = (str, optional)
13206-
13207- If provided, this value defines how much disk space is reserved: the storage
13208- server will not accept any share which causes the amount of free disk space
13209- to drop below this value. (The free space is measured by a call to statvfs(2)
13210- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13211- user account under which the storage server runs.)
13212-
13213- This string contains a number, with an optional case-insensitive scale
13214- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13215- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13216- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13217-
13218-expire.enabled =
13219-expire.mode =
13220-expire.override_lease_duration =
13221-expire.cutoff_date =
13222-expire.immutable =
13223-expire.mutable =
13224-
13225- These settings control garbage-collection, in which the server will delete
13226- shares that no longer have an up-to-date lease on them. Please see the
13227- neighboring "garbage-collection.txt" document for full details.
13228-
13229-
13230-== Running A Helper ==
13231+Running A Helper
13232+================
13233hunk ./docs/configuration.rst 424
13234+mutable.format = sdmf or mdmf
13235+
13236+ This value tells Tahoe-LAFS what the default mutable file format should
13237+ be. If mutable.format=sdmf, then newly created mutable files will be in
13238+ the old SDMF format. This is desirable for clients that operate on
13239+ grids where some peers run older versions of Tahoe-LAFS, as these older
13240+ versions cannot read the new MDMF mutable file format. If
13241+ mutable.format = mdmf, then newly created mutable files will use the
13242+ new MDMF format, which supports efficient in-place modification and
13243+ streaming downloads. You can overwrite this value using a special
13244+ mutable-type parameter in the webapi. If you do not specify a value
13245+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
13246+
13247+ Note that this parameter only applies to mutable files. Mutable
13248+ directories, which are stored as mutable files, are not controlled by
13249+ this parameter and will always use SDMF. We may revisit this decision
13250+ in future versions of Tahoe-LAFS.
13251)
13252)
13253hunk ./docs/configuration.rst 324
13254+Frontend Configuration
13255+======================
13256+
13257+The Tahoe client process can run a variety of frontend file-access protocols.
13258+You will use these to create and retrieve files from the virtual filesystem.
13259+Configuration details for each are documented in the following
13260+protocol-specific guides:
13261+
13262+HTTP
13263+
13264+    Tahoe runs a webserver by default on port 3456. This interface provides a
13265+    human-oriented "WUI", with pages to create, modify, and browse
13266+    directories and files, as well as a number of pages to check on the
13267+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
13268+    with a REST-ful HTTP interface that can be used by other programs
13269+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
13270+    details, and the ``web.port`` and ``web.static`` config variables above.
13271+    The `<frontends/download-status.rst>`_ document also describes a few WUI
13272+    status pages.
13273+
13274+CLI
13275+
13276+    The main "bin/tahoe" executable includes subcommands for manipulating the
13277+    filesystem, uploading/downloading files, and creating/running Tahoe
13278+    nodes. See `<frontends/CLI.rst>`_ for details.
13279+
13280+FTP, SFTP
13281+
13282+    Tahoe can also run both FTP and SFTP servers, and map a username/password
13283+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
13284+    for instructions on configuring these services, and the ``[ftpd]`` and
13285+    ``[sftpd]`` sections of ``tahoe.cfg``.
13286+
13287)
13288)
13289hunk ./docs/configuration.rst 402
13290-shares.needed = (int, optional) aka "k", default 3
13291-shares.total = (int, optional) aka "N", N >= k, default 10
13292-shares.happy = (int, optional) 1 <= happy <= N, default 7
13293-
13294- These three values set the default encoding parameters. Each time a new file
13295- is uploaded, erasure-coding is used to break the ciphertext into separate
13296- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13297- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13298- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13299- Setting k to 1 is equivalent to simple replication (uploading N copies of
13300- the file).
13301-
13302- These values control the tradeoff between storage overhead, performance, and
13303- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13304- backend storage space (the actual value will be a bit more, because of other
13305- forms of overhead). Up to N-k shares can be lost before the file becomes
13306- unrecoverable, so assuming there are at least N servers, up to N-k servers
13307- can be offline without losing the file. So large N/k ratios are more
13308- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13309- smaller than N.
13310-
13311- Large values of N will slow down upload operations slightly, since more
13312- servers must be involved, and will slightly increase storage overhead due to
13313- the hash trees that are created. Large values of k will cause downloads to
13314- be marginally slower, because more servers must be involved. N cannot be
13315- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13316- uses.
13317-
13318- shares.happy allows you control over the distribution of your immutable file.
13319- For a successful upload, shares are guaranteed to be initially placed on
13320- at least 'shares.happy' distinct servers, the correct functioning of any
13321- k of which is sufficient to guarantee the availability of the uploaded file.
13322- This value should not be larger than the number of servers on your grid.
13323-
13324- A value of shares.happy <= k is allowed, but does not provide any redundancy
13325- if some servers fail or lose shares.
13326-
13327- (Mutable files use a different share placement algorithm that does not
13328-  consider this parameter.)
13329-
13330-
13331-== Storage Server Configuration ==
13332-
13333-[storage]
13334-enabled = (boolean, optional)
13335-
13336- If this is True, the node will run a storage server, offering space to other
13337- clients. If it is False, the node will not run a storage server, meaning
13338- that no shares will be stored on this node. Use False this for clients who
13339- do not wish to provide storage service. The default value is True.
13340-
13341-readonly = (boolean, optional)
13342-
13343- If True, the node will run a storage server but will not accept any shares,
13344- making it effectively read-only. Use this for storage servers which are
13345- being decommissioned: the storage/ directory could be mounted read-only,
13346- while shares are moved to other servers. Note that this currently only
13347- affects immutable shares. Mutable shares (used for directories) will be
13348- written and modified anyway. See ticket #390 for the current status of this
13349- bug. The default value is False.
13350-
13351-reserved_space = (str, optional)
13352-
13353- If provided, this value defines how much disk space is reserved: the storage
13354- server will not accept any share which causes the amount of free disk space
13355- to drop below this value. (The free space is measured by a call to statvfs(2)
13356- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13357- user account under which the storage server runs.)
13358-
13359- This string contains a number, with an optional case-insensitive scale
13360- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13361- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13362- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13363-
13364-expire.enabled =
13365-expire.mode =
13366-expire.override_lease_duration =
13367-expire.cutoff_date =
13368-expire.immutable =
13369-expire.mutable =
13370-
13371- These settings control garbage-collection, in which the server will delete
13372- shares that no longer have an up-to-date lease on them. Please see the
13373- neighboring "garbage-collection.txt" document for full details.
13374-
13375-
13376-== Running A Helper ==
13377+Running A Helper
13378+================
13379)
13380merger 0.0 (
13381merger 0.0 (
13382hunk ./docs/configuration.rst 402
13383-shares.needed = (int, optional) aka "k", default 3
13384-shares.total = (int, optional) aka "N", N >= k, default 10
13385-shares.happy = (int, optional) 1 <= happy <= N, default 7
13386-
13387- These three values set the default encoding parameters. Each time a new file
13388- is uploaded, erasure-coding is used to break the ciphertext into separate
13389- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13390- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13391- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13392- Setting k to 1 is equivalent to simple replication (uploading N copies of
13393- the file).
13394-
13395- These values control the tradeoff between storage overhead, performance, and
13396- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13397- backend storage space (the actual value will be a bit more, because of other
13398- forms of overhead). Up to N-k shares can be lost before the file becomes
13399- unrecoverable, so assuming there are at least N servers, up to N-k servers
13400- can be offline without losing the file. So large N/k ratios are more
13401- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13402- smaller than N.
13403-
13404- Large values of N will slow down upload operations slightly, since more
13405- servers must be involved, and will slightly increase storage overhead due to
13406- the hash trees that are created. Large values of k will cause downloads to
13407- be marginally slower, because more servers must be involved. N cannot be
13408- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13409- uses.
13410-
13411- shares.happy allows you control over the distribution of your immutable file.
13412- For a successful upload, shares are guaranteed to be initially placed on
13413- at least 'shares.happy' distinct servers, the correct functioning of any
13414- k of which is sufficient to guarantee the availability of the uploaded file.
13415- This value should not be larger than the number of servers on your grid.
13416-
13417- A value of shares.happy <= k is allowed, but does not provide any redundancy
13418- if some servers fail or lose shares.
13419-
13420- (Mutable files use a different share placement algorithm that does not
13421-  consider this parameter.)
13422-
13423-
13424-== Storage Server Configuration ==
13425-
13426-[storage]
13427-enabled = (boolean, optional)
13428-
13429- If this is True, the node will run a storage server, offering space to other
13430- clients. If it is False, the node will not run a storage server, meaning
13431- that no shares will be stored on this node. Use False this for clients who
13432- do not wish to provide storage service. The default value is True.
13433-
13434-readonly = (boolean, optional)
13435-
13436- If True, the node will run a storage server but will not accept any shares,
13437- making it effectively read-only. Use this for storage servers which are
13438- being decommissioned: the storage/ directory could be mounted read-only,
13439- while shares are moved to other servers. Note that this currently only
13440- affects immutable shares. Mutable shares (used for directories) will be
13441- written and modified anyway. See ticket #390 for the current status of this
13442- bug. The default value is False.
13443-
13444-reserved_space = (str, optional)
13445-
13446- If provided, this value defines how much disk space is reserved: the storage
13447- server will not accept any share which causes the amount of free disk space
13448- to drop below this value. (The free space is measured by a call to statvfs(2)
13449- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13450- user account under which the storage server runs.)
13451-
13452- This string contains a number, with an optional case-insensitive scale
13453- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13454- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13455- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13456-
13457-expire.enabled =
13458-expire.mode =
13459-expire.override_lease_duration =
13460-expire.cutoff_date =
13461-expire.immutable =
13462-expire.mutable =
13463-
13464- These settings control garbage-collection, in which the server will delete
13465- shares that no longer have an up-to-date lease on them. Please see the
13466- neighboring "garbage-collection.txt" document for full details.
13467-
13468-
13469-== Running A Helper ==
13470+Running A Helper
13471+================
13472merger 0.0 (
13473hunk ./docs/configuration.rst 324
13474+``mutable.format = sdmf or mdmf``
13475+
13476+    This value tells Tahoe what the default mutable file format should
13477+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
13478+    in the old SDMF format. This is desirable for clients that operate on
13479+    grids where some peers run older versions of Tahoe, as these older
13480+    versions cannot read the new MDMF mutable file format. If
13481+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
13482+    the new MDMF format, which supports efficient in-place modification and
13483+    streaming downloads. You can overwrite this value using a special
13484+    mutable-type parameter in the webapi. If you do not specify a value here,
13485+    Tahoe will use SDMF for all newly-created mutable files.
13486+
13487+    Note that this parameter only applies to mutable files. Mutable
13488+    directories, which are stored as mutable files, are not controlled by
13489+    this parameter and will always use SDMF. We may revisit this decision
13490+    in future versions of Tahoe-LAFS.
13491+
13492merger 0.0 (
13493merger 0.0 (
13494replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13495merger 0.0 (
13496hunk ./docs/configuration.rst 384
13497-shares.needed = (int, optional) aka "k", default 3
13498-shares.total = (int, optional) aka "N", N >= k, default 10
13499-shares.happy = (int, optional) 1 <= happy <= N, default 7
13500-
13501- These three values set the default encoding parameters. Each time a new file
13502- is uploaded, erasure-coding is used to break the ciphertext into separate
13503- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
13504- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
13505- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
13506- Setting k to 1 is equivalent to simple replication (uploading N copies of
13507- the file).
13508-
13509- These values control the tradeoff between storage overhead, performance, and
13510- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
13511- backend storage space (the actual value will be a bit more, because of other
13512- forms of overhead). Up to N-k shares can be lost before the file becomes
13513- unrecoverable, so assuming there are at least N servers, up to N-k servers
13514- can be offline without losing the file. So large N/k ratios are more
13515- reliable, and small N/k ratios use less disk space. Clearly, k must never be
13516- smaller than N.
13517-
13518- Large values of N will slow down upload operations slightly, since more
13519- servers must be involved, and will slightly increase storage overhead due to
13520- the hash trees that are created. Large values of k will cause downloads to
13521- be marginally slower, because more servers must be involved. N cannot be
13522- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe
13523- uses.
13524-
13525- shares.happy allows you control over the distribution of your immutable file.
13526- For a successful upload, shares are guaranteed to be initially placed on
13527- at least 'shares.happy' distinct servers, the correct functioning of any
13528- k of which is sufficient to guarantee the availability of the uploaded file.
13529- This value should not be larger than the number of servers on your grid.
13530-
13531- A value of shares.happy <= k is allowed, but does not provide any redundancy
13532- if some servers fail or lose shares.
13533-
13534- (Mutable files use a different share placement algorithm that does not
13535-  consider this parameter.)
13536-
13537-
13538-== Storage Server Configuration ==
13539-
13540-[storage]
13541-enabled = (boolean, optional)
13542-
13543- If this is True, the node will run a storage server, offering space to other
13544- clients. If it is False, the node will not run a storage server, meaning
13545- that no shares will be stored on this node. Use False this for clients who
13546- do not wish to provide storage service. The default value is True.
13547-
13548-readonly = (boolean, optional)
13549-
13550- If True, the node will run a storage server but will not accept any shares,
13551- making it effectively read-only. Use this for storage servers which are
13552- being decommissioned: the storage/ directory could be mounted read-only,
13553- while shares are moved to other servers. Note that this currently only
13554- affects immutable shares. Mutable shares (used for directories) will be
13555- written and modified anyway. See ticket #390 for the current status of this
13556- bug. The default value is False.
13557-
13558-reserved_space = (str, optional)
13559-
13560- If provided, this value defines how much disk space is reserved: the storage
13561- server will not accept any share which causes the amount of free disk space
13562- to drop below this value. (The free space is measured by a call to statvfs(2)
13563- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
13564- user account under which the storage server runs.)
13565-
13566- This string contains a number, with an optional case-insensitive scale
13567- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
13568- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
13569- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
13570-
13571-expire.enabled =
13572-expire.mode =
13573-expire.override_lease_duration =
13574-expire.cutoff_date =
13575-expire.immutable =
13576-expire.mutable =
13577-
13578- These settings control garbage-collection, in which the server will delete
13579- shares that no longer have an up-to-date lease on them. Please see the
13580- neighboring "garbage-collection.txt" document for full details.
13581-
13582-
13583-== Running A Helper ==
13584+Running A Helper
13585+================
13586hunk ./docs/configuration.rst 424
13587+mutable.format = sdmf or mdmf
13588+
13589+ This value tells Tahoe-LAFS what the default mutable file format should
13590+ be. If mutable.format=sdmf, then newly created mutable files will be in
13591+ the old SDMF format. This is desirable for clients that operate on
13592+ grids where some peers run older versions of Tahoe-LAFS, as these older
13593+ versions cannot read the new MDMF mutable file format. If
13594+ mutable.format = mdmf, then newly created mutable files will use the
13595+ new MDMF format, which supports efficient in-place modification and
13596+ streaming downloads. You can overwrite this value using a special
13597+ mutable-type parameter in the webapi. If you do not specify a value
13598+ here, Tahoe-LAFS will use SDMF for all newly-created mutable files.
13599+
13600+ Note that this parameter only applies to mutable files. Mutable
13601+ directories, which are stored as mutable files, are not controlled by
13602+ this parameter and will always use SDMF. We may revisit this decision
13603+ in future versions of Tahoe-LAFS.
13604)
13605)
13606hunk ./docs/configuration.rst 324
13607+Frontend Configuration
13608+======================
13609+
13610+The Tahoe client process can run a variety of frontend file-access protocols.
13611+You will use these to create and retrieve files from the virtual filesystem.
13612+Configuration details for each are documented in the following
13613+protocol-specific guides:
13614+
13615+HTTP
13616+
13617+    Tahoe runs a webserver by default on port 3456. This interface provides a
13618+    human-oriented "WUI", with pages to create, modify, and browse
13619+    directories and files, as well as a number of pages to check on the
13620+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
13621+    with a REST-ful HTTP interface that can be used by other programs
13622+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
13623+    details, and the ``web.port`` and ``web.static`` config variables above.
13624+    The `<frontends/download-status.rst>`_ document also describes a few WUI
13625+    status pages.
13626+
13627+CLI
13628+
13629+    The main "bin/tahoe" executable includes subcommands for manipulating the
13630+    filesystem, uploading/downloading files, and creating/running Tahoe
13631+    nodes. See `<frontends/CLI.rst>`_ for details.
13632+
13633+FTP, SFTP
13634+
13635+    Tahoe can also run both FTP and SFTP servers, and map a username/password
13636+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
13637+    for instructions on configuring these services, and the ``[ftpd]`` and
13638+    ``[sftpd]`` sections of ``tahoe.cfg``.
13639+
13640)
13641)
13642)
13643replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
13644)
13645hunk ./src/allmydata/mutable/retrieve.py 7
13646 from zope.interface import implements
13647 from twisted.internet import defer
13648 from twisted.python import failure
13649-from foolscap.api import DeadReferenceError, eventually, fireEventually
13650-from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError
13651-from allmydata.util import hashutil, idlib, log
13652+from twisted.internet.interfaces import IPushProducer, IConsumer
13653+from foolscap.api import eventually, fireEventually
13654+from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError, \
13655+                                 MDMF_VERSION, SDMF_VERSION
13656+from allmydata.util import hashutil, log, mathutil
13657+from allmydata.util.dictutil import DictOfSets
13658 from allmydata import hashtree, codec
13659 from allmydata.storage.server import si_b2a
13660 from pycryptopp.cipher.aes import AES
13661hunk ./src/allmydata/mutable/retrieve.py 239
13662             # KiB, so we ask for that much.
13663             # TODO: Change the cache methods to allow us to fetch all of the
13664             # data that they have, then change this method to do that.
13665-            any_cache, timestamp = self._node._read_from_cache(self.verinfo,
13666-                                                               shnum,
13667-                                                               0,
13668-                                                               1000)
13669+            any_cache = self._node._read_from_cache(self.verinfo, shnum,
13670+                                                    0, 1000)
13671             ss = self.servermap.connections[peerid]
13672             reader = MDMFSlotReadProxy(ss,
13673                                        self._storage_index,
13674hunk ./src/allmydata/mutable/retrieve.py 373
13675                  (k, n, self._num_segments, self._segment_size,
13676                   self._tail_segment_size))
13677 
13678-        # ask the cache first
13679-        got_from_cache = False
13680-        datavs = []
13681-        for (offset, length) in readv:
13682-            (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum,
13683-                                                            offset, length)
13684-            if data is not None:
13685-                datavs.append(data)
13686-        if len(datavs) == len(readv):
13687-            self.log("got data from cache")
13688-            got_from_cache = True
13689-            d = fireEventually({shnum: datavs})
13690-            # datavs is a dict mapping shnum to a pair of strings
13691+        for i in xrange(self._total_shares):
13692+            # So we don't have to do this later.
13693+            self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments)
13694+
13695+        # Our last task is to tell the downloader where to start and
13696+        # where to stop. We use three parameters for that:
13697+        #   - self._start_segment: the segment that we need to start
13698+        #     downloading from.
13699+        #   - self._current_segment: the next segment that we need to
13700+        #     download.
13701+        #   - self._last_segment: The last segment that we were asked to
13702+        #     download.
13703+        #
13704+        #  We say that the download is complete when
13705+        #  self._current_segment > self._last_segment. We use
13706+        #  self._start_segment and self._last_segment to know when to
13707+        #  strip things off of segments, and how much to strip.
13708+        if self._offset:
13709+            self.log("got offset: %d" % self._offset)
13710+            # our start segment is the first segment containing the
13711+            # offset we were given.
13712+            start = mathutil.div_ceil(self._offset,
13713+                                      self._segment_size)
13714+            # this gets us the first segment after self._offset. Then
13715+            # our start segment is the one before it.
13716+            start -= 1
13717+
13718+            assert start < self._num_segments
13719+            self._start_segment = start
13720+            self.log("got start segment: %d" % self._start_segment)
13721         else:
13722             self._start_segment = 0
13723 
13724hunk ./src/allmydata/mutable/servermap.py 7
13725 from itertools import count
13726 from twisted.internet import defer
13727 from twisted.python import failure
13728-from foolscap.api import DeadReferenceError, RemoteException, eventually
13729-from allmydata.util import base32, hashutil, idlib, log
13730+from foolscap.api import DeadReferenceError, RemoteException, eventually, \
13731+                         fireEventually
13732+from allmydata.util import base32, hashutil, idlib, log, deferredutil
13733+from allmydata.util.dictutil import DictOfSets
13734 from allmydata.storage.server import si_b2a
13735 from allmydata.interfaces import IServermapUpdaterStatus
13736 from pycryptopp.publickey import rsa
13737hunk ./src/allmydata/mutable/servermap.py 16
13738 
13739 from allmydata.mutable.common import MODE_CHECK, MODE_ANYTHING, MODE_WRITE, MODE_READ, \
13740-     DictOfSets, CorruptShareError, NeedMoreDataError
13741-from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \
13742-     SIGNED_PREFIX_LENGTH
13743+     CorruptShareError
13744+from allmydata.mutable.layout import SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy
13745 
13746 class UpdateStatus:
13747     implements(IServermapUpdaterStatus)
13748hunk ./src/allmydata/mutable/servermap.py 391
13749         #  * if we need the encrypted private key, we want [-1216ish:]
13750         #   * but we can't read from negative offsets
13751         #   * the offset table tells us the 'ish', also the positive offset
13752-        # A future version of the SMDF slot format should consider using
13753-        # fixed-size slots so we can retrieve less data. For now, we'll just
13754-        # read 2000 bytes, which also happens to read enough actual data to
13755-        # pre-fetch a 9-entry dirnode.
13756+        # MDMF:
13757+        #  * Checkstring? [0:72]
13758+        #  * If we want to validate the checkstring, then [0:72], [143:?] --
13759+        #    the offset table will tell us for sure.
13760+        #  * If we need the verification key, we have to consult the offset
13761+        #    table as well.
13762+        # At this point, we don't know which we are. Our filenode can
13763+        # tell us, but it might be lying -- in some cases, we're
13764+        # responsible for telling it which kind of file it is.
13765         self._read_size = 4000
13766         if mode == MODE_CHECK:
13767             # we use unpack_prefix_and_signature, so we need 1k
13768hunk ./src/allmydata/mutable/servermap.py 633
13769         updated.
13770         """
13771         if verinfo:
13772-            self._node._add_to_cache(verinfo, shnum, 0, data, now)
13773+            self._node._add_to_cache(verinfo, shnum, 0, data)
13774 
13775 
13776     def _got_results(self, datavs, peerid, readsize, stuff, started):
13777hunk ./src/allmydata/mutable/servermap.py 664
13778 
13779         for shnum,datav in datavs.items():
13780             data = datav[0]
13781-            try:
13782-                verinfo = self._got_results_one_share(shnum, data, peerid, lp)
13783-                last_verinfo = verinfo
13784-                last_shnum = shnum
13785-                self._node._add_to_cache(verinfo, shnum, 0, data, now)
13786-            except CorruptShareError, e:
13787-                # log it and give the other shares a chance to be processed
13788-                f = failure.Failure()
13789-                self.log(format="bad share: %(f_value)s", f_value=str(f.value),
13790-                         failure=f, parent=lp, level=log.WEIRD, umid="h5llHg")
13791-                self.notify_server_corruption(peerid, shnum, str(e))
13792-                self._bad_peers.add(peerid)
13793-                self._last_failure = f
13794-                checkstring = data[:SIGNED_PREFIX_LENGTH]
13795-                self._servermap.mark_bad_share(peerid, shnum, checkstring)
13796-                self._servermap.problems.append(f)
13797-                pass
13798+            reader = MDMFSlotReadProxy(ss,
13799+                                       storage_index,
13800+                                       shnum,
13801+                                       data)
13802+            self._readers.setdefault(peerid, dict())[shnum] = reader
13803+            # our goal, with each response, is to validate the version
13804+            # information and share data as best we can at this point --
13805+            # we do this by validating the signature. To do this, we
13806+            # need to do the following:
13807+            #   - If we don't already have the public key, fetch the
13808+            #     public key. We use this to validate the signature.
13809+            if not self._node.get_pubkey():
13810+                # fetch and set the public key.
13811+                d = reader.get_verification_key(queue=True)
13812+                d.addCallback(lambda results, shnum=shnum, peerid=peerid:
13813+                    self._try_to_set_pubkey(results, peerid, shnum, lp))
13814+                # XXX: Make self._pubkey_query_failed?
13815+                d.addErrback(lambda error, shnum=shnum, peerid=peerid:
13816+                    self._got_corrupt_share(error, shnum, peerid, data, lp))
13817+            else:
13818+                # we already have the public key.
13819+                d = defer.succeed(None)
13820 
13821             # Neither of these two branches return anything of
13822             # consequence, so the first entry in our deferredlist will
13823hunk ./src/allmydata/test/test_storage.py 1
13824-import time, os.path, platform, stat, re, simplejson, struct
13825+import time, os.path, platform, stat, re, simplejson, struct, shutil
13826 
13827hunk ./src/allmydata/test/test_storage.py 3
13828-import time, os.path, stat, re, simplejson, struct
13829+import mock
13830 
13831 from twisted.trial import unittest
13832 
13833}
13834[mutable/filenode.py: fix create_mutable_file('string')
13835"Brian Warner <warner@lothar.com>"**20110221014659
13836 Ignore-this: dc6bdad761089f0199681eeb784f1001
13837] hunk ./src/allmydata/mutable/filenode.py 137
13838         if contents is None:
13839             return MutableData("")
13840 
13841+        if isinstance(contents, str):
13842+            return MutableData(contents)
13843+
13844         if IMutableUploadable.providedBy(contents):
13845             return contents
13846 
13847[resolve more conflicts with current trunk
13848"Brian Warner <warner@lothar.com>"**20110221055600
13849 Ignore-this: 77ad038a478dbf5d9b34f7a68159a3e0
13850] hunk ./src/allmydata/mutable/servermap.py 461
13851         self._queries_completed = 0
13852 
13853         sb = self._storage_broker
13854-        full_peerlist = sb.get_servers_for_index(self._storage_index)
13855+        # All of the peers, permuted by the storage index, as usual.
13856+        full_peerlist = [(s.get_serverid(), s.get_rref())
13857+                         for s in sb.get_servers_for_psi(self._storage_index)]
13858         self.full_peerlist = full_peerlist # for use later, immutable
13859         self.extra_peers = full_peerlist[:] # peers are removed as we use them
13860         self._good_peers = set() # peers who had some shares
13861[update MDMF code with StorageFarmBroker changes
13862"Brian Warner <warner@lothar.com>"**20110221061004
13863 Ignore-this: a693b201d31125b391cebe0412ddd027
13864] {
13865hunk ./src/allmydata/mutable/publish.py 203
13866         self._encprivkey = self._node.get_encprivkey()
13867 
13868         sb = self._storage_broker
13869-        full_peerlist = sb.get_servers_for_index(self._storage_index)
13870+        full_peerlist = [(s.get_serverid(), s.get_rref())
13871+                         for s in sb.get_servers_for_psi(self._storage_index)]
13872         self.full_peerlist = full_peerlist # for use later, immutable
13873         self.bad_peers = set() # peerids who have errbacked/refused requests
13874 
13875hunk ./src/allmydata/test/test_mutable.py 2538
13876             # for either a block and salt or for hashes, either of which
13877             # will exercise the error handling code.
13878             killer = FirstServerGetsKilled()
13879-            for (serverid, ss) in nm.storage_broker.get_all_servers():
13880-                ss.post_call_notifier = killer.notify
13881+            for s in nm.storage_broker.get_connected_servers():
13882+                s.get_rref().post_call_notifier = killer.notify
13883             ver = servermap.best_recoverable_version()
13884             assert ver
13885             return self._node.download_version(servermap, ver)
13886}
13887[mutable/filenode: Clean up servermap handling in MutableFileVersion
13888Kevan Carstensen <kevan@isnotajoke.com>**20110226010433
13889 Ignore-this: 2257c9f65502098789f5ea355b94f130
13890 
13891 We want to update the servermap before attempting to modify a file,
13892 which we now do. This introduced code duplication, which was addressed
13893 by refactoring the servermap update into its own method, and then
13894 eliminating duplicate servermap updates throughout the
13895 MutableFileVersion.
13896] {
13897hunk ./src/allmydata/mutable/filenode.py 19
13898 from allmydata.mutable.publish import Publish, MutableData,\
13899                                       DEFAULT_MAX_SEGMENT_SIZE, \
13900                                       TransformingUploadable
13901-from allmydata.mutable.common import MODE_READ, MODE_WRITE, UnrecoverableFileError, \
13902+from allmydata.mutable.common import MODE_READ, MODE_WRITE, MODE_CHECK, UnrecoverableFileError, \
13903      ResponseCache, UncoordinatedWriteError
13904 from allmydata.mutable.servermap import ServerMap, ServermapUpdater
13905 from allmydata.mutable.retrieve import Retrieve
13906hunk ./src/allmydata/mutable/filenode.py 807
13907         a little bit.
13908         """
13909         log.msg("doing modify")
13910-        d = self._modify_once(modifier, first_time)
13911+        if first_time:
13912+            d = self._update_servermap()
13913+        else:
13914+            # We ran into trouble; do MODE_CHECK so we're a little more
13915+            # careful on subsequent tries.
13916+            d = self._update_servermap(mode=MODE_CHECK)
13917+
13918+        d.addCallback(lambda ignored:
13919+            self._modify_once(modifier, first_time))
13920         def _retry(f):
13921             f.trap(UncoordinatedWriteError)
13922hunk ./src/allmydata/mutable/filenode.py 818
13923+            # Uh oh, it broke. We're allowed to trust the servermap for our
13924+            # first try, but after that we need to update it. It's
13925+            # possible that we've failed due to a race with another
13926+            # uploader, and if the race is to converge correctly, we
13927+            # need to know about that upload.
13928             d2 = defer.maybeDeferred(backoffer, self, f)
13929             d2.addCallback(lambda ignored:
13930                            self._modify_and_retry(modifier,
13931hunk ./src/allmydata/mutable/filenode.py 837
13932         I attempt to apply a modifier to the contents of the mutable
13933         file.
13934         """
13935-        # XXX: This is wrong -- we could get more servers if we updated
13936-        # in MODE_ANYTHING and possibly MODE_CHECK. Probably we want to
13937-        # assert that the last update wasn't MODE_READ
13938-        assert self._servermap.last_update_mode == MODE_WRITE
13939+        assert self._servermap.last_update_mode != MODE_READ
13940 
13941         # download_to_data is serialized, so we have to call this to
13942         # avoid deadlock.
13943hunk ./src/allmydata/mutable/filenode.py 1076
13944 
13945         # Now ask for the servermap to be updated in MODE_WRITE with
13946         # this update range.
13947-        u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13948-                             self._servermap,
13949-                             mode=MODE_WRITE,
13950-                             update_range=(start_segment, end_segment))
13951-        return u.update()
13952+        return self._update_servermap(update_range=(start_segment,
13953+                                                    end_segment))
13954 
13955 
13956     def _decode_and_decrypt_segments(self, ignored, data, offset):
13957hunk ./src/allmydata/mutable/filenode.py 1135
13958                                    segments_and_bht[1])
13959         p = Publish(self._node, self._storage_broker, self._servermap)
13960         return p.update(u, offset, segments_and_bht[2], self._version)
13961+
13962+
13963+    def _update_servermap(self, mode=MODE_WRITE, update_range=None):
13964+        """
13965+        I update the servermap. I return a Deferred that fires when the
13966+        servermap update is done.
13967+        """
13968+        if update_range:
13969+            u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13970+                                 self._servermap,
13971+                                 mode=mode,
13972+                                 update_range=update_range)
13973+        else:
13974+            u = ServermapUpdater(self._node, self._storage_broker, Monitor(),
13975+                                 self._servermap,
13976+                                 mode=mode)
13977+        return u.update()
13978}
13979[web: Use the string "replace" to trigger whole-file replacement when processing an offset parameter.
13980Kevan Carstensen <kevan@isnotajoke.com>**20110227231643
13981 Ignore-this: 5bbf0b90d68efe20d4c531bb98a8321a
13982] {
13983hunk ./docs/frontends/webapi.rst 360
13984  To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.
13985 
13986  In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
13987- writeable mutable file, that file's contents will be overwritten in-place. If
13988- it is a read-cap for a mutable file, an error will occur. If it is an
13989- immutable file, the old file will be discarded, and a new one will be put in
13990- its place. If the target file is a writable mutable file, you may also
13991- specify an "offset" parameter -- a byte offset that determines where in
13992- the mutable file the data from the HTTP request body is placed. This
13993- operation is relatively efficient for MDMF mutable files, and is
13994- relatively inefficient (but still supported) for SDMF mutable files.
13995+ writeable mutable file, that file's contents will be overwritten
13996+ in-place. If it is a read-cap for a mutable file, an error will occur.
13997+ If it is an immutable file, the old file will be discarded, and a new
13998+ one will be put in its place. If the target file is a writable mutable
13999+ file, you may also specify an "offset" parameter -- a byte offset that
14000+ determines where in the mutable file the data from the HTTP request
14001+ body is placed. This operation is relatively efficient for MDMF mutable
14002+ files, and is relatively inefficient (but still supported) for SDMF
14003+ mutable files. If no offset parameter is specified, then the entire
14004+ file is replaced with the data from the HTTP request body. For an
14005+ immutable file, the "offset" parameter is not valid.
14006 
14007  When creating a new file, if "mutable=true" is in the query arguments, the
14008  operation will create a mutable file instead of an immutable one.
14009hunk ./src/allmydata/test/test_web.py 3187
14010             self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
14011         return d
14012 
14013+    def test_PUT_update_at_invalid_offset(self):
14014+        file_contents = "test file" * 100000 # about 900 KiB
14015+        d = self.PUT("/uri?mutable=true", file_contents)
14016+        def _then(filecap):
14017+            self.filecap = filecap
14018+        d.addCallback(_then)
14019+        # Negative offsets should cause an error.
14020+        d.addCallback(lambda ignored:
14021+            self.shouldHTTPError("test mutable invalid offset negative",
14022+                                 400, "Bad Request",
14023+                                 "Invalid offset",
14024+                                 self.PUT,
14025+                                 "/uri/%s?offset=-1" % self.filecap,
14026+                                 "foo"))
14027+        return d
14028 
14029     def test_PUT_update_at_offset_immutable(self):
14030         file_contents = "Test file" * 100000
14031hunk ./src/allmydata/web/common.py 55
14032     # message? Since this call is going to be used by programmers and
14033     # their tools rather than users (through the wui), it is not
14034     # inconsistent to return that, I guess.
14035-    offset = int(offset)
14036-    return offset
14037+    return int(offset)
14038 
14039 
14040 def get_root(ctx_or_req):
14041hunk ./src/allmydata/web/filenode.py 219
14042         req = IRequest(ctx)
14043         t = get_arg(req, "t", "").strip()
14044         replace = parse_replace_arg(get_arg(req, "replace", "true"))
14045-        offset = parse_offset_arg(get_arg(req, "offset", -1))
14046+        offset = parse_offset_arg(get_arg(req, "offset", False))
14047 
14048         if not t:
14049hunk ./src/allmydata/web/filenode.py 222
14050-            if self.node.is_mutable() and offset >= 0:
14051-                return self.update_my_contents(req, offset)
14052-
14053-            elif self.node.is_mutable():
14054-                return self.replace_my_contents(req)
14055             if not replace:
14056                 # this is the early trap: if someone else modifies the
14057                 # directory while we're uploading, the add_file(overwrite=)
14058hunk ./src/allmydata/web/filenode.py 227
14059                 # call in replace_me_with_a_child will do the late trap.
14060                 raise ExistingChildError()
14061-            if offset >= 0:
14062-                raise WebError("PUT to a file: append operation invoked "
14063-                               "on an immutable cap")
14064 
14065hunk ./src/allmydata/web/filenode.py 228
14066+            if self.node.is_mutable():
14067+                if offset == False:
14068+                    return self.replace_my_contents(req)
14069+
14070+                if offset >= 0:
14071+                    return self.update_my_contents(req, offset)
14072+
14073+                raise WebError("PUT to a mutable file: Invalid offset")
14074+
14075+            else:
14076+                if offset != False:
14077+                    raise WebError("PUT to a file: append operation invoked "
14078+                                   "on an immutable cap")
14079+
14080+                assert self.parentnode and self.name
14081+                return self.replace_me_with_a_child(req, self.client, replace)
14082 
14083hunk ./src/allmydata/web/filenode.py 245
14084-            assert self.parentnode and self.name
14085-            return self.replace_me_with_a_child(req, self.client, replace)
14086         if t == "uri":
14087             if not replace:
14088                 raise ExistingChildError()
14089}
14090[docs/configuration.rst: fix more conflicts between #393 and trunk
14091Kevan Carstensen <kevan@isnotajoke.com>**20110228003426
14092 Ignore-this: 7917effdeecab00d634a06f1df8fe2cf
14093] {
14094replace ./docs/configuration.rst [A-Za-z_0-9\-\.] Tahoe Tahoe-LAFS
14095hunk ./docs/configuration.rst 324
14096     (Mutable files use a different share placement algorithm that does not
14097     currently consider this parameter.)
14098 
14099+``mutable.format = sdmf or mdmf``
14100+
14101+    This value tells Tahoe-LAFS what the default mutable file format should
14102+    be. If ``mutable.format=sdmf``, then newly created mutable files will be
14103+    in the old SDMF format. This is desirable for clients that operate on
14104+    grids where some peers run older versions of Tahoe-LAFS, as these older
14105+    versions cannot read the new MDMF mutable file format. If
14106+    ``mutable.format`` is ``mdmf``, then newly created mutable files will use
14107+    the new MDMF format, which supports efficient in-place modification and
14108+    streaming downloads. You can overwrite this value using a special
14109+    mutable-type parameter in the webapi. If you do not specify a value here,
14110+    Tahoe-LAFS will use SDMF for all newly-created mutable files.
14111+
14112+    Note that this parameter only applies to mutable files. Mutable
14113+    directories, which are stored as mutable files, are not controlled by
14114+    this parameter and will always use SDMF. We may revisit this decision
14115+    in future versions of Tahoe-LAFS.
14116+
14117+
14118+Frontend Configuration
14119+======================
14120+
14121+The Tahoe client process can run a variety of frontend file-access protocols.
14122+You will use these to create and retrieve files from the virtual filesystem.
14123+Configuration details for each are documented in the following
14124+protocol-specific guides:
14125+
14126+HTTP
14127+
14128+    Tahoe runs a webserver by default on port 3456. This interface provides a
14129+    human-oriented "WUI", with pages to create, modify, and browse
14130+    directories and files, as well as a number of pages to check on the
14131+    status of your Tahoe node. It also provides a machine-oriented "WAPI",
14132+    with a REST-ful HTTP interface that can be used by other programs
14133+    (including the CLI tools). Please see `<frontends/webapi.rst>`_ for full
14134+    details, and the ``web.port`` and ``web.static`` config variables above.
14135+    The `<frontends/download-status.rst>`_ document also describes a few WUI
14136+    status pages.
14137+
14138+CLI
14139+
14140+    The main "bin/tahoe" executable includes subcommands for manipulating the
14141+    filesystem, uploading/downloading files, and creating/running Tahoe
14142+    nodes. See `<frontends/CLI.rst>`_ for details.
14143+
14144+FTP, SFTP
14145+
14146+    Tahoe can also run both FTP and SFTP servers, and map a username/password
14147+    pair to a top-level Tahoe directory. See `<frontends/FTP-and-SFTP.rst>`_
14148+    for instructions on configuring these services, and the ``[ftpd]`` and
14149+    ``[sftpd]`` sections of ``tahoe.cfg``.
14150+
14151 
14152 Storage Server Configuration
14153 ============================
14154hunk ./docs/configuration.rst 436
14155     `<garbage-collection.rst>`_ for full details.
14156 
14157 
14158-shares.needed = (int, optional) aka "k", default 3
14159-shares.total = (int, optional) aka "N", N >= k, default 10
14160-shares.happy = (int, optional) 1 <= happy <= N, default 7
14161-
14162- These three values set the default encoding parameters. Each time a new file
14163- is uploaded, erasure-coding is used to break the ciphertext into separate
14164- pieces. There will be "N" (i.e. shares.total) pieces created, and the file
14165- will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved.
14166- The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10).
14167- Setting k to 1 is equivalent to simple replication (uploading N copies of
14168- the file).
14169-
14170- These values control the tradeoff between storage overhead, performance, and
14171- reliability. To a first approximation, a 1MB file will use (1MB*N/k) of
14172- backend storage space (the actual value will be a bit more, because of other
14173- forms of overhead). Up to N-k shares can be lost before the file becomes
14174- unrecoverable, so assuming there are at least N servers, up to N-k servers
14175- can be offline without losing the file. So large N/k ratios are more
14176- reliable, and small N/k ratios use less disk space. Clearly, k must never be
14177- smaller than N.
14178-
14179- Large values of N will slow down upload operations slightly, since more
14180- servers must be involved, and will slightly increase storage overhead due to
14181- the hash trees that are created. Large values of k will cause downloads to
14182- be marginally slower, because more servers must be involved. N cannot be
14183- larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe-LAFS
14184- uses.
14185-
14186- shares.happy allows you control over the distribution of your immutable file.
14187- For a successful upload, shares are guaranteed to be initially placed on
14188- at least 'shares.happy' distinct servers, the correct functioning of any
14189- k of which is sufficient to guarantee the availability of the uploaded file.
14190- This value should not be larger than the number of servers on your grid.
14191-
14192- A value of shares.happy <= k is allowed, but does not provide any redundancy
14193- if some servers fail or lose shares.
14194-
14195- (Mutable files use a different share placement algorithm that does not
14196-  consider this parameter.)
14197-
14198-
14199-== Storage Server Configuration ==
14200-
14201-[storage]
14202-enabled = (boolean, optional)
14203-
14204- If this is True, the node will run a storage server, offering space to other
14205- clients. If it is False, the node will not run a storage server, meaning
14206- that no shares will be stored on this node. Use False this for clients who
14207- do not wish to provide storage service. The default value is True.
14208-
14209-readonly = (boolean, optional)
14210-
14211- If True, the node will run a storage server but will not accept any shares,
14212- making it effectively read-only. Use this for storage servers which are
14213- being decommissioned: the storage/ directory could be mounted read-only,
14214- while shares are moved to other servers. Note that this currently only
14215- affects immutable shares. Mutable shares (used for directories) will be
14216- written and modified anyway. See ticket #390 for the current status of this
14217- bug. The default value is False.
14218-
14219-reserved_space = (str, optional)
14220-
14221- If provided, this value defines how much disk space is reserved: the storage
14222- server will not accept any share which causes the amount of free disk space
14223- to drop below this value. (The free space is measured by a call to statvfs(2)
14224- on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the
14225- user account under which the storage server runs.)
14226-
14227- This string contains a number, with an optional case-insensitive scale
14228- suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So
14229- "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same
14230- thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing.
14231-
14232-expire.enabled =
14233-expire.mode =
14234-expire.override_lease_duration =
14235-expire.cutoff_date =
14236-expire.immutable =
14237-expire.mutable =
14238-
14239- These settings control garbage-collection, in which the server will delete
14240- shares that no longer have an up-to-date lease on them. Please see the
14241- neighboring "garbage-collection.txt" document for full details.
14242-
14243-
14244-== Running A Helper ==
14245+Running A Helper
14246+================
14247 
14248 A "helper" is a regular client node that also offers the "upload helper"
14249 service.
14250}
14251[mutable/layout: remove references to the salt hash tree.
14252Kevan Carstensen <kevan@isnotajoke.com>**20110228010637
14253 Ignore-this: b3b2963ba4d0b42c78b6bba219d4deb5
14254] {
14255hunk ./src/allmydata/mutable/layout.py 577
14256     # 99          8           The offset of the EOF
14257     #
14258     # followed by salts and share data, the encrypted private key, the
14259-    # block hash tree, the salt hash tree, the share hash chain, a
14260-    # signature over the first eight fields, and a verification key.
14261+    # block hash tree, the share hash chain, a signature over the first
14262+    # eight fields, and a verification key.
14263     #
14264     # The checkstring is the first three fields -- the version number,
14265     # sequence number, root hash and root salt hash. This is consistent
14266hunk ./src/allmydata/mutable/layout.py 628
14267     #      calculate the offset for the share hash chain, and fill that
14268     #      into the offsets table.
14269     #
14270-    #   4: At the same time, we're in a position to upload the salt hash
14271-    #      tree. This is a Merkle tree over all of the salts. We use a
14272-    #      Merkle tree so that we can validate each block,salt pair as
14273-    #      we download them later. We do this using
14274-    #
14275-    #        put_salthashes(salt_hash_tree)
14276-    #
14277-    #      When you do this, I automatically put the root of the tree
14278-    #      (the hash at index 0 of the list) in its appropriate slot in
14279-    #      the signed prefix of the share.
14280-    #
14281-    #   5: We're now in a position to upload the share hash chain for
14282+    #   4: We're now in a position to upload the share hash chain for
14283     #      a share. Do that with something like:
14284     #     
14285     #        put_sharehashes(share_hash_chain)
14286hunk ./src/allmydata/mutable/layout.py 639
14287     #      The root of this tree will be put explicitly in the next
14288     #      step.
14289     #
14290-    #      TODO: Why? Why not just include it in the tree here?
14291-    #
14292-    #   6: Before putting the signature, we must first put the
14293+    #   5: Before putting the signature, we must first put the
14294     #      root_hash. Do this with:
14295     #
14296     #        put_root_hash(root_hash).
14297hunk ./src/allmydata/mutable/layout.py 872
14298             raise LayoutInvalid("I was given the wrong size block to write")
14299 
14300         # We want to write at len(MDMFHEADER) + segnum * block_size.
14301-
14302         offset = MDMFHEADERSIZE + (self._actual_block_size * segnum)
14303         data = salt + data
14304 
14305hunk ./src/allmydata/mutable/layout.py 889
14306         # tree is written, since that could cause the private key to run
14307         # into the block hash tree. Before it writes the block hash
14308         # tree, the block hash tree writing method writes the offset of
14309-        # the salt hash tree. So that's a good indicator of whether or
14310+        # the share hash chain. So that's a good indicator of whether or
14311         # not the block hash tree has been written.
14312         if "share_hash_chain" in self._offsets:
14313             raise LayoutInvalid("You must write this before the block hash tree")
14314hunk ./src/allmydata/mutable/layout.py 907
14315         The encrypted private key must be queued before the block hash
14316         tree, since we need to know how large it is to know where the
14317         block hash tree should go. The block hash tree must be put
14318-        before the salt hash tree, since its size determines the
14319+        before the share hash chain, since its size determines the
14320         offset of the share hash chain.
14321         """
14322         assert self._offsets
14323hunk ./src/allmydata/mutable/layout.py 932
14324         I queue a write vector to put the share hash chain in my
14325         argument onto the remote server.
14326 
14327-        The salt hash tree must be queued before the share hash chain,
14328-        since we need to know where the salt hash tree ends before we
14329+        The block hash tree must be queued before the share hash chain,
14330+        since we need to know where the block hash tree ends before we
14331         can know where the share hash chain starts. The share hash chain
14332         must be put before the signature, since the length of the packed
14333         share hash chain determines the offset of the signature. Also,
14334hunk ./src/allmydata/mutable/layout.py 937
14335-        semantically, you must know what the root of the salt hash tree
14336+        semantically, you must know what the root of the block hash tree
14337         is before you can generate a valid signature.
14338         """
14339         assert isinstance(sharehashes, dict)
14340hunk ./src/allmydata/mutable/layout.py 942
14341         if "share_hash_chain" not in self._offsets:
14342-            raise LayoutInvalid("You need to put the salt hash tree before "
14343+            raise LayoutInvalid("You need to put the block hash tree before "
14344                                 "you can put the share hash chain")
14345         # The signature comes after the share hash chain. If the
14346         # signature has already been written, we must not write another
14347}
14348[test_mutable.py: add test to exercise fencepost bug
14349warner@lothar.com**20110228021056
14350 Ignore-this: d2f9cf237ce6db42fb250c8ad71a4fc3
14351] {
14352hunk ./src/allmydata/test/test_mutable.py 2
14353 
14354-import os
14355+import os, re
14356 from cStringIO import StringIO
14357 from twisted.trial import unittest
14358 from twisted.internet import defer, reactor
14359hunk ./src/allmydata/test/test_mutable.py 2931
14360         self.set_up_grid()
14361         self.c = self.g.clients[0]
14362         self.nm = self.c.nodemaker
14363-        self.data = "test data" * 100000 # about 900 KiB; MDMF
14364+        self.data = "testdata " * 100000 # about 900 KiB; MDMF
14365         self.small_data = "test data" * 10 # about 90 B; SDMF
14366         return self.do_upload()
14367 
14368hunk ./src/allmydata/test/test_mutable.py 2981
14369             self.failUnlessEqual(results, new_data))
14370         return d
14371 
14372+    def test_replace_segstart1(self):
14373+        offset = 128*1024+1
14374+        new_data = "NNNN"
14375+        expected = self.data[:offset]+new_data+self.data[offset+4:]
14376+        d = self.mdmf_node.get_best_mutable_version()
14377+        d.addCallback(lambda mv:
14378+            mv.update(MutableData(new_data), offset))
14379+        d.addCallback(lambda ignored:
14380+            self.mdmf_node.download_best_version())
14381+        def _check(results):
14382+            if results != expected:
14383+                print
14384+                print "got: %s ... %s" % (results[:20], results[-20:])
14385+                print "exp: %s ... %s" % (expected[:20], expected[-20:])
14386+                self.fail("results != expected")
14387+        d.addCallback(_check)
14388+        return d
14389+
14390+    def _check_differences(self, got, expected):
14391+        # displaying arbitrary file corruption is tricky for a
14392+        # 1MB file of repeating data,, so look for likely places
14393+        # with problems and display them separately
14394+        gotmods = [mo.span() for mo in re.finditer('([A-Z]+)', got)]
14395+        expmods = [mo.span() for mo in re.finditer('([A-Z]+)', expected)]
14396+        gotspans = ["%d:%d=%s" % (start,end,got[start:end])
14397+                    for (start,end) in gotmods]
14398+        expspans = ["%d:%d=%s" % (start,end,expected[start:end])
14399+                    for (start,end) in expmods]
14400+        #print "expecting: %s" % expspans
14401+
14402+        SEGSIZE = 128*1024
14403+        if got != expected:
14404+            print "differences:"
14405+            for segnum in range(len(expected)//SEGSIZE):
14406+                start = segnum * SEGSIZE
14407+                end = (segnum+1) * SEGSIZE
14408+                got_ends = "%s .. %s" % (got[start:start+20], got[end-20:end])
14409+                exp_ends = "%s .. %s" % (expected[start:start+20], expected[end-20:end])
14410+                if got_ends != exp_ends:
14411+                    print "expected[%d]: %s" % (start, exp_ends)
14412+                    print "got     [%d]: %s" % (start, got_ends)
14413+            if expspans != gotspans:
14414+                print "expected: %s" % expspans
14415+                print "got     : %s" % gotspans
14416+            open("EXPECTED","wb").write(expected)
14417+            open("GOT","wb").write(got)
14418+            print "wrote data to EXPECTED and GOT"
14419+            self.fail("didn't get expected data")
14420+
14421+
14422+    def test_replace_locations(self):
14423+        # exercise fencepost conditions
14424+        expected = self.data
14425+        SEGSIZE = 128*1024
14426+        suspects = range(SEGSIZE-3, SEGSIZE+1)+range(2*SEGSIZE-3, 2*SEGSIZE+1)
14427+        letters = iter("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
14428+        d = defer.succeed(None)
14429+        for offset in suspects:
14430+            new_data = letters.next()*2 # "AA", then "BB", etc
14431+            expected = expected[:offset]+new_data+expected[offset+2:]
14432+            d.addCallback(lambda ign:
14433+                          self.mdmf_node.get_best_mutable_version())
14434+            def _modify(mv, offset=offset, new_data=new_data):
14435+                # close over 'offset','new_data'
14436+                md = MutableData(new_data)
14437+                return mv.update(md, offset)
14438+            d.addCallback(_modify)
14439+            d.addCallback(lambda ignored:
14440+                          self.mdmf_node.download_best_version())
14441+            d.addCallback(self._check_differences, expected)
14442+        return d
14443+
14444 
14445     def test_replace_and_extend(self):
14446         # We should be able to replace data in the middle of a mutable
14447}
14448[mutable/publish: account for offsets on segment boundaries.
14449Kevan Carstensen <kevan@isnotajoke.com>**20110228083327
14450 Ignore-this: c8758a0580fcc15a22c2f8582d758a6b
14451] {
14452hunk ./src/allmydata/mutable/filenode.py 17
14453 from pycryptopp.cipher.aes import AES
14454 
14455 from allmydata.mutable.publish import Publish, MutableData,\
14456-                                      DEFAULT_MAX_SEGMENT_SIZE, \
14457                                       TransformingUploadable
14458 from allmydata.mutable.common import MODE_READ, MODE_WRITE, MODE_CHECK, UnrecoverableFileError, \
14459      ResponseCache, UncoordinatedWriteError
14460hunk ./src/allmydata/mutable/filenode.py 1058
14461         # appending data to the file.
14462         assert offset <= self.get_size()
14463 
14464+        segsize = self._version[3]
14465         # We'll need the segment that the data starts in, regardless of
14466         # what we'll do later.
14467hunk ./src/allmydata/mutable/filenode.py 1061
14468-        start_segment = mathutil.div_ceil(offset, DEFAULT_MAX_SEGMENT_SIZE)
14469+        start_segment = mathutil.div_ceil(offset, segsize)
14470         start_segment -= 1
14471 
14472         # We only need the end segment if the data we append does not go
14473hunk ./src/allmydata/mutable/filenode.py 1069
14474         end_segment = start_segment
14475         if offset + data.get_size() < self.get_size():
14476             end_data = offset + data.get_size()
14477-            end_segment = mathutil.div_ceil(end_data, DEFAULT_MAX_SEGMENT_SIZE)
14478+            end_segment = mathutil.div_ceil(end_data, segsize)
14479             end_segment -= 1
14480         self._start_segment = start_segment
14481         self._end_segment = end_segment
14482hunk ./src/allmydata/mutable/publish.py 551
14483                                                   segment_size)
14484             self.starting_segment = mathutil.div_ceil(offset,
14485                                                       segment_size)
14486-            self.starting_segment -= 1
14487+            if offset % segment_size != 0:
14488+                self.starting_segment -= 1
14489             if offset == 0:
14490                 self.starting_segment = 0
14491 
14492}
14493[tahoe-put: raise UsageError when given a nonsensical mutable type, move option validation code to the option parser.
14494Kevan Carstensen <kevan@isnotajoke.com>**20110301030807
14495 Ignore-this: 2dc19d8bd741842eff458ca553d0bf2a
14496] {
14497hunk ./src/allmydata/scripts/cli.py 179
14498         if self.from_file == u"-":
14499             self.from_file = None
14500 
14501+        if self['mutable-type'] and self['mutable-type'] not in ("sdmf", "mdmf"):
14502+            raise usage.UsageError("%s is an invalid format" % self['mutable-type'])
14503+
14504+
14505     def getSynopsis(self):
14506         return "Usage:  %s put LOCAL_FILE REMOTE_FILE" % (os.path.basename(sys.argv[0]),)
14507 
14508hunk ./src/allmydata/scripts/tahoe_put.py 33
14509     stdout = options.stdout
14510     stderr = options.stderr
14511 
14512-    if mutable_type and mutable_type not in ('sdmf', 'mdmf'):
14513-        # Don't try to pass unsupported types to the webapi
14514-        print >>stderr, "error: %s is an invalid format" % mutable_type
14515-        return 1
14516-
14517     if nodeurl[-1] != "/":
14518         nodeurl += "/"
14519     if to_file:
14520hunk ./src/allmydata/test/test_cli.py 1008
14521         return d
14522 
14523     def test_mutable_type_invalid_format(self):
14524-        self.basedir = "cli/Put/mutable_type_invalid_format"
14525-        self.set_up_grid()
14526-        data = "data" * 100000
14527-        fn1 = os.path.join(self.basedir, "data")
14528-        fileutil.write(fn1, data)
14529-        d = self.do_cli("put", "--mutable", "--mutable-type=ldmf", fn1)
14530-        def _check_failure((rc, out, err)):
14531-            self.failIfEqual(rc, 0)
14532-            self.failUnlessIn("invalid", err)
14533-        d.addCallback(_check_failure)
14534-        return d
14535+        o = cli.PutOptions()
14536+        self.failUnlessRaises(usage.UsageError,
14537+                              o.parseOptions,
14538+                              ["--mutable", "--mutable-type=ldmf"])
14539 
14540     def test_put_with_nonexistent_alias(self):
14541         # when invoked with an alias that doesn't exist, 'tahoe put'
14542}
14543[web: use None instead of False in the case of no offset, use object identity comparison to check whether or not an offset was specified.
14544Kevan Carstensen <kevan@isnotajoke.com>**20110305010858
14545 Ignore-this: 14b7550ca95ce423c9b0b7f6f14ffd2f
14546] {
14547hunk ./src/allmydata/test/test_mutable.py 2981
14548             self.failUnlessEqual(results, new_data))
14549         return d
14550 
14551+    def test_replace_beginning(self):
14552+        # We should be able to replace data at the beginning of the file
14553+        # without truncating the file
14554+        B = "beginning"
14555+        new_data = B + self.data[len(B):]
14556+        d = self.mdmf_node.get_best_mutable_version()
14557+        d.addCallback(lambda mv: mv.update(MutableData(B), 0))
14558+        d.addCallback(lambda ignored: self.mdmf_node.download_best_version())
14559+        d.addCallback(lambda results: self.failUnlessEqual(results, new_data))
14560+        return d
14561+
14562     def test_replace_segstart1(self):
14563         offset = 128*1024+1
14564         new_data = "NNNN"
14565hunk ./src/allmydata/test/test_web.py 3185
14566         d.addCallback(_get_data)
14567         d.addCallback(lambda results:
14568             self.failUnlessEqual(results, self.new_data + ("puppies" * 100)))
14569+        # and try replacing the beginning of the file
14570+        d.addCallback(lambda ignored:
14571+            self.PUT("/uri/%s?offset=0" % self.filecap, "begin"))
14572+        d.addCallback(_get_data)
14573+        d.addCallback(lambda results:
14574+            self.failUnlessEqual(results, "begin"+self.new_data[len("begin"):]+("puppies"*100)))
14575         return d
14576 
14577     def test_PUT_update_at_invalid_offset(self):
14578hunk ./src/allmydata/web/common.py 55
14579     # message? Since this call is going to be used by programmers and
14580     # their tools rather than users (through the wui), it is not
14581     # inconsistent to return that, I guess.
14582-    return int(offset)
14583+    if offset is not None:
14584+        offset = int(offset)
14585+
14586+    return offset
14587 
14588 
14589 def get_root(ctx_or_req):
14590hunk ./src/allmydata/web/filenode.py 219
14591         req = IRequest(ctx)
14592         t = get_arg(req, "t", "").strip()
14593         replace = parse_replace_arg(get_arg(req, "replace", "true"))
14594-        offset = parse_offset_arg(get_arg(req, "offset", False))
14595+        offset = parse_offset_arg(get_arg(req, "offset", None))
14596 
14597         if not t:
14598             if not replace:
14599hunk ./src/allmydata/web/filenode.py 229
14600                 raise ExistingChildError()
14601 
14602             if self.node.is_mutable():
14603-                if offset == False:
14604+                if offset is None:
14605                     return self.replace_my_contents(req)
14606 
14607                 if offset >= 0:
14608hunk ./src/allmydata/web/filenode.py 238
14609                 raise WebError("PUT to a mutable file: Invalid offset")
14610 
14611             else:
14612-                if offset != False:
14613+                if offset is not None:
14614                     raise WebError("PUT to a file: append operation invoked "
14615                                    "on an immutable cap")
14616 
14617}
14618[mutable/filenode: remove incorrect comments about segment boundaries
14619Kevan Carstensen <kevan@isnotajoke.com>**20110307081713
14620 Ignore-this: 7008644c3d9588815000a86edbf9c568
14621] {
14622hunk ./src/allmydata/mutable/filenode.py 1001
14623         offset. I return a Deferred that fires when this has been
14624         completed.
14625         """
14626-        # We have two cases here:
14627-        # 1. The new data will add few enough segments so that it does
14628-        #    not cross into the next power-of-two boundary.
14629-        # 2. It doesn't.
14630-        #
14631-        # In the former case, we can modify the file in place. In the
14632-        # latter case, we need to re-encode the file.
14633         new_size = data.get_size() + offset
14634         old_size = self.get_size()
14635         segment_size = self._version[3]
14636hunk ./src/allmydata/mutable/filenode.py 1011
14637         log.msg("got %d old segments, %d new segments" % \
14638                         (num_old_segments, num_new_segments))
14639 
14640-        # We also do a whole file re-encode if the file is an SDMF file.
14641+        # We do a whole file re-encode if the file is an SDMF file.
14642         if self._version[2]: # version[2] == SDMF salt, which MDMF lacks
14643             log.msg("doing re-encode instead of in-place update")
14644             return self._do_modify_update(data, offset)
14645hunk ./src/allmydata/mutable/filenode.py 1016
14646 
14647+        # Otherwise, we can replace just the parts that are changing.
14648         log.msg("updating in place")
14649         d = self._do_update_update(data, offset)
14650         d.addCallback(self._decode_and_decrypt_segments, data, offset)
14651}
14652[mutable: use integer division where appropriate
14653Kevan Carstensen <kevan@isnotajoke.com>**20110307082229
14654 Ignore-this: a8767e89d919c9f2a5d5fef3953d53f9
14655] {
14656hunk ./src/allmydata/mutable/filenode.py 1055
14657         segsize = self._version[3]
14658         # We'll need the segment that the data starts in, regardless of
14659         # what we'll do later.
14660-        start_segment = mathutil.div_ceil(offset, segsize)
14661-        start_segment -= 1
14662+        start_segment = offset // segsize
14663 
14664         # We only need the end segment if the data we append does not go
14665         # beyond the current end-of-file.
14666hunk ./src/allmydata/mutable/filenode.py 1062
14667         end_segment = start_segment
14668         if offset + data.get_size() < self.get_size():
14669             end_data = offset + data.get_size()
14670-            end_segment = mathutil.div_ceil(end_data, segsize)
14671-            end_segment -= 1
14672+            end_segment = end_data // segsize
14673+
14674         self._start_segment = start_segment
14675         self._end_segment = end_segment
14676 
14677hunk ./src/allmydata/mutable/publish.py 547
14678 
14679         # Calculate the starting segment for the upload.
14680         if segment_size:
14681+            # We use div_ceil instead of integer division here because
14682+            # it is semantically correct.
14683+            # If datalength isn't an even multiple of segment_size, but
14684+            # is larger than segment_size, datalength // segment_size
14685+            # will be the largest number such that num <= datalength and
14686+            # num % segment_size == 0. But that's not what we want,
14687+            # because it ignores the extra data. div_ceil will give us
14688+            # the right number of segments for the data that we're
14689+            # given.
14690             self.num_segments = mathutil.div_ceil(self.datalength,
14691                                                   segment_size)
14692hunk ./src/allmydata/mutable/publish.py 558
14693-            self.starting_segment = mathutil.div_ceil(offset,
14694-                                                      segment_size)
14695-            if offset % segment_size != 0:
14696-                self.starting_segment -= 1
14697-            if offset == 0:
14698-                self.starting_segment = 0
14699+
14700+            self.starting_segment = offset // segment_size
14701 
14702         else:
14703             self.num_segments = 0
14704hunk ./src/allmydata/mutable/publish.py 604
14705         self.end_segment = self.num_segments - 1
14706         # Now figure out where the last segment should be.
14707         if self.data.get_size() != self.datalength:
14708+            # We're updating a few segments in the middle of a mutable
14709+            # file, so we don't want to republish the whole thing.
14710+            # (we don't have enough data to do that even if we wanted
14711+            # to)
14712             end = self.data.get_size()
14713hunk ./src/allmydata/mutable/publish.py 609
14714-            self.end_segment = mathutil.div_ceil(end,
14715-                                                 segment_size)
14716-            self.end_segment -= 1
14717+            self.end_segment = end // segment_size
14718+            if end % segment_size == 0:
14719+                self.end_segment -= 1
14720+
14721         self.log("got start segment %d" % self.starting_segment)
14722         self.log("got end segment %d" % self.end_segment)
14723 
14724}
14725
14726Context:
14727
14728[docs/configuration.rst: add a "Frontend Configuration" section
14729Brian Warner <warner@lothar.com>**20110222014323
14730 Ignore-this: 657018aa501fe4f0efef9851628444ca
14731 
14732 this points to docs/frontends/*.rst, which were previously underlinked
14733] 
14734[web/filenode.py: avoid calling req.finish() on closed HTTP connections. Closes #1366
14735"Brian Warner <warner@lothar.com>"**20110221061544
14736 Ignore-this: 799d4de19933f2309b3c0c19a63bb888
14737] 
14738[Add unit tests for cross_check_pkg_resources_versus_import, and a regression test for ref #1355. This requires a little refactoring to make it testable.
14739david-sarah@jacaranda.org**20110221015817
14740 Ignore-this: 51d181698f8c20d3aca58b057e9c475a
14741] 
14742[allmydata/__init__.py: .name was used in place of the correct .__name__ when printing an exception. Also, robustify string formatting by using %r instead of %s in some places. fixes #1355.
14743david-sarah@jacaranda.org**20110221020125
14744 Ignore-this: b0744ed58f161bf188e037bad077fc48
14745] 
14746[Refactor StorageFarmBroker handling of servers
14747Brian Warner <warner@lothar.com>**20110221015804
14748 Ignore-this: 842144ed92f5717699b8f580eab32a51
14749 
14750 Pass around IServer instance instead of (peerid, rref) tuple. Replace
14751 "descriptor" with "server". Other replacements:
14752 
14753  get_all_servers -> get_connected_servers/get_known_servers
14754  get_servers_for_index -> get_servers_for_psi (now returns IServers)
14755 
14756 This change still needs to be pushed further down: lots of code is now
14757 getting the IServer and then distributing (peerid, rref) internally.
14758 Instead, it ought to distribute the IServer internally and delay
14759 extracting a serverid or rref until the last moment.
14760 
14761 no_network.py was updated to retain parallelism.
14762] 
14763[TAG allmydata-tahoe-1.8.2
14764warner@lothar.com**20110131020101] 
14765Patch bundle hash:
1476661d4041da237d46cc26d429eddf0e155a1aafad0