#833 closed defect (fixed)

reject mutable children when *reading* an immutable dirnode

Reported by: warner Owned by: davidsarah
Priority: critical Milestone: 1.6.0
Component: code-dirnodes Version: 1.5.0
Keywords: integrity forward-compatibility backward-compatibility confidentiality news-done Cc:
Launchpad Bug:

Description

In #607 (DIR2-CHK), Zooko suggested that in addition to refusing to allow mutable children when creating an immutable dirnode, the tahoe code should guard against buggy/noncompliant implementations and complain when it sees a mutable child in a pre-existing immutable dirnode. The code should log an error of some sort and ignore the child.

When I implemented #607, I didn't do this. It should be done. The trickiest part is testing it, and deciding where to send the error message.

Attachments (14)

docs-diff.txt (24.1 KB) - added by davidsarah at 2010-01-22T06:12:25Z.
Documentation patches
test-diff.txt (169.3 KB) - added by davidsarah at 2010-01-23T13:05:21Z.
Diff for tests
all-diff.txt (240.0 KB) - added by davidsarah at 2010-01-23T13:06:14Z.
Diff for everything (code + tests + docs)
all-with-json-fix-diff.txt (268.1 KB) - added by davidsarah at 2010-01-24T06:10:54Z.
Diff for everything including t=json fix (code + tests + docs)
new-all-diff.2.txt (197.3 KB) - added by davidsarah at 2010-01-25T12:24:18Z.
New diff for everything (code + tests + docs) -- review this version
mutable-in-immutable-darcspatch.txt (236.9 KB) - added by davidsarah at 2010-01-27T07:11:54Z.
Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements
misc-tweaks-darcspatch.txt (59.3 KB) - added by davidsarah at 2010-01-27T07:12:21Z.
Miscellaneous documentation, test, and code formatting tweaks.
all-833-darcspatch.txt (276.2 KB) - added by davidsarah at 2010-01-27T23:23:25Z.
Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements (supercedes previous patches)
changes-since-kevans-review-diff.txt (19.2 KB) - added by davidsarah at 2010-01-27T23:26:08Z.
Changes since Kevan's review
fix-inaccurate-comment-darcspatch.txt (43.1 KB) - added by davidsarah at 2010-01-28T20:31:29Z.
Fix inaccurate comment in test_mutant_dirnodes_are_omitted
add-mutable-to-json-and-imm-ro-to-listings-darcspatch.txt (51.6 KB) - added by davidsarah at 2010-01-29T03:25:19Z.
Add mutable field to t=json output for unknown nodes, when mutability is known. Includes patch for #931 on which it is dependent.
this-is-the-one-to-apply-darcspatch.txt (52.9 KB) - added by davidsarah at 2010-01-29T03:38:53Z.
Fix inaccurate comment in test_mutant_dirnodes_are_omitted. Show -IMM and -RO suffixes for types of immutable and read-only unknown nodes in directory listings. Add mutable field to t=json output for unknown nodes, when mutability is known. Fix example JSON in webapi.txt that cannot occur in practice.
all-including-python24-fixes-darcspatch.txt (56.4 KB) - added by davidsarah at 2010-01-29T04:02:44Z.
Same patches as above but including Python 2.4 fixes.
test-unknownnode-improvements-and-json-example-darcspatch.txt (47.5 KB) - added by davidsarah at 2010-01-30T07:03:26Z.
Improvements to tests for UnknownNode?, and minor fix to a JSON example in webapi.txt

Download all attachments as: .zip

Change History (82)

comment:1 Changed at 2009-12-13T02:19:47Z by davidsarah

  • Keywords integrity added

This is a security (integrity) issue, because if I have an immutable directory, I should be able to assume collision-resistance for the whole of the subtree beneath that directory.

comment:2 Changed at 2009-12-13T04:13:59Z by zooko

  • Milestone changed from undecided to 1.6.0

I agree and I think we should do this before releasing immutable directories in Tahoe-LAFS v1.6.

comment:3 Changed at 2009-12-15T18:00:01Z by zooko

  • Priority changed from major to critical

To clarify, I think this is a critical security issue because if you tahoe cp -r $IMM_DIR_NODE ./local and then give $IMM_DIR_NODE to your friend, and she also tahoe cp -r $IMM_DIR_NODE ./herlocal, then you can be assured that she has all the same stuff that you do, even if the original creator of the directory that you are copying tries to trick you so that you and your friend get different results. This is the "deep" analogue of #491 (URIs do not refer to unique files in Allmydata Tahoe).

comment:4 Changed at 2009-12-30T00:45:46Z by davidsarah

A consequence of fixing this would be that you can be almost sure that an immutable directory represents a directed acyclic graph, because it should be infeasible to create a cycle of hashes. This seems like a potentially useful property.

comment:5 follow-up: Changed at 2010-01-09T02:47:09Z by warner

  • Owner set to warner
  • Status changed from new to assigned

On the phone, Zooko and I discussed the following plan:

  • in the WUI (i.e. the HTML directory-listing pages), when rendering an immutable directory, any recognized-as-mutable objects therein should be presented just like unrecognized objects (i.e. caps from the future). In other words, the WUI for immdirs will only generate download/list hyperlinks for objects that are recognized as immutable. It will generate non-hyperlinked child names for the unknown/illegal objects. The "More Info" page will still list the mutable objects's read/writecaps.
  • do the same in the t=json webapi: objects that are not recognized as mutable will be returned with type="unknown" instead of type="filenode" or type="dirnode". This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still include rw_uri and ro_uri.
  • the "tahoe cp" command, when faced with an unknown object, should print a stderr message and skip over the object. So doing a "cp -r" of a tahoe source directory that contains unknown objects will result in a smaller target directory which does not contain them.

This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their tahoe cp -r $IMM_DIR_NODE ./herlocal command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents)

davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.

comment:6 in reply to: ↑ 5 Changed at 2010-01-09T06:35:19Z by davidsarah

Replying to warner:

On the phone, Zooko and I discussed the following plan:

  • in the WUI (i.e. the HTML directory-listing pages), when rendering an immutable directory, any recognized-as-mutable objects therein should be presented just like unrecognized objects (i.e. caps from the future). In other words, the WUI for immdirs will only generate download/list hyperlinks for objects that are recognized as immutable. It will generate non-hyperlinked child names for the unknown/illegal objects. The "More Info" page will still list the mutable objects's read/writecaps.
  • do the same in the t=json webapi: objects that are not recognized as mutable will be returned with type="unknown" instead of type="filenode" or type="dirnode". This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still include rw_uri and ro_uri.
  • the "tahoe cp" command, when faced with an unknown object, should print a stderr message and skip over the object. So doing a "cp -r" of a tahoe source directory that contains unknown objects will result in a smaller target directory which does not contain them.

I'm a bit concerned that this approach would make it too easy to inadvertently write a webapi client that fails to enforce the restriction. Also, having to duplicate the check in all frontends seems like an indication that it is being checked at the wrong layer, i.e. that the webapi should be enforcing it instead.

Is there any reason why such entries need to be accessible at all? It doesn't seem different from any other incorrectly encoded directory entry, which might not be accessible.

This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their tahoe cp -r $IMM_DIR_NODE ./herlocal command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents)

davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.

comment:7 follow-up: Changed at 2010-01-10T08:29:17Z by warner

I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.

This would mean that "tahoe ls" couldn't even acknowledge the existence of such caps. Older clients would be more likely to overwrite them if they can't see them at all. And it limits the ability of other tools (perhaps smarter or more up-to-date than the tahoe node they're using) to do something with those caps.

I dunno what's the best thing to do. I agree that it feels a bit leaky, but I'm also uncomfortable with censoring out the existence of caps-from-the-future.

comment:8 in reply to: ↑ 7 Changed at 2010-01-10T20:10:19Z by davidsarah

  • Keywords forward-compatibility added

Replying to warner:

I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.

Not necessarily. If the directory is immutable, we could just omit rw_uri.

comment:9 Changed at 2010-01-11T04:57:38Z by zooko

Hey that's a good idea! :-)

comment:10 Changed at 2010-01-11T18:47:03Z by warner

That *is* a good idea, and an explicit form of it should go into source:src/allmydata/web/directory.py#L801 (in DirectoryJSONMetadata._got), to stomp out the rw_uri return value if the directory being listed is readonly.

The directory unpacking function (source:src/allmydata/dirnode.py#L256, in DirectoryNode._unpack_contents) will only return a rwcap value if the dirnode has a writekey, which is only the case for mutable dirwritecaps (neither mutable dirreadcaps nor immutable directories have a writekey). But if the rocap value happens to really be a writecap (like if some non-conformant implementation placed one there), DirectoryNode.list would return a writeable filenode, and then the webapi directory listing would get a rw_uri value, and would return it to the client without a check to explicitly stomp it out.

(I hadn't previously considered the problem of writecaps stored in the rocap column.. for recognized caps, we can block them, but of course we can't really say anything about caps-from-the-future).

But I don't think that's enough. If our claim is that "each time a conformant Tahoe client copies out the contents of an immutable directory, they will be exactly the same", then only serving ro_uri isn't strong enough (ro_uri is always read-only, but not always immutable). If some non-conformant implementation puts a mutable readcap into an immutable directory's ro_uri column, then a conformant client which only reports ro_uri will still enable "tahoe cp" to copy from something mutable, and then users could see different contents on different copies.

(If we'd decided to provide an imm_uri in addition to ro_uri and rw_uri, then maybe we could have omitted everything but imm_uri, but we didn't, and besides I think it would have made the dirnode columns even more complex).

comment:11 Changed at 2010-01-11T19:31:03Z by zooko

You know, we really should include an imm_uri, shouldn't we? I guess the way we are currently doing it means "If you are listing a mutable dir, then the ro_uri is a mutable child, and if you are listing an immutable dir, then the ro_uri is an immutable child.". Is that right? But what if there is an immutable child in a mutable dir?

comment:12 Changed at 2010-01-11T19:36:56Z by zooko

So in addition to whatever plan we settle on for reliably excluding known-mutable children, is there any use in predicting what format caps from the future might use to indicate read-only, read-write, and immutable? In my Tahoe-LAFS plugin for TiddlyWiki? I made it recognize the current cap format and also:

// This is speculative: maybe in the future there will be a version of Tahoe where caps 
// start with these symbols, and if so then this JavaScript code will magically work with 
// that version of Tahoe.
TAHOE_FUTURE_IMMUTABLE_CAP_RE_STR = "i_" + ALPHANUMERIC_STRING;
TAHOE_FUTURE_READONLY_CAP_RE_STR = "r_" + ALPHANUMERIC_STRING;
TAHOE_FUTURE_WRITABLE_CAP_RE_STR = "W_" + ALPHANUMERIC_STRING;

http://allmydata.org/source/tiddly_on_tahoe/trunk/tahoe_tiddly/TahoePlugin.js

I'm not sure that does any good, in particular because we may end up using different indicators for cap-type, maybe in part because having a separator char there, while good for human eyeballs, is bad for selecting with double-click of a mouse...

Anyway, if you could think for a minute about forward-compatibility and make sure that such things are not of use in this case and state conclusively that Tahoe-LAFS v1.6 should treat all unrecognized caps (whether from the future, corrupted, or malicious) that it finds as the children of immutable directories as mutable and unreadable that would make me feel better. :-)

comment:13 Changed at 2010-01-11T20:12:00Z by warner

hm. I guess immutability is a property of the object, while readability/writeability is a property of the cap (think of the cap as a facet here). There's no such thing as an "immutable cap" to a mutable object.

So maybe instead of "imm_uri" we should put another property on the JSON stuff we return, an "immutable" boolean.

Of course, for caps-from-the-future, we'd leave this empty, because we don't know. Then we'd need to decide whether the webapi promises all([x.immutable for x in immdir.list()]) == True (in which case it would be obligated to strip out everything except known-immutable children), or whether it promises something smaller.

WRT to anticipated future caps, I can currently imagine append-only caps, write-with-storage-authority caps, verify/repair caps, destroy caps, revocation caps, and others. Very few of these cleanly break down into mutable/immutable, or read/write. So I don't see an overwhelming amount of value to trying to anticipate them too much, and therefore having current code attempt to deduce much of anything about caps-from-the-future.

So yes, I believe that 1.6 should treat all unrecognized caps as unknown, and any operation that promises to only return caps that are known to have some particular property should be obligated to either filter out unrecognized caps or mark them in some way such that clients can easily tell which are which.

I'm still undecided as to whether marking nodetype==unknown or removing them entirely is better.

comment:14 Changed at 2010-01-11T20:35:40Z by zooko

If I understand correctly an "mutable" flag is a good idea, and should perhaps be quickly added into v1.6 in the interests of forward-compatibility, but is orthogonal to this issue of how to exclude mutable children from the listing of a mutable directory.

What if we had a "bad_children" element in the "dirnode" JSON, adjacent to the current "children" element: docs/frontends/webapi.txt?rev=4112#L509. This would give the programmer all the information about the bad children while making it very unlikely that someone would accidentally include bad children in among the good children. :-)

Oh, and although I really like the name "bad_children", I guess it also includes good children from the future, so we ought to name it something like "unrecognized_children". How is a programmer to know (possibly just out of curiousity) why each child is in that set? Perhaps there is a flag on each child saying either "mutable=true" or "known_format=false" or something?

comment:15 Changed at 2010-01-11T21:01:55Z by zooko

Oh, what about having two separate keys next to "children" -- "bad_children" for mutable children of an immutable directory and "unrecognized_children" for children from the future or corrupted.

I guess it boils down to whether we want the implementation written by a lazy programmer to exclude the children entirely from the listing or to include them but not make them "live" (clickable, cd'able, cp'able, etc.) -- just make them "greyed-out".

If we want them excluded, then they shouldn't appear as keys under "children" at all. If we want them included but "greyed-out", then they should appear as keys under "children" but they should not have a "ro_uri" or "rw_uri" key.

So maybe this means that mutable children of an immutable directory should appear in a separate "bad_children" key or shouldn't appear at all, while unrecognized children (hopefully from the future) should appear in the "children" key while having no "ro_uri" or "rw_uri".

comment:16 Changed at 2010-01-11T22:47:42Z by zooko

Brian wrote:

So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.

David-Sarah wrote:

Not necessarily. If the directory is immutable, we could just omit rw_uri.

I wrote:

Good idea!

No wait, that's a bad idea. A read-only cap to a mutable directory is not a read-cap to an immutable directory. I keep getting confused about this, and in the history of Tahoe-LAFS Brian and I have been confused about it more than once. I'm going to get a post-it note and affix it to my monitor, saying:

"READ-ONLY DOESN'T MEAN IMMUTABLE."

comment:17 Changed at 2010-01-13T19:59:48Z by zooko

Brian and I chatted about this on IRC yesterday. At the end I got sleepy and confused about what constraints we impose on the web gateway with regard to the JSON-formatted directory listings that it delivers to web clients. It is a question of forward-compatibility -- today's web clients may read JSON-formatted directory listings from web gateways from the future. This is a separate but related issue to today's web gateways (which are also Tahoe-LAFS storage clients) reading distributed directories that contain children inserted by other Tahoe-LAFS storage clients from the future. That latter issue is what most of our thinking on this ticket has been about so far.

I'm not sure what to think about that whole idea -- I haven't thought it through yet.

However, now in the light of morning one particular issue that seems clear to me is that the test that today's web gateway uses to determine whether a given child is a recognized or unrecognized child should be as specific as possible as long as it never marks a child from today as unrecognized. Brian tells me that uri.from_string("URI:CHK:<a>look at me I'm evil<a>") raises a BadURIError instead of what we would probably prefer which is to return an UnknownURI. I don't see why it does that from looking at the code. Here is a map to the relevant code:

The web gateway (which is a Tahoe-LAFS storage client) unpacks the directory contents with its internal _unpack_contents() method and passes the result to its caller from its list() method. Already looking at the code I want more documentation here. Could list() grow a docstring that tells the caller more about what sorts of things it might find in the caps of the children that it returns?

So the caller in this case takes the caps and passes each one to NodeMaker.create_from_cap() which returns UnknownNode in case both its writecap and readcap parameters are empty and then calls uri.from_string().

uri.from_string() looks to me like it is intended to, and does, return an UnknownURI instance when the cap is not any of the known formats, but Brian says otherwise.

So my proposals right now are:

  1. We add a docstring explaining what the caller can safely assume about the caps returned from DirectoryNode.list().
  2. We change uri.from_string() to return UnknownURI whenever the cap is clearly not a well-formed cap of a known type and add a unit test of this.
  3. The JSON listing returned by the wapi excludes mutable children of immutable directories entirely. The presence of one can only mean an attack or a bug. It includes children of unrecognized types, but not in the children node of the JSON object, instead in a sibling node named unrecognized_children. That way wapi-client code written by a lazy programmer who doesn't think about the concept of unrecognized children will be oblivious to their presence. This is perhaps suboptimal from a UI/UX standpoint, but it is safe. Code written by more conscientious programmer can read the unrecognized_children node and show those children as present but unusable ("greyed-out") for improved UX.

comment:18 Changed at 2010-01-13T20:11:02Z by warner

For reference, uri.from_string("URI:CHK:<a>look at me I'm evil</a>") starts by dispatching on a string prefix: we see the URI:CHK: prefix and stop considering anything else (including UnknownURI). Then the CHK-specific handler class applies a regexp to the look at me part and throws BadURIError.

The requirement is that any new format we introduce must not share one of our previously-claimed prefixes. The prefixes we've defined so far all look like /^URI:[A-Z\-]+:/, but I imagine we'll add tahoe:// or something in the future, and probably some binary formats that will also fit into this parse tree because they start with something non-ascii.

So there's plenty of room for new prefixes, but not room for new shapes on current prefixes.

comment:19 Changed at 2010-01-13T21:51:53Z by zooko

Oh, so this suggests a further refinement: child links that have caps that begin with one of our defined prefixes such as URI:CHK: but which are not well-formed according to uri.from_string() will not be treated as child links potentially-from-the-future but instead as bad child links, the same as child links which try to provide mutable access to you when they are in a mutable directory.

comment:20 Changed at 2010-01-16T07:43:05Z by davidsarah

Zooko, Brian and I discussed this on IRC, and initially came up with the following set of options. All the options have in common that when a child link of an immutable directory is recognized as being mutable, it should just be omitted. They differ on what should happen with unknown child links (i.e. caps from the future):

  • OMIT: just omit unknown child links
  • NONE: include a directory entry for an unknown child link, but omit rw_uri and ro_uri from its JSON representation
  • RO_URI: include a directory entry with only ro_uri
  • UNKNOWN_RO_URI: include a directory entry with a new unknown_ro_uri field that is used instead of ro_uri. (For mutable directories, unknown children would have both unknown_rw_uri and unknown_ro_uri fields.)
  • FORWARD_COMPAT_METADATA: standardize a way of recognizing immutable caps for "all" future versions (say, by the first character after lafs://), so that non-immutable caps can be excluded without having to understand their format.

There are also variations of each option according to whether the restriction is implemented in the webapi code or in DirectoryNode. If it is implemented in the webapi, then read-modify-write operations will preserve unknown caps that aren't directly affected by the operation. I think we decided that this is preferable because it loses data in fewer cases -- although it may introduce some inconsistencies between frontends, unless we manage to fix that for 1.6.

There was concensus that RO_URI option is unsafe (i.e. fails to address the issue in this bug), because it would be too easy for a ro_uri to be passed on to another client that understands it, but having lost track of the constraint that it is supposed to be immutable.

We had already excluded FORWARD_COMPAT_METADATA on the basis that it requires making decisions that we don't have time to consider carefully enough for 1.6. UNKNOWN_RO_URI has the advantage of not throwing away information, but we decided it was too complicated to implement and review in the time we have. OMIT for 1.6, and FORWARD_COMPAT_METADATA in 1.7, seemed like a good compromise.

comment:21 Changed at 2010-01-16T17:28:10Z by davidsarah

Overnight I thought of the following additional option, which has similar forward-compatibility advantages to FORWARD_COMPAT_METADATA but has fewer constraints on the future URL design:

  • PREFIXED_RO_URI: include a directory entry for an unknown child link of an immutable directory, but prefix ro_uri with "imm." (and omit rw_uri).

Note that this doesn't imply that the new format would have to use "imm." -- that prefix would only occur on URLs from the past. When we add support for the new format, the future client would see the "imm." prefix and check that the cap is immutable; if it is then it would work correctly.

(We might end up registering both "lafs:" and "imm.lafs:" in order to make this work in more cases, but I don't think that's a problem, and it's not strictly necessary. Anyway, there are already URI schemes containing '.'.)

In retrospect this is similar to something Zooko suggested on IRC ('mangle the cap by prepending "I'M NOT REALLY A CAP"'), except that "imm." just means that the future client must check that it is an immutable cap.

The effect of prepending "imm." with the current webapi is to display "GET unknown: can only do t=info, not t=", which is safe but ugly. We should fix that (it's ticket #884).

comment:22 follow-up: Changed at 2010-01-16T23:04:41Z by zooko

Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future). It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link).

comment:23 in reply to: ↑ 22 Changed at 2010-01-17T01:10:46Z by davidsarah

  • Keywords confidentiality added

Replying to zooko:

Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future).

Oh, I hadn't thought of that. I suspect we won't end up doing it that way if we wait until 1.7 to fix #839, though, because at that point we'll probably know enough about what the new cap URLs are going to look like to distinguish mutable from immutable, and therefore we can refuse outright to copy future mutable caps. Copying them as imm.-prefixed URLs has the disadvantage that you're granting authority to read subsequent updates to the original mutable object (to someone who strips off the imm.), which a deep-copy operation normally wouldn't do. So this would not be completely safe.

It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link).

Yes, that's a big advantage.

comment:24 Changed at 2010-01-17T02:46:07Z by davidsarah

  • Owner changed from warner to davidsarah
  • Status changed from assigned to new

comment:25 Changed at 2010-01-17T02:46:14Z by davidsarah

  • Status changed from new to assigned

comment:26 follow-up: Changed at 2010-01-17T15:12:02Z by davidsarah

Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.) grep for CannotPackUnknownNodeError to see where this is enforced.

It would however be safe to allow a client that is rewriting the whole directory to preserve entries containing caps that it does not understand. I'm not sure yet whether I'll be able to do that for 1.6.

comment:27 in reply to: ↑ 26 ; follow-up: Changed at 2010-01-17T18:24:01Z by davidsarah

Replying to davidsarah:

Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.)

I've changed my mind on this. In all but two of the webapi interfaces that involve creating or changing child dirnodes, the operation either creates the caps that it adds (in which case they are known), or is given them in the format {"rw_uri": ..., "ro_uri": ...}. So if the webapi client knows how to attenuate the cap (or already has it in that form) but the webapi server -- i.e. storage client -- doesn't, then that's OK.

The two (almost equivalent) interfaces that don't take a cap with separate "rw_uri" and "ro_uri" slots, are PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, i.e. the "hard link existing cap" operations. In that case we don't know how to attenuate the cap, so we cannot safely accept an unknown cap to be added.

However, I don't think there's any reason to restrict the other operations. We obtain no security benefit from doing so, because a malicious client (with the relevant writecap) can always write whatever it likes into a directory. The important thing is to make sure that clients who obtain caps from the directory are not confused.

So, I propose to remove the following check in source:src/allmydata/dirnode.py#L402 (and similarly at L427):

  if isinstance(child_node, UnknownNode):
      # don't be willing to pack unknown nodes: we might accidentally
      # put some write-authority into the rocap slot because we don't
      # know how to diminish the URI they gave us. We don't even know
      # if they gave us a readcap or a writecap.
      msg = "cannot pack unknown node as child %s" % str(name)
      raise CannotPackUnknownNodeError(msg)

(and remove CannotPackUnknownNodeError). Instead, the logic used by a webapi server when packing (encoding) and unpacking (decoding) a dirnode will be as follows:

If the directory is immutable,

  • always omit rw_uri slots (i.e. silently ignore whatever was there) when unpacking.
  • raise an error (fail the webapi operation with no side effects on the directory) if an rw_uri slot was present when packing.
  • if the packed ro_uri is unknown, strip any "ro." or "imm." prefix and then prepend "imm.".

If the directory is mutable,

  • pass through any rw_uri slots when packing and unpacking.
  • if the packed ro_uri is unknown and it does not already have a "ro." or "imm." prefix, then prepend "ro.".

An URI is "unknown" if {{{from_string}}} in uri.py returns an instance of UnknownURI.

So the effect is that the URI returned in the ro_uri slot will be unusable by correct clients if it does not meet the constraint that it should have met to be in that slot -- i.e. being either read-only or deep-immutable.

comment:28 follow-up: Changed at 2010-01-18T04:31:17Z by davidsarah

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the rw_uri and ro_uri slots should be consistent with each other -- i.e. if both are present then they point to the same object, and ro_uri is the read-only version of rw_uri (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

(Note that if we read the directory twice, then the two reads are not guaranteed to be consistent. This issue doesn't arise for immutable directories because only the ro_uri slots will be present.)

comment:29 in reply to: ↑ 27 ; follow-up: Changed at 2010-01-18T14:08:30Z by zooko

I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when reading the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS. (I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory, but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)

But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints.

But the storage client should enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory.

Okay, so the practice of prepending an imm. to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them. It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending of imm. is a way to mark the context from which that cap came.

The reason we can't prepend imm. to every child link from an immutable directory and ro. to every link from a ro_uri slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognize ro. or imm..

This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has imm. on the front of it then you should check whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off the imm. and use the cap. Likewise with ro.. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prepend imm. even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prepending imm. only to child links whose type you don't recognize.

comment:30 in reply to: ↑ 28 ; follow-up: Changed at 2010-01-18T14:19:23Z by zooko

Replying to davidsarah:

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the rw_uri and ro_uri slots should be consistent with each other -- i.e. if both are present then they point to the same object, and ro_uri is the read-only version of rw_uri (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.

I'm not sure either. Let's see: src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap.

(Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.)

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

comment:31 in reply to: ↑ 29 ; follow-up: Changed at 2010-01-18T19:31:48Z by davidsarah

Replying to zooko:

I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when reading the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS.

The constraints are only on decoding future cap URIs. When a future version decodes an URI that starts with "ro." and is followed by the encoding of a future cap, it must check that the cap is read-only. When it decodes an URI that starts with "imm." and is followed by the encoding of a future cap, it must check that the cap is immutable. If those constraints are not met then decoding must fail (but it is still safe to handle the URI in undecoded form).

There are no constraints on current clients, or on handling URIs as opaque strings. (If an entity were to strip off a "ro." or "imm." prefix without checking the constraint, it could confuse itself, so it shouldn't do that.)

(I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory,

Correct, because it's not decoding those child URIs.

but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)

That would mean that a directory operation could have side-effects on child links that it isn't defined to alter. No component needs to check URIs that it isn't decoding, so this would be consistent between components.

But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints.

Right.

But the storage client should enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory.

Yes. (Currently it does refuse to put child links of unknown types into any directory, but this patch would change that.)

Okay, so the practice of prepending an imm. to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them.

As I said, the constraint is only necessary on decoding. A future version of the storage client could redundantly check the constraint when it passes on an URI to a webapi client, and it should probably do so even if only to reduce the length of the URI (since it can then strip off the 3 or 4 character prefix).

It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending of imm. is a way to mark the context from which that cap came. The reason we can't prepend imm. to every child link from an immutable directory and ro. to every link from a ro_uri slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognize ro. or imm..

Exactly.

This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has imm. on the front of it then you should check whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off the imm. and use the cap. Likewise with ro.. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prepend imm. even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prepending imm. only to child links whose type you don't recognize.

OK, I've made this improvement in my patch.

comment:32 in reply to: ↑ 30 ; follow-ups: Changed at 2010-01-18T19:48:25Z by davidsarah

Replying to zooko:

Replying to davidsarah:

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the rw_uri and ro_uri slots should be consistent with each other -- i.e. if both are present then they point to the same object, and ro_uri is the read-only version of rw_uri (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.

I'm not sure either. Let's see: src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap.

Except that if the most powerful cap is of unknown type, _create_from_cap (note the underscore) will return None, so create_from_cap will return UnknownNode(writecap, readcap), which doesn't ensure that the cap pair is consistent.

So, the proposed behaviour for 1.6 is identical to the current behaviour here: known cap pairs retrieved from the webapi are consistent, but unknown ones may not be.

(Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.)

What's important here is off-line diminish from a writecap to a readcap. All the proposed designs support that. What might not be supported is off-line diminish from a short readcap to a verifycap. (Elk Point does not support that in cases where the readcap is too short to provide sufficient integrity.)

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

What would the webapi client do with that fact? imm. and ro. are associated with specific conformance requirements, but unk. wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.)

comment:33 in reply to: ↑ 32 Changed at 2010-01-18T20:04:25Z by davidsarah

Replying to davidsarah:

Replying to zooko:

Replying to davidsarah:

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

What would the webapi client do with that fact? imm. and ro. are associated with specific conformance requirements, but unk. wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.)

... although it should probably avoid decoding caps since that is the webapi server's responsibility. Which reminds me: we don't seem to have either a webapi operation or a CLI command to diminish a cap.

comment:34 Changed at 2010-01-18T20:06:40Z by davidsarah

  • Keywords backward-compatibility added

comment:35 in reply to: ↑ 32 Changed at 2010-01-18T20:15:38Z by zooko

Okay, so the meaning of imm. in English is "I found this cap in an immutable context and I couldn't verify whether it is an immutable cap." rather than "I found this cap in an immutable context.". Sounds okay to me.

Replying to davidsarah:

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

What would the webapi client do with that fact? imm. and ro. are associated with specific conformance requirements, but unk. wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption.

Okay, I'm convinced.

Great! Are we out of open issues for this ticket?

I chatted with Scott Kittermann, a Master of the Universe, and he said we still had an excellent chance of getting Tahoe-LAFS v1.6 included in Lucid if we upload it to them on Monday 25th instead of (as per our previous plan) Wednesday the 20th. So let's do it!! :-)

comment:36 in reply to: ↑ 31 Changed at 2010-01-18T20:26:05Z by davidsarah

Replying to davidsarah:

Replying to zooko:

(I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory,

Correct, because it's not decoding those child URIs.

but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)

That would mean that a directory operation could have side-effects on child links that it isn't defined to alter.

I misread your comment -- you said a different directory. However, there's no webapi operation that directly copies caps from one directory to another. Copying is implemented by getting the JSON representation of a directory, and using mkdir-with-children or mkdir-immutable or set-children to create or modify another directory (I assume; I haven't looked at the implementation of tahoe cp). So this is covered by the validation of known cap pairs that is done on directory reads.

Yes, I think we are out of open issues. Yay for Tahoe 1.6 in Lucid!

comment:37 Changed at 2010-01-19T01:53:23Z by warner

If we must use prefixes, let's use unknown. instead of unk. . We don't need these to be short, and abbreviated-to-the-point-of-incomprehensibility keywords drive me nuts.

I don't know if the same argument applies to imm. or ro. or whatever.

I haven't read the rest of the discussion thoroughly enough to have an intelligent opinion, but this prefix stuff makes me nervous. OTOH, the notion of attaching a "guard" to the cap seems to make sense. I just worry about last-minute changes to a two-year old API.

comment:38 Changed at 2010-01-19T03:25:41Z by davidsarah

I've put the prefixes in READONLY_PREFIX and IMMUTABLE_PREFIX constants, so this would be trivial to change. I have no strong opinion on whether to use "ro." and "imm." vs "readonly." and "immutable.".

I understand the reasons for nervousness, but we're getting quite a lot of forward-compatibility goodness from this set of changes, and we should have sufficient time to review it before the release.

Changed at 2010-01-22T06:12:25Z by davidsarah

Documentation patches

Changed at 2010-01-23T13:05:21Z by davidsarah

Diff for tests

Changed at 2010-01-23T13:06:14Z by davidsarah

Diff for everything (code + tests + docs)

comment:39 Changed at 2010-01-23T13:08:25Z by davidsarah

The changes are larger and less elegant than I'd hoped, and I failed to resist the temptation to do some refactoring of test_web.py. The changes to test_web.py that are most relevant, i.e. excluding the refactoring, are in these functions:

  _create_initial_children
  _create_immutable_children
  test_POST_NEWDIRURL_initial_children
  test_POST_NEWDIRURL_immutable
  test_POST_mkdir_immutable
  test_POST_mkdir_no_parentdir_initial_children
  test_POST_mkdir_no_parentdir_immutable
  test_unknown
  test_immutable_unknown
  test_deep_check
  test_mutant_dirnodes_are_omitted

The last of these is a new test that directly checks the main problem in this ticket; you may want to look at it first.

To review the code patches, I suggest looking at interfaces.py, unknown.py, uri.py, dirnode.py, and nodemaker.py first.

There are some commented-out print statements left, which I'll remove before preparing the final patch.

I couldn't figure out how to tell darcs to produce a diff that excludes certain directories, so all-diff.txt contains the test and doc diffs as well.

comment:40 Changed at 2010-01-23T13:09:00Z by davidsarah

  • Keywords review-needed added

comment:41 Changed at 2010-01-23T13:27:42Z by davidsarah

The calls to rw_uri.strip() and ro_uri.strip() in dirnode.py should be

rw_uri.strip(' ') and ro_uri.strip(' '),

since the Python docs don't clearly specify which characters are treated as whitespace by default. Actually that probably also applies to uses of .strip() elsewhere.

Changed at 2010-01-24T06:10:54Z by davidsarah

Diff for everything including t=json fix (code + tests + docs)

comment:42 Changed at 2010-01-24T06:14:20Z by davidsarah

The t=json option was not working for unknown nodes, so I've fixed that.

(Go to here and click on JSON to see the error message produced by the current code.)

comment:43 follow-up: Changed at 2010-01-25T00:35:13Z by kevan

I've read through this ticket, and I want to make sure that I understand what the proposed solution is before I start reviewing it.

The motivating problem (originally, at least) is that immutable directories were allowed to have mutable children. This goes against the expectation that the contents of an immutable directory are themselves immutable, as stated in comment:3. The problem them expanded to include dealing with how clients from the present deal with caps from the future.

The solution that I think ended up being the one that everyone agreed with was stated in comment:21. This was further elaborated on in comment:27. Paraphrasing:

  • When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure.
  • When reading an immutable directory, unknown caps in the rw_uri slot should be silently ignored, and unknown caps in the ro_uri slot should have an ro. or imm. prefix removed, and replaced with an imm. prefix. When writing an immutable directory, unknown caps in the ro_uri slot should be prefixed with imm. if they are not already, and the existence of unknown caps in the rw_uri slot should be cause for an error and a failure. (extrapolating from comment:20)
  • When reading a mutable directory, unknown caps in the rw_uri slot should be passed through as normal. When writing a mutable directory, unknown caps in the rw_uri slot should be passed through as normal, and unknown caps in the ro_uri slot should have any existing prefix removed and replaced with ro..
  • When presented with a cap prefixed with imm. or ro., webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with imm. (this is suggested in comment:29).

Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.

comment:44 in reply to: ↑ 43 Changed at 2010-01-25T12:19:07Z by davidsarah

Replying to kevan:

The solution that I think ended up being the one that everyone agreed with was stated in comment:21. This was

further elaborated on in comment:27. Paraphrasing:

  • When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be

omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for

an error and a failure.

Yes. This should fail the operation with a MustBeDeepImmutableError?.

  • When reading an immutable directory, unknown caps in the rw_uri slot should be silently ignored,

In an immutable directory, the rwcapdata field that would contain the encrypted and mac'd rw_uri

should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this

field, but on reflection I think we should treat a non-empty rwcapdata in the same way as a netstring

decoding error, i.e. raise ValueError?. That's what the latest patch [new-all-diff.txt] does.

We also test that an immutable directory's underlying filenode has no .get_writekey() method, so that the

code to decrypt this field would fail anyway if it were mistakenly executed.

and unknown caps in the ro_uri slot should have an ro. or imm. prefix removed, and replaced

with an imm. prefix.

Yes.

When writing an immutable directory, unknown caps in the ro_uri slot should be prefixed with imm.

if they are not already,

No, this isn't necessary: the ro_uri slots of an immutable directory are implicitly immutable. The

imm. will be added when the slot is read out by a client that doesn't understand the cap format, but it is

not stored. In fact, ro. or imm. will be stripped if present (by strip_prefix_for_ro in

unknown.py).

(If the prefix were not stripped, then it would be more difficult to test what happens when .imm is absent

-- this would be a case that the attacker could provoke that would not occur in normal operation.)

and the existence of unknown caps in the rw_uri slot should be cause for an error and a failure.

(extrapolating from comment:20)

Yes. Actually the existence of *any* cap in the rw_uri slot of an immutable directory is a cause for

failure. However it is valid to provide a rw_uri in the JSON representation or in an URL parameter if it

can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest

version of the patch, if it already had an imm. prefix. In either of those cases, it will be moved to the

ro_uri slot.

  • When reading a mutable directory, unknown caps in the rw_uri slot should be passed through as normal.

When writing a mutable directory, unknown caps in the rw_uri slot should be passed through as normal, and

unknown caps in the ro_uri slot should have any existing prefix removed and replaced with ro..

Not quite. If there was an existing imm. prefix, it should stay. (This would happen for an unknown cap that

is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed

to refer to an immutable subtree is not forgotten.)

Similarly to the immutable case, if an unknown cap is given in a rw_uri slot with no ro_uri and the

cap already has a ro. prefix, then it will be moved to the ro_uri slot. (This makes it possible to

link a ro. or imm.-prefixed unknown cap into an existing directory using the WUI, without needing a

JSON request.)

  • When presented with a cap prefixed with imm. or ro., webapi servers should see if it is a cap

that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in

other words, that it is immutable if prefixed with imm. (this is suggested in comment:29).

... and read-only if prefixed with ro.. Yes.

Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.

Please do, thanks.

comment:45 Changed at 2010-01-25T12:22:29Z by davidsarah

[fixed formatting]

Replying to kevan:

The solution that I think ended up being the one that everyone agreed with was stated in comment:21. This was further elaborated on in comment:27. Paraphrasing:

  • When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure.

Yes. This should fail the operation with a MustBeDeepImmutableError?.

  • When reading an immutable directory, unknown caps in the rw_uri slot should be silently ignored,

In an immutable directory, the rwcapdata field that would contain the encrypted and mac'd rw_uri should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this field, but on reflection I think we should treat a non-empty rwcapdata in the same way as a netstring decoding error, i.e. raise ValueError?. That's what the latest patch [new-all-diff.txt] does.

We also test that an immutable directory's underlying filenode has no .get_writekey() method, so that the code to decrypt this field would fail anyway if it were mistakenly executed.

and unknown caps in the ro_uri slot should have an ro. or imm. prefix removed, and replaced with an imm. prefix.

Yes.

When writing an immutable directory, unknown caps in the ro_uri slot should be prefixed with imm. if they are not already,

No, this isn't necessary: the ro_uri slots of an immutable directory are implicitly immutable. The imm. will be added when the slot is read out by a client that doesn't understand the cap format, but it is not stored. In fact, ro. or imm. will be stripped if present (by strip_prefix_for_ro in unknown.py).

(If the prefix were not stripped, then it would be more difficult to test what happens when .imm is absent -- this would be a case that the attacker could provoke that would not occur in normal operation.)

and the existence of unknown caps in the rw_uri slot should be cause for an error and a failure.

(extrapolating from comment:20)

Yes. Actually the existence of *any* cap in the rw_uri slot of an immutable directory is a cause for failure. However it is valid to provide a rw_uri in the JSON representation or in an URL parameter if it can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest version of the patch, if it already had an imm. prefix. In either of those cases, it will be moved to the ro_uri slot.

  • When reading a mutable directory, unknown caps in the rw_uri slot should be passed through as normal.

When writing a mutable directory, unknown caps in the rw_uri slot should be passed through as normal, and unknown caps in the ro_uri slot should have any existing prefix removed and replaced with ro..

Not quite. If there was an existing imm. prefix, it should stay. (This would happen for an unknown cap that is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed to refer to an immutable subtree is not forgotten.)

Similarly to the immutable case, if an unknown cap is given in a rw_uri slot with no ro_uri and the cap already has a ro. prefix, then it will be moved to the ro_uri slot. (This makes it possible to link a ro. or imm.-prefixed unknown cap into an existing directory using the WUI, without needing a JSON request.)

  • When presented with a cap prefixed with imm. or ro., webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with imm. (this is suggested in comment:29).

... and read-only if prefixed with ro.. Yes.

Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.

Please do, thanks.

Changed at 2010-01-25T12:24:18Z by davidsarah

New diff for everything (code + tests + docs) -- review this version

comment:46 Changed at 2010-01-25T17:06:59Z by kevan

  • Owner changed from davidsarah to kevan
  • Status changed from assigned to new

comment:47 Changed at 2010-01-25T17:07:07Z by kevan

  • Status changed from new to assigned

comment:48 Changed at 2010-01-26T03:47:06Z by kevan

I've looked through the changes to behavior (with the exception of those to the webapi) and documentation, and have the following comments so far. I'll probably have more after I start working on the remaining parts, but posting what I have now might make this patchset more likely to make it into trunk by Thursday.

interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of MustBeDeepImmutableError and MustBeReadOnlyError. This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error?

The docstring for strip_prefix_for_ro should be inside the function, not before it.

In the __init__ method of UnknownNode?, you say if x is not None: instead of if x. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison?

if given_ro_uri is not None:
            read_cap = uri.from_string(given_ro_uri, deep_immutable=deep_immutable, name=name)
            if isinstance(read_cap, uri.UnknownURI):
                self.error = read_cap.get_error()
                if self.error:
                    assert self.rw_uri is None and self.ro_uri is None
                    return

How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical? From what I can understand by reading the code in UnknownURI, it is possible to get that result in either case. Apologies if this is a case of me being dense.

if deep_immutable:
            assert self.rw_uri is None
            # strengthen the constraint on ro_uri to ALLEGED_IMMUTABLE_PREFIX
            if given_ro_uri is not None:
                if given_ro_uri.startswith(ALLEGED_IMMUTABLE_PREFIX):
                    self.ro_uri = given_ro_uri
                elif given_ro_uri.startswith(ALLEGED_READONLY_PREFIX):
                    self.ro_uri = ALLEGED_IMMUTABLE_PREFIX + given_ro_uri[len(ALLEGED_READONLY_PREFIX):]
                else:
                    self.ro_uri = ALLEGED_IMMUTABLE_PREFIX + given_ro_uri

What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that UnknownNode? can enter. It also serves as self-documentation, I guess.

Any reason for implementing is_unknown in _BaseURI? Is it anything that might need to be added to the IURI interface? What about is_readonly, is_mutable, get_readonly, get_verify_cap, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces?

The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes.

I didn't see any problems in nodemaker.py or dirnode.py.

If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket.

I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the kwargs form.

I have a block of free time tomorrow, so I'm hoping I will have the rest of the review done by then.

comment:49 Changed at 2010-01-26T06:24:18Z by davidsarah

interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of MustBeDeepImmutableError? and MustBeReadOnlyError?. This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error?

An UnknownNode? may have one of these error objects in its .error property. They will only be thrown if node.raise_error() is called. This happens when we try to put the node into a directory (besides tests, the relevant calls to raise_error are in dirnode.py and nodemaker.py).

The docstring for strip_prefix_for_ro should be inside the function, not before it.

OK.

In the __init__ method of UnknownNode?, you say if x is not None: instead of if x. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison?

Yes. I'll recheck what happens to falsy URIs.

>  if given_ro_uri is not None:
>              read_cap = uri.from_string(given_ro_uri,
>  deep_immutable=deep_immutable, name=name)
>              if isinstance(read_cap, uri.UnknownURI):
>                  self.error = read_cap.get_error()
>                  if self.error:
>                      assert self.rw_uri is None and self.ro_uri is None
>                      return

How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical?

In either case, isinstance(read_cap, uri.UnknownURI) will be true. But only in the latter case will read_cap.get_error() be truthy.

>  if deep_immutable:
>              assert self.rw_uri is None
...

What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that UnknownNode? can enter. It also serves as self-documentation, I guess.

It's intended as documentation. I normally use asserts only for documentation, not for validity checks.

Any reason for implementing is_unknown in _BaseURI? Is it anything that might need to be added to the IURI interface?

No, I'll remove that. (I originally had an is_unknown method on URIs but changed to using isinstance(..., UnknownURI).)

What about is_readonly, is_mutable, get_readonly, get_verify_cap, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces?

The existing code was inconsistent; it had these methods for some URI classes and not others. They were already declared in IURI.

The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes.

Fixed.

I didn't see any problems in nodemaker.py or dirnode.py.

If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket.

Out of scope (there are > 60 uses of "Tahoe" in webapi.txt!)

I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the kwargs form.

Will do.

comment:50 Changed at 2010-01-27T05:58:35Z by kevan

web/common.py looks good.

web/dirnode.py looks good.

web/filenode.py looks good.

web/info.py looks good.

web/root.py looks good.

test/common.py looks good.

test/test_dirnode.py:

+        bad_future_node = UnknownNode(future_write_uri, None)
+        bad_kids1 = {u"one": (bad_future_node, {})}
         d.addCallback(lambda ign:
+                      self.shouldFail(MustNotBeUnknownRWError, "bad_kids1",
+                                      "cannot attach unknown",
                                       nm.create_new_mutable_directory,
                                       bad_kids1))

Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap?

(Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though)

I think that FakeMutableFile?? needs to implement raise_error.

test_filenode.py:

around line 41,

+        self.failUnlessEqual(fn1.is_unknown(), False)
+        self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True)

could be re-written as

    self.failIf(fn1.is_unknown())
    self.failUnless(fn1.is_allowed_in_mutable_directory())

I think it reads better that way, anyway.

Similarly for:

+        self.failUnlessEqual(v.is_readonly(), True)
+        self.failUnlessEqual(v.is_mutable(), False)

just a little below that and for

+        self.failUnlessEqual(fn1.is_unknown(), False)
+        self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True)

around line 64 and

+        self.failUnlessEqual(n.is_unknown(), False)
+        self.failUnlessEqual(n.is_allowed_in_immutable_directory(), False)

in the chunk around line 99 and

+        self.failUnlessEqual(nro.is_mutable(), True)
+        self.failUnlessEqual(nro.is_readonly(), True)
+        self.failUnlessEqual(nro.is_unknown(), False)
+        self.failUnlessEqual(nro.is_allowed_in_immutable_directory(), False)

in the chunk around line 127.

(test_filenode.py uses failUnlessEqual fairly solidly where it could use those, though, so what you're doing is at least consistent with what is already there)

test/test_system.py seems okay.

test/test_uri.py:

It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought.

test_web.py:

I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes.

You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests.

Other comments:

Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925?

My checklist of things that I thought there needed to be tests for is appeased, anyway, and I didn't see any other problems, so I guess this is reviewed.

Changed at 2010-01-27T07:11:54Z by davidsarah

Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements

Changed at 2010-01-27T07:12:21Z by davidsarah

Miscellaneous documentation, test, and code formatting tweaks.

Changed at 2010-01-27T23:23:25Z by davidsarah

Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements (supercedes previous patches)

Changed at 2010-01-27T23:26:08Z by davidsarah

Changes since Kevan's review

comment:51 Changed at 2010-01-27T23:36:41Z by davidsarah

(A version of this reply was sent to the tahoe-dev list while the trac server was down, but note that there are some extra comments about #925 and from_string_* at the end.)

Replying to kevan:

test/test_dirnode.py:

+        bad_future_node = UnknownNode(future_write_uri, None)
+        bad_kids1 = {u"one": (bad_future_node, {})}
         d.addCallback(lambda ign:
+                      self.shouldFail(MustNotBeUnknownRWError, "bad_kids1",
+                                      "cannot attach unknown",
                                       nm.create_new_mutable_directory,
                                       bad_kids1))

Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap?

Yes, exactly. Comment added.

(Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though)

Well, they didn't start with explanations :-) This is a good point, but I think I'll need to address it after the 1.6 release.

I think that FakeMutableFile?? needs to implement raise_error.

Actually the tests never call it, but it should have this method for consistency. Fixed.

test_filenode.py:

around line 41,

+        self.failUnlessEqual(fn1.is_unknown(), False)
+        self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True)

could be re-written as

    self.failIf(fn1.is_unknown())
    self.failUnless(fn1.is_allowed_in_mutable_directory())

Changed for all cases in test_filenode.py.

test/test_uri.py:

It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought.

You mean the ones that result in instances of !UnknownURI? (Technically there are no invalid uris any more, because uri.from_string will catch the !BadURIError and construct an !UnknownURI that remembers it. The !BadURIError might be reraised later, though.)

I think that if it's easy enough to work out, that's as it should be. The trouble with putting too many comments in is that reviewers trust the comments, when they shouldn't.

test_web.py:

I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes.

Don't make that assumption! The patch removed some of the restrictions on UnknownNodes, so there's definitely a need to explicitly test this.

In fact the addition should succeed (for a mutable directory) when the URI is prefixed with "ro." or "imm." -- in those cases we don't need to be able to diminish it to a readcap, because it already is one. Tests for the six possible cases (test_{PUT_NEWFILEURL, POST_link}_uri_unknown_{bad, ro_good, imm_good) added.

You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests.

Yes, that made it a lot easier to see what needed to be tested.

Other comments:

Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925?

Now done (test_dirnode.py: test_spaces_are_stripped_on_the_way_out). As mentioned in #925, I changed it to only strip trailing spaces, which is sufficient to address the padding forward-compatibility issue.

Incidentally, I decided not to change from_string_* in uri.py to use explicit named keyword arguments, because I think the code is clearer and more stable under maintenance as it is. I did add a comment that they take the same keyword args as from_string.

comment:52 Changed at 2010-01-28T02:40:36Z by kevan

The new changes look good -- no complaints from me.

From grepping around the darcs patch, it seems like you've also remembered to remove the commented-out print statements -- if I'm mistaken, don't forget to do that. :)

The patches apply fine against a fresh checkout of trunk (as of about 5 minutes ago), and it is able to build without error. I need to leave for dinner now, but I'll confirm that all of the tests pass once I get back.

comment:53 Changed at 2010-01-28T06:38:16Z by kevan

...and they pass, so everything looks good here. I'm going to mark this as "reviewed" and change its owner back to davidsarah -- if someone disagrees with that, then feel free to change it back.

comment:54 Changed at 2010-01-28T06:38:39Z by kevan

  • Keywords reviewed added; review-needed removed
  • Owner changed from kevan to davidsarah
  • Status changed from assigned to new

comment:55 Changed at 2010-01-28T14:03:34Z by zooko

Okay, I'm ready to apply the patches, but which of the patches attached to this ticket are the one that should be applied?

comment:56 Changed at 2010-01-28T14:58:18Z by zooko

Oh I get it, the second-to-last one, which is marked as "supercedes previous patches" and which also seems to include the succeeding patch.

comment:57 Changed at 2010-01-28T15:02:44Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

comment:58 follow-up: Changed at 2010-01-28T20:25:54Z by davidsarah

Phew! After rereading the past discussion, here are a few clarifications about how 1.6 will behave:

  • as suggested by Zooko in comment:10, URIs such as "URI:CHK:<a>look at me I'm evil<a>" are now treated as unknown (the BadURIError raised by, in this case, CHKFileURI.init_from_string is caught by uri.from_string). However they are not allowed to be put into directories -- the error thrown by init_from_string will be remembered and re-raised if we try to do that.
  • the patch does not add an "immutable" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "ro_uri" field holds a known-immutable cap, or starts with "imm.", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that.
  • there are no "bad_children" or "unrecognized_children" keys in the JSON -- we omit bad children, and unrecognized children go under the main "children" key.

Also, note that directory listings containing unknown caps were readable with Tahoe v1.5, but not v1.4.1 or earlier versions.

Changed at 2010-01-28T20:31:29Z by davidsarah

Fix inaccurate comment in test_mutant_dirnodes_are_omitted

comment:59 in reply to: ↑ 58 ; follow-up: Changed at 2010-01-29T00:08:25Z by davidsarah

Replying to davidsarah:

  • the patch does not add an "immutable" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "ro_uri" field holds a known-immutable cap, or starts with "imm.", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that.

I was slightly confused: there is already a "mutable" field (added in 3ac4a734e507abbf), so we should not add "immutable". The issue is whether the output of the "t=json" operation on unknown nodes (which did not work prior to 1.6) should include "mutable", with a value set according to whether the ro_uri starts with "imm.". Currently there is no "mutable" field.

This isn't giving the webapi client any additional information that it couldn't work out for itself (and, contrary to the comment above, it does not need to understand the current cap formats, only the "imm." prefix). However it might help to simplify client code.

If a false value in the "mutable" field means "alleged immutable", then this change should be made; if it means "definitely immutable", it should not. We should decide this before the 1.6 release, otherwise clients would have to take into account the field not being present in any case.

comment:60 in reply to: ↑ 59 Changed at 2010-01-29T00:10:24Z by davidsarah

Replying to davidsarah:

I was slightly confused: there is already a "mutable" field (added in 3ac4a734e507abbf), ... Currently there is no "mutable" field.

My writing skills need work. I meant, currently there is no "mutable" field in the output of t=json for unknown nodes.

comment:61 Changed at 2010-01-29T03:06:09Z by davidsarah

Note that if (we only have the URI for the object itself, or the parent directory is mutable), and (the node doesn't have a rw_uri and its ro_uri doesn't have the "imm." prefix), then we don't know whether or not it is immutable. In this case we'd omit the "mutable" field.

Changed at 2010-01-29T03:25:19Z by davidsarah

Add mutable field to t=json output for unknown nodes, when mutability is known. Includes patch for #931 on which it is dependent.

comment:62 Changed at 2010-01-29T03:28:58Z by davidsarah

  • Keywords review-needed added; reviewed removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for review of t=json changes.

Note that there is also a minor bugfix for DeepCheckStreamer? in directory.py: it was giving a type of "file" for unknown nodes. Now it gives a type of "unknown", which makes it consistent with ManifestStreamer?.

Changed at 2010-01-29T03:38:53Z by davidsarah

Fix inaccurate comment in test_mutant_dirnodes_are_omitted. Show -IMM and -RO suffixes for types of immutable and read-only unknown nodes in directory listings. Add mutable field to t=json output for unknown nodes, when mutability is known. Fix example JSON in webapi.txt that cannot occur in practice.

comment:63 Changed at 2010-01-29T04:01:58Z by davidsarah

Apparently Python 2.4 doesn't support the 'foo if test else bar' syntax. Patch for that coming up.

Changed at 2010-01-29T04:02:44Z by davidsarah

Same patches as above but including Python 2.4 fixes.

comment:64 Changed at 2010-01-29T13:18:13Z by zooko

  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed in 3e35959e9b9130a4, f1fd703dedca8c45. Thank you!

comment:65 Changed at 2010-01-29T20:50:58Z by davidsarah

  • Keywords review-needed removed

Changed at 2010-01-30T07:03:26Z by davidsarah

Improvements to tests for UnknownNode?, and minor fix to a JSON example in webapi.txt

comment:66 Changed at 2010-01-30T07:08:44Z by zooko

Patches to improve tests and webapi doc for this issue: e2ee725d7d325b8d, ea3954372a06a36c

comment:67 Changed at 2010-02-02T05:58:37Z by davidsarah

  • Keywords news-done review-needed added

comment:68 Changed at 2010-02-02T05:58:54Z by davidsarah

  • Keywords review-needed removed
Note: See TracTickets for help on using tickets.