[tahoe-dev] "Elk Point" design for mutable, add-only, and immutable files

David-Sarah Hopwood david-sarah at jacaranda.org
Fri Sep 11 08:19:19 PDT 2009


I've designed a new version of the cap protocol I posted a few days
ago, that addresses Brian's comments about roadblock attacks.

As well as ordinary mutable files, the mutable variant can also
support a basic version of the add-only collection semantics discussed
in <http://allmydata.org/trac/tahoe/ticket/795>.
(The appendcap in that ticket is called the "write/add only cap" in the
diagrams linked to below, and the writecap in the ticket is the
"master cap". It would be easy to extend this further by using
additional signing keys.)

It's easier to refer to protocol designs if they have meaningful and
memorable names, so I've called this one Elk Point (a beauty spot on
the edge of Lake Tahoe; see Google Earth or Maps for pics).

Diagram for mutable files or add-only buckets:
<http://jacaranda.org/tahoe/mutable-addonly-elkpoint-2.png>
<http://jacaranda.org/tahoe/mutable-addonly-elkpoint-2.svg>

Diagram for immutable files:
<http://jacaranda.org/tahoe/immutable-elkpoint-2.png>
<http://jacaranda.org/tahoe/immutable-elkpoint-2.svg>

[To get a previous design that I was considering that used an extra
public/private signature key pair, remove the "-2" in these URLs. That
version was way overcomplicated, so I'm not going to describe it here.]


The Elk Point designs use the idea of extending the read cap with a
relatively short value (T, or length t bits) that is used to resist
roadblock attacks; I had arrived at that idea independently before
reading Zooko's note describing something very similar.

It also uses a longer storage index to avoid accidental collisions.
(The maximum number of distinct storage indices that are possible
will be determined by the length of a read cap, i.e. approximately
2^(n+t), provided that it is not further constrained by the length
of the S part of the storage index.)

In terms of Brian's "C/Ir/I" analysis, the read caps have n bits
that are both "C" (confidentiality) and "Ir" (integrity usable only
by readcap holder), plus t bits that are "I" (integrity usable by
anyone). (Note however that it is quite tricky to make sure that
an attacker must treat all the I/Ir bits as a combined hash for
the purpose of collision resistance, although I think this is done
correctly.)

The verify caps contain ell bits that duplicate the S part of the
storage index. This is necessary because we want requests by a
verifier to look the same as requests by a reader (otherwise, a
malicious server could easily provide correct data for verification
requests but not for read requests). It also contains m "I" bits.

The read/verify caps contain n "C" bits and m "I" bits.

A conservative choice of parameters might be

  immutable: n = 128, t = 128, ell = 160, m = 160.
             read caps 256 bits, verify caps 320 bits, read/verify 228 bits.
             2^128 confidentiality and collision resistance; roadblock
             attacks are effectively impossible.

  mutable:   n = 128, t = 50,  ell = 160, m = 160.
             read caps 178 bits, verify caps 320 bits, read/verify 228 bits.
             2^128 confidentiality; collision resistance not relevant.
             Resistance to roadblock attacks is sufficient provided
             strategy 2 described below is used.


Brian Warner wrote:
> David-Sarah Hopwood wrote:
> 
>>>  * verifycap cannot be offline-derived from readcap:
>>>    ...
>>>    One mitigation strategy would be to store both readcap and
>>>    verifycap in dirnodes, effectively caching the verifycap
>>>    computation.
>>
>> Given that the combined (readcap, H(v, k1_enc)) is as short as just
>> the readcap in any alternative scheme, this seems quite acceptable to
>> me.
> 
> Yeah, it's just that it'll require a new dirnode format,

I had the impression that the plan was to [*] adopt the new cap
formats at the same time as introducing the features of being able
to store arbitrary URIs in dirnodes, and representing Tahoe caps as
real URIs (http://allmydata.org/trac/tahoe/wiki/NewCapDesign).
Those features would presumably need a new dirnode format anyway.

[*] famous last words?

> and it means
> that the "add_child" webapi operation (which accepts a childname and
> childcap, and adds a new child to a directory) will have to first run
> out to the servers and figure out the verifycap.

Yes, if it's not given a read/verifycap for the child.

>>>  * but since storage-index != verifycap (i.e. H(UEBhash+k1enc)),
>>>    servers will be unable to completely validate their shares. They
>>>    can confirm that everything (including K1enc, thanks to
>>>    David-Sarah's suggestion) matches the verifycap, but they can't
>>>    tell that the verifycap matches the storage-index under which the
>>>    share is stored (i.e. they'd be unable to detect two swapped
>>>    sharefiles). This permits the "roadblock" attack and generally
>>>    misses our goals of allowing full server-side validation.
>>
>> That could be fixed by including the storage index in the verifycap,
>> i.e. (storage_index, H(v, k1_enc)).
> 
> Nope :(. You can give both the storage-index and the verifycap
> (H(UEBhash+k1enc)) to the server, but they've got not way to confirm
> that they two are related.. they just have to take your word for it.

Adding the storage-index to the input of the hash that gave V' in the
previous protocol was supposed to fix that, but it didn't do so in a
way that would have worked to prevent roadblock attacks, because if
more than one record is submitted with the same storage index, the
server has no way to decide which is correct without the read key.

In both variants of Elk Point, roadblock attacks are made more difficult
(or effectively impossible depending on the cap length/security tradeoff)
because the attacker must find a preimage for the t-bit value T, which is
a prefix of the hash that is used in the verify cap.

There are two possible strategies that can be used:

1. Make t large enough that an attacker is completely unable to find
   a preimage for T before all the shares are uploaded. If the attacker
   finds a preimage after the shares are uploaded, it is too late:
   the servers that have shares will have associated the storage index
   S || T with the triple (K1_enc, Dhash, V), which is also sufficient
   to verify the ciphertext in both the mutable and immutable cases.
   Therefore the attacker can be prevented from submitting any share
   that has the same storage index, but different values for any of the
   other fields.

2. Temporarily allow duplicate entries for the same storage index (as
   also suggested by Zooko). In this case the server accepts duplicate
   entries for a given S || T and will submit all of them on a
   request by a client. The client knows which record s is correct
   by checking consistency with R from its read cap:
   hash_n(s.KX_verify, s.Dhash, s.UEBhash, decrypt[R](s.K1_enc)) = R.
   To defeat this check an attacker would need to do approximately
   2^n work.

These strategies can be combined, by having a server stop accepting
storage records with new (K1_enc, Dhash, V) triples for a given storage
index, at some specified timeout after it has accepted the first record
for that index (sufficiently long for the legitimate uploader to be able
to upload all shares). An attacker can't target a storage index before
its upload starts, because they don't know what the index will be. So,
this limits the time they have available to perform a DoS attack by
adding colliding records.

In any case, note that in the variant of Elk Point for immutable files,
the extra t bits in a read cap also contributes to collision resistance,
i.e. you get (n+t)/2 bits of collision resistance (the optimum for an
n+t bit read cap) instead of n bits.

> To visualize this, imagine that you've created share A, with SI_A and
> Vcap_A (=H(v,k1_enc)) , and you've also created share B with SI_B and
> Vcap_B. Now because you're evil, you send share A, SI_B, and Vcap_A to
> the server (i.e. you put share A in the place where share B ought to
> go). The server can confirm that share A matches Vcap_A. But because
> they don't get the readcap, they can't do the operation to show that
> SI_B doesn't match share A.

Actually, that particular attack would have failed even in my previous
protocol, because of the inclusion of the storage index in the hash:

  V'_A != hash_n(SI_B, K1_enc_A, (mutable ? K_verify_A : ciphertext_A)).

However, this doesn't really help, because the attacker would actually
send (SI_B, hash_n(SI_B, K1_enc_A, (mutable ? K_verify_A : ciphertext_A))
as the verifycap.

In the Elk Point protocol, the attacker must find a (K1_enc, Dhash, V)
triple that collides with an existing T. As the notes next to the
diagrams say, this requires 2^(t-k-p) work to match any of 2^k T values
with success probability 2^-p. I think that Zooko's suggestion of
t = 20 bits is too short, BTW (even if multiple-target attacks are not
considered); a million hash computations is almost nothing these days.
If I've done the sums correctly, Crypto++ on the machine benchmarked at
<http://www.cryptopp.com/benchmarks.html> hashes 2^20 blocks of SHA-256
in just over half a second.

> I still don't know how to solve this one for immutable files.

Elk Point solves it using essentially the same method for both;
the structure of the mutable and immutable variants is almost
identical. (For an alternative solution that uses public keys even
in the immutable file protocol, see
<http://jacaranda.org/tahoe/immutable-elkpoint.png>.)

> For mutable files, we want the verifycap and storage-index to both
> be the public key (or its hash).

In Elk Point the verify cap contains an m-bit hash derived from the
public key (KC_verify), and m can be chosen as long as needed to
provide full integrity (since it doesn't affect the read cap size).
The storage index contains a t-bit hash derived from the public key,
which is sufficient by the argument above.

>> dirnodes still only need to store (readcap, H(v, k1_enc)), since
>> the readcap can be hashed to get the storage index.
> 
> Yeah, unless/until we want traversalcaps, for which we need to store the
> verifycap in a place where mere traversalcap-holders can see it. That
> probably means storing some redundant information.

Yes, that's correct I think.

>>>  * we wouldn't be able to directly use our permuted-list Tahoe2
>>>    peer-selection protocol, since we won't know the storage-index
>>>    (and thus the permuted list) until after we've uploaded all the
>>>    shares.
>>
>> Zooko's protocol still works if r = H(k1, H(plaintext)), rather than r
>> = H(k1, H(ciphertext)). In that case you would only need to know the
>> hash of the plaintext, not the encoded ciphertext, to calculate the
>> storage-index. Does that help?
> 
> It would reduce the amount of work needed to compute the storage index,
> but I think it would also cease to provide integrity checking on the
> ciphertext. I think we really need r=H(k1,H(UEB)) to make the integrity
> properties work.

The immutable variant of Elk Point uses the UEBhash here, exactly as in
your modification to my earlier diagram.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com





More information about the tahoe-dev mailing list