[tahoe-dev] how to encrypt and integrity-check with only one value (was: Re: two reasons not to use semi-private keys in our new cap design)

Thu Sep 10 09:13:49 PDT 2009

Dear David-Sarah and Brian:

Hello, I am slowly catching up on the burst of crypto cap creativity  
that you two posted over the last few days.

On Monday,2009-09-07, at 1:48 , Brian Warner wrote:

>  * we can't determine the storage-index until after we've encoded the
>    entire file (which generally means after we've uploaded it). So we
>    need a new uploader protocol that lets us upload to an as-yet- 
> unnamed
>    slot, and then provide the slot's storage-index at the very end of
>    the process. This is more work, but it isn't a huge deal.

Remember that I really, really want this anyway, because this is  
necessary to have "one-pass" == "on-line" upload.  Imagine that you  
are a tiny embedded machine with little RAM and little or no disk.   
Your client opens an HTTP connection to you and starts uploading the  
plaintext of a huge file, expecting you to store it on a Tahoe-LAFS  
grid.  You need to (a) pick a random encryption key, (b) perform  
encryption, erasure-coding, and computation of the verification data,  
(c) send the resulting encrypted shares and verification data to  
storage servers.  You have to do all of this in an "on-line" way,  
i.e. you can't store a lot of intermediate data somewhere while  
waiting to see the end of the plaintext.  Then, (d) return the  
resulting read-cap to the client as quickly as possible after the  
client finishes sending you the plaintext.  This is ticket #320.

>  * we wouldn't be able to directly use our permuted-list Tahoe2
>    peer-selection protocol, since we won't know the storage-index (and
>    thus the permuted list) until after we've uploaded all the  
> shares. I
>    think we'd have to go with the "server-selection-index" idea: a  
> much
>    shorter string (since it only needs to provide load-balancing, not
>    collision resistance), either randomly generated or derived from a
>    salted CHK hash (and thus computable before encoding/upload),  
> used to
>    permute the peerlist. This string must be included in the readcap,
>    increasing it's length, but we could probably get away with  
> maybe 20
>    bits or so.

Argh!  You are right!  Another few bits needed in the readcap!  Boo  
hoo.  :-(

> So, while I like the one-cryptovalue trick, I'm unsatisfied with both
> the lack of server-side validation and offline readcap-to-verifycap
> attenuation, and the separate SSI value makes me slightly nervous.

Re: server-side validation, what do you think of my proposal in [1]?   
It lets the server fully validate the verify-cap, and readers carry  
around just enough of the verify cap to give themselves a massive  
advantage (a million to one) over DoS'ers.

Re: offline diminishing readcap-to-verifycap, I liked your and David- 
Sarah's comments about storing the verifycap with the readcap  
sometimes.  In general, each kind of cap could have a base part --  
the minimal information which is necessary and sufficient to be a cap  
(assuming full access to servers) -- plus it could have an "extended"  
part -- pieces that you can always get from the servers if you have  
the base part, but you can save round-trips if you have the extended  
part.  For read-caps, the minimal part could be the crypto value, the  
server-selection-index (boo hoo) and a 20-bit prefix of the  
verifycap.  The extended part could be the full verify-cap and the  
k_enc.  Or maybe the extended part could be the full public key and  
the read key!

Then it would be up to the user of the cap to decide whether to use  
the smallest possible cap or to use the extended cap in order to save  
round-trips when dereferencing or diminishing it.

Re: separate SSI (server-selection-index) value, what makes you  
nervous about it?  Personally, I like the idea of separating the data  
(crypto) layer from the network (server-selection) layer.  Some grids  
might have a server-selection policy that you always query the  
servers in increasing order of network round trip time, regardless of  
which cap you are looking for.  Those grids wouldn't need a server- 
selection-index at all.  Others might accompany each of their caps  
with a description of which servers each share was last seen on.   
That would be in a sense a very large, optional SSI.  (Hm, and it  
would act a bit like a slow, persistent BitTorrent tracker.  :-))

Is the fact that people might eventually use such crazy server- 
selection policies (that we haven't yet vetted) one of the things  
that makes you nervous about separating out the SSI?  :-)

Regards,

Zooko

[1] http://allmydata.org/pipermail/tahoe-dev/2009-September/002829.html

tickets mentioned in this letter:

http://allmydata.org/trac/tahoe/ticket/320 # add streaming (on-line)  
upload to HTTP interface