#683 new defect

handle arbitrary URIs in directories

Reported by: kpreid Owned by:
Priority: major Milestone: undecided
Component: code-dirnodes Version: 1.3.0
Keywords: newcaps newurls revocation Cc: jeremy@…
Launchpad Bug:

Description

Tahoe has things it calls URIs which identify files. For example: URI:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

However, they are not URIs (which term is defined by RFC); in particular, URIs have the syntax <scheme>:<scheme-specific-part>, where the possible values for <scheme> are administered by the IETF:

http://www.iana.org/assignments/uri-schemes.html

Since Tahoe "URIs" do have the properties a URI should, I believe the appropriate fix for this is to register a tahoe: URI scheme. As far as I know, the "URI:" part of a Tahoe URI is always the same, so it conveys no information and can be replaced with this for only a two-character addition: tahoe:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

--- The remainder of this text is not a matter of correctness but additional functionality ---

Furthermore, so that these URIs are also URLs (readily usable for contacting the resource with no local context), I would recommend including in the the syntax of the scheme-specific-path a provision for an OPTIONAL location hint for the grid, i.e. some host that can be contacted by some protocol that can put the client in communication with appropriate storage servers. This is essentially the same provision as in CapTP URIs; borrowing their syntax, it would be like:

tahoe://example.net:1234,192.168.33.91:1234/CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

That is, tahoe:// comma-separated list of hosts / current Tahoe-URI components.


Besides correctness in terminology, another advantage of having registered Tahoe URI syntax is that Tahoe files can participate as first-class entities in URI-based systems, and vice-versa.

For example, if Tahoe directories could store arbitrary URIs, of which Tahoe URIs were a special case, then they could include references not just to other things in the same Tahoe grid, but any other URLs-as-capabilities system, including other Tahoe grids or Waterken servers or ...whatever. You could use the data: URI scheme to store sufficiently small files directly in directories. (I vaguely recall that Tahoe might already have that capability.)

If there is a registered Tahoe scheme, then systems which work exclusively with URLs, but are extensible to handle additional URL schemes, can be extended to support Tahoe, rather than necessarily going through a Tahoe web gateway, thus providing useful information (e.g. 'this is immutable'), perhaps more efficient downloading, etc.

Change History (9)

comment:1 Changed at 2009-05-13T19:04:31Z by zooko

Because of Kevin Reid's post on tahoe-dev: http://allmydata.org/pipermail/tahoe-dev/2009-May/001770.html

I have realized that this ticket contains two issues: 1. making tahoe URIs be real, official URIs so that they fit into the way other code such as web browsers use URIs, and 2. extending Tahoe directories to hold arbitrary URIs and not just tahoe caps.

They are both interesting prospects to me, and certainly related, but we should probably split off a separate ticket, so people can understand them as features that could be separately implemented.

comment:2 Changed at 2009-05-13T19:29:59Z by kpreid

To be a little more explicit: in the message Zooko linked to, I allude to that if Tahoe directories contained general URIs, then you could insert a directory entry which is a revocable membrane to a Tahoe directory; this directory entry is not itself a Tahoe-type-URI because Tahoe, being distributed, cannot support revocation (there is no relied-upon agent in the grid to remember to abort access).

(This wouldn't just require inserting URIs in tahoe directories, though; it would also require that clients are willing to switch between the crypto-and-DHT-based ('offline', in a sense) Tahoe protocols and a talk-to-one-server-which-proxies-for-you ('online') protocol. But storing URIs in directories at least lets clients *have the option of being so fancy*.)

comment:3 Changed at 2009-05-14T22:21:50Z by zooko

  • Summary changed from So-called URIs aren't to handle arbitrary URIs in directories

Okay, the part about putting Tahoe caps into real URIs is already ticketed: #432 (writing down filecaps: revise URI scheme).

I'm changing this ticket to be about the second part: handling arbitrary URIs inside Tahoe directories (such as using some sort of plugin system?).

comment:4 Changed at 2009-07-03T01:00:04Z by warner

  • Component changed from unknown to code-dirnodes
  • Owner nobody deleted

comment:5 Changed at 2009-07-03T01:25:38Z by warner

ef1b6ae8e312af21 changes the way dirnodes are processed to tolerate unrecognized URIs. This should make tahoe-1.5 able to survive new formats that come from the future (i.e. if a 1.5 client tries to read or modify a directory which has new-format entries which were placed there by some 1.6-or-beyond version). It's at least a start.

comment:6 Changed at 2009-10-28T03:33:50Z by davidsarah

  • Keywords newcaps added

Tagging issues relevant to new cap protocol design.

comment:7 Changed at 2010-01-17T19:42:44Z by davidsarah

  • Keywords newurls added

comment:8 Changed at 2010-03-06T01:11:45Z by jsgf

  • Cc jeremy@… added

Tahoe "URI"s are specific to a particular grid; without that piece of information you have no particular way of knowing how to access the referenced object. Including the host/IP information as a hint in a tahoe: URI is useful, but they're only hints; they can become invalid without the underlying objecting being invalid.

I think, therefore, a tahoe: URI must include some kind of unambiguous grid identifier so that it uniquely globally identifies a particular object. Some kind of connection hint may also be useful, but that seems like a layering violation (since IPv4, or IP in general, is not the only possible transport for Tahoe).

I guess this is related to issue #403.

comment:9 Changed at 2012-09-10T19:54:26Z by zooko

  • Keywords revocation added
Note: See TracTickets for help on using tickets.