[tahoe-dev] suggested changes to the web api

Tue Aug 14 14:52:23 PDT 2007

Folks, especially Brian:

Here is a big rewrite of webapi, including some changes to the API  
itself.  Brian has already persuaded me that we should go ahead and  
release allmydata.org tahoe v0.5 (hopefully tomorrow) without making  
all of these changes.

However, please examine this proposal.  We might need to make some of  
the changes right away.

Changes that I made include (but might not be limited to):

  * The big change: unify discussion of name-based and uri-based  
commands, and expect the reader to understand both at once, and  
refactor the document to have six use-case-oriented sections.

  * Include a read-write uri (if available) in the metadata instead  
of by a separate API call.

  * Signal mutability by the presence of a RW URI instead of by a  
separate "mutable" bool.

  * Remove mention of other metadata like Content-Type.

  * s/dirnode/directory/

  * lots of editing...

  * Add "?overwrite=" param for PUT.

  * Remove the explanation about the need to escape slashes (ticket  
#102).  (This change to the doc might need to be undone, depending on  
the disposition of #102.)

Regards,

Zooko

tickets mentioned in this message:

http://allmydata.org/trac/tahoe/ticket/102

------- begin appended webapi.txt

This document has six sections:

1.  the basic API for how to programmatically control your tahoe node
2.  convenience methods
3.  safety and security issues
4.  features for controlling your tahoe node from a standard web browser
5.  debugging and testing features
6.  XML-RPC (coming soon)

1. the basic API for how to programmatically control your tahoe node

a. connecting to the tahoe node

Writing "8011" into $NODEDIR/webport causes the node to run a  
webserver on
port 8011. Writing "tcp:8011:interface=127.0.0.1" into $NODEDIR/ 
webport does
the same but binds to the loopback interface, ensuring that only the  
programs
on the local host can connect. Using
"ssl:8011:privateKey=mykey.pem:certKey=cert.pem" would run an SSL  
server. See
twisted.application.strports for more details.

If $NODEDIR/webpassword exists, it will be used (somehow) to require  
HTTP
Digest Authentication for all webserver connections.  XXX specify how

b. file names

The node provides some small number of "virtual drives". In the 0.5
release, this number is two: the first is the global shared vdrive, the
second is the private non-shared vdrive. We will call these "global" and
"private".

For the purpose of this document, let us assume that the vdrives  
currently
contain the following directories and files:

global/
global/Documents/
global/Documents/notes.txt

private/
private/Pictures/
private/Pictures/tractors.jpg
private/Pictures/family/
private/Pictures/family/bobby.jpg

Within the webserver, there is a tree of resources. The top-level  
"vdrive"
resource gives access to files and directories in all of the user's  
virtual
drives. For example, the URL that corresponds to notes.txt would be:

http://localhost:8011/vdrive/global/Documents/notes.txt

and the URL for tractors.jpg would be:

http://localhost:8011/vdrive/private/Pictures/tractors.jpg

In addition, each directory has a corresponding URL. The Pictures URL  
is:

http://localhost:8011/vdrive/private/Pictures

c. URIs

A separate top-level namespace ("uri/" instead of "vdrive/") is used to
access to files and directories directly by URI, rather than by going  
through
the vdrive.

For example, this identifies a file or directory:

http://localhost:8011/uri/$URI

And this identifies a file or directory named "tractors.jpg" in a
subdirectory "Pictures" of the identified directory:

http://localhost:8011/uri/$URI/Pictures/tractors.jpg

In the following examples, "$URL" is a shorthand for a URL like the ones
above, either with "vdrive/" as the top level and a sequence of
slash-separated pathnames following, or with "uri/" as the top level,
followed by a URI, optionally followed by a sequence of slash-separated
pathnames.

Now, what can we do with these URLs? By varying the HTTP method
(GET/PUT/POST/DELETE) and by appending a type-indicating query  
argument, we
control what we want to do with the data and how it should be presented.

d. examining files or directories

   GET $URL?t=json

   This returns machine-parseable information about the indicated  
file or
   directory in the HTTP response body. This information contains a  
flag that
   indicates whether the thing is a file or a directory.

   If it is a file, then the information includes file size and URI,  
like
   this:

    [ 'filenode', { 'ro_uri': file_uri,
                    'size': bytes } ]

   If it is a directory, then it includes information about the  
children of
   this directory, as a mapping from child name to a set of metadata  
about the
   child (the same data that would appear in a corresponding GET? 
t=json of the
   child itself). Like this:

    [ 'dirnode', { 'rw_uri': read_write_uri,
                   'ro_uri': read_only_uri,
                   'children': children } ]

   In the above example, 'children' is a dictionary in which the keys  
are
   child names and the values depend upon whether the child is a file  
or a
   directory:

    'foo.txt': [ 'filenode', { 'ro_uri': uri, 'size': bytes } ]
    'subdir':  [ 'dirnode', { 'rw_uri': rwuri, 'ro_uri': rouri } ]

   note that the value is the same as the JSON representation of the  
child
   object (except that directories do not recurse -- the "children"  
entry of
   the child is omitted).

   Then the rw_uri field will be present in the information about a  
directory
   if and only if you have read-write access to that directory,

e. downloading a file

   GET $URL

   If the indicated object is a file, then this simply retrieves the  
contents
   of the file. The file's contents are provided in the body of the HTTP
   response.

   If the indicated object a directory, then this returns an HTML page,
   intended to be used by humans, which contains HREF links to all  
files and
   directories reachable from this directory. These HREF links do not  
have a
   t= argument, meaning that a human who follows them will get pages  
also
   meant for a human. It also contains forms to upload new files, and to
   delete files and directories. These forms use POST methods to do  
their job.

   You can add the "save=true" argument, which adds a 'Content- 
Disposition:
   attachment' header to prompt most web browsers to save the file to  
disk
   rather than attempting to display it.

   A filename (from which a MIME type can be derived) can be  
specified using a
   'filename=' query argument. This is especially useful if the $URL  
does not
   end with the name of the file (because it instead ends with the  
identifier
   of the file). This filename is also the one used if the 'save=true'
   argument is set. For example:

    GET http://localhost:8011/uri/$TRACTORS_URI?filename=tractors.jpg

f. uploading a file

   PUT http://localhost:8011/uri

   Upload a file, returning its URI as the HTTP response body. This  
does not
   make the file visible from the virtual drive -- to do that, see  
section
   1.h. below, or the convenience method in section 2.a..

g. creating a new directory

   PUT http://localhost:8011/uri?t=mkdir

   Create a new empty directory and return its URI as the HTTP  
response body.
   This does not make the newly created directory visible from the  
virtual
   drive, but you can use section 1.h. to attach it, or the  
convenience method
   in section 2.XXX.

h. attaching a file or directory as the child of an extant directory

   PUT $URL?t=uri

   This attaches a child (either a file or a directory) to the given  
directory
   $URL is required to indicate a directory as the second-to-last  
element and
   the desired filename as the last element, for example:

    PUT http://localhost:8011/uri/$URI_OF_SOME_DIR/Pictures/tractors.jpg
    PUT http://localhost:8011/uri/$URI_OF_SOME_DIR/tractors.jpg
    PUT http://localhost:8011/vdrive/private/Pictures/tractors.jpg

   The URI of the child is provided in the body of the HTTP request.

   There is an optional "?overwrite=" param whose value can be  
"true", "t",
   "1", "false", "f", or "0" (case-insensitive), and which defaults  
to "true".
   If the indicated directory already contains the given child name,  
then if
   overwrite is true then the value of that name is changed to be the  
new URI.
   If overwrite is false then an error is returned. XXX specify the  
error

   This can be used to attach a shared directory (a directory that other
   people can read or write) to the vdrive. Intermediate directories,  
if any,
   are created on-demand.

i. removing a name from a directory

   DELETE $URL

   This removes the given name from the given directory. $URL is  
required to
   indicate a directory as the second-to-last element and the name to  
remove
   from that directory as the last element, just as in section 1.g..

   Note that this does not actually delete the resource that the name  
points
   to from the tahoe grid -- it only removes this name in this  
directory. If
   there are other names in this directory or in other directories  
that point
   to the resource, then it will remain accessible through those  
paths. Even
   if all names pointing to this resource are removed from their parent
   directories, then if someone is in possession of the URI of this  
resource
   they can continue to access the resource through the URI. Only if  
a person
   is not in possession of the URI, and they do not have access to any
   directories which contain names pointing to this resource, are they
   prevented from accessing the resource.

2. convenience methods

a. uploading a file and attaching it to the vdrive

   PUT $URI

   Upload a file and link it into the the vdrive at the location  
specified by
   $URI. The last item in the $URI must be a filename, and the second- 
to-last
   item must identify a directory.

   It will create intermediate directories as necessary. The file's  
contents
   are taken from the body of the HTTP request. For convenience, the  
HTTP
   response contains the URI that results from uploading the file,  
although
   the client is not obligated to do anything with the URI. According  
to the
   HTTP/1.1 specification (rfc2616), this should return a 200 (OK)  
code when
   modifying an existing file, and a 201 (Created) code when creating  
a new
   file.

   To use this, run 'curl -T localfile http://localhost:8011/vdrive/ 
global/newfile'

3. safety and security issues -- names vs. URIs

The vdrive provides a mutable filesystem, but the ways that the  
filesystem
can change are limited. The only thing that can change is that the  
mapping
from child names to child objects that each directory contains can be  
changed
by adding a new child name pointing to an object, removing an  
existing child
name, or changing an existing child name to point to a different object.

Obviously if you query tahoe for information about the filesystem and  
then
act upon the filesystem (such as by getting a listing of the contents  
of a
directory and then adding a file to the directory), then the  
filesystem might
have been changed after you queried it and before you acted upon it.
However, if you use the URI instead of the pathname of an object when  
you act
upon the object, then the only change that can happen is when the  
object is a
directory then the set of child names it has might be different. If,  
on the
other hand, you act upon the object using its pathname, then a different
object might be in that place, which can result in more kinds of  
surprises.

For example, suppose you are writing code which recursively downloads  
the
contents of a directory. The first thing your code does is fetch the  
listing
of the contents of the directory. For each child that it fetched, if  
that
child is a file then it downloads the file, and if that child is a  
directory
then it recurses into that directory. Now, if the download and the  
recurse
actions are performed using the child's name, then the results might be
wrong, because for example a child name that pointed to a sub- 
directory when
you listed the directory might have been changed to point to a file  
(in which
case your attempt to recurse into it would result in an error and the  
file
would be skipped), or a child name that pointed to a file when you  
listed the
directory might now point to a sub-directory (in which case your  
attempt to
download the child would result in a file containing HTML text  
describing the
sub-directory!).

If your recursive algorithm uses the uri of the child instead of the  
name of
the child, then those kinds of mistakes just can't happen. Note that  
both the
child's name and the child's URI are included in the results of  
listing the
parent directory, so it isn't harder to use the URI for this purpose.

In general, use names if you want "whatever object (whether file or
directory) is found by following this name (or sequence of names)  
when my
request reaches the server". Use URIs if you want "this particular  
object".

4. features for controlling your tahoe node from a standard web browser

a. uri redirect

   GET http://localhost:8011/uri?uri=$URI

   This causes a redirect to /uri/$URI, and retains any additional query
   arguments (like filename= or save=). This is for the convenience  
of web
   forms which allow the user to paste in a URI (obtained through some
   out-of-band channel, like IM or email).

   Note that this form merely redirects to the specific file or  
directory
   indicated by the URI: unlike the GET /uri/$URI form, you cannot  
traverse to
   children by appending additional path segments to the URL.

b. web page offering rename

   GET $URL?t=rename-form&name=$CHILDNAME

   This provides a useful facility to browser-based user interfaces. It
   returns a page containing a form targetting the "POST $URL t=rename"
   functionality described below, with the provided $CHILDNAME  
present in the
   'from_name' field of that form. I.e. this presents a form offering to
   rename $CHILDNAME, requesting the new name, and submitting POST  
rename.

c. POST forms

   POST $URL
   t=upload
   name=childname  (optional)
   file=newfile
   This instructs the node to upload a file into the given directory.  
We need
   this because forms are the only way for a web browser to upload a  
file
   (browsers do not know how to do PUT or DELETE). The file's  
contents and the
   new child name will be included in the form's arguments. This can  
only be
   used to upload a single file at a time. To avoid confusion, name=  
is not
   allowed to contain a slash (a 400 Bad Request error will result).

   POST $URL
   t=mkdir
   name=childname

   This instructs the node to create a new empty directory. The name  
of the
   new child directory will be included in the form's arguments.

   POST $URL
   t=uri
   name=childname
   uri=newuri

   This instructs the node to attach a child that is referenced by  
URI (just
   like the PUT $URL?t=uri method). The name and URI of the new child
   will be included in the form's arguments.

   POST $URL
   t=delete
   name=childname

   This instructs the node to delete a file from the given directory.  
The name
   of the child to be deleted will be included in the form's arguments.

   POST $URL
   t=rename
   from_name=oldchildname
   to_name=newchildname

   This instructs the node to rename a child within the given  
directory. The
   child specified by 'from_name' is removed, and reattached as a  
child named
   for 'to_name'. This is unconditional and will replace any child  
already
   present under 'to_name', akin to 'mv -f' in unix parlance.

5. debugging and testing features

GET $URL?t=download&localfile=$LOCALPATH
GET $URL?t=download&localdir=$LOCALPATH

   The localfile= form instructs the node to download the given file  
and write
   it into the local filesystem at $LOCALPATH. The localdir= form  
instructs
   the node to recursively download everything from the given  
directory and
   below into the local filesystem. To avoid surprises, the  
localfile= form
   will signal an error if $URL actually refers to a directory,  
likewise if
   localdir= is used with a $URL that refers to a file.

   This request will only be accepted from an HTTP client connection
   originating at 127.0.0.1 . This request is most useful when the  
client node
   and the HTTP client are operated by the same user. $LOCALPATH  
should be an
   absolute pathname.

   This form is only implemented for testing purposes, because of a  
trivially
   easy attack: any web server that the local browser visits could  
serve an
   IMG tag that causes the local node to modify the local filesystem.
   Therefore this form is only enabled if you create a file named
   'webport_allow_localfile' in the node's base directory.

PUT $NEWURL?t=upload&localfile=$LOCALPATH
PUT $NEWURL?t=upload&localdir=$LOCALPATH

   This uploads a file or directory from the node's local filesystem  
to the
   vdrive. As with "GET $URL?t=download&localfile=$LOCALPATH", this  
request
   will only be accepted from an HTTP connection originating from  
127.0.0.1 .

   The localfile= form expects that $LOCALPATH will point to a file  
on the
   node's local filesystem, and causes the node to upload that one  
file into
   the vdrive at the given location. Any parent directories will be  
created in
   the vdrive as necessary.

   The localdir= form expects that $LOCALPATH will point to a  
directory on the
   node's local filesystem, and it causes the node to perform a  
recursive
   upload of the directory into the vdrive at the given location,  
creating
   parent directories as necessary. When the operation is complete, the
   directory referenced by $NEWURL will contain all of the files and
   directories that were present in $LOCALPATH, so this is equivalent  
to the
   unix commands:

    mkdir -p $NEWURL; cp -r $LOCALPATH/* $NEWURL/

   Note that the "curl" utility can be used to provoke this sort of  
recursive
   upload, since the -T option will make it use an HTTP 'PUT':

    curl -T /dev/null 'http://localhost:8011/vdrive/global/newdir? 
t=upload&localdir=/home/user/directory-to-upload'

   This form is only implemented for testing purposes, because any  
attacker's
   web server that a local browser visits could serve an IMG tag that  
causes
   the local node to modify the local filesystem. Therefore this form  
is only
   enabled if you create a file named 'webport_allow_localfile' in  
the node's
   base directory.

GET $URL?t=manifest

   Return an HTML-formatted manifest of the given directory, for  
debugging.

6. XMLRPC (coming soon)

   http://localhost:8011/xmlrpc

   This resource provides an XMLRPC server on which all of the previous
   operations can be expressed as function calls taking a "pathname"  
argument.
   This is provided for applications that want to think of everything  
in terms
   of XMLRPC.

    listdir(vdrivename, path) -> dict of (childname -> (stuff))
    put(vdrivename, path, contents) -> URI
    get(vdrivename, path) -> contents
    mkdir(vdrivename, path) -> URI
    put_localfile(vdrivename, path, localfilename) -> URI
    get_localfile(vdrivename, path, localfilename)
    put_localdir(vdrivename, path, localdirname)   # recursive
    get_localdir(vdrivename, path, localdirname)   # recursive
    put_uri(vdrivename, path, URI)

    etc..