#641 new defect

tahoe backup should be able to backup symlinks

Reported by: francois Owned by:
Priority: normal Milestone: undecided
Component: code-frontend-cli Version: 1.3.0
Keywords: tahoe-backup symlink reliability news-done Cc: alberto@…, jg71@…
Launchpad Bug:

Description (last modified by zooko)

Running tahoe backup on a directory containing a symbolic link currently doesn't work. It raises the following exception instead.

Traceback (most recent call last):
  File "/home/francois/dev/tahoe/support/bin/tahoe", line 8, in <module>
    load_entry_point('allmydata-tahoe==1.2.0-r3615', 'console_scripts', 'tahoe')()
  File "/home/francois/dev/tahoe/src/allmydata/scripts/runner.py", line 91, in run
    rc = runner(sys.argv[1:])
  File "/home/francois/dev/tahoe/src/allmydata/scripts/runner.py", line 78, in runner
    rc = cli.dispatch[command](so)
  File "/home/francois/dev/tahoe/src/allmydata/scripts/cli.py", line 359, in backup
    rc = tahoe_backup.backup(options)
  File "/home/francois/dev/tahoe/src/allmydata/scripts/tahoe_backup.py", line 353, in backup
    return bu.run()
  File "/home/francois/dev/tahoe/src/allmydata/scripts/tahoe_backup.py", line 198, in run
    new_backup_dircap = self.process(options.from_dir, latest_backup_dircap)
  File "/home/francois/dev/tahoe/src/allmydata/scripts/tahoe_backup.py", line 245, in process
    newchilddircap = self.process(childpath, oldchildcap)
  File "/home/francois/dev/tahoe/src/allmydata/scripts/tahoe_backup.py", line 245, in process
    newchilddircap = self.process(childpath, oldchildcap)
  File "/home/francois/dev/tahoe/src/allmydata/scripts/tahoe_backup.py", line 245, in process
    newchilddircap = self.process(childpath, oldchildcap)
  File "/home/francois/dev/tahoe/src/allmydata/scripts/tahoe_backup.py", line 251, in process
    raise RuntimeError("how do I back this up?" % childpath)
RuntimeError: how do I back this up?

Attachments (4)

bug-641.dpatch (18.7 KB) - added by francois at 2009-02-24T18:35:20Z.
small_symlink_test.patch (2.4 KB) - added by azazel at 2009-02-25T01:09:49Z.
half-fix-for-bug-_641_.dpatch (23.4 KB) - added by azazel at 2009-02-25T01:10:03Z.
641-symlink-depth-limit-1.darcs.patch (65.9 KB) - added by socrates1024 at 2011-11-30T21:12:04Z.

Download all attachments as: .zip

Change History (25)

comment:1 Changed at 2009-02-24T01:28:53Z by francois

Well, it's perhaps easier to discard them for now and simply display a warning message.

Changed at 2009-02-24T18:35:20Z by francois

comment:2 Changed at 2009-02-24T18:36:12Z by francois

Here's a patch which makes tahoe backup ignore symlinks.

comment:3 Changed at 2009-02-25T01:09:29Z by azazel

  • Cc alberto@… added

I've made a patch, which instead of yours, skips everything that isn't a file or a directory. This also work for file that are unix sockets, devices and so on. Please note that really non-dangling links (targets) gets backupped with or without your patch. Just dangling links are dangerous. I've attached to this ticket a patch file, 'small_symlink_test.patch' really an hack that alters your code to do a much more simple test without using any other function call or temp dir. If runner under linux it demonstrates that links with real target works, and that maybe your test code fails somewhere in being a real useful test? Now i'm too tired and i'll look at it more in detail tomorrow, maybe i'll end up with a franken-patch that will glue the best of the two. Have a look at my patches.

Changed at 2009-02-25T01:09:49Z by azazel

Changed at 2009-02-25T01:10:03Z by azazel

comment:4 Changed at 2009-02-25T02:20:36Z by swillden

While you're at it, you might want to consider also skipping directories which are on other devices. I think it's generally a bad idea to recurse into a network share unless it's been specifically requested. To do that, just look at the st_dev field from lstat. If it doesn't match the st_dev of the parent directory, skip it.

This one is somewhat debatable. For me, I'd rather have it skip network shares because my file server has terabytes of stuff on it and if the backup process goes in there it will never get to the rest of the stuff I want it to back up. Perhaps others have a different perspective.

comment:5 Changed at 2009-04-09T00:09:50Z by zooko

What's the status of this patch? I've been running it in one my local sandboxes for weeks now, and I just now obliterated those patches in order to test something closer to current trunk. It looks like none of the patches in this ticket has good unit tests yet.

comment:6 Changed at 2009-05-24T21:20:32Z by francois

What about mimicking rsync behavior ? It's probably much more intuitive for users to have a consistent default behavior while allowing special cases by the use of additional CLI arguments.

By default, if no special argument given, follow symlinks, cross filesystem boundaries and don't save any special files (fifo, devices and sockets). In case of dangling symlink, display a warning and continue.

Implement new CLI arguments to change this behavior:

 -x, --one-file-system   don’t cross filesystem boundaries
 --devices               preserve device files
 --specials              preserve special files
 -l, --links             copy symlinks as symlinks

Note that implementation that last three options requires a way to store file type and associated parameters in metadata.

comment:7 Changed at 2009-12-06T14:13:55Z by francois

  • Keywords tahoe-backup added

comment:8 Changed at 2009-12-20T20:54:32Z by warner

I've started using 'tahoe backup' for serious personal use, so I'm starting to run into these sorts of problems. My first workaround was to hack my "tahoe backup" client to skip over symlinks.

I like the idea of matching rsync's options, except that we don't have a way to record non-files yet, so we can't actually implement --devices, --specials, or --links. Our current default behavior is to follow directory symlinks, but abort when we encounter a file symlink.

If our cap-string scheme were general enough, I'd say we should create a cap type that says "here is a filecap, treat its contents as the target of a symlink" (just like our dircaps say "here is a filecap, treat its contents as an encoded directory table"). But that's a deeper change.. still appropriate for this ticket, which after all says "tahoe backup should be able to backup symlinks", but represents more work than I want to do right now.

Right now, I just want to be able to use "tahoe backup" even though my home directory has a couple of symlinks in it. I'd be happy with an option to skip symlinks altogether (whether they point to files or directories), or to skip file-symlinks. And I'd be happy if we always skipped the special things like devices and sockets.. I don't have any of those in my home directory.. they're only in /tmp/ and /dev/ and places that I'm not yet trying to back up.

comment:9 Changed at 2009-12-21T00:30:04Z by davidsarah

  • Keywords symlink reliability added

#729 is an instance of the same problem.

comment:10 Changed at 2010-01-27T22:14:24Z by warner

for now (i.e. for 1.6.0), I'm going to have "tahoe backup" skip all symlinks, emitting the same WARNING: cannot backup special file %s message that you get with device files and named pipes.

comment:11 Changed at 2010-02-02T06:04:39Z by davidsarah

  • Keywords news-done added

comment:12 Changed at 2011-03-21T16:26:08Z by davidsarah

From the duplicate #1380 filed by gdt:

When running backup on a directory (which is in coda, which probably doesn't matter), I get

WARNING: cannot backup symlink '/blah/blah'

I consider symlinks important. I realize tahoe doesn't have them (maybe it should) but this points out that tahoe backup is not a satisfactory general solution. (I would argue that the backup program and the filesystem used to store the files comprising the backup database should be independent anyway.)

Changed at 2011-11-30T21:12:04Z by socrates1024

comment:13 Changed at 2011-11-30T21:15:30Z by socrates1024

I would also like for "tahoe backup" to handle symlinks. Most specifically, I like to symlink directories I want backed-up into my main "Dropbox" folder (the target of "tahoe backup" in my crontab).

After a few experiments with Dropbox, it seems that Dropbox 'follows' symlinks to a limit depth, but it doesn't 'preserve' the symlinks (i.e. it does not behave like rsync --links). There seem to be a handful of hazards with following symlinks: you can have infinite recursion if circular symlinks aren't detected, and even without recursion, symlinks can cause redundant data to be stored.

I'm attaching a patch just to show my approach so far, to enforce a symlink depth limit of 3 (for directories only). I'll look into making tests that show how this approach behaves. For my immediate personal needs, this is already a solution.

comment:14 Changed at 2012-03-31T03:05:38Z by amiller

I rewrote my previous patch from 4 months ago (I forgot I ever posted it here) but nothing has changed in my approach.

I have now added a unit test that creates a directory with a symlink cycle and shows what happens. Cycles are only followed up to 3 levels deep. Other notable behavior is that multiple symlinks to the same file will be uploaded to tahoe_lafs multiple times as separate files.

https://github.com/amiller/tahoe-lafs/pull/1.patch

comment:15 Changed at 2012-03-31T03:07:37Z by davidsarah

  • Priority changed from minor to normal

comment:16 Changed at 2012-04-01T02:02:12Z by amiller

  • Keywords review-needed added

comment:17 Changed at 2012-04-12T00:57:53Z by amiller

  • Keywords design-review-needed added

comment:18 follow-up: Changed at 2012-05-15T17:13:35Z by zooko

Per this mailing list discussion, a better way to detect cycles than counting how many symlinks you've traversed is to examine the dev and inode of each thing and raise an exception about recursive symlinks if you encounter the same one a second time. That way we can handle an arbitrarily deep nest of symlinks.

Here's some code I wrote for a different tool that uses dev and inode to identify files:

https://tahoe-lafs.org/trac/dupfilefind/browser/trunk/dupfilefind/dff.py?annotate=blame

Last edited at 2012-05-15T17:14:10Z by zooko (previous) (diff)

comment:19 in reply to: ↑ 18 Changed at 2012-10-04T20:42:42Z by jg71

  • Cc jg71@… added

Replying to zooko:

imho, it would be a good idea to keep backing up symlinks optional (well, /make/ it optional)

comment:20 Changed at 2013-07-12T16:54:42Z by zooko

  • Description modified (diff)
  • Keywords review-needed design-review-needed removed

I don't want the "limit it to K levels deep" approach, so I'm unsetting review-needed. Thank you for your contribution, amiller!

comment:21 Changed at 2013-12-06T16:38:30Z by amiller

I'm not sure the status of this ticket... but I wanted to past along my github commit, which includes tests and is currently rebased against matser. https://github.com/amiller/tahoe-lafs/commit/3deafed1c790e076481032536260a29ba2007401

Note: See TracTickets for help on using tickets.