[tahoe-dev] tree size increased?

Brian Warner warner-tahoe at allmydata.com
Wed Dec 26 12:58:58 PST 2007


Hey zooko, so, I noticed a series of patches you pushed recently that
uncompressed the tarballs in misc/dependencies/, for the stated goal of
allowing the overall tahoe tarball to compress better. Was that goal
accomplished? I compared a tree before those changes to one after it, and
noticed:

 before:
  darcs checkout is 4M + 22M of _darcs/  = 26MB total
  sdist.tar.gz is 2.37MB
  sdist.tar.bz2 is 2.33MB

 after:
  darcs checkout is 12M + 37M of _darcs/  = 49MB total
  sdist.tar.gz is 5.15MB
  sdist.tar.bz2 is 3.96MB

So a developer's tree is bigger, the history we're carrying around in all
trees is bigger, and the overall tahoe tarball is bigger. Some of the
increase can be attributed to the dramatic size increases of the dependent
libraries that you've updated to include their own copies of upstream
dependencies. In particular, I think we have at least four copies of the
setuptools .egg in a tahoe checkout now, and I see multiple copies of
setuptools_darcs, argparse, and pyutil.

I can imagine the reasons for most of these changes (improving the desert
island behavior, making pycryptopp/zfec desert-island-buildable too, etc),
but I worry that the tree size is starting to get unreasonable.
setuptools_darcs contains maybe what, a hundred lines of code? (I see one
real .py file with 2440 bytes), but consumes 370kB in every tahoe tree, plus
a second 370kB in the _darcs pristine, plus the 326kB compressed patch that
created the setuptools_darcs-1.1.5.tar file. And each time setuptools_darcs
gets updated, the patch which removes one file and adds a new one will
contain an additional 2*326kB (I'm assuming all darcs patches are invertible,
so removing a file takes as large as patch as adding it).

I think you're optimizing the wrong thing here.. I have a dozen tahoe trees
on my laptop, and now they consume over half a gigabyte (588MB versus the
previous 312MB), but I only ever download the .tar.gz maybe once a month, and
even for users downloading it once a day, 2.37MB is not worth reducing. And I
worry that the desert-island solution is worse than the problem (maybe we
should consider auto-creating a .tar.gz which contains the support tarballs
but not put them in the SCM tree). And^3 I worry that tree size is one of
those insidiously-growing things that eventually makes you detest the state
that your tree has gotten into and wish for something simpler and cleaner,
and I don't want tahoe to be in that state yet.

So, can you tell me more about your feelings about tree size and what we
should be optimizing for? I assume that I'm missing a few things here.. what
kind of size optimization did you see in your tests that prompted you to make
the .tar.gz->.tar change?

Oh, and merry christmas! :)

 -Brian



More information about the tahoe-dev mailing list