#640 new enhancement

Cli fs operations refactoring

Reported by: azazel Owned by:
Priority: major Milestone: undecided
Component: code-frontend-cli Version: 1.3.0
Keywords: cleanup Cc:
Launchpad Bug:

Description (last modified by davidsarah)

Each of the cli commands implements its own logic to do fs operations, both local-side and tahoe-side. I think that most of the command modules should be refactored to use common code. This is for many reasons among:

  • remove duplication;
  • code comprehensibility and normalization of behavior;
  • spread availability of improvements found in latter commands (backup) to other commands;
  • easier implementation of new features like non-ascii charset handling for filenames;
  • better handling of logging?

The steps in order to do that (that I can see):

  1. Recognize the needs of each command in term of fs manipulation;
  1. Look at the best implementation available and eventually improve it;
  1. maybe design a minimal layer that will help handling alias/path/cap based operations and other cases.

Other interesting points to discuss:

  • See whether a replacement of httplib with twisted can help in parallelizing and increasing the bandwidth usage.

Let's discuss it!

Change History (4)

comment:1 Changed at 2009-02-24T19:25:33Z by zooko

azazel: Those are good reasons! We already have reasonable test coverage for most cli commands, so after doing this refactoring it should be possible to look at the per-line code coverage output and see that the tests are thoroughly exercising the code in the new refactored version.

I don't have time to do this myself (see my blog for notes about what I'm doing), but I would happily review and accept patches that do this (once the buildbot is green).

comment:2 Changed at 2010-03-25T02:22:21Z by davidsarah

  • Description modified (diff)
  • Keywords cleanup added

See whether a replacement of httplib with twisted can help in parallelizing and increasing the bandwidth usage.

That should probably be a separate ticket.

Last edited at 2010-11-21T20:21:28Z by davidsarah (previous) (diff)

comment:3 follow-up: Changed at 2010-11-21T18:48:59Z by nejucomo

I read over the code to understand how to get from my shell into ./src/allmydata/scripts/* and it seems overly complex:

./bin/tahoe is a python script generated from ./bin/tahoe-script.template.

Q1: Why is this generated from a template? That seems unpythonic and unnecessary. When I diff the template and the generated script on my Mac, the only change is to replace "#!/bin/false" with "#!/usr/bin/python".

./bin/tahoe does a bunch of platform difference wrangling and eventually calls ./support/bin/tahoe.

Q2: Can any of the complexity in ./bin/tahoe be removed?

./support/bin/tahoe is very short and simply calls: load_entry_point('allmydata-tahoe==1.8.0', 'console_scripts', 'tahoe')()

Q3: Why does ./support/bin/tahoe exist? Why doesn't ./bin/tahoe call load_entry_point directly?

Q4: Why is the load_entry_point mechanism used? Is it necessary?

load_entry_point() resolves a mapping for "console_scripts" for tahoe to ./src/allmydata/scripts/runner.py's run() function. This mapping is configured in ./setup.py.

Q5: Why doesn't ./bin/tahoe call runner.py's run() directly?

Q6: Why are there two separate packaging processes necessary to make the commandline work? One being script template generation, the other being entry_point registration.

My perspective probably doesn't account for use cases, but I hope by asking these questions we might notice potential simplifications.

comment:4 in reply to: ↑ 3 Changed at 2010-11-21T20:19:15Z by davidsarah

Replying to nejucomo:

I read over the code to understand how to get from my shell into ./src/allmydata/scripts/* and it seems overly complex:

./bin/tahoe is a python script generated from ./bin/tahoe-script.template.

Q1: Why is this generated from a template? That seems unpythonic and unnecessary. When I diff the template and the generated script on my Mac, the only change is to replace "#!/bin/false" with "#!/usr/bin/python".

The intention is that the script runs with the same Python interpreter that ran setup.py build. A shebang line must specify an absolute path. It would be possible to use "#!/usr/bin/env python", but note that some of the built dependencies contain compiled code that is specific to a Python version, so upgrading the system Python (that is first on the PATH) would then cause an existing installation to fail.

./bin/tahoe does a bunch of platform difference wrangling and eventually calls ./support/bin/tahoe.

Q2: Can any of the complexity in ./bin/tahoe be removed?

Yes. Most of the complexity is just due to the fact that we have two scripts (this one and support/bin/tahoe). The reason why we need two scripts is that the one generated by setuptools doesn't set up the sys.path as we want, and setting the PYTHONPATH environment variable is the documented way to change the initial sys.path. The Windows-specific code is necessary in order to pass on Unicode arguments correctly to the second script, which would not be needed if we only had one script.

./support/bin/tahoe is very short and simply calls: load_entry_point('allmydata-tahoe==1.8.0', 'console_scripts', 'tahoe')()

Q3: Why does ./support/bin/tahoe exist? Why doesn't ./bin/tahoe call load_entry_point directly?

support/bin/tahoe is the entry script generated by setuptools. I presume that originally we considered the contents of that script to be an implementation detail of setuptools, and therefore didn't want to duplicate it.

Since then, we have ended up changing how setuptools generates scripts, and we would probably be better off just setting sys.path ourselves. That is what Brian's "unsuck branch" does, IIUC.

Q4: Why is the load_entry_point mechanism used? Is it necessary?

Sort of. Some mechanism to put the right copy of Tahoe and dependent libraries on the sys.path is necessary, and load_entry_point is the setuptools way of doing that.

load_entry_point() resolves a mapping for "console_scripts" for tahoe to ./src/allmydata/scripts/runner.py's run() function. This mapping is configured in ./setup.py.

Q5: Why doesn't ./bin/tahoe call runner.py's run() directly?

The sys.path would be wrong.

Q6: Why are there two separate packaging processes necessary to make the commandline work? One being script template generation, the other being entry_point registration.

A combination of overcomplication in the setuptools design, and our workarounds for setuptools bugs.

Last edited at 2010-11-21T20:19:46Z by davidsarah (previous) (diff)
Note: See TracTickets for help on using tickets.