#71 closed enhancement (fixed)

"client node probably started"

Reported by: zooko Owned by: davidsarah
Priority: major Milestone: 1.8.1
Component: code-nodeadmin Version: 0.7.0
Keywords: usability cli news-needed Cc:
Launchpad Bug:

Description

It would be nice if we could remove the "probably" from that message. How about doing a Foolscap "Hi there" with it? (That was Sam Stoller's suggestion.)

Attachments (1)

prototype.diff (10.5 KB) - added by warner at 2007-11-08T18:55:17Z.
prototype implementation

Download all attachments as: .zip

Change History (36)

comment:1 Changed at 2007-07-02T18:49:19Z by warner

The "probably" is there because the runner process has no clear way of knowing if the new process dies right away or continues running.

I've got some code in buildbot which watches the logfile and looks for the message that indicates startup has been successful.. perhaps we could snarf it for this purpose.

What do you mean by a 'Foolscap "Hi there"' message?

-Brian

comment:2 Changed at 2007-07-02T19:21:15Z by warner

  • Milestone set to undecided

comment:3 Changed at 2007-07-14T02:10:09Z by zooko

If the runner process has some positive indication that the new process started up long enough to perform some action (such as writing a message to the log or connecting back to the runner process with Foolscap and saying "Hi there"), then the runner process should inform the user that the process has started, without the "probably".

So, yes, snarfing that code from buildbot would be fine with me.

comment:4 Changed at 2007-08-14T18:59:32Z by warner

  • Component changed from code to code-nodeadmin
  • Owner somebody deleted

comment:5 Changed at 2007-09-05T21:31:01Z by nejucomo

  • Owner set to nejucomo

I'm working on the foolscap approach. I believe it's possible to connect to the node and call get_version, so I'll use that if possible. (I've started by modifying the runner tests to start a node and fail if "probably started" appears in the output.)

comment:6 Changed at 2007-09-25T04:35:12Z by zooko

  • Milestone changed from undecided to 0.6.1
  • Version changed from 0.4.0 to 0.6.0

This would fit nicely into the theme of v0.6.1: documentation, packaging, user-friendliness, etc.

comment:7 Changed at 2007-10-11T03:17:37Z by warner

I'd advise the logfile-scanning approach. Benefits:

  • any exceptions or warnings which occur during startup are displayed to the admin who is starting the node, at exactly the time and place they need to see it
  • it displays exceptions even if foolscap fails to work (i.e. if pyopenssl isn't installed). Logfile writing is the only requirement

Downsides:

  • it generally requires forking off a process, which is problematic under windows. I think I have a good-enough solution for this in buildbot, but I think it involves limited functionality

comment:8 Changed at 2007-10-13T22:37:00Z by zooko

  • Milestone changed from 0.6.1 to 0.7.0

Bumping to v0.7 milestone.

Nejucomo: if you aren't planning to fix this ticket, would you please take your name off the "assigned" field?

comment:9 Changed at 2007-10-31T07:45:38Z by warner

  • Owner changed from nejucomo to warner
  • Status changed from new to assigned

I built a prototype of this, watching twistd.log until the introducer has been contacted. I suspect it will have interactions with windows though (forking), and it probably breaks the 'start -m' (multiple nodes) functionality.

I plan to make it work better once I've gotten more progress down on #197.

Changed at 2007-11-08T18:55:17Z by warner

prototype implementation

comment:10 Changed at 2007-11-13T18:17:41Z by zooko

  • Milestone changed from 0.7.0 to 0.7.1
  • Version changed from 0.6.0 to 0.7.0

comment:11 Changed at 2008-01-23T02:46:49Z by zooko

  • Milestone changed from 0.7.1 to undecided

comment:12 Changed at 2008-02-15T18:39:55Z by zooko

I forget exactly how many people I have watched going through the Tahoe install and launch process. About half a dozen. Every single one has exclaimed at "Client node probably started.". I just watched another person do it, and they too exclaimed in exactly the same way, so let's say it's seven out of seven.

comment:13 Changed at 2009-01-13T05:03:39Z by guest

I will add another vote that "probably" is not a very reassuring word choice. While things seem to be working, I still am unclear as to why I've been told that tahoe has only "probably started"

comment:14 Changed at 2009-06-24T21:38:22Z by warner

Incidentally, I just learned that modern twistd can be run as a library. See http://divmod.org/trac/browser/trunk/Axiom/axiom/scripts/axiomatic.py for an example. This would make it easier to avoid the extra subprocess, and might make it easier to provide a more confident answer to this ticket.

In general, if we can instantiate the Client before the fork, then the parent process can be sure that:

  1. the child was able to load all the correct Tahoe code, and import all the dependencies
  2. the tahoe.cfg file was well-formed and none of its values caused immediate problems

To feel confident that the Client actually got started, we'll need to establish some form of communication between the "tahoe start" parent and the actual node process, whether that means tailing the logfile or connecting to the control.furl .

comment:15 Changed at 2009-12-13T03:09:16Z by davidsarah

  • Keywords usability added

comment:16 Changed at 2010-02-02T03:15:29Z by davidsarah

  • Milestone changed from eventually to 1.7.0

comment:17 Changed at 2010-03-09T05:26:18Z by zooko

Jeremy Visser has packaged Tahoe-LAFS v1.6.1 for Ubuntu Lucid. He tried to test his package by following these instructions: http://allmydata.org/source/tahoe-lafs/trunk/docs/running.html but he got stuck and gave up on testing it (until I reminded him to try again). So I asked why he had given up:

<jayvee> I'm reading that, and not getting very far
<zooko> Why not?
<zooko> Sounds like I need to file a bug report on that page. :_)
<jayvee> oh, just the feedback I'm getting is not very descriptive
<jayvee> "introducer node probably started"
<jayvee> I'm basically expecting to see "tahoe successfully started, browse to
	 this_url to view contents"
<jayvee> but maybe I'm just a simpleton
<jayvee> 'tahoe run' blocks with no feedback. I presume that's intentional (no
	 news is good news), but a little disconcerting to someone who has
	 never used it before.

comment:18 Changed at 2010-03-09T05:26:48Z by zooko

<jayvee> I ran 'tahoe start .' and 'tahoe run', and yet nothing is listening
	 on port 3456.
<jayvee> the documentation (using.html) says that should be the case

comment:19 Changed at 2010-03-09T05:32:13Z by zooko

  • Keywords cli added

comment:20 Changed at 2010-03-09T06:07:45Z by zooko

<jayvee> zooko, as a point of comparison, this is what upstart gives me
<jayvee> $ sudo start mythtv-backend
<jayvee> mythtv-backend start/running, process 17060
<jayvee> much more satisfying. even printing the PID makes me much more
	 confident.

comment:21 Changed at 2010-03-09T19:39:17Z by davidsarah

  • Priority changed from minor to major

comment:22 Changed at 2010-03-21T18:27:48Z by zooko

It looks like http://twistedmatrix.com/trac/ticket/823 would solve this ticket with its --wait option.

See also #602 which is about "probably not started" not being sufficiently detailed and #529 which is about detecting problems on startup and failing loudly instead of quietly, and #371 which is about a common problem on startup.

comment:23 Changed at 2010-06-04T07:50:11Z by zooko

  • Milestone changed from 1.7.0 to soon

comment:24 Changed at 2010-07-22T05:24:53Z by davidsarah

We could just adapt the approach suggested in the twisted ticket (and implemented in this patch) rather than waiting for twisted to adopt it. That would also allow us to receive arbitrary messages from the child process and print them, addressing ticket:68#comment:53 for example.

comment:25 Changed at 2010-07-23T18:21:50Z by zooko

It would be nice to contribute to Twisted. We either do so directly by contributing patches and code review for Twisted #823 and then waiting for it to be deployed and the using it in Tahoe-LAFS, or at least we could work on a patch within Tahoe-LAFS but be sure to carefully cross-link it with the relevant Twisted tickets and to try to get a similar patch committed to Twisted.

comment:26 Changed at 2010-11-11T01:50:14Z by warner

I think this is resolved by ac3b26ecf29c08cb .. anyone want to confirm?

comment:27 Changed at 2010-11-20T06:32:28Z by zooko

  • Keywords news-needed added
  • Milestone changed from soon to 1.8.1
  • Resolution set to fixed
  • Status changed from assigned to closed

I ran tahoe start and it didn't print out any uncertainty-inducing messages:

Zooko-Ofsimplegeos-MacBook-Pro:~ pubvolgrid$ tahoe start
STARTING '/Users/pubvolgrid/.tahoe'

Hm, news-needed.

comment:28 follow-ups: Changed at 2010-11-20T06:53:35Z by zooko

  • Resolution fixed deleted
  • Status changed from closed to reopened

Hey does this mean that we can start running all these tests on cygwin and/or windows now:

test_runner.py

It looks like both of these conditions which force tests to be skipped are now irrelevant and all tests should be runnable, but I'm not sure.

comment:29 in reply to: ↑ 28 Changed at 2010-11-20T21:16:58Z by davidsarah

  • Owner changed from warner to davidsarah
  • Status changed from reopened to new

Replying to zooko:

Hey does this mean that we can start running all these tests on cygwin and/or windows now:

test_runner.py

Possibly, I will investigate that.

comment:30 Changed at 2010-11-20T21:17:10Z by davidsarah

  • Status changed from new to assigned

comment:31 in reply to: ↑ 28 ; follow-up: Changed at 2010-11-20T22:35:04Z by davidsarah

Replying to zooko:

Hey does this mean that we can start running all these tests on cygwin and/or windows now: test_runner.py

Apparently not.

The cygwin part of this is #908, and is due to a bug in twisted.internet.utils on cygwin that apparently causes it to hang. (I haven't tested it with recent cygwin, but it wouldn't have been affected by ac3b26ecf29c08cb.)

For native Windows, we currently skip the test_runner.RunNode tests because of #27 (twistd doesn't daemonize on windows). That is, tahoe start behaves like tahoe run on Windows, which is too different for the tests to work. It looks non-trivial to make them work without fixing either #27 or #1121 (test 'tahoe run').

comment:32 Changed at 2010-11-20T22:52:18Z by zooko

Thanks for investigating!

comment:33 Changed at 2010-11-20T23:16:59Z by zooko

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:34 in reply to: ↑ 31 ; follow-up: Changed at 2010-11-21T02:07:11Z by davidsarah

Replying to davidsarah:

[...] That is, tahoe start behaves like tahoe run on Windows, [...]

More precisely: tahoe start now behaves like tahoe run. Prior to ac3b26ecf29c08cb, it used os.system to run twistd, which put the node in a different process to the tahoe command, although that process did not then daemonize. Since ac3b26ecf29c08cb, it runs the node in the same process as the tahoe command. Hmm, is that a regression?

(Here is the code for twistd on Windows, and here is for Unix.)

comment:35 in reply to: ↑ 34 Changed at 2010-11-21T05:45:56Z by zooko

Replying to davidsarah:

More precisely: tahoe start now behaves like tahoe run. Prior to ac3b26ecf29c08cb, it used os.system to run twistd, which put the node in a different process to the tahoe command, although that process did not then daemonize. Since ac3b26ecf29c08cb, it runs the node in the same process as the tahoe command. Hmm, is that a regression?

I don't think anybody benefited from or cared about the fact that it used to run it in a separate process. It just made it harder to kill it on Windows.

Note: See TracTickets for help on using tickets.