#659 closed defect (invalid)

rusty dusty server fails on tests

Reported by: arch_o_median Owned by: arch_o_median
Priority: major Milestone: eventually
Component: code Version: 1.3.0
Keywords: rusty dusty server setup.py setup test build old minimal Cc:
Launchpad Bug:

Description

I'm installing on a:

Hardware: Pentium III; 64 Mb RAM; 731 Mhz OS: Debian 5.0 Lenny 2.6.26-1-686

The attached file includes:

1) dpkg_l_Log: the output of a dpkg -l

"included because I installed non-base packages including python-devel gcc and darcs 2.1.2"

2) buildlog and testlog[_2]: outputs of setup.py build and test respectively

Attachments have brief descriptive headers echoing the commandlines that invoked them.

Attachments (7)

old_sys_inst_setup_test_logs.tar.bz2 (14.3 KB) - added by arch_o_median at 2009-03-10T21:02:31Z.
build test logs
test.log (23.5 KB) - added by arch_o_median at 2009-03-11T02:20:39Z.
testimmutablelog (10.1 KB) - added by arch_o_median at 2009-03-11T02:20:59Z.
command_repeater.sh (651 bytes) - added by arch_o_median at 2009-03-14T03:05:24Z.
Script used to run test on a loop.
testlog (28.9 KB) - added by arch_o_median at 2009-03-19T22:28:42Z.
test_all_one (28.9 KB) - added by arch_o_median at 2009-03-19T22:31:14Z.
test_all_two (34.8 KB) - added by arch_o_median at 2009-03-19T22:31:22Z.

Download all attachments as: .zip

Change History (23)

Changed at 2009-03-10T21:02:31Z by arch_o_median

build test logs

comment:1 Changed at 2009-03-10T21:05:48Z by arch_o_median

  • Keywords rusty dusty server added
  • Summary changed from Install on Old Sytem setup.py test output to Minimal Resource Server: "setup.py build; setup.py test"

comment:2 Changed at 2009-03-10T22:40:46Z by zooko

  • Summary changed from Minimal Resource Server: "setup.py build; setup.py test" to tests fail on rusty dusty server

The test log includes failures, starting with this:

allmydata.test.test_immutable
  Test
    test_download ...                                                      [OK]
    test_download_abort_if_too_many_corrupted_shares ...                   [OK]
    test_download_abort_if_too_many_missing_shares ...                  [ERROR]
                 [ERROR]
                 [ERROR]
    test_download_from_only_3_remaining_shares ...                      [ERROR]
                     [ERROR]
                     [ERROR]
    test_download_from_only_3_shares_with_good_crypttext_hash ...       [ERROR]
      [ERROR]
    test_test_code ...                                                  [ERROR]
                                                 [ERROR]

Would you please re-run just these tests (so that it won't take all day), with python ./setup.py test -s allmydata.test.test_immutable, let it run to completion, and attach the output as well as the contents of _trial_temp/test.log ?

comment:3 follow-up: Changed at 2009-03-10T22:41:00Z by zooko

  • Component changed from packaging to code
  • Priority changed from minor to major

comment:4 in reply to: ↑ 3 ; follow-up: Changed at 2009-03-11T02:20:00Z by arch_o_median

Replying to zooko:

Here they come!

Changed at 2009-03-11T02:20:39Z by arch_o_median

Changed at 2009-03-11T02:20:59Z by arch_o_median

comment:5 in reply to: ↑ 4 Changed at 2009-03-11T02:21:55Z by arch_o_median

Replying to arch_o_median:

Replying to zooko:

test.log is _trial_temp/test.log

comment:6 Changed at 2009-03-11T02:41:19Z by zooko

What? Tests passed this time! Darn! Nondeterministic computing!

Could you please try to figure out what makes the tests pass or fail? Perhaps it is that when you run the full suite then the test_immutable part fails? Perhaps it has to do with whether the machine is loaded with other processes at the same time? Or maybe you should have to run test_immutable several times in a row to see it fail?

comment:7 Changed at 2009-03-11T04:37:40Z by zooko

Handy dandy shell script:

/bin/true # to set $? to 0
while [ $? = 0 ] ; do
    python ./setup.py test -s allmydata.test.test_immutable
done

That will execute the test_immutable tests over and over, stopping if python exits with a non-zero exit code, which it will if any test fails.

comment:8 Changed at 2009-03-13T06:32:26Z by arch_o_median

OK, running overnight or until test failure... using my very first from scratch BASH script of more that a couple of lines.

comment:9 Changed at 2009-03-13T22:31:34Z by arch_o_median

I'm running "python ./setup.py test" repeatedly. I plan to let it run for 72 hours.

comment:10 follow-up: Changed at 2009-03-13T22:52:20Z by zooko

Wait a minute! You mean that when you reran the exact same test which earlier produced "test.log" which you attached in http://allmydata.org/trac/tahoe/ticket/659#comment:1 , it didn't produce the same output? (The test_immutable and some other ones didn't fail?)

Changed at 2009-03-14T03:05:24Z by arch_o_median

Script used to run test on a loop.

comment:11 Changed at 2009-03-14T03:28:39Z by arch_o_median

I ran it using the attached script. Maybe it doesn't do what I think it does? I assume a failed test produces a non-zero exit code.

The script ran 95 times, before I killed it.

I have changed state on the system. In particular I darcs "got" the latest version of tahoe and put it in a directory parallel to the one containing the version obtained by walking through the install. I ran setup.py in the darcs version. I then rm -rf'd the darcs-version directory, and reacquired the "walk-through" 1.3.0 version, and built it. The tests I mention were run on this reinstalled version of 1.3.0, so if tahoe changed state outside of its install directory (I also rm -rf'd ~/.tahoe) then I've made things harder on myself.

I've also apt-get install vim. I _may_ have apt-get installed other packages but I don't remember for sure and I haven't convinced dpkg to tell me what I installed most recently, yet.

I notice that the ERROR's in the first and second logs here: http://allmydata.org/trac/tahoe/attachment/ticket/659/old_sys_inst_setup_test_logs.tar.bz2 are not in the same tests. The second log passes test_immutable, but produces errors in test_repairer.

comment:12 Changed at 2009-03-14T15:36:47Z by zooko

Maybe it doesn't do what I think it does? I assume a failed test produces a non-zero exit code.

Did you capture the stdout/stderr so you can grep it and see if any tests failed?

comment:13 Changed at 2009-03-14T15:38:27Z by zooko

if tahoe changed state outside of its install directory (I also rm -rf'd ~/.tahoe)

It shouldn't have changed anything outside of those two directories. In fact, just running the unit tests shouldn't have changed anything outside the install directory.

comment:14 Changed at 2009-03-14T15:48:04Z by zooko

I have changed state on the system. In particular I darcs "got" the latest version of tahoe and put it in a directory parallel to the one containing the version obtained by walking through the install. I ran setup.py in the darcs version. I then rm -rf'd the darcs-version directory, and reacquired the "walk-through" 1.3.0 version, and built it. The tests I mention were run on this reinstalled version of 1.3.0,

Which tests? Maybe you could identify which tests you mean using a URL to their log file or something.

I still don't understand why it failed one time but didn't fail another time. I think, but am not entirely certain, that you actually ran the same tests on the same source code with the same system both times, which implies that there is a non-determinism problem in the tests (probably a timing problem). But can you confirm that this is what you did? I really want a report of a controlled experiment in which the exact same test produced two different results.

By the way, I like the command_repeater.sh script!

Changed at 2009-03-19T22:28:42Z by arch_o_median

Changed at 2009-03-19T22:31:14Z by arch_o_median

Changed at 2009-03-19T22:31:22Z by arch_o_median

comment:15 in reply to: ↑ 10 Changed at 2009-03-19T22:42:43Z by arch_o_median

Replying to zooko:

Wait a minute! You mean that when you reran the exact same test which earlier produced "test.log" which you attached in http://allmydata.org/trac/tahoe/ticket/659#comment:1 , it didn't produce the same output? (The test_immutable and some other ones didn't fail?)

Not quite:

When I ran "python ./setup.py test" the first time I got this result: http://allmydata.org/trac/tahoe/attachment/ticket/659/test_all_one

When I ran "python ./setup.py test" the second time I got this results: http://allmydata.org/trac/tahoe/attachment/ticket/659/test_all_two

When I ran "python ./setup.py test -s allmydata.test.test_immutable" 96 times I generated 0 [ERRORS].

The [ERROR] sets generated by "setup.py test" are not the same.

Given the above observations I thought it would be interesting to look at patterns in [ERROR] messages generated across a large number of "setup.py test" runs. I had resolved to do this when a new development arose. The Rusty Dusty server, died.

That is it generated some terrifying error message and became totally non-responsive. Unfortunately this occurred several days ago and I cannot remember the specifics of the error. I eventually powered the machine down. Today when I pressed the power button to bring it back up, it started emitting a regular BEEEP BEEEP noise, and gave no other signs of life (e.g. no activity on monitor). I did this twice.

My new plan is to run the above tests on a different old desktop, while debugging Rusty.

comment:16 Changed at 2009-03-19T23:10:04Z by zooko

  • Resolution set to invalid
  • Status changed from new to closed
  • Summary changed from tests fail on rusty dusty server to rusty dusty server fails on tests

Heh heh heh. Closing this ticket as "invalid". You should try running memtest86 and other such diagnostics.

Note: See TracTickets for help on using tickets.