#402 closed defect (fixed)

bug in Twisted, triggered by pyOpenSSL-0.7

Reported by: warner Owned by: zooko
Priority: critical Milestone: 1.3.0
Component: operational Version: 1.0.0
Keywords: Cc: exarkun
Launchpad Bug: 236190

Description

The symptom is that tahoe's test_system fails with "unclean reactor errors", complaining about several foolscap negotiation timers that are still running when the test finishes.

We tracked this down to a bug in twisted, inside some twisted code that is only enabled in the presence of pyopenssl-0.7 (which just landed in sid a few days ago). The previous pyopenssl-0.6 does not trigger the bug. This bug causes foolscap's unit tests to fail in the same way.

The twisted folks (exarkun in particular) are now aware of the problem and are able to reproduce it: http://twistedmatrix.com/trac/ticket/3218

The current workaround is to downgrade to pyopenssl-0.6 .

Change History (23)

comment:1 Changed at 2008-05-30T04:10:38Z by zooko

  • Summary changed from tahoe unit tests fail with latest debian sid to bug in Twisted, triggered by pyOpenSSL-0.7

I guess the current workaround should be to specify in our _auto_deps.py:

# v0.7 of pyOpenSSL triggers a bug in Twisted <= 8.1.0, which is the latest version of Twisted at this time: http://allmydata.org/trac/tahoe/ticket/402
setup_requires.append("pyOpenSSL >= 0.6, != 0.7")

I'll try that out on the buildbot tomorrow morning when I'm more wakeful and willing to spend time wrangling buildslaves...

comment:2 Changed at 2008-05-30T21:32:41Z by zooko

  • Milestone changed from undecided to 1.1.0

comment:3 Changed at 2008-05-30T21:32:47Z by zooko

  • Owner changed from warner to zooko
  • Status changed from new to assigned

comment:4 Changed at 2008-05-31T00:19:33Z by zooko

Ah, well I tried making Tahoe require pyOpenSSL >= 0.6, != 0.7, but the pyOpenSSL-0.6 tarball is not easy_install'able, as described here:

https://bugs.launchpad.net/pyopenssl/+bug/236190

If the pyOpenSSL maintainers fix the 0.6 tarball's permission bits as I submitted in that ticket, then this will start working.

comment:5 Changed at 2008-05-31T00:26:40Z by zooko

If this gets fixed so that pyOpenSSL can be easy_install'ed (provided that OpenSSL is installed), then this will reduce the need for #282 (more detailed and targeted docs about installing from source).

comment:6 Changed at 2008-06-04T01:08:34Z by zooko

I've requested that JP give me admin privs for the pyOpenSSL sf.net project so that I can upload a pyOpenSSL-0.6.tar.gz which works around this problem.

comment:7 Changed at 2008-06-04T01:12:24Z by zooko

Now, as far as we know the combination of Tahoe-1.1+Twisted-8.1+pyOpenSSL-0.7 doesn't lead to any bad behavior except for a vast number of unit tests failing doing to failure to close connections. One workaround, if we can't get an easy_install'able pyOpenSSL-0.6.tar.gz would be to code up an explicit "skip-test-like" behavior in our Makefile or perhaps in our test code to skip all these numerous failing tests if pyOpenSSL v0.7 is detected. I'm not sure exactly how that would be implemented. It also feels a little bit uncomfortable to deploy Tahoe-1.1+Twisted-8.1+pyOpenSSL-0.7 because I'm not entirely sure that the bug wouldn't lead to other problems for Tahoe users. (Doubtless pyOpenSSL-0.6 is more buggy that pyOpenSSL-0.7, but at least we have experience with it and there are no known anomalies which could be explained by bugs in pyOpenSSL-0.6.)

comment:8 Changed at 2008-06-04T16:16:43Z by zooko

  • Resolution set to fixed
  • Status changed from assigned to closed

Okay this whole issue is now foolscap's problem -- Tahoe doesn't actually require pyOpenSSL at all. Tahoe requires foolscap-with-secure-connections, and (currently) foolscap-with-secure-connections requires pyOpenSSL. So I'm closing this ticket as "fixed" and further work will be done in #438 (get foolscap to declare its dependency on pyOpenSSL) and http://foolscap.lothar.com/trac/ticket/66 (install requires pyOpenSSL (for secure mode)).

comment:9 Changed at 2008-06-04T16:17:57Z by zooko

  • Resolution fixed deleted
  • Status changed from closed to reopened

Oh wait, resolving this as "fixed" is a bit premature. Until there is a foolscap release that does this, and Tahoe specifies that it requires such a foolscap release, then this is still an open issue for Tahoe.

Also, there is a judgment call to be made as to what version of foolscap Tahoe should require.

comment:10 Changed at 2008-06-04T22:48:31Z by zooko

Okay, Brian is planning to release foolscap v0.2.8 which declares an "extra" dependency. If you specify that your project depends on foolscap "with the extra feature of secure connections", then foolscap will require pyOpenSSL.

He is not specifying that it requires a version of pyOpenSSL other than v0.7, which means that if Tahoe requires foolscap, and foolscap foolscap causes pyOpenSSL to be installed, and the version of pyOpenSSL that gets installed is version 0.7, then the Tahoe unit tests will all get ERRORs due to connections not being shut down properly.

So now I want to figure out how to make those ERRORs not happen when people install Tahoe and run make test.

comment:11 Changed at 2008-06-05T23:03:57Z by warner

  • Milestone changed from 1.1.0 to 1.1.1

Apparently we don't fully understand the combination of versions that trigger this problem. On my debian/sid system, I see timeout/reactor-unclean failures of tahoe's test_system (and of several foolscap unit tests). This system has:

  • python-twisted-8.1.0-1
  • python-openssl-0.7-1
  • libssl0.9.8g-10.1

However, an Ubuntu/Hardy? system we just set up does *not* fail tests, when using what we believe to be twisted-8.1.0, pyopenssl-0.7, and libssl0.9.8g-4ubuntusomething.

If this really only causes failures on sid (but hardy is ok), we're willing to push it out a release. We still want to get it fixed, but it will probably require the pyopenssl maintainers to fix twisted#3218.

sid users are still advised to hold python-openssl at 0.6-5 .

comment:12 Changed at 2008-06-06T12:36:55Z by exarkun

Are you sure it doesn't fail in that configuration? The problem includes a race condition dependent on timing of network operations and Python calls. It may just be that the race is biased towards going the wrong way frequently in one environment and the right way in the other.

comment:13 Changed at 2008-06-10T23:07:20Z by zooko

I just ran the Tahoe unit tests on Mac OS 10.4 on a PowerPC G4 867 MHz laptop, and this failure did not occur.

comment:14 Changed at 2008-07-25T19:39:13Z by zooko

  • Cc exarkun added
  • Priority changed from major to critical

This is important because currently there are two workarounds, each of which is unacceptable to one of the Tahoe developers:

workaround #1: leave "secure_connections" out of the requirements that Tahoe needs from foolscap, so that installations of Tahoe, which trigger installations of foolscap, do not trigger installations of pyOpenSSL. This works around the problem because if you happen to have pyOpenSSL already installed, but invisible to setuptools, and it is a version of pyOpenSSL that doesn't trigger this bug, then everything works including no bogus test failures. However, this is unacceptable to Zooko because if you do not already have the right version of pyOpenSSL installed then you will get a runtime exception and you'll have to manually installed pyOpenSSL. It is unacceptable to Zooko to require users to manually install pyOpenSSL.

workaround #2: leave "secure_connections" in the requirements. Then you won't have to manually install anything, and if you happen to get a combination of Twisted and pyOpenSSL which do not trigger this bug, you won't get any bogus test failures. However, this is unacceptable to Brian, because if you get a combination of Twisted and pyOpenSSL and your development platform which triggers this bug then you'll get bogus test failures. People seeing bogus test failures are unacceptable to Brian (and his development platform -- sid -- is the one which incurs this failure).

Here is a work-around which is kind of ugly but at least it isn't unacceptable: write a tearDown() method to reach inside the reactor and clean off outstanding delayed calls and open sockets. Also we would have to change the Tahoe unit tests to not wait for connection cleanup before passing the tests.

A *good* solution to this would, of course, be to fix this bug in Twisted and/or pyOpenSSL. Maybe we could contribute some time to that. I vaguely recall that there is now a unit test for the problem...

comment:15 Changed at 2008-07-25T21:57:50Z by zooko

Looks like the Twisted folks have been making progress on this issue:

http://twistedmatrix.com/trac/ticket/3218

comment:16 Changed at 2008-07-25T22:07:11Z by warner

zooko says that the twisted folks say that this may only happen with the select reactor.. so another easy workaround is to use the pollreactor instead. I'll test this and report back.

comment:17 Changed at 2008-07-30T16:53:02Z by zooko

  • Resolution set to fixed
  • Status changed from reopened to closed

Okay this is fixed by 01e5ca68e2640274, 3eb5f221d7ed217b, 677f26f0f4f10d04, bd0fe3588b314711, 5a0e98d693fd2f3e. (Changes to the build system sometimes take multiple patches, because I use the buildbot to try out my changes on all of our platforms at once. If the buildbot "try this out but don't commit it to trunk" feature were working and I knew how to use it then I would do that instead.)

The fix is to set --reactor=poll on linux. (So this is in a sense a work-around instead of a fix, but on the other hand there's no reason for us to prefer the select reactor on linux, so this is fine.)

comment:18 Changed at 2008-08-11T18:13:10Z by zooko

Hooray -- the Twisted folks have fixed this issue:

http://twistedmatrix.com/trac/ticket/3218

comment:19 Changed at 2008-08-11T19:11:31Z by warner

Excellent. Now we just need to wait for them to make a release, and add advice in the README to avoid the combination of Twisted in (8.0.1 .. 8.1.0) and pyOpenSSL-0.7 .

incidentally: I've tested twisted-8.0.1 and 8.1.0 (against pyopenssl-0.7) and saw test failures. I don't know about 8.0.0 . I see some failures against twisted-2.5.0, and different (non-ssl-related) ones against twisted-2.4.0 . So the versionspace to avoid might be Twisted in (8.0.0 .. 8.1.0) and pyopenssl-0.7 .. don't know yet.

Or, we could just be satisfied with always using the pollreactor. But if we ever want to simplify the Makefile and remove that platform-detection / reactor-choosing code, we could force users to go with a post-8.1.0 release of twisted instead.

comment:20 Changed at 2008-08-11T20:14:33Z by zooko

I'm not aware of any reason to prefer a select reactor over a poll reactor if there is a poll reactor on your platform, so I'm satisfied with our current reactor chooser.

I think I'll suggest to the Twisted folks (via their issue tracker) that Twisted could make poll reactor the default reactor on platforms that support it.

comment:21 Changed at 2008-08-11T20:28:02Z by zooko

http://twistedmatrix.com/trac/ticket/2234 # Select default reactor based on platform and available libraries

comment:22 Changed at 2008-09-03T01:16:35Z by warner

  • Milestone changed from 1.3.1 to 1.3.0

comment:23 Changed at 2008-10-31T15:33:34Z by launchpad

  • Launchpad Bug set to 236190

Updating Launchpad bug reference

Note: See TracTickets for help on using tickets.