#902 closed defect (fixed)

network failure => internal TypeError

Reported by: zooko Owned by:
Priority: major Milestone: 1.7.0
Component: code-peerselection Version: 1.5.0
Keywords: reliability easy upload Cc: francois@…
Launchpad Bug:

Description

I was uploading a file when the local telco monopoly decided to turn off my phone and DSL for a few minutes. My tahoe cp command-line then reported ValueError: too many value to unpack on this line of code:

    def set_shareholders(self, (used_peers, already_peers), encoder):

It reports that this line is on line 753 of immutable/upload.py.

The version is: allmydata-tahoe: 1.5.0-r4054, foolscap: 0.4.2-zsetupztime, pycryptopp: 0.5.15, zfec: 1.4.5, Twisted: 8.2.0, Nevow: 0.9.31-r15675, zope.interface: 3.1.0c1, python: 2.5.4, platform: Darwin-8.11.1-i386-32bit, sqlite: 3.1.3, simplejson: 2.0.9, argparse: 0.8.0, pyOpenSSL: 0.9, pyutil: 1.5.1, zbase32: 1.1.0, setuptools: 0.6c12dev, pysqlite: 2.3.2

Line 753 of immutable/upload.py at version 4054 is src/allmydata/immutable/upload.py@4054#L753. That method is called from only one place, line 720, which means that locate_all_shareholders() must have returned something which wasn't a tuple containing at least two values. Looking at locate_all_shareholders() I see that what it returns is whatever Tahoe2PeerSelector.get_shareholders() returned. (This fact may not be obvious to you if you aren't familiar with the Twisted Deferreds but it should be obvious to you if you are.)

It looks like get_shareholders() returns whatever is returned by Tahoe2PeerSelector._loop(). There are three (non-recursive) return statements from _loop(). The first one returns a 2-tuple, the second one returns the return-value of Tahoe2PeerSelector._got_response(), and the third one returns self.use_peers. Wait a second, that can't be right -- self.use_peers is a set of servers not a 2-tuple.

When does _loop() reach this return statement? Reading the control flow, it does so when 1. there are no uncontacted servers, and 2. there are no servers in the ask-again set, and 3. there are no servers in the and-then-ask-yet-again set, and 4. we already placed enough shares to be happy. So I guess this is a rare situation, in which we've placed enough shares to be happy simultaneously as all of our servers disappeared. (Also the status message "Placed all shares" seems slightly wrong.) I guess the next step is to write a unit test of this situation. I assume that there isn't one, since if there were it would fail. :-)

But I'm going to stop here and work on other priorities (#778) now, so if anyone else wants to fix this then please go ahead!

Change History (4)

comment:1 Changed at 2010-02-15T20:25:54Z by davidsarah

  • Milestone changed from undecided to 1.6.1

comment:2 Changed at 2010-02-16T05:28:18Z by zooko

  • Milestone changed from 1.6.1 to 1.7.0

The bug only bites in a particular race condition when you achieve share-placement happiness at the same time as you lose connections to all your storage server (as far as I can tell). Not important enough to prioritize for v1.6.1 because there are other tickets that we could focus on for v1.6.1.

comment:3 Changed at 2010-04-16T16:16:38Z by francois

  • Cc francois@… added

comment:4 Changed at 2010-05-16T06:19:45Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

Kevan's patches for #778 fixed this bug and also added a unit test that exercises this case.

Note: See TracTickets for help on using tickets.