#548 new defect

mutable publish sends queries to servers that have already been asked

Reported by: warner Owned by:
Priority: major Milestone: soon
Component: code-mutable Version: 1.2.0
Keywords: mutable availability upload ucwe performance Cc:
Launchpad Bug:

Description

another problem that appeared in #546 is in the mapupdate(MODE_WRITE) code, when run on a servermap that's already been updated once. This occurs when the mutable file's modify method is used, and the first attempt fails because of an UncoordinatedWriteError . This triggers a retry, in which the servermap is updated again, the (new) current version is retrieved, the modifier function applied again, and (if anything changed) a new publish is performed.

When this happens, the servermap is not empty: it already has a bunch of shares from either the previous mapupdate or from the publish write requests returning.

The mapupdate code starts by sending out N queries to the "must query" servers: those which we already know have a share of some sort, or which we've queried in the past. These come back, and we get a boundary map of "1111111111". To find the real edge we must send out more queries (hoping to get a map of 1111111111000).

The bug is that the code sends out the next batch of queries to the same servers that it has already asked. It looks like the new queries are determined without consulting the list of which servers to which queries have already been sent. I think this is because those first queries were sent to the must_query list.

my cryptic notes:

si=njpk4lit4ns3yj7xmgszheh62q
tahoe rm testgrid:recentdir/recent.cd5bb67746f0c3538175c768456c37f3   -> 3   [si=njpk], incident
 parent list [njpk4]
  mapupdate(MODE_READ)  e4515
  retrieve  seq2 e4593  sh0@cfb7, sh1@5xry, sh3@6j2m
 parent modify -> read, write
  mapupdate(MODE_WRITE) e4634  false-boundary
   tx: ehnf mgq3 kuzy cfb7 pfav 6y7v,  5xry b3yc qau2 6j2m 7vi2 bc5x
   rx: kuzy 6y7v mgq3 ehnf cfb7(sh0) pfav b3yc 5xry(sh1) 6j2m(sh3), 001000, boundary??
    pfav mgq3 cfb7(sh0) kuzy 6y7v ehnf 5xry(sh1) b3yc qau2(?) 6j2m(sh3) | 7vi2(?) bc5x(?)
    BROKEN: why did this count as a boundary?
    oh, [cfb7(sh0) kuzy 6y7v ehnf] = 1000
  retrieve seq2 e4720   sh0@cfb7, sh1@5xry, sh3@6j2m
  publish seq3 e4763
    sh0 to [cfb7f3lh], sh1 to [5xryfgeq], sh2 to [pfavfmv3], sh3 to [6j2mb464],
    sh4 to [mgq3xx3t], sh5 to [kuzya6zx], sh6 to [6y7vpksf], sh7 to [ehnfmjtc],
    sh8 to [b3yclx4f], sh9 to [qau2ui2a]
   qua2 has surprising sh2
  retry
   mapupdate(MODE_WRITE) e4832
    sends 10 queries to the publish answerers
    sends 5 queries, to servers already asked: pfav mgq3 cfb7 kuzy 6y7v (first 5 in permuted order)
    log ends

Change History (7)

comment:1 Changed at 2010-03-25T01:16:47Z by davidsarah

  • Keywords availability upload ucwe added
  • Milestone changed from undecided to 1.7.0

comment:2 Changed at 2010-05-27T22:06:10Z by zooko

  • Milestone changed from 1.7.0 to 1.8.0

It's really bothering me that mutable file upload and download behavior is so finicky, buggy, inefficient, hard to understand, different from immutable file upload and download behavior, etc. So I'm putting a bunch of tickets into the "1.8" Milestone. I am not, however, at this time, volunteering to work on these tickets, so it might be a mistake to put them into the 1.8 Milestone, but I really hope that someone else will volunteer or that I will decide to do it myself. :-)

comment:3 Changed at 2010-08-10T04:15:33Z by davidsarah

  • Keywords mutable added
  • Milestone changed from 1.8.0 to 1.9.0

comment:4 Changed at 2010-12-16T00:59:14Z by davidsarah

  • Keywords performance added

comment:5 Changed at 2011-07-16T21:01:29Z by davidsarah

  • Milestone changed from 1.9.0 to soon

comment:6 Changed at 2011-07-16T21:01:58Z by zooko

  • Milestone changed from soon to 1.9.0

This appears to be an efficiency improvement and not a correctness issue.

comment:7 Changed at 2011-07-16T23:43:01Z by davidsarah

  • Milestone changed from 1.9.0 to soon
Note: See TracTickets for help on using tickets.