#392 closed enhancement (fixed)

pipeline upload segments to make upload faster

Reported by: warner Owned by: warner
Priority: major Milestone: 1.5.0
Component: code-performance Version: 1.0.0
Keywords: speed Cc:
Launchpad Bug:

Description

In ticket #252 we decided to reduce the max segment size from 1MiB to 128KiB. But this caused in-colo upload speed to drop by at least 50%.

We should see if we can pipeline two segments for upload, to get back the extra round-trip times that we lost with having more segments.

It's also possible that some of the slowdown is just from the extra overhead of computing more hashes, but I suspect the turnaround time more than overhead.

We need to do something similar for download too, since the download speed was reduced drastically by the segsize change too.

Attachments (1)

pipeline.diff (14.1 KB) - added by warner at 2009-04-15T19:23:12Z.
patch to add pipelining to immutable upload

Download all attachments as: .zip

Change History (6)

comment:1 Changed at 2008-05-14T18:09:52Z by warner

Oh, and I just thought of the right place to do this too: in the WriteBucketProxy. It should be allowed to keep a Nagle-like cache of write vectors, and send them out in a batch when the cache gets larger than some particular size (that will coalesce small writes into a single call, reducing the round-trip time). In addition, it should be allowed to have multiple calls outstanding if the total amount of data that it has sent (and therefore might be in the transport buffer) is below some amount, say 128KiB. If k=3, then that should allow three segments to be on the wire at once, mitigating the slowdown due to round trips. As long as the RTT time is less than the bandwidth*windowsize, this should keep the pipe full.

comment:2 Changed at 2008-06-01T22:09:39Z by warner

#320 is related, since the storage-server protocol changes we talked about would make it easier to implement the pipelining.

Changed at 2009-04-15T19:23:12Z by warner

patch to add pipelining to immutable upload

comment:3 Changed at 2009-04-15T20:10:20Z by warner

So, using the attached patch, I added pipelined writes to the immutable upload operation. The Pipeline class allows up to 50KB in the pipe before it starts blocking the sender (specifically, the calls to WriteBucketProxy._write return defer.succeed until there is more than 50KB of unacknowledged data in the pipe, after which it returns regular Deferreds until some of those writes get retired. A terminal flush() call causes the Upload to wait for the pipeline to drain before it is considered complete).

A quick performance test (in the same environments that we do the buildbot performance tests on: my home DSL line and tahoecs2 in colo) showed a significant improvement in the DSL per-file overhead, but only about a 10% improvement in the overall upload rate (for both DSL and colo).

Basically, the 7 writes used to write a small file (header, segment 0, crypttext_hashtree, block_hashtree, share_hashtree, uri_extension, close) are all put on the wire together, so they take bandwidth plus 1 RTT instead of bandwidth plus 7 RTT. The savings of 6 RTT appears to save us about 1.8 seconds over my DSL line. (my ping time to the servers is about 11ms, but then there's kernel/python/twisted/foolscap/tahoe overhead on top of that).

For a larger file, pipelining might increase the utilization of the wire, particularly if you have a "long fat" pipe (high bandwidth but high latency). However, with 10 shares going out at the same time, the wire is probably pretty full already: the ratio of interest is segsize*N/k/BW / RTT . You send N blocks for a single segment at once, then you wait for all the replies to come back, then generate the next blocks. If the time it takes to send a single block is greater than the server's turnaround time, then N-1 responses will be received before the last block is finished sending, so you've only got one RTT of idle time (while you wait for the last server to respond). Pipelining will fill this last RTT, but my guess is that isn't that much of a help, and that something else is needed to explain the performance hit we saw in colo when we moved to larger segments.

DSL no pipelining:

TIME (startup): 2.36461615562 up, 0.719145059586 down
TIME (1x 200B): 2.38471603394 up, 0.734190940857 down
TIME (10x 200B): 21.7909920216 up, 8.98366594315 down
TIME (1MB): 45.8974239826 up, 5.21775698662 down
TIME (10MB): 449.196600914 up, 34.1318571568 down
upload per-file time: 2.179s
upload speed (1MB): 22.87kBps
upload speed (10MB): 22.37kBps

DSL with pipelining:

TIME (startup): 0.437352895737 up, 0.185742139816 down
TIME (1x 200B): 0.493880987167 up, 0.202013969421 down
TIME (10x 200B): 5.15211510658 up, 2.04516386986 down
TIME (1MB): 43.141931057 up, 2.09753513336 down
TIME (10MB): 416.777194977 up, 19.6058299541 down
upload per-file time: 0.515s
upload speed (1MB): 23.46kBps
upload speed (10MB): 24.02kBps

The in-colo tests showed roughly the same improvement to upload speed, but very little change to the per-file time. The RTT time there is shorter (ping time is about 120us), which might explain the difference. But I think the slowdown lies elsewhere. Pipelining shaves about 30ms off each file, and increases the overall upload speed by about 10%.

colo no pipelining:

TIME (startup): 0.29696393013 up, 0.0784759521484 down
TIME (1x 200B): 0.285771131516 up, 0.0790619850159 down
TIME (10x 200B): 3.23165798187 up, 0.849181175232 down
TIME (100x 200B): 31.7827451229 up, 8.95765590668 down
TIME (1MB): 1.00738477707 up, 0.347244977951 down
TIME (10MB): 7.12743496895 up, 2.9827849865 down
TIME (100MB): 70.9683670998 up, 25.6454920769 down
upload per-file time: 0.318s
upload per-file times-avg-RTT: 83.833386
upload per-file times-total-RTT: 20.958347
upload speed (1MB): 1.45MBps
upload speed (10MB): 1.47MBps
upload speed (100MB): 1.42MBps

colo with pipelining:

TIME (startup): 0.262734889984 up, 0.0758249759674 down
TIME (1x 200B): 0.271718025208 up, 0.0812950134277 down
TIME (10x 200B): 2.80361104012 up, 0.838641881943 down
TIME (100x 200B): 28.4790999889 up, 9.36092710495 down
TIME (1MB): 0.853738069534 up, 0.337486028671 down
TIME (10MB): 6.6658270359 up, 2.67381596565 down
TIME (100MB): 64.6233050823 up, 26.5593090057 down
upload per-file time: 0.285s
upload per-file times-avg-RTT: 77.205647
upload per-file times-total-RTT: 19.301412
upload speed (1MB): 1.76MBps
upload speed (10MB): 1.57MBps
upload speed (100MB): 1.55MBps

I want to run some more tests before landing this patch, to make sure it's really doing what I though it should be doing. I'd also like to improve the automated speed-test to do a simple TCP transfer to measure the available upstream bandwidth, so we can compare tahoe's upload speed against the actual wire.

comment:4 Changed at 2009-05-18T23:46:26Z by warner

  • Milestone changed from eventually to 1.5.0
  • Resolution set to fixed
  • Status changed from new to closed

I pushed this patch anyways.. I think it'll help, just not as much as I was hoping for.

comment:5 Changed at 2017-01-11T00:35:09Z by Brian Warner <warner@…>

In 5e1d464/trunk:

Merge branch PR392

closes #392
closes ticket:2860

Note: See TracTickets for help on using tickets.