#375 new enhancement

include "retry backoff limit" in introducer announcements?

Reported by: warner Owned by:
Priority: minor Milestone: undecided
Component: code-network Version: 1.0.0
Keywords: backoff Cc:
Launchpad Bug:

Description

The recent #374 fix (trigger thundering-herd reconnections) got Peter and I thinking.. why are we using Foolscap's default one-hour backoff cap for storage nodes that are supposed to always be available? Should we use a quicker cap?

Exponential backoff is a nice technique that helps gracefully deal with overload situtations. The default one hour limit is a compromise between wanting to lower the traffic generated by systems that are down for a long time, and wanting to decease the reconnection latency. I wouldn't want to have Foolscap use a lower cap.. one connection attempt per second would be a lot of traffic. It feels like one connection attempt per hour won't hurt (too much), even if there are a lot of clients doing the connecting (my built-in assumption here is that there won't be *too* many clients connecting: 1000 clients would mean a connection attempt every 3.6 seconds, 10k clients means 3Hz, 100k clients means 30Hz, and 1M clients means 300Hz. So I think I'm ok with one hour for up to 10k clients.)

So we were kicking around the idea that the storage servers could publish a recommended maximum backoff time in their service announcements. If the server is supposed to be up most of the time (i.e. it is running in colo), then we have it publish a faster backoff limit, maybe 5 or 10 minutes. Friendnet nodes that are more intermittent can stick to the default one hour.

Not sure if this is a good idea or not, but I wanted to capture it.

Change History (1)

comment:1 Changed at 2008-06-01T20:57:51Z by warner

  • Milestone changed from eventually to undecided
Note: See TracTickets for help on using tickets.