#68 closed enhancement (fixed)

implement distributed introduction, remove Introducer as a single point of failure

Reported by: lvo Owned by: warner
Priority: major Milestone: 1.12.0
Component: code-network Version: 0.2.0
Keywords: scalability availability introduction gsoc docs anti-censorship test-needed research i2p-collab Cc: zooko, killyourtv@…, mmoya@…, K1773R@…, clashthebunny@…, leif@…, skydrome@…, tahoe-lafs-trac@…, tahoe-lafs.org@…, zl29ah@…, vladimir@…
Launchpad Bug:

Description (last modified by zooko)

I am quite sure you are aware of the problem [of] an introducer [...] crash bringing everything down. I read your roadmap.txt but I didn't find anything specific to address this. May I suggest using introducers.furl [...] where multiple entries can be used and the information is updated to all introducers at the same time when a peer makes an update.

Also, upon adding a new introducer, there should be a way to discover all the info currently on the existing introducer. I think I am making this part sounds more trivial than it is.

Thanks Lu

Attachments (26)

DualInroducerScenario1.jpeg (75.0 KB) - added by writefaruq at 2010-05-27T14:30:54Z.
Dual Introducer Scenario1 (26/5/10)
DualInroducerScenario1-Modified.png (97.2 KB) - added by writefaruq at 2010-05-27T14:31:28Z.
Dual Introducer Scenario 1 Modified (27/5/10)
client(can-subscribe-to-multi-introducer-backward-compat).dpatch (5.1 KB) - added by writefaruq at 2010-06-12T18:06:30Z.
Given a file "introducers" in client basedir, each line containing single introducer_furl, this patch can subscribe to all of them keeping backward compatibility
connected_to_introducers.png (29.5 KB) - added by writefaruq at 2010-06-17T20:04:15Z.
Client's welcome page shows a list of connected introducers.
client(can-show-connected-introducers-in-welcome-page).dpatch (5.7 KB) - added by writefaruq at 2010-06-17T20:06:06Z.
Serving the connection status to multiple introducers, still backwrad compatible
root(can-show-connected-introducers-in-welcome-page).dpatch (1.0 KB) - added by writefaruq at 2010-06-17T20:07:39Z.
welcome(can-show-connected-introducers-in-welcome-page).dpatch (1.4 KB) - added by writefaruq at 2010-06-17T20:08:16Z.
test_multi_introducers.py (1.2 KB) - added by writefaruq at 2010-07-05T17:45:07Z.
Demo test file that checks if the number of introducer_clients is same as the number of introducers_furls found in "introducers" cfg file
test_root.py (1009 bytes) - added by writefaruq at 2010-07-07T22:54:30Z.
corrected test for checking the use of introducer_furl by root.py
enable_client_with_multi_introducer.dpatch (10.0 KB) - added by writefaruq at 2010-07-09T18:06:45Z.
Revised patch for client.py web/root.py web/welcome.xhtml
test-run-after-client_py-web-root_py-welcome_xhtml-patched.log (79.9 KB) - added by writefaruq at 2010-07-09T18:16:36Z.
Test results after applying the previous enable-client-* patch
multiple-introducers-changes-in-architecture-configuration-running.dpatch (13.2 KB) - added by writefaruq at 2010-07-16T18:28:34Z.
doc chages for multiple introducers
test_root.2.py (877 bytes) - added by writefaruq at 2010-07-19T11:52:52Z.
corrected test for checking the use of introducer_furls by root.py (multiple introducer version)
test_introducers_cfg.py (1.1 KB) - added by writefaruq at 2010-07-22T12:12:52Z.
Check if a new "introducers" cfg file can be created and tahoe.cfg's introducer_furl can be written in this file
test_multi_introducers.2.py (640 bytes) - added by writefaruq at 2010-07-24T14:29:26Z.
Check if Client's number of introducer_clients equals to the number of furls in "introducers" file
test_introducers_cfg.2.py (1.0 KB) - added by writefaruq at 2010-07-31T17:09:38Z.
code refined by pyflakes
test_multi_introducers.3.py (544 bytes) - added by writefaruq at 2010-07-31T17:10:07Z.
code refined by pyflakes
test_root.3.py (850 bytes) - added by writefaruq at 2010-07-31T17:10:36Z.
code refined by pyflakes
test_multi_introducers.4.py (3.8 KB) - added by writefaruq at 2010-08-04T14:54:39Z.
Merged all tests
multiple-introducer-client-side-002.dpatch (5.3 KB) - added by writefaruq at 2010-08-04T15:02:48Z.
multi-introducers doc patch
multiple-introducer-client-side-001.dpatch (11.1 KB) - added by writefaruq at 2010-08-07T10:36:51Z.
Client side code changes combined together, fixed warn_flag error
multiple-introducer-client-side-001-x1.dpatch (2.0 KB) - added by writefaruq at 2010-08-07T20:00:21Z.
Fixed warn_flag error
multiple-introducer-client-side-001-x2.dpatch (4.9 KB) - added by writefaruq at 2010-08-09T07:26:32Z.
tweaks to pass the full-tests
ticket68-multi-introducer.tar.gz (1.2 MB) - added by writefaruq at 2010-10-24T05:43:06Z.
A snapshot of working repository
incident-2010-10-31-082948-tx5qoxy.flog.bz2 (7.7 KB) - added by Myckel at 2010-10-31T07:59:59Z.
First incident report (after shutdown, before making dir)
incident-2010-10-31-083037-4o3degq.flog.bz2 (8.4 KB) - added by Myckel at 2010-10-31T08:00:48Z.
2nd incident log (after mkdir)

Download all attachments as: .zip

Change History (151)

comment:1 Changed at 2007-06-29T18:32:46Z by zooko

  • Owner changed from somebody to zooko
  • Status changed from new to assigned

I would like to make a fully decentralized introduction scheme, such as the one I had in Mnet. Basically, every node would be an Introducer. This doesn't scale up in terms of number of nodes in the network unless we add some cleverness to it, but currently Tahoe networks are neither capable of scaling up to more than 100 nodes nor required to scale up to more than 100 nodes. (See UseCases.)

I will add to source:roadmap.txt about this issue.

comment:2 Changed at 2007-06-29T23:44:21Z by zooko

I updated source:roadmap.txt . I'm tempted to think we should go directly to connection management v4 and not stop at v3.

comment:3 Changed at 2007-07-01T03:34:46Z by zooko

Sam Stoller mentioned that it would be cool if peer nodes were discoverable through Bonjour. I agree!

comment:4 Changed at 2007-07-02T19:43:27Z by warner

  • Milestone set to release 1.0
  • Summary changed from One introducer = single point of failure to implement distributed introduction, remove Introducer as a single point of failure

This would be nice, but I think it's a lower priority than the connection management. I think we'll hit scaling problems earlier because of the number of connections held open by client nodes (windows boxes with minimal memory and python-vs-windows limitations) than because of the number of connections held open by a central Introducer (which will be running on a well-provisioned unix box, with plenty of memory and bandwidth, in a professionally-run colo facility). We can also introduce multiple central Introducers without too much effort, which would make them even more available.

Also note that an Introducer failure will prevent new clients from seeing the mesh, but will not prevent already-running clients from continuing to use each other, so such a failure is somewhat graceful.

I'm also thinking that relay is higher priority than distributed introduction.

Also, I'm thinking that we may want to provide for some sort of private mesh in the future, which will mean creating some sort of "membership badge" credentials, which would need to be checked at connection establishment (or lease-request) time, and we might want to at least lay out some requirements for that before building the distributed introduction scheme.

That said, if we choose to build a single global mesh, I very much like the gossip approach to learning about other nodes, and zeroconf/Bonjour would also be pretty slick (although I can only see it being useful when there is already a tahoe node on your local LAN), so I would like to see those implemented sooner or later.

-Brian

comment:5 Changed at 2007-08-14T18:54:45Z by warner

  • Component changed from code to code-network

comment:6 Changed at 2007-08-30T23:43:37Z by warner

Zooko and I hashed out a good scheme to do this while I was in Boulder this week. Here's the plan:

  • nodes start up in a network of size one
  • the node's UI offers a button labled "Invite a Friend", which does some internal setup and emits a FURL to email/IM/paste to your friend
  • your friend's node's UI has a button labeled "Accept an Invitation" which accepts this FURL.
  • accepting an invitation causes the two networks (yours and your friend's) to be merged.
    • in a future release, they won't necessarily be merged, but for now the mesh is fully connected

With this approach, there is no single introducer. In addition, it enables the following interesting properties:

  • your node will remember who invited them to join. Each node will know the pet name path from themselves to every other node in the mesh.
  • the node will have a UI to show you how much storage is being used by whom, and options to impose individual quotas, or cut them off entirely

The vdrive server is still an outstanding question: until we get distributed dirnodes (#115), each dirnode will still be attached to a single host, which needs to be visible to anyone who's interested in reading the directory. So our first release that removes the Introducer will probably retain the vdrive server, and we'll have to figure out a reasonable UI that handles this.

comment:7 Changed at 2007-08-31T01:17:41Z by warner

In one of our current designs, the API for the PersonalIntroducer? held by each node on each other node (not necessarily reified as a distinct Foolscap-Referenceable object, but that would be an easy implementation) would have the following API:

class RIPersonalIntroducer(RemoteInterface):
  def get_storage_server():
      return RIPersonalStorageServer
  def tell_me_about_peers(i_already_know_about=ListOf(PeerIdentifiers)):
      return ListOf(RIPersonalIntroducer)

  def please_meet(my_petname_for_them=str, them=PeerIdentifier):
      return RIPersonalIntroducer

and RIPersonalStorageServer would have the same API as the current RIStorageServer, with allocate_buckets, etc.

tell_me_about_peers would use the provided list to filter out all peers that the asker already knows about, and would then go to all of the remaining peers with a please_meet message to produce new RIPersonalIntroducer facets for the asker, then return a list of these facets. This is the place where Horton will go: until we get that, each node that does an introduction gets to take advantage of the facet that ought to be reserved for the asker (i.e. the introductee is vulnerable to the introducer). With Horton, the same attack exists, but the two nodes will see different identifiers for the MitM, so that if Bob ever comes to learn about Carol through a different path, he will perceive her as being different than the pseudoCarol that Alice gave him.

With a maximally transparent Horton built in to Foolscap, the tell_me_about_peers method just returns a list of Alice's existing RIPersonalIntroducer proxies, and Foolscap will do the Horton work to transform them into Bob-oriented proxies. Also, the please_meet method would move into the customized Stub class (where it would behave much the same way).

comment:8 Changed at 2007-11-13T18:34:52Z by zooko

This is an important feature, but I don't think we are going to get it done in the next six weeks, so I'm putting it in Milestone 1.0.

comment:9 Changed at 2007-12-18T00:08:15Z by zooko

  • Keywords scaling added

comment:10 Changed at 2007-12-18T00:09:21Z by zooko

  • Keywords scalability added

comment:11 Changed at 2008-01-22T20:33:57Z by zooko

Here is a simple scheme for decentralized introduction. It probably scales up at least as well as the rest of our current network architecture does — i.e. it is scalable enough for now.

First, implement #271 — "subscriber-only introducer client". DONE

Second, make announcement idempotent in the introducer — i.e. make it so that if a node announces themselves when they are already in the introducer's set of announced nodes, that the introducer ignores the announcement. (This makes sense, anyway, because the introducer doesn't need to inform any subscribers about the re-announcement since the clients will already have heard the earlier announcement from the introducer, and the only time a node would announce itself redundantly would be if that node were buggy.) DONE

Third, make "introducers" a class of publishable, subscribable thing like storage server and read-only storage server and upload helper (as per #271).

Fourth, make all publishers — introducer clients that send announcements — send their announcements to an evenly distributed subset of the introducers, namely the "Chord fingers" — the introducer halfway around the circle from the publisher, plus the one a quarter fo the way around the circle, etc.

Fifth, tell each introducer to subscribe to the introducer-announcements of a small set of other introducers — again choosing the Chord fingers — and whenever the introducer hears announcements from the introducers that it subscribes to, then it announces those announcements themselves, just as if it had just heard them from a client. (Of course, it still ignores any announcements of nodes which it has previously announced, as above.)

Now the load of handling introductions is evenly spread among all introducers, and there is no Single Point of Failure/Single Point of Load.

Each introducer receives log(N) redundant announcements of each new node, where N is the total number of introducers in the system.

Last edited at 2012-11-29T17:30:03Z by zooko (previous) (diff)

comment:12 Changed at 2008-02-27T20:30:11Z by zooko

See #295 for how to add access control for the authority to act as a server and the authority to act as a client on top of distributed introduction.

comment:13 Changed at 2008-03-21T22:28:42Z by zooko

  • Milestone changed from 1.0 to undecided

comment:14 Changed at 2008-04-15T04:09:54Z by zooko

  • Owner changed from zooko to nobody
  • Status changed from assigned to new

comment:15 Changed at 2009-08-04T15:20:31Z by zooko

This review of Tahoe-LAFS on arstechnica.com reminds me that while issue #68 seems relatively non-urgent to me because I know how little the grid relies on the introducer and how easy it is to replicate introducers, it would be much better if we could simply say "the grid is fully decentralized" and then introductory articles like this one could optimize out a whole paragraph describing the introducer.

http://allmydata.org/pipermail/tahoe-dev/2009-August/002509.html

Also, fixing this ticket would be fun. Someone should do it. :-)

comment:16 Changed at 2009-12-13T02:52:14Z by davidsarah

  • Keywords availability added; scaling removed

comment:17 Changed at 2009-12-13T04:04:58Z by davidsarah

  • Keywords introducer added

comment:18 Changed at 2010-03-12T17:42:29Z by davidsarah

  • Keywords gsoc added

comment:19 Changed at 2010-03-12T23:45:41Z by davidsarah

  • Keywords introduction added; introducer removed

comment:20 Changed at 2010-03-16T16:50:37Z by zooko

To get started on this, see src/allmydata/introducer/client.py and src/allmydata/introducer/server.py. Each of those files is fairly small and you should be able to read through them both and understand the current implementation. See also src/allmydata/introducer/interfaces.py which defines the interfaces between the components.

Changed at 2010-05-27T14:30:54Z by writefaruq

Dual Introducer Scenario1 (26/5/10)

Changed at 2010-05-27T14:31:28Z by writefaruq

Dual Introducer Scenario 1 Modified (27/5/10)

comment:21 Changed at 2010-05-28T15:18:03Z by writefaruq

Snapshot A: Client1 and client2 is connected with Introducer X and Y. But Client3 is only connected to IntroducerY.

Snapshot B: IntroducerY becomes down, Client4 joins with a configured to talk to Introducer X and Y. Client3 has no knowledge about Client4 and vice versa.

Snapshot C: Introducer X becomes down and Y becomes up. So all clients come to know about each other.

The main target of this scenario is to enable clients to talk to multiple introducers.

Changed at 2010-06-12T18:06:30Z by writefaruq

Given a file "introducers" in client basedir, each line containing single introducer_furl, this patch can subscribe to all of them keeping backward compatibility

comment:22 Changed at 2010-06-12T18:13:09Z by writefaruq

Backward compatibility is maintained by:

  • tahoe.cfg can have "introducer_furl" in client section as before.
  • If "introducers" configuration file is not found it will work as before, i.e one introducer from tahoe.cfg

Note this patch does not update client's webui with all connected introducers.

comment:23 Changed at 2010-06-13T18:58:45Z by zooko

Faruq: glad to see this patch! Okay here are my comments.

+        # keep self.introducer_furl intact to break any reference to it       

What does this comment mean? Do you mean keep it in order not to break any reference to it?

+            for introducer_furl in  f.read().split('\n'):
+                if (introducer_furl == '') or (introducer_furl == '\n'):

It can't be equal to '\n' after a .split('\n'). Maybe change this to:

+            for introducer_furl in  f.read().split('\n'):
+                if not introducer_furl.strip():

Now this code needs tests. Let's save this code aside, write a unit test which turned red, and then put this patch back into place and see if it turns the unit test green. So the unit test could, for example, populate the "introducer" file with two introducers, then instantiate the Client object (from src/allmydata/client.py), then invoke some method of that Client object which it will handle correctly only if it knows about both of the introducers.

Oh, I've got to go to lunch. I'll look at this more later!

comment:24 follow-up: Changed at 2010-06-15T01:48:46Z by writefaruq

Thanks for corrections. Regarding the reference, that's my intent, not to break any reference to it. If this code is fine, I'd like to add another patch that changes web/root.py and web/welcome.xhtml to show the connected introducers etc.

Changed at 2010-06-17T20:04:15Z by writefaruq

Client's welcome page shows a list of connected introducers.

Changed at 2010-06-17T20:06:06Z by writefaruq

Serving the connection status to multiple introducers, still backwrad compatible

comment:25 Changed at 2010-06-17T20:18:45Z by writefaruq

These patches (probably one patch would be better) fetches the connection status to multiple introducers in somewhat crude way. Tested with enabling and disabling introducers. These patches are also backward compatible, not breaking any reference to old connected_to_introducer(), but new code should call connected_to_introducers() that also supply the status of the single introducer.

comment:26 Changed at 2010-06-22T01:11:30Z by zooko

Nice work! Next, please write a unit test of these patches. One unit test should verify that the client learns about a server when that server is announced to one introducer and also when that server is announced to the other introducer. The unit test should use a "mock IntroducerClient class" to test that code that your patch changed in src/allmydata/client.py@4193#L173. The idea is that the code in src/allmydata/client.py thinks that it is instantiating an instance of IntroducerClient, but actually the test code has set it up so that when the code under test instantiates IntroducerClient() then instead it gets an instance of the mock introducer client.

You can accomplish this using the Python mock library's mock.patch decorator. You can copy the way we use mock.patch in other places in our tests if you like to learn by code copying (I like to learn that way).

http://www.voidspace.org.uk/python/mock/

Changed at 2010-07-05T17:45:07Z by writefaruq

Demo test file that checks if the number of introducer_clients is same as the number of introducers_furls found in "introducers" cfg file

comment:27 Changed at 2010-07-05T19:25:36Z by zooko

Nice work! Now that there is a unit test we can start thinking about actually committing these patches to trunk.

This test would notice if the code under test failed to read the .tahoe/introducers config file correctly or failed to create an IntroducerClient for each one, right?

Now can you write a test (or extend the test you already wrote) to notice if the code under test failed to subscribe to all of the introducers that it knew about? For example, maybe the test would configure two introducers in the .tahoe/introducers file, mock.patch() the IntroducerClient class, then instantiate the src/allmydata/client.py Client class, then check that two mock IntroducerClients got created and that each of them had their .subscribe_to() method called.

After that, I can't think of any way that your patch to allmydata/client.py would have a bug which would not be caught by these tests. Can you?

comment:28 in reply to: ↑ 24 Changed at 2010-07-06T03:13:34Z by zooko

Replying to writefaruq:

Thanks for corrections. Regarding the reference, that's my intent, not to break any reference to it.

Instead of doing this, please search the codebase for any other reference to the self.introducer_furl attribute and change that code to reference the new self.introducer_furls attribute instead. Note also that any such code will have unit tests that will turn red if your patch which removes self.introducer_furl breaks that code, so run the unit tests after you have removed self.introducer_furl and after you have searched the codebase for other code that uses introducer_furl.

Likewise in an earlier comment you mentioned:

These patches are also backward compatible, not breaking any reference to old connected_to_introducer(), but new code should call connected_to_introducers() that also supply the status of the single introducer.

This is not the sort of "backward compatibility" that we want. If you are adding a new feature in the code or changing a feature in the code then instead of leaving the old feature in place in the code in case anyone is calling it, we prefer to find all callers and update them.

On the other hand the things that you said about backward compatibility of the tahoe.cfg file is the sort of "backward compatibility" that we want. That has to do with users who might be using an older version of Tahoe-LAFS and then upgrade to a newer version which has your patch. We want the behavior of the new version to be some good behavior that they expected even if they do not make any change to their config files.

comment:29 follow-up: Changed at 2010-07-06T18:05:57Z by writefaruq

allmydata.client.Client.self.introducer_furl is called from allmydata.web.root.Root for fetching the list of introducer furls. But that can be replaced by new code that is tested by test_root.py. self.introducer_furl is also called from various testing modules, e.g. test/common.py (line 471). I'm not sure if they need to be patched at this moment.

Last edited at 2010-07-06T18:14:32Z by writefaruq (previous) (diff)

comment:30 in reply to: ↑ 29 Changed at 2010-07-06T19:23:57Z by zooko

Replying to writefaruq:

allmydata.client.Client.self.introducer_furl is called from allmydata.web.root.Root for fetching the list of introducer furls. But that can be replaced by new code that is tested by test_root.py. self.introducer_furl is also called from various testing modules, e.g. test/common.py (line 471). I'm not sure if they need to be patched at this moment.

As I mentioned on IRC, I want you to do "test-driven development" on this part. Step 1 is to remove the attribute introducer_furl from the allmydata.client.Client class. Step 2 is to run the complete (current) test suite and see which tests, if any, go red. Step 3 is to think about the places that you know of in the code that refer to the old, now-removed attribute, and think about whether the tests that are currently red are the right tests to exercise those places of the code. If they are not the right way to test that code (they test that code only "by accident", in some sense, or you think it is a bad way to test that code for some reason) then write a new test that tests that code. Now for the important point in "test-driven development": you are not allowed to fix the bad code which refers to the now-deleted introducer_furl attribute until you have a red test which you think is a good test for that code! Step 4: fix the code. :-)

Changed at 2010-07-07T22:54:30Z by writefaruq

corrected test for checking the use of introducer_furl by root.py

comment:31 Changed at 2010-07-08T01:46:38Z by zooko

  • Cc zooko added
  • Owner changed from nobody to writefaruq

comment:32 Changed at 2010-07-08T01:46:49Z by zooko

  • Milestone changed from eventually to 1.8.0

comment:33 Changed at 2010-07-08T01:49:20Z by zooko

Looks good! Except, heh heh heh. Isn't this test testing that data_introducer_furl() queries the client object's .introducer_furl method? Maybe you should now change the test to say that if the data_introducer_furl() method queries the client object's .introducer_furl method then it fails the test, but if it queries the client object's .introducer_furls attribute instead then it passes the test?

Then run it and confirm that it fails the test.

Then fix it!

:-)

Changed at 2010-07-09T18:06:45Z by writefaruq

Revised patch for client.py web/root.py web/welcome.xhtml

comment:34 Changed at 2010-07-09T18:12:08Z by zooko

I have some questions about how decentralized (gossip-based) introduction is supposed to work. Faruq (and everyone who cares about decentralized introduction!) please tell me if my assumptions are wrong.

Assumption 1: there will be a flat text file in your ".tahoe" base dir named "introducers" containing a list of introducer furls that the node will read at start-up.

Assumption 2: whenever the node learns about new introducers it will write the furl of that new introducer into the file.

Assumption 3: if there is no "introducers" file at startup then it will instead look into the .tahoe/tahoe.cfg file to find the "introducer.furl" entry (which is how introducer was configured up until Tahoe-LAFS v1.7.0), and if it finds it then it will write it into the ".tahoe/introducers" file and use it.

Assumption 4: if there is an "introducers" file at startup then it will *not* look into the .tahoe/tahoe.cfg file to find the "introducer.furl" entry, and any entry which is in there will be ignored.

Question 1: is this what you are trying to implement, Faruq?

Question 2: is this what people want to use in Tahoe-LAFS v1.8?

Regards,

Zooko

Changed at 2010-07-09T18:16:36Z by writefaruq

Test results after applying the previous enable-client-* patch

comment:35 Changed at 2010-07-09T18:25:26Z by writefaruq

Assumption 1 is implemented and tested.

Regarding assumption 2 and later part of 3:

"and if it finds it then it will write it into the ".tahoe/introducers" file and use it"

is not implemented.

Assumption 4 was not considered before.

comment:36 Changed at 2010-07-10T01:40:44Z by davidsarah

  • Keywords review-needed added

Review needed for GSoC mid-term evaluations.

comment:37 Changed at 2010-07-10T22:51:07Z by zooko

Faruq: hey we're making progress! Maybe we could even finish assumption 1, the latter 3 and 4 and Terrell's comment that it should warn if it is ignoring an old setting:

http://tahoe-lafs.org/pipermail/tahoe-dev/2010-July/004636.html

If we finished that of behaviors, including tests (which I think you have already done a pretty good job of) and docs, then we could commit that to trunk and people could start using it even before we implement assumption 2. What do you think?

comment:38 follow-up: Changed at 2010-07-11T16:53:12Z by writefaruq

Combining assumption 1, 3-4 and Terrell's comment the following strategy can be coded into Client.

Step 1: Try to load "basedir/introducers"

Step 2A: If "basedir/introducers" found: a) load introducer furls from this file b) warn if there is any introducer_furl entry in tahoe.cfg

Step 2B: If no "basedir/introducers" found: a) create one "basedir/introducers" b) write introducer_furl entry from tahoe.cfg to this file.

If this is fine, I can proceed to implement this strategy.

comment:39 in reply to: ↑ 38 ; follow-up: Changed at 2010-07-11T20:47:40Z by davidsarah

Replying to writefaruq:

Combining assumption 1, 3-4 and Terrell's comment the following strategy can be coded into Client.

Step 1: Try to load "basedir/introducers"

Step 2A: If "basedir/introducers" found: a) load introducer furls from this file b) warn if there is any introducer_furl entry in tahoe.cfg

Step 2B: If no "basedir/introducers" found: a) create one "basedir/introducers" b) write introducer_furl entry from tahoe.cfg to this file.

For an existing basedir, 2B b) would cause the introducer_furl to be written to basedir/introducers on the first run, and then 2A b) would cause a warning on subsequent runs. The warning seems unnecessary in this case, since there's no reason to believe the user was confused about the config settings; they were changed automatically.

comment:40 in reply to: ↑ 39 ; follow-up: Changed at 2010-07-12T04:10:32Z by zooko

Replying to davidsarah:

For an existing basedir, 2B b) would cause the introducer_furl to be written to basedir/introducers on the first run, and then 2A b) would cause a warning on subsequent runs. The warning seems unnecessary in this case, since there's no reason to believe the user was confused about the config settings; they were changed automatically.

That's a good point, but how could we do better? I don't think it is a good idea to automatically edit the tahoe.cfg file (to delete the old introducer.furl). Currently Tahoe-LAFS never edits that file -- it is for humans to edit only. I think it should still be a warning because we don't want the human to look into the tahoe.cfg file, see the introducer.furl there, and think that they have now seen the introducer config. We could suppress the warning in the case that tahoe.cfg's introducer.furl and the "introducers" file are the exact same thing (i.e. there is only one entry in "introducers" and it is this one).

Any other ideas?

comment:41 Changed at 2010-07-12T15:39:58Z by zooko

Faruq: your strategy in comment:38 sounds perfect to me. Except for the open question about whether or how to indicate warnings to the user, then the only other outstanding issue is that this change needs docs.

All of the following docs need to be updated to accept this into trunk:

I think you are close to getting this first working version completely implemented, doc'ed, tested, and ready for inclusion in trunk.

comment:42 in reply to: ↑ 40 Changed at 2010-07-12T16:47:33Z by davidsarah

Replying to zooko:

We could suppress the warning in the case that tahoe.cfg's introducer.furl and the "introducers" file are the exact same thing (i.e. there is only one entry in "introducers" and it is this one).

I think we should do this.

comment:43 Changed at 2010-07-12T16:48:12Z by davidsarah

  • Keywords docs added

comment:44 follow-up: Changed at 2010-07-12T22:28:55Z by writefaruq

I've drafted the following text. Please correct me!

For configuration.txt:

If a Tahoe grid has multiple introducers, each introducer's FURL must be placed in "BASEDIR/introducers" file. Each line of this file contains exactly one FURL entry. Any FURL entry found in tahoe.cfg will be copied to that file.

For architecture.txt:

By deploying multiple introducers in a Tahoe grid, the above SPoF challenge can be overcome. In that case if one introducer fails clients are still be able to get announcement about new servers from remaining introducers. This is our first step towards implementing a fully distributed introduction. For future releases, we have plans to enhance our distributed introduction, allowing any server to tell a new client about all the others.

For running.html:

To use multiple introducers, write all introducers' FURLs in "BASEDIR/introducers" file, one FURL per line.

comment:45 in reply to: ↑ 44 Changed at 2010-07-13T05:49:03Z by zooko

Faruq:

Great! Please go ahead and take my suggestions below then write documentation patches like these and attach a darcs patch to this ticket for just the documentation patches.

The current plan is to finish the strategy from comment:38, except that for

  • Step 2A: If "basedir/introducers" found: a) load introducer furls from this file b) warn if there is any introducer_furl entry in tahoe.cfg

change it to:

  • Step 2A: If "basedir/introducers" found: a) load introducer furls from this file b) warn unless there is exactly one introducer furl from this file and it is the same as the introducer_furl entry in tahoe.cfg

(This is as described in my comment:40 and davidsarah's comment:42.)

Also about your docs: consider that once your patches land in trunk then configuring the "introducers" file will be the preferred way to do it and the "introducer.furl" entry in tahoe.cfg will be supported only for backward-compatibility reasons and will not be recommended to new users. So the documentation should describe the "introducers" file as the way to configure it and mention the "introducer.furl" entry in tahoe.cfg only when explaining that such an entry, if it exists, will be automatically written into the "introducers" file.

Replying to writefaruq:

For configuration.txt:

If a Tahoe grid has multiple introducers, each introducer's FURL must be placed in "BASEDIR/introducers" file. Each line of this file contains exactly one FURL entry. Any FURL entry found in tahoe.cfg will be copied to that file.

Don't say "If" here, just say that this is the way to configure any introducers (regardless of if it is one or more). It is necessary to mention the automatic copying of the FURL entry from tahoe.cfg so that readers of configuration.txt will have a complete understanding and understand the backward-compatibility implications.

Also, please call it "Tahoe-LAFS" instead of "Tahoe" in docs. (For one thing, I don't want to have a name collision with http://sourceforge.net/projects/tahoe/ . For another thing, I think of "LAFS" as the protocol and the data formats and specification, and "Tahoe-LAFS" as the current Python implementation.)

For architecture.txt:

By deploying multiple introducers in a Tahoe grid, the above SPoF challenge can be overcome. In that case if one introducer fails clients are still be able to get announcement about new servers from remaining introducers. This is our first step towards implementing a fully distributed introduction. For future releases, we have plans to enhance our distributed introduction, allowing any server to tell a new client about all the others.

Nice!

For running.html:

To use multiple introducers, write all introducers' FURLs in "BASEDIR/introducers" file, one FURL per line.

Again, edit running.html so that the "BASEDIR/introducers" is the only method of configuring introducers. It is not necessary to mention the automatic copying of introducer.furl from tahoe.cfg in running.html.

Please for each patch that you submit write a descriptive patch name and description like these ones: 8ba536319689ec8e, 1de4d2c594ee64c8, d0706d27ea2624b5, 63b28d707b12202f, c18b934c6a8442f8, 7cadb49b88c03209, be6139dad72cdf49.

Okay, good work on this! I'm hoping that by the time I have to write a mid-term review for Google (which I guess I have to do by Friday), that I will be able to say that you've completed a working subset of your summer goal.

Please post the doc patch as a darcs patch and I will review it right away. Now what about test patches. You've already posted attachment:test_root.py and attachment:test_multi_introducers.py . Are those the complete set of tests for the "comment:38" strategy?

Oh no, looking at them I see that attachment:test_root.py is asking the code-under-test to look at the old .introducer_furl attribute. That is not right, it should instead be requiring the code-under-test to not look at the old .introducer_furl attribute and instead to look only at the .introducer_furls attribute.

I see that attachment:test_multi_introducers.py is requiring the code-under-test to have 1 introducer for the "introducer.furl" entry in tahoe.cfg plus however many are in the "introducers" file. But what "introducers" file is used for this test? When this test code runs it will be inside a temporary directory (named "_trial_temp") which will not already have any "introducers" file present.

Let's make the test code provide an introducers file to the code-under-test, something like this:

from allmydata.util.fileutil import write

INTRODUCER_FURLS=['furl1', 'furl2']

class T(unittest.TestCase):
    def test(self):
        write(MULTI_INTRODUCERS_CFG, '\n'.join(INTRODUCER_FURLS))
        # get a client and count of introducer_clients
        myclient = Client()
        ic_count = len(myclient.introducer_clients)
        self.failUnlessEqual(ic_count, 2)

That test would be testing that the Client discovers the two furls in the "introducers" file. Then we also need the following tests of the "comment:38" strategy:

  1. A test that makes sure that if there is a different furl in tahoe.cfg than what is in the "introducers" file that this different furl does not get used by Client but that instead a warning message is printed to stderr. (By the way, here is some test code for a different project that I wrote recently to make sure that the code under test writes a certain thing to stderr: http://tahoe-lafs.org/trac/trialcoverage/browser/trunk/trialcoverage/test/test_import_all.py?rev=32#L47 )
  2. A test that makes sure the code under test is doing Step 2B: If no "basedir/introducers" found: a) create one "basedir/introducers" b) write introducer_furl entry from tahoe.cfg to this file. So this test would create a "basedir/tahoe.cfg" (with an introducer.furl entry in it) but not a "basedir/introducers" file, instantiate the Client object, and then check that it has an introducer client object for the furl entry from the tahoe.cfg file, and then check that a new "basedir/introducers" file has been created with that furl in it.
  3. A test similar to the current attachment:test_root.py to make sure that the code which generates the WUI pages queries the right attributes.

Changed at 2010-07-16T18:28:34Z by writefaruq

doc chages for multiple introducers

comment:46 Changed at 2010-07-16T18:33:44Z by writefaruq

I have kept the multiple introducers config file name as usual. But "introducers.cfg" can be another alternative. Another question, is this file initially be generated for user like tahoe.cfg ?

Last edited at 2010-07-17T09:52:20Z by writefaruq (previous) (diff)

comment:47 follow-up: Changed at 2010-07-17T17:06:27Z by writefaruq

To implement modified comment:38 strategy, I re-structure the code in Client's init_introducer_clients like this:

        self.introducer_furls = []        
        
        # Try to load ""BASEDIR/introducers" cfg file
        cfg = os.path.join(self.basedir, MULTI_INTRODUCERS_CFG)
        if os.path.exists(cfg):
           f = open(cfg, 'r')
           for introducer_furl in  f.read().split('\n'):
                if not introducer_furl.strip():
                    continue
                self.introducer_furls.append(introducer_furl)
           f.close()            
        
        # read furl from tahoe.cfg
        ifurl = self.get_config("client", "introducer.furl", None)
        if ifurl not in self.introducer_furls: 
            self.introducer_furls.append(ifurl)
            f = open(cfg, 'a')
            f.writelines(ifurl)
            f.write('\n')
            f.close()
            if furl_count > 1:
                print "Warning! introducers config file modified."

But is warning to be sent to somewhere else? Which one should be called self.log() or log.msg() ?

Last edited at 2010-07-19T11:49:42Z by writefaruq (previous) (diff)

Changed at 2010-07-19T11:52:52Z by writefaruq

corrected test for checking the use of introducer_furls by root.py (multiple introducer version)

comment:48 Changed at 2010-07-19T11:58:35Z by writefaruq

This test counts the number of furls loaded by the Client and see if that is equal to the response of the query made in root.py. Tested with 0-2 introducers (in cfg file) and found working.

comment:49 in reply to: ↑ 47 ; follow-up: Changed at 2010-07-20T06:15:46Z by zooko

Replying to writefaruq:

But is warning to be sent to somewhere else? Which one should be called self.log() or log.msg() ?

You should use self.log() for logging (if the object in question subclasses from some class so that it has a self.log() method. In this case it does because Client's parent class Node defines a log() method.).

I wonder if there is a better way to communicate to the user than just logging a message. Not sure.

comment:50 Changed at 2010-07-20T06:18:25Z by zooko

I'm really not sure that I agree with Brian's comment in http://tahoe-lafs.org/pipermail/tahoe-dev/2010-July/004663.html . The way Brian proposed and Faruq agreed to do it means that there are "two ways to do it"--you can either edit your tahoe.cfg's introducer.furl or you can edit your introducer.furls file. Users who see one of them may assume that it is the only one and then be surprised when they get different behavior than they expected (due to the existence of the other one). I guess I'm too sleepy to go into detail right now, but I want Faruq to know that I looked at this ticket tonight. :-)

comment:52 in reply to: ↑ 49 ; follow-up: Changed at 2010-07-20T17:12:03Z by davidsarah

Replying to zooko:

Replying to writefaruq:

But is warning to be sent to somewhere else? Which one should be called self.log() or log.msg() ?

You should use self.log() for logging (if the object in question subclasses from some class so that it has a self.log() method. In this case it does because Client's parent class Node defines a log() method.).

Yes. But for displaying a warning to the user, I would print >>sys.stderr. (For tests, sys.stderr can be captured; see the existing tests in source:src/allmydata/test/test_runner.py .)

comment:53 in reply to: ↑ 52 ; follow-up: Changed at 2010-07-21T16:46:47Z by zooko

Replying to davidsarah:

Yes. But for displaying a warning to the user, I would print >>sys.stderr. (For tests, sys.stderr can be captured; see the existing tests in source:src/allmydata/test/test_runner.py .)

That works for cli scripts, but for the Tahoe-LAFS node itself (unless it launched with tahoe run or a possible future tahoe start --nodaemon), where would lines written to stderr go? I would hope that they would be logged, but it is possible they would be silently dropped.

Last edited at 2010-07-21T16:47:15Z by zooko (previous) (diff)

comment:54 in reply to: ↑ 53 Changed at 2010-07-22T04:45:51Z by davidsarah

Replying to zooko:

Replying to davidsarah:

Yes. But for displaying a warning to the user, I would print >>sys.stderr. (For tests, sys.stderr can be captured; see the existing tests in source:src/allmydata/test/test_runner.py .)

That works for cli scripts, but for the Tahoe-LAFS node itself (unless it launched with tahoe run or a possible future tahoe start --nodaemon), where would lines written to stderr go?

Good point. But the config files are only read at startup, so perhaps tahoe start could read and parse them just in order to display any warnings, before launching the node.

(I realize this doesn't guarantee that the contents of the files haven't changed between when tahoe start reads them and when the node does, but that would be very unusual.)

Alternatively, a solution to #71 ("client node probably started") might allow the node to communicate messages to the runner process at startup.

Changed at 2010-07-22T12:12:52Z by writefaruq

Check if a new "introducers" cfg file can be created and tahoe.cfg's introducer_furl can be written in this file

comment:55 Changed at 2010-07-23T05:36:16Z by zooko

  • Keywords review-needed removed

Unsetting review-needed. This patch is not ready to be reviewed and then applied to trunk. However, it would probably be a good help and encouragement to Faruq if anyone would look at his code, docs, or comments and give him your thoughts. :-)

Changed at 2010-07-24T14:29:26Z by writefaruq

Check if Client's number of introducer_clients equals to the number of furls in "introducers" file

comment:56 Changed at 2010-07-27T05:59:03Z by zooko

attachment:test_multi_introducers.2.py looks like a good test of whether the allmydata.client.Client correctly reads all of the entries from the "introducers" config file. Please run pyflakes on it (you can just run python setup.py flakes) and fix any warnings that pyflakes reports.

comment:57 Changed at 2010-07-27T06:18:45Z by zooko

Re: attachment:test_introducers_cfg.py please add a docstring to the test_introducer_clients_count() method saying what this test is looking for in the behavior of the code under test. The comment that comes with the attachment on trac says:

Check if a new "introducers" cfg file can be created and tahoe.cfg's introducer_furl can be written in this file

But of course a file can be created! I guess from looking at the code and the name of test_introducer_clients_count() that it is intended to do something like this:

    def test_read_introducer_furl_from_tahoecfg(self):
        """ Ensure that the Client reads the introducer.furl config item from
        the tahoe.cfg file. """

The basedir variable is unnecessary—remove it and replace os.path.join(basedir, "tahoe.cfg") with just "tahoe.cfg". The line at the end that reads MULTI_INTRODUCERS_CFG doesn't do anything—remove it. Otherwise this looks like a good test.

Changed at 2010-07-31T17:09:38Z by writefaruq

code refined by pyflakes

Changed at 2010-07-31T17:10:07Z by writefaruq

code refined by pyflakes

Changed at 2010-07-31T17:10:36Z by writefaruq

code refined by pyflakes

comment:58 Changed at 2010-08-04T07:07:27Z by zooko

Faruq:

Please merge all the tests into one file named test_multi_introducer.py.

Here is a branch to hold your work:

http://tahoe-lafs.org/trac/tahoe-lafs/browser/ticket68-multi-introducer

Here is a view of the buildbot which shows the history of builds of your branch (only showing the Supported Builders):

http://tahoe-lafs.org/buildbot/waterfall?builder=hardy-amd64&builder=windows&builder=Kyle+OpenBSD-4.6+amd64&builder=Arthur+lenny+c7+32bit&builder=David+A.+OpenSolaris+i386&builder=Ruben+Fedora&builder=Eugen+lenny-amd64&builder=Zooko+zomp+Mac-amd64+10.6+py2.6&builder=tarballs&branch=ticket68-multi-introducer

Please attach your most recent patches to this ticket and I will apply them to that branch and then trigger the buildbot to run the tests on all of our buildslaves.

Changed at 2010-08-04T14:54:39Z by writefaruq

Merged all tests

Changed at 2010-08-04T15:02:48Z by writefaruq

multi-introducers doc patch

comment:59 Changed at 2010-08-04T15:57:52Z by writefaruq

The last three files: attachment:multiple-introducer-client-side-001.dpatch attachment:multiple-introducer-client-side-002.dpatch attachment:test_multi_introducers.4.py (patch sending failed for some unknown reason) should be applied/added to test repo.

comment:60 Changed at 2010-08-07T06:11:46Z by zooko

Okay I applied the two patches and I copied attachment:test_multi_introducers.4.py into src/allmydata/test/test_multi_introducers.py . Then I ran these tests with this command:

python setup.py test -s allmydata.test.test_multi_introducers

The output from that command ended with this message:

allmydata.test.test_multi_introducers
  TestClient
    test_introducer_count ...                                              [OK]
    test_read_introducer_furl_from_tahoecfg ...                            [OK]
    test_warning ... Warning! introducers config file modified.
                                                   [ERROR]
  TestRoot
    test_introducer_furls ...                                              [OK]

===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 279, in run
    testMethod()
  File "/Users/zooko/playground/tahoe-lafs/ticket68-multi-introducer/src/allmydata/test/test_multi_introducers.py", line 120, in test_warning
    self.failUnlessEqual(True, myclient.warn_flag)
exceptions.AttributeError: Client instance has no attribute 'warn_flag'

allmydata.test.test_multi_introducers.TestClient.test_warning
-------------------------------------------------------------------------------
Ran 4 tests in 0.317s

FAILED (errors=1, successes=3)

Have you tried this yourself? I would have expected you to get the same error.

Changed at 2010-08-07T10:36:51Z by writefaruq

Client side code changes combined together, fixed warn_flag error

comment:61 follow-up: Changed at 2010-08-07T10:41:38Z by writefaruq

This error should be escaped by undo the last patch attachment:multiple-introducer-client-side-001.dpatch and apply the latest one. I've replaced with the correct version now.

comment:62 Changed at 2010-08-07T18:04:04Z by zooko

Faruq: now that we've started storing your patches in this branch: ticket68-multi-introducer, there is no longer a good way to undo the old patches. So would you please provide a patch which gets added on top of the patches that are already in your branch? One way to do this would be to get a new repo from your branch, like this:

darcs get --lazy http://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer

Then cd into the ticket68-multi-introducer repository and change the code there in to make the tests pass. But do not use darcs unrecord, darcs obliterate, or darcs amend-record in that repository, because those commands work by removing patches from the repository, and we can't (or don't want to) remove patches from the repository http://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer on the server.

comment:63 Changed at 2010-08-07T18:04:35Z by zooko

Okay I merged trunk (which is currently 1.8.0rc1) into the ticket68-multi-introducer branch and ran a full build here are the results. Then I applied your three patches from comment:59 and ran a full build again: here are the results.

comment:64 in reply to: ↑ 61 Changed at 2010-08-07T18:10:07Z by zooko

Replying to writefaruq:

This error should be escaped by undo the last patch attachment:multiple-introducer-client-side-001.dpatch and apply the latest one. I've replaced with the correct version now.

I can't undo the last patch attachment:multiple-introducer-client-side-001.dpatch because, as described in comment:62, we are going to maintain a history of all patches on ticket68-multi-introducer. For example, here is the history of such patches: http://tahoe-lafs.org/trac/tahoe-lafs/log/ticket68-multi-introducer/ and the one that you attached as the last attachment:multiple-introducer-client-side-001.dpatch I have now applied to that branch as [20100801142304-e2516-411e80c14e29287e8d9ce700e7b359e23fb45105].

Changed at 2010-08-07T20:00:21Z by writefaruq

Fixed warn_flag error

comment:65 Changed at 2010-08-08T04:54:52Z by zooko

Faruq: did you run the tests after you fixed the warn_flag error? If you did, what do you think of the results? If you did not, please run the tests and paste the results in here.

My note in comment:60 tells you how to run the tests.

comment:66 Changed at 2010-08-08T09:38:33Z by writefaruq

I've tested after applying this patch. Test result is at here: http://pastebin.com/1Ac3b6Jk A summary is given below.

allmydata.test.test_multi_introducers
  TestClient
    test_introducer_count ...                                              [OK]
    test_read_introducer_furl_from_tahoecfg ...                            [OK]
    test_warning ... Warning! introducers config file modified.
                                                      [OK]
  TestRoot
    test_introducer_furls ...                                              [OK]

-------------------------------------------------------------------------------
Ran 4 tests in 0.465s

PASSED (successes=4)

Last edited at 2010-08-08T09:42:46Z by writefaruq (previous) (diff)

comment:67 Changed at 2010-08-08T14:28:28Z by zooko

Okay, good, now also please run more of the other tests to see if your patches broke anything else.

python setup.py test -s allmydata.test

Changed at 2010-08-09T07:26:32Z by writefaruq

tweaks to pass the full-tests

comment:68 Changed at 2010-08-09T22:14:17Z by zooko

  • Milestone changed from 1.8.0 to 1.9.0

comment:69 Changed at 2010-08-12T05:13:25Z by zooko

Just for reference, here is a hyperlink that shows you the most recent results of building the ticket68-multi-introducer branch on all of our Supported Builder: buildbot link

comment:70 Changed at 2010-08-12T06:04:50Z by zooko

Faruq: I committed your latest patches and triggered the buildbot to test them. Use the buildbot link to see the results (I committed them just now, so look for the builds that started at 22:41:21 PDT on 2010-08-11).

You can see the patches that are on the branch here: http://tahoe-lafs.org/trac/tahoe-lafs/log/ticket68-multi-introducer/

The builds haven't finished yet so I don't know whether all the tests passed on all platforms, but I'm going to sleep now. :-)

comment:71 Changed at 2010-08-12T22:43:29Z by zooko

  • Keywords review-needed added

Okay as you can see from the buildbot link that shows Supported Builders testing this branch the tests pass on the buildbot. Adding the review-needed tag to this ticket.

comment:72 Changed at 2010-10-20T04:47:40Z by freestorm

I added a question about multiple introducer to the FAQ wiki page. So after close this ticket, please edit FAQ page

comment:73 Changed at 2010-10-24T05:30:14Z by writefaruq

Full source available at

http://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer/

The final GSoC code is here

http://code.google.com/p/google-summer-of-code-2010-tahoe-lafs/downloads/detail?name=MOFaruque_Sarker.tar.gz&can=2&q=#makechanges

Some hints to use it

  • Get a fresh copy of tahoe-lafs
  • Apply the above patch(es)
  • Check configuration.txt to find steps to configure

A seond file "BASEDIR/introducers" configures introducers. It is necessary to write all FURL entries into this file. Each line in this file contains exactly one FURL entry. For backward compatibility reasons, any "introducer.furl" entry found in tahoe.cfg file will automatically be copied into this file. Keeping any FURL entry in tahoe.cfg file is not recommended for new users.

  • Edit BASEDIR/introducers and add FURLs for each introducer. Of course you need to run them before you get a FURL.
  • Play with them as you like.
Last edited at 2010-10-24T05:35:27Z by writefaruq (previous) (diff)

Changed at 2010-10-24T05:43:06Z by writefaruq

A snapshot of working repository

comment:74 Changed at 2010-10-30T09:58:53Z by Myckel

I've installed the snapshot on 2 systems. Started an introducer on both systems. Started a storage node on both systems with both furls. I can see the storage nodes appearing in the web interface of both introducers and the storage node web interface contains both introducers.

Shutdown one system, web interface still shows off-line system as active for introducer and storage. Trying to create a new directory causes the node to contact the off-line system and keeps busy with that (no time-out?). Request stays in "active operations" list, even after stopping the request.

comment:75 follow-up: Changed at 2010-10-30T15:55:52Z by zooko

Hm. Myckel: Could you please reproduce this and then about 10 seconds after you shutdown one storage server, click the "Report an Incident" button on the welcome page. Then again when you attempt to mkdir, please click the "Report an Incident" button a few seconds after you've done so.

Each time you click "Report an Incident" it creates a file in the logs/incidents. Please attach those files to this ticket.

Faruq: we should write a unit test of this workflow—create two introducers, create a storage server point at both introducers, create a storage client pointing at both introducers, shutdown one of the servers, then, um, then initiate an operation in the storage client, such as mkdir (which is what Myckel did manually) or any other operation that uses storage servers.

Changed at 2010-10-31T07:59:59Z by Myckel

First incident report (after shutdown, before making dir)

Changed at 2010-10-31T08:00:48Z by Myckel

2nd incident log (after mkdir)

comment:76 in reply to: ↑ 75 ; follow-up: Changed at 2010-10-31T08:05:11Z by Myckel

Replying to zooko:

Hm. Myckel: Could you please reproduce this and then about 10 seconds after you shutdown one storage server, click the "Report an Incident" button on the welcome page. Then again when you attempt to mkdir, please click the "Report an Incident" button a few seconds after you've done so.

Ok, files are attached. I hope they are useful, because after making the incident report I noticed that the storage server recovered. This might also not be related to the multiple introducer situation, because I've had it also happening when trying with volunteer grid (one storage node went off-line, I couldn't do anything any more, until restarting my storage node).

comment:77 follow-up: Changed at 2010-10-31T08:14:49Z by Myckel

I've restarted the storage node and introducer that I shutdown. Took a few minutes before the other storage node and introducer noticed the new storage node and introducer.

Is there some heartbeat or small time out in place?

comment:78 in reply to: ↑ 76 ; follow-up: Changed at 2010-10-31T13:06:14Z by zooko

Replying to Myckel:

Ok, files are attached. I hope they are useful, because after making the incident report I noticed that the storage server recovered. This might also not be related to the multiple introducer situation, because I've had it also happening when trying with volunteer grid (one storage node went off-line, I couldn't do anything any more, until restarting my storage node).

Wait, what? I'm confused. You created two introducers and two storage nodes, right? And then were you using one of the storage nodes to also be a gateway (== a storage client)? And then did you shut down the other one by running tahoe stop $BASEDIR on it?

comment:79 in reply to: ↑ 77 Changed at 2010-10-31T13:55:56Z by zooko

Replying to Myckel:

I've restarted the storage node and introducer that I shutdown. Took a few minutes before the other storage node and introducer noticed the new storage node and introducer.

Is there some heartbeat or small time out in place?

Yes, you retry to open a connection to each peer periodically, in an exponential back-off pattern (until you have backed off to trying only once per hour, at which point you keep trying at that rate indefinitely).

So if the peer was down for 5 minutes then it might take up to 5 minutes after it is brought back up before you reconnect to it.

comment:80 in reply to: ↑ 78 Changed at 2010-10-31T19:57:59Z by Myckel

Replying to zooko:

Wait, what? I'm confused. You created two introducers and two storage nodes, right? And then were you using one of the storage nodes to also be a gateway (== a storage client)? And then did you shut down the other one by running tahoe stop $BASEDIR on it?

Guess I was not so clear. This is what I did: 2 computers:

Computer 1: Run both an introducer and storage client (access it through the web interface).

Computer 2: Run both an introducer and a storage client (can access it trough the web interface, but don't bother with that).

Both introducers see both storage clients. Both storage clients say they are connected to the introducers. All fine so far.

Then I shutdown system 2, so NO tahoe stop $BASEDIR (I could also plug the power or do a hard reset). Then on system 1 I try to make a dir through the web interface, and then everything stays busy while it tries to contact the storage node/client on system 2.

comment:81 Changed at 2010-12-16T00:46:42Z by davidsarah

  • Keywords anti-censorship added

comment:82 Changed at 2011-06-26T05:47:10Z by zooko

  • Keywords test-needed added; review-needed removed

Faruq: as per comment:75, we should add a test for this case. Removing the review-needed tag and adding the test-needed flag.

comment:83 Changed at 2011-07-18T15:40:17Z by zooko

There's been some discussion of this ticket on the mailing list here and here and in the Tahoe-LAFS Weekly News.

comment:84 Changed at 2011-07-19T01:04:02Z by zooko

Out of time for v1.9.0! But anyone who loves this, please jump in. There's no time like the present! Do some manual testing of Faruq's patch, write a new patch, write unit tests, etc. :-)

comment:85 Changed at 2011-07-27T18:17:15Z by zooko

  • Milestone changed from 1.9.0 to soon
  • Owner changed from writefaruq to zooko
  • Status changed from new to assigned

comment:86 Changed at 2011-11-27T15:03:03Z by killyourtv

  • Cc killyourtv@… added

I've been using this patch with the ones in #1007 and #1010 (and foolscap tickets 150 and 151) on I2P with v1.8.3 and so far there haven't been any issues with the functionality.

It seems, however, that comments aren't allowed in $TAHOENODE/introducers. At least # doesn't work as a comment character. Having the ability to add comments would be a very welcome addition.

comment:87 Changed at 2012-02-14T00:30:44Z by killyourtv

Just to give a heads up: Most of the 18 storage nodes on our smallish grid on I2P have been using the multiple introducer patch since late November and things are still working well for us.

Also one of our users made some modifications that add colors to the introducer list as can be seen at http://i.imgur.com/aPbaY.png. After I refactor the patch for the current git revision I'll add it to this ticket.

comment:88 Changed at 2012-02-17T16:33:43Z by zooko

killyourtv: cool! Thank you for the note. If I recall correctly, Faruq's patch didn't have a thorough unit test.

I've noticed several good contributions to Tahoe-LAFS that are blocked on not having unit tests. I think a lot of people know how to write Python code but aren't sure what we expect in terms of testing, or don't know how to use trial's features to test results that are deferred until a subsequent event. I've been thinking that having a "unit test tutorial" party could be fun, where everyone who has a patch for Tahoe-LAFS that needs tests comes to the IRC channel and we pick one and walk through how to write tests for it...

comment:89 Changed at 2012-02-17T17:14:09Z by zooko

For anyone who wants to contribute to this ticket, the patches are available through darcs from this repo https://tahoe-lafs.org/trac/tahoe-lafs/browser/ticket68-multi-introducer , i.e. darcs get --lazy https://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer. killyourtv probably has them available in another form (unified diff?). It would be cool to port the darcs repo to be a git branch. If you do that, please add a comment to this ticket pointing to the git branch.

comment:90 follow-up: Changed at 2012-02-18T00:35:10Z by killyourtv

In case it's of use: https://github.com/kytvi2p/tahoe-lafs.

I made two branches, one for what I think should be close to 1.8.3 (it's not tagged) and one for 1.9.x (current git).

comment:91 Changed at 2012-04-20T10:41:02Z by killyourtv

The patchset has been refactored to apply on top of the current build at https://github.com/kytvi2p/tahoe-lafs/tree/68-multi

Although everything seems to work, the unit tests are (unfortunately) still broken.

comment:92 Changed at 2012-05-05T20:48:50Z by lebek

  • Owner changed from zooko to lebek
  • Status changed from assigned to new

I've been working on restructuring the new IntroducerClient so that we can implement multi-introducer grids without losing announcement deduplication logic in the client. My work so far is here: https://github.com/lebek/tahoe-lafs/compare/master...68-multi-introducer

The way configuration works is a new [client]introducer.furls option which takes multiple values (whitespace or line break separated). If both [client]introducer.furl and [client]introducer.furls are set the values are appended.

All introducer tests are passing at the moment, so theoretically this might work already. I'm still working on new tests specific to the multi-introducer setting. I also still need to make announcements idempotent in the introducer. Finally, I'll import Faruq's patches to the WUI and documentation, they shouldn't require much modification.

comment:93 Changed at 2012-05-14T12:29:05Z by lebek

  • Milestone changed from soon to 1.10.0

comment:94 Changed at 2012-12-06T20:13:34Z by zooko

  • Milestone changed from 1.10.0 to soon

This isn't ready for Tahoe-LAFS v1.10, but as recently discussed, we've decided we'd like to try integrating it into trunk ASAP! Lebek, or anyone else who wants to help, please see that mailing list discussion and reply on tahoe-dev, or this ticket, or join us at the next Weekly Dev Chat.

comment:95 Changed at 2012-12-06T21:34:31Z by davidsarah

  • Description modified (diff)

Removing obsolete reference to vdrive servers in the Description.

comment:96 Changed at 2013-01-04T21:16:35Z by zooko

  • Keywords research added

comment:97 Changed at 2013-03-03T14:54:31Z by mmoya

  • Cc mmoya@… added

comment:98 Changed at 2013-04-01T20:19:55Z by K1773R

  • Cc K1773R@… added

comment:99 Changed at 2013-04-11T07:13:32Z by ClashTheBunny

  • Cc clashthebunny@… added

comment:100 Changed at 2013-05-01T20:55:41Z by leif

  • Cc leif@… added

comment:101 Changed at 2013-05-07T19:49:15Z by zooko

  • Description modified (diff)

I'd like to get this into trunk ASAP! So it can get thoroughly tested out for Tahoe-LAFS v1.11. If I understand correctly, lebek's notes at comment:92 and our discussion from a weekly dev chat are telling us what next steps to take.

comment:102 Changed at 2013-07-17T16:55:20Z by zooko

#1402 was a duplicate, and there was a patch attached to it by socrates:

attachment:relay.py:ticket:1402

comment:103 Changed at 2013-08-10T15:16:58Z by skydrome

  • Cc skydrome@… added

comment:104 Changed at 2013-08-21T15:43:46Z by psi

  • Keywords i2p-collab added

comment:105 Changed at 2013-12-29T23:29:05Z by jmalcolm

  • Cc tahoe-lafs-trac@… added

comment:106 in reply to: ↑ 90 Changed at 2014-01-11T23:35:11Z by leif

Replying to killyourtv:

In case it's of use: https://github.com/kytvi2p/tahoe-lafs.

I made two branches, one for what I think should be close to 1.8.3 (it's not tagged) and one for 1.9.x (current git).

david415 and I began updating this patch to work with post-1.10 versions of tahoe: https://github.com/leif/tahoe-lafs/commits/ticket68 (tests do not pass yet, but it is connecting to multiple introducers).

Hopefully we'll have a cleaned up patch soon.

comment:107 Changed at 2014-11-24T01:50:20Z by leif

I'm cross-posting this comment to #68 and #467.

Here is a squashed commit of the multi-introducer and introducerless patches on top of the current master: https://github.com/leif/tahoe-lafs/compare/master...introless-multiintro-squashed

And here is a 3-way merge combining the history of both feature branches with master in such a way that git log and git blame can still find the original commits: https://github.com/leif/tahoe-lafs/compare/master...introless-multiintro-with-history (creating this was a git adventure; I ended up doing the 3-way merge using -s ours and then doing another squash merge followed by git commit --amend)

I'm going to write more tests before submitting a pull request with one of these. But, if anyone wants to review or test it now I'd appreciate it!

Last edited at 2016-01-15T19:39:01Z by daira (previous) (diff)

comment:108 Changed at 2015-02-09T11:09:34Z by lpirl

  • Cc tahoe-lafs.org@… added

comment:109 Changed at 2015-03-28T21:36:59Z by L29Ah

  • Cc zl29ah@… added

comment:110 Changed at 2016-01-07T06:52:06Z by leif

Here is the latest introducerless/multi-introducer patch: https://github.com/leif/tahoe-lafs/commit/1ae5aaecbb68f13019b6bc2ba4632bb4a5623aaa (that is a squash merge on top of two other commits which will hopefully land on master soon).

It should perhaps have some more tests, but testing/review/feedback would be welcomed.

comment:111 Changed at 2016-01-14T23:40:25Z by daira

I gave some feedback, although it's a huge diff and probably needs more eyes on it.

comment:112 Changed at 2016-01-15T19:37:57Z by daira

  • Milestone changed from soon to 1.10.3
  • Owner changed from lebek to leif

I'm optimistically putting this in the 1.10.3 milestone; it may well get booted out to 1.11.

comment:113 Changed at 2016-01-21T03:26:38Z by leif

Here is the new version, after addressing daira's comments: https://github.com/leif/tahoe-lafs/commit/8fc8cd9151d4dc4c041867bac98aefff6a105729

I *think* this is nearly ready to merge, so more review and/or testing would be appreciated.

The one thing remaining that I think needs to be done is to add some tests to test_web.

comment:114 Changed at 2016-01-31T21:13:56Z by leif

Here is the latest introless-multiintro branch (with full history) with a few more commits since the squashed commit in my previous comment.

I posted a comment about my next steps for this branch on ticket #467.

comment:115 Changed at 2016-03-01T17:21:08Z by daira

  • Milestone changed from 1.10.3 to 1.11.0

Out of time for 1.10.3.

comment:116 Changed at 2016-03-22T05:02:52Z by warner

  • Milestone changed from 1.11.0 to 1.12.0

Milestone renamed

comment:117 Changed at 2016-04-01T22:23:22Z by rvs

  • Cc vladimir@… added

comment:118 Changed at 2016-06-28T18:20:37Z by warner

  • Milestone changed from 1.12.0 to 1.13.0

moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders

comment:119 Changed at 2016-09-08T10:55:28Z by dawuud

i've got this dev branch where i added init_introducer_clients: https://github.com/david415/tahoe-lafs/tree/68.multi_intro.0

comment:120 Changed at 2016-09-09T13:36:32Z by dawuud

in the above dev branch i've gotten all the unit tests to pass... so i opened this pull-request here: https://github.com/tahoe-lafs/tahoe-lafs/pull/338

please review

Last edited at 2016-09-09T13:56:21Z by dawuud (previous) (diff)

comment:121 Changed at 2016-09-09T13:56:40Z by dawuud

  • Owner changed from leif to warner

comment:122 Changed at 2016-09-13T00:47:26Z by Brian Warner <warner@…>

In 3b24e7e/trunk:

Merge PR 338 from david415/68.multi_intro.0

This enables the use of multiple introducers, via
NODEDIR/private/introducers.yaml . Still needs docs.

refs ticket:68

comment:123 Changed at 2016-09-13T00:47:27Z by Brian Warner <warner@…>

In d802135/trunk:

test introducerless config

refs ticket:68

comment:124 Changed at 2016-09-13T00:47:27Z by Brian Warner <warner@…>

In 2e3ec41/trunk:

document multiintroducer/introducerless config

refs ticket:68

comment:125 Changed at 2016-09-13T00:50:32Z by warner

  • Milestone changed from 1.13.0 to 1.12.0
  • Resolution set to fixed
  • Status changed from new to closed

Ok, at long last, this ticket is done. We didn't implement the cool "gossip" approach, or the limited-flood thing, or the invitation thing. But nodes can now be configured with zero/1/many introducer FURLs (via a combination of tahoe.cfg introducer.furl= and the new NODEDIR/private/introducers.yaml), and servers will announce themselves to all introducers, and clients will merge announcements from all introducers.

Note: See TracTickets for help on using tickets.