[tahoe-dev] What are the common variable names for erasure coding parameters?

zooko zooko at zooko.com
Fri Dec 21 08:47:19 PST 2007


Folks:

I just noticed, once again, the inconvenience of Brian (and the tahoe  
code base) using "k" for the number of primary shares and "n" for the  
total number of shares, where I (and the zfec library) use "k" for  
the number of primary shares and "m" for the total number of shares.

So I set out to learn what are the most common variable names in use  
in the larger world, to decide whether I think zfec (and I) should  
change to "k" and "n" or Tahoe (and Brian) should change to "k" and "m".

Mojo Nation and Mnet and zfec use "k" for the number of primary  
shares and "m" for the number of total shares, so m-k is the number  
of check shares, and k/m is the "rate" and m/k is the "expansion  
factor".  (We probably inherited this from Doug Barnes who wrote the  
first implementation of erasure coding for Mojo Nation, which at that  
time was called "Information Dispersal" per Rabin, but I'm not sure  
where Doug got his variable names.)

Flud [1] uses "m" for the number of primary shares and "k" for the  
number of check shares, so m+k is the number of total shares.

Luigi Rizzo's influential paper [2] uses "k" for the number of  
primary shares and "n" for the number of total shares.

James Plank's tutorial [3] uses "n" for the number of primary shares  
and "m" for the number of check shares.

Wikipedia [4], uses "n" for the number of required shares (which for  
us is always equal to the number of primary shares) and "r" for the  
rate, so n/r is the number of total shares.

The top two hits on google for "erasure coding" (excluding the  
wikipedia hit) are scientific papers by systems/p2p researchers:   
"Erasure Coding vs. Replication: A Quantitative Comparison" -- Hakim  
Weatherspoon and John D. Kubiatowicz, and "High Availability in DHTs:  
Erasure Coding vs. Replication" -- Rodrigo Rodrigues and Barbara  
Liskov.  These two both use "m" for the number of primary shares and  
"n" for the total number of shares.

Okay, there isn't really a consensus, but "n" is more popular than  
"m" for the total number of shares, so I will start using it and I  
might someday get around to changing the zfec API and docs and code  
from "m" to "n".

Regards,

Zooko

[1] http://www.flud.org/wiki/Story_of_a_File
[2] http://citeseer.ist.psu.edu/rizzo97effective.html
[3] http://citeseer.ist.psu.edu/41070.html
[4] http://en.wikipedia.org/wiki/Erasure_coding



More information about the tahoe-dev mailing list