Changes between Initial Version and Version 2 of Ticket #1182


Ignore:
Timestamp:
2010-09-11T00:17:35Z (14 years ago)
Author:
davidsarah
Comment:

#1196 is a duplicate.

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #1182

    • Property Keywords cleanup added
    • Property Summary changed from improve asymptotic complexity of Spans and DataSpans to clean up and improve asymptotic complexity of Spans and DataSpans
  • Ticket #1182 – Description

    initial v2  
    66
    77We changed the downloader so that fewer spans were retained in Spans and DataSpans objects, but this limits their potential for reuse, and we haven't checked whether this could still affect the downloader in some situations.
     8
     9Here is a copy of ticket:798#comment:18 :
     10
     11Reviewing [changeset:cbcb728e7ea0031d]:
     12
     13 * the {{{(start, length)}}} form of the {{{[Simple]Spans}}} constructor is not used outside test code (and the test code can be changed to pass a {{{[(start, length)]}}} array). Removing this would slightly simplify the constructor and avoid a possibly error-prone overloading.
     14
     15 * in the {{{Spans}}} class comment, ", frequently used to represent .newsrc contents" is out-of-context and not needed.
     16
     17 * in the {{{_check}}} method of {{{Spans}}}, if assertions are switched on then the {{{self._spans}}} array is re-sorted in order to check whether it is ordered. This is unnecessary: if you add an {{{assert length > 0, length}}} in the loop, then the loop will be checking a condition that is stronger than the array being ordered, given that the starts and lengths are numbers. (The sorting actually takes O(n) time rather than O(n log n) on each call to {{{_check}}}, because [http://bugs.python.org/file4451/timsort.txt Timsort] will detect the ordered input, but it's still unnecessary overhead.)
     18
     19 * the assert statements should include the variables they use in the exception message, e.g. {{{assert start > prev_end, (start, prev_end)}}}.
     20
     21 * "{{{overlap(s_start, s_length, start, length) or adjacent(s_start, s_length, start, length)}}}" is equivalent to "{{{overlap(s_start, s_length, start-1, length+2)}}}".
     22
     23 * in the only other use of {{{adjacent}}} (in {{{DataSpans.add}}}), only the {{{start0 < start1}}} case should be able to occur. Inline and simplify.
     24
     25 * in the loop over {{{enumerate(self._spans)}}}, you could exit early when {{{s_start > start+length}}}. At that point you know where to insert the {{{(start, length)}}} span without having to re-sort.
     26
     27 * a {{{Spans}}} object behaves somewhat like a set of the elements in all of its spans, but the {{{__contains__}}} and {{{__iter__}}} methods are not consistent with that (instead viewing it as a set of {{{(start, length)}}} pairs). I realize this may allow for more convenient use of {{{in}}} and iteration, but it should at least be documented.
     28
     29 * {{{_check}}} and {{{assert_invariants}}} do similar things; give them the same name.
     30
     31 * {{{DataSpans._dump}}} is poorly named.
     32
     33 * {{{DataSpans.assert_invariants}}} should check that none of the data strings are zero-length.
     34
     35 * is it intentional that {{{DataSpans.add}}} calls {{{self.assert_invariants()}}} but {{{remove}}} (and {{{pop}}}, although that's much simpler) don't?
     36
     37 * {{{if s_start <= start < s_end:}}} I find this Python construct too clever by half. Anyway, at this point {{{s_start <= start}}} is always true (otherwise we won't get past the first {{{continue}}}), so I would write this as
     38{{{
     39assert s_start <= start, (s_start, start)
     40if start < s_end:
     41}}}
     42
     43 * Perhaps rename {{{s_*}}} to {{{old_*}}}.
     44
     45 * {{{DataSpans.add}}}: if I understand correctly, case A also covers:
     46{{{
     47    OLD      OLD    OLD    OLD
     48NEW       NEW      NEWW   NEEWW
     49}}}
     50   This isn't immediately clear from the comment. Depicting it as
     51{{{
     52    OLD
     53NEW.....
     54}}}
     55   might help. Also, use uppercase consistently for the cases.
     56
     57 * {{{suffix_len}}} in case D has a different meaning to {{{suffix_len}}} in case E. Maybe rename it to {{{replace_len}}} for case D.
     58
     59Tests:
     60
     61 * The {{{Spans}}} class is tested by {{{ByteSpans}}}, but it just stores spans of integers, not necessarily byte offsets. I would suggest {{{s/ByteSpans/TestSpans/}}} and {{{s/StringSpans/TestDataSpans/}}}.
     62
     63 * I could be wrong, but I don't think the deterministic tests are covering all cases in {{{add}}} and {{{remove}}}. I'd prefer to see those tests have full coverage rather than relying on the randomized tests to make up the gap.