#24 closed defect (fixed)

SHA256 failure on NetBSD with multiple segments

Reported by: warner Owned by: bdew
Priority: critical Milestone:
Version: 0.5.1 Keywords: integrity
Cc: bdew, midnightmagic, weidai Launchpad Bug:

Description

In tahoe#738, a user reported
hash-validation failures on a certain sized file on NetBSD/i386. Working
backwards, we were able to isolate the problem to a pycryptopp hash failure,
as demonstrated in the following test:

expected = SHA256("a"*33 + "a"*95).hexdigest()
s = SHA256("a"*33)
s.update("a"*95)
got = s.hexdigest()
assert expected==got, (expected, got)

It would appear that two updates, in which the total number of bytes is a
multiple of 128, and in which the first write is an odd number less than 64,
will cause this bug.

I'll attach a file with a test case to add to the pycryptopp test suite that
should exercise this problem.

This was observed in an 0.5.14 tree, on a box described as "NetBSD fasty
4.99.7 NetBSD 4.99.7 ... i386". The sha.cpp file was compiled with a command
line of: cc -pthread -fno-strict-aliasing -DNDEBUG -O2 -I/usr/include -I/usr/pkg/include -fPIC -I. -I/usr/pkg/include/python2.5 -c cryptopp/sha.cpp -o build/temp.netbsd-4.99.7-i386-2.5/cryptopp/sha.o -w. (note that inline assembly was probably enabled). "cc" was cc (GCC) 4.1.2 20060628 prerelease (NetBSD nb2 20060711), and "as" was GNU assembler 2.16.1.

Attachments (5)

shatest.diff (1.0 KB) - added by warner at 2009-06-30T00:46:31Z.
test case to exercise the problem
lengths.txt (52.8 KB) - added by warner at 2009-06-30T00:47:59Z.
failing test output on NetBSD
mytest2.cpp (1.2 KB) - added by midnightmagic at 2009-07-03T19:44:59Z.
setup.py.diff (1.3 KB) - added by midnightmagic at 2009-07-03T22:07:49Z.
will ALMOST CERTAINLY not work for bdew's machine but I don't have access to python there.
sha.cpp.patch (1.2 KB) - added by weidai at 2009-07-04T17:51:39Z.
fix for sha.cpp

Download all attachments as: .zip

Change History (36)

Changed at 2009-06-30T00:46:31Z by warner

test case to exercise the problem

Changed at 2009-06-30T00:47:59Z by warner

failing test output on NetBSD

comment:1 Changed at 2009-06-30T02:32:10Z by zooko

Good work, guys!

I had a quick look at the current sha256module.cpp and I didn't see any obvious bugs. Do you? I also ran the new pycryptopp.test.test_sha256.SHA256.test_chunksize on my linux workstation yukyuk both normally and under valgrind and it passed. (Not surprising since the automatic tests on the yukyuk builder also pass.)

A-ha! I see that the new test_chunksize fails on Black Dew's Debian buildbot! This is exciting because something else that Black Dew's Debian buildbot and MM's NetBSD machine have in common is that they fail http://allmydata.org/trac/tahoe/ticket/737 . Hopefully there is a common cause of #24 (== http://allmydata.org/trac/tahoe/ticket/738 ).

The next step is probably to see if Crypto++'s own test suite exercises this case, and if it doesn't make it do so, and then run Crypto++'s own test suite on one of the machines where this fails.

Also, Black Dew could try running it under valgrind, like this:

valgrind --suppressions=/usr/lib/valgrind/python.supp python ./setup.py test -s pycryptopp.test.test_sha256.SHA256.test_chunksize

And, in parallel: set up a buildbot for pycryptopp on MM's NetBSD machine (in addition to the one he already has for Tahoe-LAFS).

comment:2 Changed at 2009-06-30T02:52:39Z by zooko

Oh, there is another thing to test -- what if you link to an external libcryptopp.so instead of building the embedded version of Crypto++? Maybe our build process introduces a bug into the resulting Crypto++ object code that isn't present in the upstream build or the Debian or NetBSD builds. I tried both embedded and external on my Linux workstation yukyuk with no difference in behavior. If Black Dew and Midnight Magic could try it on their machines where the tests fail that might be informative.

comment:3 Changed at 2009-06-30T03:25:25Z by zooko

  • Cc bdew midnightmagic added
  • Keywords integrity added
  • Owner set to bdew

adding Cc: bdew, midnightmagic so they will know that there is something they can do to help. Setting 'assigned to' bdew at random. Adding keyword 'integrity'.

comment:4 Changed at 2009-06-30T19:47:17Z by zooko

A-ha! Now Black Dew's buildslave got an internal compiler error in g++ while building Crypto++:

http://allmydata.org/buildbot/builders/BlackDew%20debian-unstable-i386/builds/25/steps/build/logs/stdio

This suggests to me that the machine has hardware problems.

Does MM's machine also have hardware problems? Have either of you used a self-check routine such as "memtest86"? Or compiling your linux kernel in a while [ 1 ] loop overnight?

comment:5 Changed at 2009-07-01T06:50:18Z by bdew

Huh?.. iv'e tried running it under valgrind and it shows an invalid instruction error:

==10192== valgrind: Unrecognised instruction at address 0x5c33c8e.
==10192== Your program just tried to execute an instruction that Valgrind
==10192== did not recognise.  There are two possible reasons for this.
==10192== 1. Your program has a bug and erroneously jumped to a non-code
==10192==    location.  If you are running Memcheck and you just saw a
==10192==    warning about a bad jump, it's probably your program's fault.
==10192== 2. The instruction is legitimate but Valgrind doesn't handle it,
==10192==    i.e. it's Valgrind's fault.  If you think this is the case or
==10192==    you are not sure, please let us know and we'll try to fix it.
==10192== Either way, Valgrind will now raise a SIGILL signal which will
==10192== probably kill your program.
==10192==
==10192== Process terminating with default action of signal 4 (SIGILL)
==10192==  Illegal opcode at address 0x5C33C8E
==10192==    at 0x5C33C8E: CryptoPP::SHA256::HashMultipleBlocks(unsigned int const*, unsigned int) (sha.cpp:434)
==10192==    by 0x5C048D6: CryptoPP::IteratedHashBase<unsigned int, CryptoPP::HashTransformation>::Update(unsigned char const*, unsigned int) (iterhash.cpp:55)
==10192==    by 0x5C3A94E: SHA256_update(SHA256*, _object*) (sha256module.cpp:49)
==10192==    by 0x80CE880: PyEval_EvalFrameEx (ceval.c:3600)
==10192==    by 0x80CF89C: PyEval_EvalFrameEx (ceval.c:3698)
==10192==    by 0x80D00C4: PyEval_EvalCodeEx (ceval.c:2875)
==10192==    by 0x8116C7D: function_call (funcobject.c:517)
==10192==    by 0x805D4B6: PyObject_Call (abstract.c:1861)
==10192==    by 0x80CD5B9: PyEval_EvalFrameEx (ceval.c:3892)
==10192==    by 0x80D00C4: PyEval_EvalCodeEx (ceval.c:2875)
==10192==    by 0x8116B90: function_call (funcobject.c:517)
==10192==    by 0x805D4B6: PyObject_Call (abstract.c:1861)
==10192==
==10192== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2190 from 9)
==10192== malloc/free: in use at exit: 9,068,451 bytes in 13,510 blocks.
==10192== malloc/free: 279,589 allocs, 266,079 frees, 66,658,102 bytes allocated.
==10192== For counts of detected errors, rerun with: -v
==10192== searching for pointers to 13,510 not-freed blocks.
==10192== checked 9,053,816 bytes.
==10192==
==10192== LEAK SUMMARY:
==10192==    definitely lost: 0 bytes in 0 blocks.
==10192==      possibly lost: 156,820 bytes in 425 blocks.
==10192==    still reachable: 8,911,631 bytes in 13,085 blocks.
==10192==         suppressed: 0 bytes in 0 blocks.

Without valgrind it runs and fails the tests... no crash.

Regarding possible hardware problems - I'll check more into it, never saw gcc crashes on it before, but this computer is pretty old and could develop some hardware problems. Also it doesn't have ECC on memory so it could just be some cosmic particle flying around deciding to flip some bit ;)

comment:6 Changed at 2009-07-01T07:27:46Z by bdew

I've missed the actual instruction from my previous post:

vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x6F 0x1

Which maps to:

MOVDQA 66 0F 6F /r xmm1, xmm2/m128 Move aligned double quadword from xmm2/m128 to xmm1

Looks like valgrind doesnt like SSE2 stuff?

comment:7 Changed at 2009-07-01T07:56:17Z by bdew

Aha! Now i built it without SSE2 support (adding CRYPTOPP_DISABLE_SSE2) and the test passes (and valgrind too):

==26122== Memcheck, a memory error detector.
==26122== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==26122== Using LibVEX rev 1884, a library for dynamic binary translation.
==26122== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==26122== Using valgrind-3.4.1-Debian, a dynamic binary instrumentation framework.
==26122== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==26122== For more details, rerun with: -v
==26122==
running darcsver
running test
running egg_info
writing requirements to pycryptopp.egg-info/requires.txt
writing pycryptopp.egg-info/PKG-INFO
writing top-level names to pycryptopp.egg-info/top_level.txt
writing dependency_links to pycryptopp.egg-info/dependency_links.txt
writing manifest file 'pycryptopp.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-i686-2.5/pycryptopp/_pycryptopp.so -> pycryptopp
test_chunksize (pycryptopp.test.test_sha256.SHA256) ... ok

----------------------------------------------------------------------
Ran 1 test in 10.929s

OK
==26122==
==26122== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3736 from 9)
==26122== malloc/free: in use at exit: 4,983,817 bytes in 906 blocks.
==26122== malloc/free: 325,282 allocs, 324,376 frees, 71,999,912 bytes allocated.
==26122== For counts of detected errors, rerun with: -v
==26122== searching for pointers to 906 not-freed blocks.
==26122== checked 5,115,828 bytes.
==26122==
==26122== LEAK SUMMARY:
==26122==    definitely lost: 0 bytes in 0 blocks.
==26122==      possibly lost: 66,612 bytes in 181 blocks.
==26122==    still reachable: 4,917,205 bytes in 725 blocks.
==26122==         suppressed: 0 bytes in 0 blocks.
==26122== Rerun with --leak-check=full to see details of leaked memory.

comment:8 Changed at 2009-07-01T08:43:12Z by bdew

And even more fun: Using an SSE2-less build of pycryptopp makes tahoe pass all the tests without crashes/hangs (http://allmydata.org/trac/tahoe/ticket/737):

Ran 599 tests in 395.987s

PASSED (skips=3, expectedFailures=2, unexpectedSuccesses=7, successes=587)

So i guess there is a bug in either Crypto++ SSE code or my CPU SSE implementation that somehow also makes python's time() return NaNs? or segfault.

FWIW this system CPU is:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 2600+
stepping        : 0
cpu MHz         : 1916.490
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up

comment:9 Changed at 2009-07-01T09:10:58Z by bdew

Now it hit me - AMD AthlonXP doesn't even support SSE2, only the base SSE - why is MOVDQA even being used? and why doesn't it outright crash with an invalid opcode when not in valgrind?

Also for completeness tested with --disable-embedded-cryptopp - I'm getting the same erros from valgring/failed test/hanging tahoe test.

comment:10 Changed at 2009-07-01T12:43:14Z by zooko

So when you try with --disable-embedded-cryptopp, what libcryptopp.so are you linking against? A Debian-packaged one or one you built yourself?

comment:11 Changed at 2009-07-01T13:26:41Z by zooko

When a standard Crypto++ source tree is built (i.e., not the special embedded build in pycryptopp) then it produces a file named cryptest.exe (even on Unix). If you run ./cryptest.exe v then it will execute the Crypto++ self-tests. (If you run ./cryptest.exe b then it will execute the benchmarks.)

I don't know yet whether the self-tests exercise the same paths of hashing, as Brian described in the original issue report:

"two updates, in which the total number of bytes is a multiple of 128, and in which the first write is an odd number less than 64, will cause this bug."

comment:12 follow-up: Changed at 2009-07-01T13:57:43Z by zooko

Wei Dai added optimized assembly for SHA-256 for amd64 in the Crypto++ v5.6.0 release. Perhaps a bug was added in that change. Isn't MOVDQA an SSE instruction (not requiring SSE2)?

comment:13 in reply to: ↑ 12 Changed at 2009-07-01T15:49:47Z by bdew

According to http://en.wikipedia.org/wiki/X86_instruction_listings - MOVDQA belongs to SSE2.

However, according to http://mubench.sourceforge.net/results.html - "this cpu does not support SSE2, but appears to run some SSE2 opcodes that have corresponding MMX opcodes without error".

comment:14 Changed at 2009-07-01T23:53:29Z by bdew

I've downloaded and built crypto++ from source, all the tests succeed in cryptest.exe

comment:15 Changed at 2009-07-02T20:47:35Z by zooko

See also the note on http://allmydata.org/trac/tahoe/ticket/738#comment:12 -- Black Dew's buildslave incurred an internal compiler error on g++. I think that this indicates flaky hardware, but I suppose it could also be caused by a bug in the CPU which is being exercised by Crypto++?

comment:16 Changed at 2009-07-03T06:32:54Z by midnightmagic

It would be interesting if it were flakey hardware, since the same problem happens pretty uniformly across otherwise stable and independent machines that have similarities: they are x86, pretty old CPUs, etc. (Like, one is crazy rock-solid stable up for 600+ super busy days machinery, so solid I run a strata 2 ntp on it.)

I downloaded and built crypto++ manually, and running the tests as bdew did I found 100% success so long as I kept the /dev/random entropy pool fed. (Else it would fail because it would wait too long for random data to appear.)

I will attempt these other things that bdew seems so facile with. :)

comment:17 follow-up: Changed at 2009-07-03T19:44:30Z by midnightmagic

I have written a test program for crypto++ which demonstrates the same problem for a SHA256: the results differ based on whether the input is split up on an odd boundary or not. I have compiled my crypto++ library with and (I believe) without the optimized assembly routines.

I will add my program to this ticket.

Changed at 2009-07-03T19:44:59Z by midnightmagic

comment:18 in reply to: ↑ 17 Changed at 2009-07-03T19:51:28Z by midnightmagic

Replying to midnightmagic:

I have written a test program for crypto++ which demonstrates the same problem for a SHA256:
the results differ based on whether the input is split up on an odd boundary or not. I have
compiled my crypto++ library with and (I believe) without the optimized assembly routines.

I lied. Manually disabling the optimized assembly makes my sample program output the correct results. Here are the details:

g++ -DCRYPTOPP_DISABLE_X86ASM -DNDEBUG -g -O2 -DCRYPTOPP_DISABLE_SSSE3 -pipe -I. -c mytest2.cpp && g++ -o mytest2 mytest2.o -L. -lcryptopp && ./mytest2

will work and results in:

{{{ STRING: aaa:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (etc)
SHA256: 6836CF13BAC400E9105071CD6AF47084DFACAD4E5E302C94BFED24E013AFB73E
STRING: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (etc)
SHA256: 6836CF13BAC400E9105071CD6AF47084DFACAD4E5E302C94BFED24E013AFB73E }}}

Just build crypto++ and then plunk mytest2.cpp in the same directory as the libcryptopp.a was built in, and run the above.

g++ -DNDEBUG -g -O2 -DCRYPTOPP_DISABLE_SSSE3 -pipe -I. -c mytest2.cpp && g++ -o mytest2 mytest2.o -L. -lcryptopp && ./mytest2

will NOT work, and results in:

{{{ STRING: aaa:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (etc)
SHA256: D159F4747C64249633829F8355DC84678A67CCCE92ABC8F770D33111AFB43E42
STRING: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (etc)
SHA256: 6836CF13BAC400E9105071CD6AF47084DFACAD4E5E302C94BFED24E013AFB73E }}}

comment:19 Changed at 2009-07-03T20:15:12Z by zooko

Way to go, MM (and BD)! I will report this to the Crypto++ maintainers. In the meantime, we should probably think about disabling optimized assembly (at least on some platforms?) so that we can go ahead with the Tahoe-1.5 release. Do you have a platform were the optimized assembly *does* work so you can see how much different it makes in performance?

Thanks.

comment:20 Changed at 2009-07-03T20:22:32Z by zooko

I posted to Crypto++ list:

http://groups.google.com/group/cryptopp-users/browse_thread/thread/d3fb86b84a4793b7

I wouldn't be surprised if Wei Dai (author of Crypto++) can come up with a fix for this fast enough to go into Tahoe-LAFS v1.5.0. But in case he doesn't, Midnight Magic is writing a patch for pycryptopp's setup.py to turn off optimized assembly on some platforms.

comment:21 Changed at 2009-07-03T22:06:59Z by midnightmagic

I posted there too. Anyway, I've got a diff that allows pycryptopp to succeed with the chunktests Werner built. It's relatively specific, and it seems unfair. The older machinery is that which would most benefit from assembly optimizations. Anyway.. I'll attach the diff in just a moment.

Changed at 2009-07-03T22:07:49Z by midnightmagic

will ALMOST CERTAINLY not work for bdew's machine but I don't have access to python there.

comment:22 in reply to: ↑ description Changed at 2009-07-03T23:43:49Z by weidai

It looks like this might be specific to AMD XP, which I don't have access to. I can't reproduce it on my AMD 64 machine. Can someone give me a shell account on an AMD XP machine so I can debug this? If so, please use http://www.ibiblio.org/weidai/mailme.php to email the login info to me.

comment:23 follow-up: Changed at 2009-07-04T00:12:56Z by weidai

Never mind, I reproduced it on my non-SSE2 Celeron processor. I'm working on a fix.

comment:24 in reply to: ↑ 23 ; follow-up: Changed at 2009-07-04T01:25:12Z by midnightmagic

Replying to weidai:

Never mind, I reproduced it on my non-SSE2 Celeron processor. I'm working on a fix.

Perhaps not so coincidentally, a non-SSE2 Celeron is running on one of the other machines where this problem is also visible.

(Amazing how fast you got on top of this.. thanks eh! I appreciate you helping me breathe more life into what might otherwise be my computer graveyard..)

comment:25 in reply to: ↑ 24 Changed at 2009-07-04T03:05:58Z by weidai

I've attached a fix. Please try it out. I'll explain the bug on the Crypto++ list later if anyone is curious. I'll also add some additional testing code to the next release of Crypto++ to catch this type of bugs.

comment:26 Changed at 2009-07-04T07:37:41Z by bdew

I'm getting a compile error with this patch: (gcc 4.3.3-12, binutils 2.19.1)

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I. -I/usr/include/python2.5 -c cryptopp/sha.cpp -o build/temp.linux-i686-2.5/cryptopp/sha.o -w
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
cryptopp/sha.cpp: Assembler messages:
cryptopp/sha.cpp:435: Error: ambiguous operand size for `dec'
cryptopp/sha.cpp:435: Error: ambiguous operand size for `dec'
cryptopp/sha.cpp:435: Error: ambiguous operand size for `dec'
error: command 'gcc' failed with exit status 1

comment:27 Changed at 2009-07-04T14:53:09Z by zooko

  • Cc weidai added

Adding Cc: weidai, so that he'll be sure to see Black Dew's report of a compiler (assembler) error.

Changed at 2009-07-04T17:51:39Z by weidai

fix for sha.cpp

comment:28 Changed at 2009-07-04T17:52:08Z by weidai

Should be fixed now. Please try again.

comment:29 Changed at 2009-07-05T06:51:01Z by bdew

Ok, the second patch works fine for me and fixes both this and http://allmydata.org/trac/tahoe/ticket/737

comment:30 follow-up: Changed at 2009-07-05T14:00:15Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

Fixed by [20090705135508-92b7f-c3c3ad54fdfc52fa22c1c2fbd70479ba27802952]. Thanks, MM, BD, Wei Dai!

comment:31 in reply to: ↑ 30 Changed at 2009-07-06T20:55:12Z by midnightmagic

Just wanted to chime in and state that trunk as of the patch committed, above, works perfectly on all machines of mine (NetBSD all, all Athlon XP, Celeron, Pentium III) and the problem pycryptopp test now passes perfectly:

test_chunksize (pycryptopp.test.test_sha256.SHA256) ... ok

The following machines now report success, and this apparently with full ASM optimizations:

quickie
NetBSD quickie 4.99.7 NetBSD 4.99.7 (quickie) #0: Tue Jan 2 14:47:23 PST 2007 root@shoggoth:/v/src/sys/arch/i386/compile/quickie i386
CPU: AMD Athlon XP 2500+ (686-class), 1837.62 MHz, id 0x6a0

warp
NetBSD warp 4.0_STABLE NetBSD 4.0_STABLE (warp) #0: Sat Jun 20 23:42:00 PDT 2009 root@shoggoth:/v/src/sys/arch/i386/compile/obj/warp i386
CPU: Intel Celeron (686-class), 1196.44 MHz, id 0x6b1

draco
NetBSD draco 3.99.23 NetBSD 3.99.23 (GENERIC) #0: Thu Jul 27 08:22:27 PDT 2006 root@shoggoth:/usr/netbsd-build/netbsd-3.99.23/i386/OBJ/usr/v/src/sys/arch/i386/compile/GENERIC i386
CPU: Intel Pentium III (686-class), 731.29 MHz, id 0x683

Note: See TracTickets for help on using tickets.