The Painless Guide to CRC isn't quite painless

* * * * *

The Painless Guide to CRC isn't quite painless

So there's this “A Painless Guide to CRC Error Detection Algorithms [1],”
which apparently tells you everything you wanted to know about CRCs, but were
afraid to ask (or didn't really want to ask). The concept itself seems to be
easy—a CRC is just the remainder of a particular type of division. The
numerator is the data (treated as one long number) and the demoninator the
“polynomial” of the CRC (even though it's a value, the specification for a
given CRC is a polynomial equation—go figure). The steps of the algorithm are
very simple:

1. Load the remainder with zero bits.
2. Augment the message by appending zero bits (equal to the size of the
remainder) to the end of it.
3. While (more message bits)
3. 1. Shift the remainder left by one bit, reading the next bit of the
augmented message into bit position 0 of the remainder.
3. 2. If a 1 bit popped out of the remainder during the previous step,
XOR the result with the polynomial

4. We now have the remainder.

The size of the remainder is the size of our CRC—16 for a 16-bit CRC, 32 for
a 32-bit CRC, etc. And from that description, the code pretty much follows:

> /**********************************************************
> * Straightforward CRC implementation
> * Section 8 of the Guide
> ***********************************************************/
>
> uint32_t crcsim(uint32_t crc,const uint8_t *p,size_t size)
> {
> size_t i;
> int c;
> int xor;
>
> while(size--)
> {
> c = *p++;
> for (i = 0 ; i < 8 ; i++)
> {
> xor = crc & 0x80000000uL; /* flag if we need to xor the polynomial */
> crc = crc << 1; /* shift the crc register */
> crc |= (c & 0x80) ? 1 : 0; /* and shift in the data bit */
>
> if (xor)
> crc ^= 0x04C11DB7uL;
>
> c = c << 1; /* shift data bit */
> }
> }
>
> return crc;
> }
>

Although, this is the rare case where it's easier to write it in assembly
that it is in C, since we have access to the carry bit when shifting, which
makes it easier to check:

> crc32a push ebp ; boiler plate code for any
> mov ebp,esp ; C callable code
> push ebx
> push esi
> push edi
>
> mov edx,[ebp + 8] ; load passed in CRC
> mov esi,[ebp + 12] ; ptr to data
> mov ecx,[ebp + 16] ; count of data
> mov edi,0x04C11DB7 ; CRC polynomial
>
> .main lodsb ; read next data byte
> mov bl,8 ; # of bits to go through
>
> .10 shl al,1 ; shift data bit into poly register
> rcl edx,1 ; shift high bit out of poly register
> jnc .15 ; if it wasn't set, skip
> xor edx,edi ; xoring the CRC polynomial
>
> .15 dec bl ; more bits?
> jnz .10 ; if so, keep going
> loop .main ; do next byte
>
> mov eax,edx ; return CRC
> pop edi ; save pushed registers
> pop esi
> pop ebx
> pop ebp
> ret
>

(Note: the core of the algorithm in assembly is four instructions---the C
compiler didn't do quite as good a job from the six lines of C comprising the
core of the algorithm---it's about six times the object code.)

What is not shown in the code above (either version) is the agumentation step
of adding additional 0-bits to the message—that's left up to the caller of
these routines.

Both of these routines give the same result. Other implementations I did
based upon the Guide also give the same results. And they're consistent with
the results of the reference code given in the Guide.

> /*******************************************************************
> * Table implementation
> * Section 9 of the Guide
> * Requires trailing zero bits.
> ********************************************************************/
>
> const uint32_t crctable[256] = { ... };
>
> uint32_t crc32z(uint32_t crc,const uint8_t *p,size_t size)
> {
> while(size--)
> crc = ((crc << 8) | *p++) ^ crctable[crc >> 24];
>
> crc = (crc << 8) ^ crctable[crc >> 24];
> crc = (crc << 8) ^ crctable[crc >> 24];
> crc = (crc << 8) ^ crctable[crc >> 24];
> crc = (crc << 8) ^ crctable[crc >> 24];
> return crc;
> }
>
> /*****************************************************************
> * Table implementation part deux---improved
> * Section 10 of the Guide
> * Does not need trailing zero bits.
> ******************************************************************/
>
> uint32_t crc32c(uint32_t crc,const uint8_t *p,size_t size)
> {
> while(size--)
> crc = (crc << 8) ^ crctable[ (crc >> 24) ^ *p++];
>
> return crc;
> }
>
> /*****************************************************************
> * Parameterized Model, from code given in the Guide
> * Section 15 of the Guide
> *
> * This is a reference implementation provided by the Guide to be
> * used for testing various CRC implementations.
> ******************************************************************/
>
> uint32_t crcmod(uint32_t crc,const uint8_t *p,size_t size)
> {
> cm_t ctx;
>
> ctx.cm_width = 32;
> ctx.cm_poly = 0x04C11DB7uL;
> ctx.cm_init = crc;
> ctx.cm_refin = false;
> ctx.cm_refot = false;
> ctx.cm_xorot = 0;
>
> cm_ini(&ctx);
> cm_blk(&ctx,(p_ubyte_)p,(ulong)size);
> return (uint32_t)cm_crc(&ctx);
> }
>

Table: CRC of “123456789” using different implementations
Implementation CRC result
------------------------------
crcsim() 89A1897F
crc32z() 89A1897F
crc32c() 89A1897F
crcmod() 89A1897F

So far so good. But that isn't the result from the standard CRC-32
implementation, which is used by Ethernet, ZIP, gzip, PNG and a few other
standards. No, CRC-32 uses what the Guide calls a “reflected” table mode,
which came about because of hardware CRC-implementations start with the least
significant bit of the byte; these algorithms start with the most significant
bit of the byte.

Okay, so the bits are fed in backwards. That can be compensated for. Also,
the standard CRC-32 algorithm mandates that the initial value of the
remainder is all one bits, not zero bits. Easy to fix. And that the final
remainder is to be exclusived-or'ed with all ones. Again, easy to do.

It all seems pretty straightforward. And while the Guide only goes over a
table inplementation of the “reflected” mode, it seems like it would be
straightforward (excuse the pun) to do reflected versions of all the
implementations done so far.

And since the zlib [2] library uses the CRC-32, we can link that in as a
baseline to compare results.

So, with that out of the way, the code:

> /**********************************************************
> * Straightforward CRC implementation, using reflected bytes
> * based on Section 9 of the Guide
> ***********************************************************/
>
> uint32_t crcsimr(uint32_t crc,const uint8_t *p,size_t size)
> {
> size_t i;
> int c;
> int xor;
>
> crc = ~crc;
> while(size--)
> {
> c = *p++;
> for (i = 0 ; i < 8 ; i++)
> {
> xor = crc & 0x80000000uL;
> crc = crc << 1;
> crc |= (c & 0x01) ? 1 : 0;
>
> if (xor)
> crc ^= 0x04C11DB7uL;
>
> c = c >> 1;
> }
> }
>
> return crc;
> }
>
> /*******************************************************************
> * Table implementation, using a reflected table
> * based on Section 9 of the Guide
> ********************************************************************/
>
> const uint32_t crctabler[256] = { ... };
>
> uint32_t crc32rz(uint32_t crc,const uint8_t *p,size_t size)
> {
> crc = ~crc;
>
> while(size--)
> crc = ((crc >> 8) | *p++) ^ crctabler[ crc & 0xFF ];
>
> crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
> crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
> crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
> crc = (crc >> 8) ^ crctabler[ crc & 0xFF ];
>
> return ~crc;
> }
>
> /*****************************************************************
> * Table implementation part deux---improved using reflected table
> * based on Section 10 of the Guide
> ******************************************************************/
>
> uint32_t crc32r(uint32_t crc,const uint8_t *p,size_t size)
> {
> crc = ~crc;
>
> while(size--)
> crc = (crc >> 8) ^ crctabler[ (crc & 0xFF) ^ *p++];
>
> return ~crc;
> }
>
> /*****************************************************************
> * Parameterized Model, from code given in the Guide
> * Section 15 of the Guide
> *
> * This is a reference implementation provided by the Guide to be
> * used for testing various CRC implementations.
> ******************************************************************/
>
> uint32_t crcmodr(uint32_t crc,const uint8_t *p,size_t size)
> {
> cm_t ctx;
>
> ctx.cm_width = 32;
> ctx.cm_poly = 0x04C11DB7uL;
> ctx.cm_init = ~crc;
> ctx.cm_refin = true;
> ctx.cm_refot = true;
> ctx.cm_xorot = ~0;
>
> cm_ini(&ctx);
> cm_blk(&ctx,(p_ubyte_)p,(ulong)size);
> return (uint32_t)cm_crc(&ctx);
> }
>

And the results:

Table: CRC of “123456789” using different implemenations, reflected
Implementation CRC result
------------------------------
crcsimr() AF296EBB
crc32rz() 717C74D2
crc32r() CBF43926
crcmodr() CBF43926
zlib.crc32() CBF43926

… um … that was rather unexpected.

I didn't think the code I wrote for reflected CRCs was that unreasonable
based upon the information in the Guide, but I guess I was wrong for some of
them.

Oh, and getting back to the non-reflected code—I didn't initialize the
results properly, nor did I exclusive-or the results. Hopefully, I'll get
CBF43926 when I do that.

Table: CRC of “123456789” using different implementations, non-reflected with proper initialization
Implementation CRC result
------------------------------
crcsim() C8C3A78F
crc32z() C8C3A78F
crc32c() FC891918
crcmod() FC891918

Okay, now I'm horribly confused. There appears to be some missing information
in “A Painless Guide to CRC Error Detection Algorithms.” The GNU Radio
implementation of CRC-32 [3] uses the non-reflective table implementation,
and when I called that, I got back FC891918, so it's consistent with at least
two of the CRC-32 non-reflected implementations. But I'm concerned that the
routines that require additional zero bits aren't the same in this case.
There has to be some subtle difference between the two in this case that I
don't see, and isn't mentioned in the Guide at all.

I did find yet another comprehensive implemenation of CRCs—Danjel McGougan's
universal_crc [4], and every version of the non-reflected CRC-32 it generated
(it generates either bit-oriented code, or several table-driven
implementations based on tradeoffs betweeen speed and memory usage) returned
FC891918 (even it's own bit-oriented version, which isn't the same as the one
described in the Guide).

Another thing I noticed by looking deeply into the abyss that is CRC, is that
my first implementation of CRC-32 is flawed [5]—I don't exclusive-or the
results with all ones at the end. I suspect that the code I based mine on
didn't bother with the exclusive-or when returning the CRC, but instead did
that elsewhere in the codebase. It's not a bug per se, but according to
_Numerical Recipes in C_ [6]:

> Second, one can add (XOR) any M-bit constant K to the CRC before it is
> transmitted … This has the advantage of detecting another kind of erorr
> that the CRC would otherwise not find: deletion of an initial 1 bit in the
> message with spurious insertion of a 1 bit at the end of the block.
>

The result is that there's a type of corruption that I won't catch. This code
was the basis for the CRC implementation in a few programs at work (oops) but
again, I don't think it's an outright show-stopping bug.

At some point, I may go through some of this on paper, one bit at a time, to
see what's going on math-wise with the reflected and non-reflected table
implementations with non-0 initial values.

[1] http://www.ross.net/crc/download/crc_v3.txt
[2] http://www.zlib.net/
[3] http://gnuradio.org/redmine/projects/gnuradio/repository/revisions/1cb52da49230c64c3719b4ab944ba1cf5a9abb92/entry/gr-digital/lib/digital_crc32.cc
[4] http://www.mcgougan.se/universal_crc/
[5] https://github.com/spc476/x-grey/blob/master/common/src/crc32.c
[6] http://www.amazon.com/exec/obidos/ASIN/0521431085/conmanlaborat-20

Email author at [email protected]