Notes on bitwise CRC32


CRC algorithms calculate remainder from division of a message polynomial by a generator polynomial over GF(2) field. There are many explanations how it works (see wikipedia for example), but there are also practical details omitted that you should care about if you want to obtain an implementation compatible with the standard table implementation.

The standard CRC32 implementation uses reversed bit ordering; in code it means right shifts instead of left shifts. The CRC32-compatible polynomial division implementation is

function PolyModRevers(const Dividend: ByteArray; G: LongWord): LongWord;
  I, L: Cardinal;
  P: PByte;
  B: Byte;
  Carry: Boolean;

  L:= Dividend.Len;
  Result:= 0;
  P:= Dividend.RawData;
  while L > 0 do begin
    B:= P^;
// push B through the Result register
    I:= 8;
      Carry:= Odd(Result);
      Result:= Result shr 1;
      if Odd(B) then
        Result:= Result or $80000000;
      B:= B shr 1;
      if Carry then Result:= Result xor G;
    until I = 0;

The function calculates remainder from division by a generator polynomial G. A generator is stored as 32-bit value; actually a generator for CRC32 algorithms is a 33-bit value, but the senior bit is always 1 and so can be omitted and used implicitly. Since the function uses reverse bit ordering in dividend polynomial, the generator polynomial should also be bit-reversed (for the standard CRC32 algorithm the bit-reversed generator is 0xEDB88320).

The dividend for a general CRC32 algorithm is obtained by appending a message with 4 zero bytes and optionally prefixing with some byte sequence. I did not find the prefix for standard CRC32 algorithm using Google, and calculated it by using brute force:

procedure Find;
  Dividend: ByteArray;
  N, R: LongWord;

  N:= 0;
    Dividend:= ByteArray.FromInt(N, SizeOf(N), True) + ByteArray.Allocate(4, 0);
    R:= PolyModRevers(Dividend, $EDB88320);
    if R = $FFFFFFFF then begin
      Writeln('N: ', Dividend.ToHex);
    if N and $ffffff = 0 then writeln(n);
  until N = 0;

The function takes time to obtain the result, so it outputs some values during the search; the final result is funny magic number $62F52692.

Putting it all together, the bitwise CRC32 code compatible with reference table implementation is:

function CalcBitCRC32(const Msg: ByteArray): LongWord;
  Dividend: ByteArray;
  Prefix: LongWord;

  Prefix:= $62F52692;
  Dividend:= ByteArray.FromInt(Prefix, SizeOf(Prefix), True) + Msg + ByteArray.Allocate(4, 0);
  Result:= not PolyModRevers(Dividend, $EDB88320);

BigInteger redux


TForge 0.72 is released.

What’s new:

  • Big integer arithmetic is included in TForge package; that is BigCardinal and BigInteger classes are now available after installing TForge package, without dll. Old dll implementation (Numerics 0.58) is still available but is not developed for now.
  • More demo projects added.
  • Improved FPC/Lazarus support; TCiphers package and demos are ported to FPC/Lazarus.
  • Several new methods are added to ByteArray class.

More Ciphers


TForge 0.71 is released.

What’s new:

  • TCipher.Salsa20.SetNonce bug fixed
  • added 3DES block cipher algorithm
  • added ChaCha20 stream cipher algorithm

Examples of creating new cipher instances:

  Cipher: TCipher;

  Cipher:= TCipher.TripleDES;   // 3DES
  Cipher:= TCipher.ChaCha20;    // 20-rounds ChaCha
  Cipher:= TCipher.ChaCha20(12) // 12-rounds ChaCha

ChaCha20 (aka ChaCha) is Salsa20 variant which is becoming popular after Google has selected ChaCha20 as a replacement for RC4 in communication protocols.

Salsa20 support explained


Salsa20 is a stream cipher; like any stream cipher Salsa20 encrypts/decrypts data by xor‘ing a plaintext/ciphertext with a pseudorandom keystream.

Salsa20 generates keystream by hashing 64-byte (512-bit) blocks; the block consists of 4 parts:

  • fixed “magic words” – 16 bytes
  • key – 32 bytes
  • nonce (message number) – 8 bytes
  • block number – 8 bytes

This simple structure allows to generate 64-byte blocks of keystream independently; to generate an arbitrary block of the keystream one need to set the block number in the cipher’s state and hash the 64-byte block.

TCipher class supports Salsa20 design by following methods:

function TCipher.SetIV(const AIV: ByteArray): TCipher;
function TCipher.SetNonce(Value: UInt64): TCipher;
function TCipher.Skip(Value: UInt64): TCipher;

TCipher class considers concatenated nonce and block number as initialization vector (IV). The TCipher.SetIV sets both the nonce and the block number in the cipher instance state. [Corrected – changed in ver 0.71]

The TCipher.SetIV  and TCipher.SetNonce methods do the same – set the nonce field in the the cipher instance state and clear the block number field; the difference is that TCipher.SetIV accepts the parameter as a byte array while TCipher.SetNonce as an integer.

The TCipher.SetIV method was introduced mainly for testing purposes.

The purpose of nonce is to serve as a unique message number; a nonce need not be kept secret, but it should never repeat during the lifetime of a secret key.

As an example suppose we want to encrypt several files by a single secret key; the right way to do it is to encrypt each file with a unique nonce value, like this:

procedure EncryptFiles(const Key: ByteArray; FileNames: array of string);
  Cipher: TCipher;
  Nonce: UInt64;
  I: Integer;

  Nonce:= 0;
  for I:= 0 to High(FileNames) do begin
    Cipher:= TCipher.Salsa20;
            .EncryptFile(FileNames[I], FileNames[I] + '.salsa20');

The try .. finally block ensures that the sensitive key data in the cipher instances’ states is destroyed.

The TCipher.Skip method increments the block number field in the cipher instance state; its purpose is to implement parallel encryption/decryption as was demonstrated in the previous post.

TCiphers – Multithreading Support.


TCiphers supports encryption/decryption in multiple threads for stream ciphers and block ciphers in CTR mode of operation.

The idea is simple – to encrypt N bytes of data in m threads you split the data into m parts, each part approximately of N/m bytes size, then create m TCipher instances and encrypt m parts in parallel. The TCiphers methods needed to implement this algorithm are:

  function TCipher.Copy: TCipher;
  function TCipher.Skip(Value: LongWord): TCipher; overload;
  function TCipher.Skip(Value: UInt64): TCipher; overload;
  property TCipher.BlockSize: Cardinal;

The TCipher.Copy duplicates an instance of TCipher, TCipher.Skip(Value) discards Value blocks of the keystream, and TCipher.BlockSize returns the cipher’s block size. These are all ingredients needed for multithreading.

The ThreadDemo project demonstrates how it works. The thread class implementing the encryption receives a cipher instance, an address of the portion of data to be encrypted and its size in the constructor and uses Windows event to signal the main thread when the operation is completed:

unit CipherThreads;


  Windows, Classes, tfCiphers;

  TCipherThread = class(TThread)
    FCipher: TCipher;
    FOrigin: Pointer;
    FSize: LongWord;
    FLast: Boolean;
    FEvent: THandle;
    procedure Execute; override;
    constructor Create(ACipher: TCipher; AOrigin: Pointer; ASize: LongWord;
                       ALast: Boolean; AEvent: THandle);


{ TCipherThread }

constructor TCipherThread.Create(ACipher: TCipher; AOrigin: Pointer;
  ASize: LongWord; ALast: Boolean; AEvent: THandle);
  FCipher:= ACipher;
  FOrigin:= AOrigin;
  FSize:= ASize;
  FLast:= ALast;
  FEvent:= AEvent;
  FreeOnTerminate:= True;
  inherited Create(False);

procedure TCipherThread.Execute;
  DataSize: LongWord;

  DataSize:= FSize;
  FCipher.KeyCrypt(FOrigin^, DataSize, FLast);


The main thread splits the data to be encrypted into 4 portions with the cipher’s block size granularity. Then we create an instance of TCipher (AES and Salsa20 algorithms are tested); the instance will finally be used to check the multi-thread encryption result by the single-thread decryption. Each thread uses its own copy of the cipher’s instance, created by using TCipher.Copy and TCipher.Skip methods. That’s all:

program ThreadDemo;


  CipherThreads in '..\Source\CipherThreads.pas';

procedure TestCipher(Cipher: TCipher);
  DATA_SIZE = 1024 * 1024;
  NThreads = 4;

  PData = ^TData;
  TData = array[0 .. DATA_SIZE - 1] of LongWord;

  Data: PData;

  I: Integer;
  DataSize: Cardinal;
  BlockSize: Cardinal;
  Origin: PByte;
  Chunk, Size: Cardinal;
  Events: array[0 .. NThreads - 1] of THandle;

  GetMem(Data, SizeOf(TData));
// fill the data with some known values
    for I:= 0 to DATA_SIZE - 1 do
      Data[I]:= I;
    BlockSize:= Cipher.BlockSize;
    Origin:= PByte(Data);
    Chunk:= SizeOf(TData) div (NThreads * BlockSize);

// encrypt the data using multiple threads
    for I:= 0 to NThreads - 1 do begin
      if I = NThreads - 1 then
        Size:= SizeOf(TData) - (NThreads - 1) * Chunk * BlockSize
        Size:= Chunk * BlockSize;
      Events[I]:= CreateEvent(nil, False, False, nil);
      TCipherThread.Create(Cipher.Copy.Skip(Chunk * LongWord(I)),
        Origin, Size, I = NThreads - 1, Events[I]);
      Inc(Origin, Chunk * BlockSize);

      WaitForMultipleObjects(NThreads, @Events, True, INFINITE);
      for I:= 0 to NThreads - 1 do

// check the result by decryption
    DataSize:= SizeOf(TData);
    Cipher.KeyCrypt(Data^, DataSize, True);
    for I:= 0 to DATA_SIZE - 1 do
      if Data[I] <> LongWord(I) then
        raise Exception.Create('!! Error -- decryption failed');


  HexKey = '2BD6459F82C5B300952C49104881FF48';
  HexIV  = '6B1E2FFFE8A114009D8FE22F6DB5F876';

    Writeln('=== Running AES Test ..');
                                     CTR_ENCRYPT or PADDING_NONE,

    Writeln('=== Running Salsa20 Test ..');
    Writeln('=== Done ! ===');
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);

This example should work with any stream cipher, but it improves performance only if the stream cipher is parallelazable. The unparallelazable RC4 cipher implements the TCipher.Skip(Value) method by generating Value bytes of a keystream and discarding them, so multitheading with RC4 is useless. On the other hand Salsa20 implementation discards the keystream blocks by incrementing directly the block number field in the cipher instance state, leveraging that the Salsa20 algorithm is parallelazable.



TForge 0.70 is released.

The release adds a new package TCiphers implementing private-key cryptography (block and stream ciphers). The current release supports:

  • Block Ciphers:
    • AES
    • DES
    • RC5
  • Modes of operation
    • ECB
    • CBC
    • CTR
  • Padding methods
    • No padding
    • Zero padding
    • ANSI X.923
    • PKCS
    • ISO 10126
    • ISO/IEC 7816-4
  • Stream Ciphers
    • RC4
    • Salsa20

The main idea behind writing the package is user-friendly Pascal Crypto API, suitable both for writing production code and for learning cryptography. Here is an example (the full source code can be found in the CipherDemo project) demonstrating how to encrypt and decrypt a file using AES cipher and CBC mode of operation in TForge:

procedure CBCFileDemo(const FileName: string;
                      const IV, Key: ByteArray);
  TCipher.AES.ExpandKey(Key, CBC_ENCRYPT, IV)
             .EncryptFile(FileName, FileName + '.aes');
  TCipher.AES.ExpandKey(Key, CBC_DECRYPT, IV)
             .DecryptFile(FileName + '.aes', FileName + '.bak');

The above example requires some explanation. For security reasons TForge don’t keep a copy of a plaintext key in a state of cipher instance if an algorithm supports key expansion (aka “key schedule”) procedure; instead it immediately expands a plaintext key passed as a parameter of ExpandKey method into a form internally used by the algorithm. A key expansion procedure may generally be affected by the encryption direction (encryption/decryption) and mode of operation, that is why TForge requires to set the encryption direction explicitly. If you expand key for decryption (ex using CBC_DECRYPT flag) and then call encryption method (ex EncryptFile), TForge raises exception.

The example is more suitable for crypto exercises. Production code usually follows the next template:

  • create a cipher instance
  • use the cipher instance for encryption or decryption
  • erase the key data from the instance

which can be implemented as follows:

  Cipher: TCipher;

  Cipher:= TCipher.AES.ExpandKey(Key, CBC_ENCRYPT, IV);
    Cipher.EncryptFile(FileName, FileName + '.aes');
//    Cipher.Free;

The instances of the TCipher type support Free method but there is no need to call Free explicitly because an instance is freed automatically when it goes out of scope.

If the padding method is not set explicitly TForge applies “default” padding method which depends on the mode of operation used. Ex for the CBC mode of operation the default padding method is “PKCS”, for the CTR mode of operation the default padding method is “no padding”; if you need say CTR with PKCS padding declare the padding method explicitly:

  Cipher:= TCipher.AES.ExpandKey(Key, CTR_ENCRYPT or PADDING_PKCS, IV);

TCipher type includes several helper methods to keep simple tasks simple; for example to encrypt/decrypt a single 64-bit block by DES cipher with a given key

procedure BlockDemo(const Block, Key: ByteArray);
  CipherText, PlainText: ByteArray;

  CipherText:= TCipher.DES.EncryptBlock(Block, Key);
  PlainText:= TCipher.DES.DecryptBlock(CipherText, Key);
  Assert(PlainText = Block);

the methods EncryptBlock and DecryptBlock keep the key expansion details under the hood.

Another handy pair of TCipher methods is designed to encrypt/decrypt arbitrary-sized data:

procedure CBCDemo(const PlainText, IV, Key: ByteArray);
  CipherText, Plaintext2: ByteArray;

  CipherText:= TCipher.AES.ExpandKey(Key, CBC_ENCRYPT, IV)
  PlainText2:= TCipher.AES.ExpandKey(Key, CBC_DECRYPT, IV)
  Assert(PlainText = PlainText2);

The core TCipher methods for encryption/decryption of arbitrary data are

    procedure TCipher.Encrypt(var Data; var DataSize: LongWord;
                      BufSize: LongWord; Last: Boolean);
    procedure TCipher.Decrypt(var Data; var DataSize: LongWord;
                      Last: Boolean);

The methods are designed for block ciphers but work with stream ciphers too. The Last argument defines whether the current portion of data is the last or not. For the last block the padding should be applied, and consequently the size of output data can differ from the size of input data. Generally the size of output data is larger for encryption and lesser for decryption; the difference can’t exceed the cipher’s block size. After the data is processed, the DataSize parameter contains the size of output data; the BufSize should be large enough to hold the output data, otherwise an exception is raised.

Stream ciphers expand a random key into a pseudorandom keystream and encrypt/decrypt by calculating xor of the keystream and the input data, no padding involved, so the size of output data is always equal to the size of input data. This operation is implemented by the method TCipher.KeyCrypt:

  procedure TCipher.KeyCrypt(var Data; DataSize: LongWord;
                             Last: Boolean);

which is the preferable method for encryption/decryption with a stream cipher.

The Last argument defines whether the current portion of data is last or not; if Last is false then the DataSize should be multiple of the cipher’s block size.

Legacy stream ciphers (RC4) have no block structure inside, so the “block size” for RC4 is equal to 1 (byte). On the other hand Salsa20 generates a keystream in 64-byte (that is 512-bit) blocks, sort of having a block cipher under the hood, so the notion of block size applies both to block and stream ciphers.

Any secure cipher can be used as a secure pseudorandom generator; the pseudorandom data can be generated by the method

  function TCipher.KeyStream(DataSize: LongWord): ByteArray;

The seed of the pseudorandom generator is a random key.

Block ciphers implement TCipher.KeyCrypt and TCipher.KeyStream methods by applying the CTR mode of operation without padding, which converts a block cipher into a stream cipher. Ex the right way to use the AES algorithm for generating pseudorandom integer values is to expand a random key for CTR mode without padding and call TCipher.KeyStream:

procedure RandDemo(const Key: ByteArray);
  Cipher: TCipher;
  I: Integer;
  Rand: LongWord;

  Cipher:= Cipher.AES.ExpandKey(Key, CTR_ENCRYPT);
  for I:= 0 to 9 do begin
    Rand:= LongWord(Cipher.KeyStream(SizeOf(Rand)));
    Writeln(I:3, ': ', Rand);

The RC5 block cipher algorithm is fully implemented as it was originally described, including 32- and 128-bit sized blocks. An instance of RC5 cipher can be created either as “standard” (8-byte block size, 12 rounds – just RC5) or “extended” (4-, 8- or 16-byte block size, 1..255 rounds – RC5(BlockSize, Rounds)):

procedure RC5BlockDemo(const Block, Key: ByteArray);
  CipherText, PlainText: ByteArray;

  CipherText:= TCipher.RC5(Block.Len, 20)
                      .EncryptBlock(Block, Key);
  PlainText:= TCipher.RC5(Block.Len, 20)
                     .DecryptBlock(CipherText, Key);
  Assert(PlainText = Block);

RC4 is a simple non-parallelizable stream cipher, widely used in today’s communication protocols but not recommended to use in new products because it is not considered secure now. Here is an example of file encryption/decryption using RC4:

procedure RC4FileDemo(const FileName: string; const Key: ByteArray);
             .EncryptFile(FileName, FileName + '.rc4');
             .DecryptFile(FileName + '.rc4', FileName + '.bak');

The example uses TCipher.Skip method to discard the first 1536 bytes of RC4 keystream [recommended for security reasons in RFC4345] prior to encryption/decryption. The parameter of TCipher.Skip method is the number of cipher blocks to be discarded; since RC4 has no block structure the “block” of RC4 is 1 byte.

Salsa20 is one of the eStream portfolio stream ciphers, modern, fast and parallelizable. The instances of Salsa20 can be created as “standard” (20-rounds) or “non-standard” (any even number of rounds from 2 to 254):

    class function TCipher.Salsa20: TCipher;
    class function TCipher.Salsa20(Rounds: LongWord): TCipher;

I will tell more about Salsa20 support in the next post dedicated to multithreading with TCiphers.