Integer division by constant

4

If divisor is fixed, the division operation can be replaced by multiplication by a reciprocal; the idea is very simple: x / d = x * (1/d); since d is fixed we can precompute 1/d and use faster multiplication operation. This trick works for integer division, with some additional complications. For example, the next function is identical to division by 5 for all 32-bit dividends:

function Div5(Dividend: LongWord): LongWord;
const
  Mult = $CCCCCCCD;
  PostSh = 2;

asm
        MOV     EDX,Mult
        MUL     EDX
        MOV     EAX,EDX
        SHR     EAX,PostSh
end;

It turns out that 32-bit multiplication constant does not exists for every divisor; in this case it is possible to use 33-bit constant with hidden senior 1 bit:

function Div5_2(Dividend: LongWord): LongWord;
const
  Mult = $9999999A;
  PostSh = 3;

asm
        MOV     ECX,EAX     // save dividend
        MOV     EDX,Mult
        MUL     EDX
        MOV     EAX,ECX     // restore dividend
        ADD     EAX,EDX
        RCR     EAX,1       // because addition could produce carry
        SHR     EAX,PostSh-1
end;

Naively, the multiplication constant ($9999999A) and postshift (3) can be obtained using the following steps:

1. Write 1/5 in binary form: 1/5 = 0.0011 0011 0011 0011 0011 ..

2. Count leading zeroes after decimal point and add 1; this gives the postshift

3. Write the leading significant bits:

1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 ..

4. Round to 33 bits; rounding is not necessarily based on the value of 34th bit only (later we will use a different approach which is free from rounding problem)

1100 1100 1100 1100 1100 1100 1100 1100 1101 0

5. Remove the leading bit and write as hexadecimal to get the multiplication constant:

99 99 99 9A

The second version is more complicated, but good news is that it works for all divisors. Generally, it can be proven that for N-bit division there exists (N+1) bit multiplication constant – for the details see the key article Division by Invariant Integers using Multiplication. The rest of the post is a review of the results obtained in the article cited. I will use 32-bit arithmetic, but the results can be generalized for any bitness.

The multiplication constant and postshift can be found using the following procedure:

procedure GetParams(Divisor: LongWord);
var
  L, N: LongWord;
  Tmp: LongWord;
  M: UInt64;
  Mult: LongWord;

begin
  Assert(Divisor > 1);
  Tmp:= Divisor;
  L:= 0;
  repeat            // count number of significant bits in Divisor
    Tmp:= Tmp shr 1;
    Inc(L);
  until Tmp = 0;
  N:= 1 shl L;      // N = 2^L
  M:= $100000000;   // 2^32
  M:= (M * (N - Divisor)) div Divisor + 1;
  Mult:= LongWord(M);
  Writeln('Mult  = ' + IntToHex(Mult, 8));
  Writeln('Shift = ' + IntToStr(L));
end;

The multiplication constant is not unique. If the constant obtained from the above procedure is odd, it makes sense to increment it by 1 (provided the incremented constant is correct too). The correctness can be checked by

function CheckMult(Divisor, Mult: LongWord): Boolean;
var
  L, N: LongWord;
  Tmp: LongWord;
  P32, MD: UInt64;
  Min, Max: UInt64;

begin
  P32:= $100000000;   // 2^32;
  MD:= (P32 + Mult) * Divisor;
  L:= 0;
  Tmp:= Divisor;
  repeat            // count number of significant bits in Divisor
    Tmp:= Tmp shr 1;
    Inc(L);
  until Tmp = 0;
  N:= 1 shl L;
  Min:= P32 * N;
  Max:= Min + N;
  Result:= (Min <= MD) and (MD <= Max);
end;

The above function gives sufficient condition for correct multiplication constant; it is probably possible that some constant fails the check but still correct, as can be shown by the full search over the dividend range; we will not consider this case.

Optimization 1. If the multiplication constant is even, we can shift it right and fit into 32-bit range, thus obtaining shorter division algorithm; this also decrements the postshift.

As an example consider division by 641. The GetParams procedure returns

Mult = 98F603FF
Shift = 10

Since the multiplication constant is odd, we try incremented constant 98F60400. CheckMult shows that it is correct too; the constant ends with 10 zero bits, which is equal to postshift, so we eliminate the postshift:

function Div641(Dividend: LongWord): LongWord;
const
  Mult = $663D81;  // = $198F60400 shr 10

asm
        MOV     EDX,Mult
        MUL     EDX
        MOV     EAX,EDX
end;

Optimization 2. If the divisor is even, we can preshift both dividend and divisor right, thus decreasing the required precision and obtaining shorter division algorithm again (with the dividend preshift step). For details see the article cited above.

Serial Number System Challenge

6

I stumbled across an interesting link that made me think about a solid serial number system based on strong cryptography. Cryptography discourages systems based on secret algorithms, and relies on open algorithms and secret keys. So let us develop a serial number generation/verification system with the same usability as the one in the linked article but without any secret algorithms.


First, our serial numbers should have the form

XXXX-XXXX-XXXX-XXXX-XXXX

where X – an uppercase english letter A..Z; to prevent user’s confusion let us exclude the letter O which looks like zero, so in the end we have 25 possible letters in 20 positions, that is BigInteger.Pow(25, 20) = $1D6329F1C35CA4BFABB9F561 combinations. Next, we want to work with full bytes; this reduces the possible serial keys to 11 byte-long numbers; also we want to use 2 bytes of serial key as a key checksum; this leaves us with 9 bytes, and we have 9*8 = 72-bit serial keys. That should be strong enough against full keyspace search attack on our system.


Suppose you are a micro-ISV and expecting to sell up to 100 copies of you software; then you need to generate 100 72-bit keys and embed their hashes into the executable (if it will turn out later that you need more copies it is not a problem – just recompile your executable with more keys next time; the same way you can revoke leaked keys – by not including them in the next release).

To derive 72-bit keys I use 128-bit master key and AES encryption algorithm as a pseudorandom function. Note that the 128-bit master key is actually the only secret in the system, everything else is calculated. It is worthwhile to generate the master key, for example, by tossing a coin 128 times.

For hashing I use SHA256 hash function. I also use CRC16 algorithm to calculate the key checksums.


The verification is 2-phase process. First, it converts a serial number in 20-letter format entered by user into a 11-byte serial key, calculates the checksum of the first 9 bytes and compares it with the last 2 bytes (this prevents user from mistyping his serial number). Second, it hashes the 9-byte key and checks that the hash exists in the keyhash table.


And now the challenge. The last TForge release (0.74) includes full source code of console application with the serial number system described above, in the Demos\Challenge subfolder. The key generation code is also included, though it is not used in the application and could be kept secret; the only thing I keep secret is 128-bit master key used.

Build the application with Delphi or Lazarus/Free Pascal.

One of the valid serial numbers is:

AVVH-GJCX-YVWM-EHUE-YMRL

Try to find other valid serial number(s).

Notes on bitwise CRC32

2

CRC algorithms calculate remainder from division of a message polynomial by a generator polynomial over GF(2) field. There are many explanations how it works (see wikipedia for example), but there are also practical details omitted that you should care about if you want to obtain an implementation compatible with the standard table implementation.

The standard CRC32 implementation uses reversed bit ordering; in code it means right shifts instead of left shifts. The CRC32-compatible polynomial division implementation is

function PolyModRevers(const Dividend: ByteArray; G: LongWord): LongWord;
var
  I, L: Cardinal;
  P: PByte;
  B: Byte;
  Carry: Boolean;

begin
  L:= Dividend.Len;
  Result:= 0;
  P:= Dividend.RawData;
  while L > 0 do begin
    B:= P^;
    Inc(P);
// push B through the Result register
    I:= 8;
    repeat
      Carry:= Odd(Result);
      Result:= Result shr 1;
      if Odd(B) then
        Result:= Result or $80000000;
      B:= B shr 1;
      if Carry then Result:= Result xor G;
      Dec(I);
    until I = 0;
    Dec(L);
  end;
end;

The function calculates remainder from division by a generator polynomial G. A generator is stored as 32-bit value; actually a generator for CRC32 algorithms is a 33-bit value, but the senior bit is always 1 and so can be omitted and used implicitly. Since the function uses reverse bit ordering in dividend polynomial, the generator polynomial should also be bit-reversed (for the standard CRC32 algorithm the bit-reversed generator is 0xEDB88320).

The dividend for a general CRC32 algorithm is obtained by appending a message with 4 zero bytes and optionally prefixing with some byte sequence. I did not find the prefix for standard CRC32 algorithm using Google, and calculated it by using brute force:

procedure Find;
var
  Dividend: ByteArray;
  N, R: LongWord;

begin
  N:= 0;
  repeat
    Dividend:= ByteArray.FromInt(N, SizeOf(N), True) + ByteArray.Allocate(4, 0);
    R:= PolyModRevers(Dividend, $EDB88320);
    if R = $FFFFFFFF then begin
      Writeln('N: ', Dividend.ToHex);
      Exit;
    end;
    Inc(N);
    if N and $ffffff = 0 then writeln(n);
  until N = 0;
  Writeln('Failed');
end;

The function takes time to obtain the result, so it outputs some values during the search; the final result is funny magic number $62F52692.

Putting it all together, the bitwise CRC32 code compatible with reference table implementation is:

function CalcBitCRC32(const Msg: ByteArray): LongWord;
var
  Dividend: ByteArray;
  Prefix: LongWord;

begin
  Prefix:= $62F52692;
  Dividend:= ByteArray.FromInt(Prefix, SizeOf(Prefix), True) + Msg + ByteArray.Allocate(4, 0);
  Result:= not PolyModRevers(Dividend, $EDB88320);
end;

BigInteger redux

11

TForge 0.72 is released.

What’s new:

  • Big integer arithmetic is included in TForge package; that is BigCardinal and BigInteger classes are now available after installing TForge package, without dll. Old dll implementation (Numerics 0.58) is still available but is not developed for now.
  • More demo projects added.
  • Improved FPC/Lazarus support; TCiphers package and demos are ported to FPC/Lazarus.
  • Several new methods are added to ByteArray class.

More Ciphers

1

TForge 0.71 is released.

What’s new:

  • TCipher.Salsa20.SetNonce bug fixed
  • added 3DES block cipher algorithm
  • added ChaCha20 stream cipher algorithm

Examples of creating new cipher instances:

var
  Cipher: TCipher;

begin
  Cipher:= TCipher.TripleDES;   // 3DES
  Cipher:= TCipher.ChaCha20;    // 20-rounds ChaCha
  Cipher:= TCipher.ChaCha20(12) // 12-rounds ChaCha

ChaCha20 (aka ChaCha) is Salsa20 variant which is becoming popular after Google has selected ChaCha20 as a replacement for RC4 in communication protocols.

Salsa20 support explained

10

Salsa20 is a stream cipher; like any stream cipher Salsa20 encrypts/decrypts data by xor‘ing a plaintext/ciphertext with a pseudorandom keystream.

Salsa20 generates keystream by hashing 64-byte (512-bit) blocks; the block consists of 4 parts:

  • fixed “magic words” – 16 bytes
  • key – 32 bytes
  • nonce (message number) – 8 bytes
  • block number – 8 bytes

This simple structure allows to generate 64-byte blocks of keystream independently; to generate an arbitrary block of the keystream one need to set the block number in the cipher’s state and hash the 64-byte block.

TCipher class supports Salsa20 design by following methods:

function TCipher.SetIV(const AIV: ByteArray): TCipher;
function TCipher.SetNonce(Value: UInt64): TCipher;
function TCipher.Skip(Value: UInt64): TCipher;

TCipher class considers concatenated nonce and block number as initialization vector (IV). The TCipher.SetIV sets both the nonce and the block number in the cipher instance state. [Corrected – changed in ver 0.71]

The TCipher.SetIV  and TCipher.SetNonce methods do the same – set the nonce field in the the cipher instance state and clear the block number field; the difference is that TCipher.SetIV accepts the parameter as a byte array while TCipher.SetNonce as an integer.

The TCipher.SetIV method was introduced mainly for testing purposes.

The purpose of nonce is to serve as a unique message number; a nonce need not be kept secret, but it should never repeat during the lifetime of a secret key.

As an example suppose we want to encrypt several files by a single secret key; the right way to do it is to encrypt each file with a unique nonce value, like this:

procedure EncryptFiles(const Key: ByteArray; FileNames: array of string);
var
  Cipher: TCipher;
  Nonce: UInt64;
  I: Integer;

begin
  Nonce:= 0;
  for I:= 0 to High(FileNames) do begin
    Cipher:= TCipher.Salsa20;
    Inc(Nonce);
    try
      Cipher.ExpandKey(Key)
            .SetNonce(Nonce)
            .EncryptFile(FileNames[I], FileNames[I] + '.salsa20');
    finally
      Cipher.Burn;
    end;
  end;
end;

The try .. finally block ensures that the sensitive key data in the cipher instances’ states is destroyed.

The TCipher.Skip method increments the block number field in the cipher instance state; its purpose is to implement parallel encryption/decryption as was demonstrated in the previous post.