High Performance Hash Library

1

TForge 0.66 released.

What’s new:

1. More hash algorithms added:

  • SHA224
  • SHA384
  • SHA512

2. MD5 and SHA1 compression routines were rewritten in BASM for CPUX86-WIN32 and CPUX64-WIN64 platforms and times faster now compared to pure pascal code (about 4 times for 32-bit compiler (Delphi XE) and about 3 times for 64-bit compiler (FPC 2.6.2) on my system).

3. Memory allocation bug in HMAC implementation was fixed.

User-friendly Hash Library

9

TForge 0.65 released.

The release features a hash library with fluent coding support. While I was writing the library I was inspired by usability of the Python’s hashlib.

Current release supports:

  • Cryptographic hash algorithms:
    1. MD5
    2. SHA1
    3. SHA256
  • Non-cryptographic hash algorithms:
    1. CRC32
    2. Jenkins-One-At-Time
  • Hash-based MAC algorithm (HMAC)
  • Key derivation algorithms:
    1. PBKDF1
    2. PBKDF2

Let us consider a common problem: calculate MD5 and SHA1 digests of a file. The simplest way to do it is:

program HashFile;

{$APPTYPE CONSOLE}

uses
  SysUtils, Classes, tfTypes, tfHashes;

procedure CalcHash(const FileName: string);
begin
  Writeln('MD5:  ', THash.MD5.UpdateFile(FileName).Digest.ToHex);
  Writeln('SHA1: ', THash.SHA1.UpdateFile(FileName).Digest.ToHex);
end;

begin
  try
    if ParamCount = 1 then begin
      CalcHash(ParamStr(1));
    end
    else
      Writeln('Usage: > HashFile filename');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  Readln;
end.

The above application demonstrates the beauty of fluent coding. The code is compact and clear – you create an instance of a hash algorithm, feed a file data to it, generate resulting digest and convert it to hexadecimal; the instance is freed automatically, no need for explicit call of the Free method.

But the above code is not optimal – it reads a file twice. A better solutions involves some coding:

procedure CalcHash(const FileName: string);
const
  BufSize = 16 * 1024;

var
  MD5, SHA1: THash;
  Stream: TStream;
  Buffer: array[0 .. BufSize - 1] of Byte;
  N: Integer;

begin
  MD5:= THash.MD5;
  SHA1:= THash.SHA1;
  try
    Stream:= TFileStream.Create(FileName,
                                fmOpenRead or fmShareDenyWrite);
    try
      repeat
        N:= Stream.Read(Buffer, BufSize);
        if N <= 0 then Break
        else begin
          MD5.Update(Buffer, N);
          SHA1.Update(Buffer, N);
        end;
      until False;
    finally
      Stream.Free;
    end;
    Writeln('MD5:  ', MD5.Digest.ToHex);
    Writeln('SHA1: ', SHA1.Digest.ToHex);
  finally
    MD5.Burn;
    SHA1.Burn;
  end;
end;

The code also demonstrate the use of Burn method; it is not needed here and could be safely removed with corresponding try/finally block but can be useful in other cases – it destroys all sensitive data in an instance. The use of Burn method is optional – it is called anyway when an instance is freed, but explicit call of the Burn method gives you full control over erasing the sensitive data.

The Free method does not free an instance; it only decrements the instance’s reference count, and since the compiler can create hidden references to the instance the moment when the reference count turns zero and the instance is freed is generally controlled by the compiler.


The use of non-cryptographic hash algorithms has one caveat – since they actually return an integer value the bytes of a digest are reversed. The idiomatic way to get the correct result is to cast the digest to integer type:

  Writeln('CRC32: ',
    IntToHex(LongWord(THash.CRC32.UpdateFile(FileName).Digest), 8));

or else you can reverse the digest’s bytes:

  Writeln('CRC32: ',
    THash.CRC32.UpdateFile(FileName).Digest.Reverse.ToHex);

HMAC algorithm generates digest using a cryptographic hash algorithm and a secret key. Here is an example of calculating SHA1-HMAC digest of a file:

procedure SHA1_HMAC_File(const FileName: string;
                         const Key: ByteArray);
begin
  Writeln('SHA1-HMAC: ',
    THMAC.SHA1.ExpandKey(Key).UpdateFile(FileName).Digest.ToHex);
end;

begin
..
  SHA1_HMAC_File(ParamStr(1),
    ByteArray.FromText('My Secret Key'));

Key derivation algorithms generate keys from user passwords by applying hash algorithms. PBKDF1 applies a cryptographic hash algorithm directly, PBKDF2 uses HMAC. Here are usage examples:

procedure DeriveKeys(const Password, Salt: ByteArray);
begin
  Writeln('PBKDF1 Key: ',
    THash.SHA1.DeriveKey(Password, Salt,
                         10000,   // number of rounds
                         16       // key length in bytes
                         ).ToHex);
  Writeln('PBKDF2 Key: ',
    THMAC.SHA1.DeriveKey(Password, Salt,
                         10000,   // number of rounds
                         32       // key length in bytes
                         ).ToHex);
end;

begin
..
  DeriveKeys(ByteArray.FromText('User Password'),
             ByteArray.FromText('Salt'));

Configuration and Installation

The release contains 2 runtime packages TForge and THashes. You should build them, first TForge, next THashes.

For Delphi users:

The packages are in Packages\DXE subfolder (Delphi XE only).

You should make the folder Source\Include available via project’s search path before you can build the packages. To do it open “Project Options” dialog for each package, set “Build Configuration” to “Base” and replace the path to TFL.inc by your path (depends on where you unpacked the downloaded archive):

Include

Now use the project manager and build the packages in “Debug” and “Release” configurations:

build

Optionally, to enable Ctrl-click navigation in the editor, add paths to the source code units to the Browsing path:
Browse

To make the packages available to an application open the application’s “Project Options” dialog, set “Build Configuration” to “Base” and add the next path to search path:

appconfig

For FPC/Lazarus users:

The packages are in Packages\FPC subfolder.

I don’t know much about Lazarus packages. Here are the package configuration options I used:

Laz2

_lazconfig

Byte Array redux

0

I am continuing to experiment with the coding patterns used in my numerics dll project and releasing TForge 0.60.

First of all TForge is a modern cryptographic library for Delphi and Lazarus/FPC (which I am currently working on) implemented as a set of runtime packages. The current release contains only one package (also named TForge); it is core package of the whole TForge project and is required by other packages (to be released later).

The purpose of the release is to introduce a new type – ByteArray; the type is an enhanced version of the standard RTL TBytes type. If you want to test the ByteArray you need to build tforge package; the release containes tforge packages for Delphi XE and Lazarus.

The release includes ByteArrayDemo console application which demonstrates functionality of the ByteArray type. For example, you can concatenate byte arrays:

var
  A1, A2: ByteArray;

begin
  A1:= ByteArray(1);
  A2:= TBytes.Create(2, 3, 4);
  Writeln('A1 + A2 = ', (A1 + A2).ToString); // 1, 2, 3, 4

perform bitwise boolean operations (xor is most useful):

var
  A1, A2: ByteArray;

begin
  A1:= ByteArray(1);
  A2:= TBytes.Create(2, 3, 4);
  Writeln('A1 xor A2 = ', (A1 xor A2).ToString); // 3 (= 1 xor 2, min array length used)

use fluent coding style (which appears to be very handy once one gets accustomed to it):

begin
  Writeln(ByteArray.FromText('ABCDEFGHIJ').Insert(3,
    ByteArray.FromText(' 123 ')).Reverse.ToText); // JIHGFED 321 CBA

and so on. It was fun to code ByteArray; I am using it more and more now as a TBytes replacement because of better usability.

Why PDWORD is not a pointer to DWORD

2

Once upon a time the Delphi team lead decided that Delphi should contain declarations of C++ types like unsigned long for the sake of better C++ compatibility. So he has written

type
  CppULongInt = LongWord;

No, that is not good thought the team lead. Delphi’s LongWord is a 32-bit type while the C++’s unsigned long is a platform-dependent type. So let us declare CppULongInt as a distinct type instead of an alias type:

type
  CppULongInt = type LongWord;

That is better. Now how about DWORD type? It is declared in Windows headers as

  typedef unsigned long       DWORD;

So for compatibility sake we will declare DWORD as

type
  DWORD = CppULongInt;

Great, thought the Delphi team lead. He call two juniors (let us call them DevA and DevB) and said:

- DevA, create a branch (BranchA) in the Delphi project, declare type DWORD = CppULongInt; and fix all bugs that may appear after;
DevB, create a branch (BranchB) in the Delphi project, declare type PDWORD = ^CppULongInt; and fix all bugs that may appear after;

After you both are ready we will merge the branches and declare PDWORD type as it should be:

type
  PDWORD = ^DWORD;

In a due time the happy DevB came to the Delphi team lead and said: “I did everything as you said Sir!. Now type DWORD = CppULongInt;, and everything is fine!”.

But the second guy DevA was not happy. He said: “Sir, we have tons of code like that:

procedure Foo(var Value: DWord);
begin
  Value:= 0;
end;

var
  D: Longword;

begin
  Foo(D);

If I declare

type
  DWORD = CppULongInt;

Then it does not compile. What should I do?”

-“Nothing”, said the Delphi team lead. “The BranchB will be the main trunk now”.

Disclaimer: the tale was written after reading this SO question

Why unit tests are not always good

17

Unit tests are good to detect most bugs in your code but not all bugs. When you are writing standard unit tests for a class you are doing the following

  • Create a fresh class instance (ex using Setup method in DUnit framework);
  • Run a code under test (usually a single call of a single method) on the instance;
  • Free the instance (ex using TearDown method in DUnit framework).

And this is how unit tests should be written; if your test detects a bug you immediately know the bug’s origin.

The problem with the above scenario is that it is ideal to hide some badly reproducible bugs such as access violation (AV) bugs. To detect such a bad bug with good probability you need something different, probably to do multiple calls of a method on the same instance, or to call different methods in the same test, and this approach is quite opposite to the idea of unit testing.

Numerics 0.57 released (Hashtables, Bugfix)

2

1. The main purpose of the release is to implement hash tables (aka associative arrays) with keys of BigCardinal or BigInteger type; such hash tables are important for cryptographic applications. The hash tables in Delphi are implemented by a generic TDictionary<TKey,TValue> class. The release includes a new unit GNumerics.pas with two helper generic classes TBigIntegerDictionary<TValue> and TBigCardinalDictionary<TValue> specializing the base generic dictionary class. For example to work with a hash table having BigInteger keys and string values you need something like this:

uses GNumerics;

[..]
var
  HashTable: TBigIntegerDictionary<string>;
  A: BigInteger;

[..]
begin
// create hash table instance
  HashTable:= TBigIntegerDictionary<string>.Create;
  A:= BigInteger('1234567890987654321');
  try
// fill hash table with data
    HashTable.Add(A * A, 'some string value');
    [..]
  finally
// destroy hash table
    HashTable.Free;
  end;
end;

2. Some bugs fixed; in particular a nasty access violation bug in BigInteger.ModPow function was fixed.

3. Minor changes in BigCardinal/BigInteger methods.

Link to the download page

Update

Version 0.58 fixes the conversion bug from (U)Int64 to BigInteger/BigCardinal.

RawByteString type explained

0

With the introduction of Unicode support Delphi also introduced magic RawByteString type; the word ‘magic’ here means that you can’t implement a type with RawByteString functionality without hidden compiler support.

A common misunderstanding about the RawByteString type is that instances of RawByteString contain no encoding information and because of this the compiler can’t implement implicit string conversion. That is not true.

First of all the RawByteString is AnsiString (1-byte character size). If you typecast a Unicode string to RawByteString type or a RawByteString string to Unicode type the compiler will always implement string conversion; if the compiler has no information about the ANSI codepage of RawByteString it uses the system codepage for conversion.

So the RawByteString magic is for AnsiStrings only.

When you declare a variable of AnsiString type you also declare a codepage; for example

type
  CyrString = type AnsiString(1251);

var
  S: CyrString;

That means the compiler has static codepage information; if you do not declare codepage the compiler assumes the system codepage. The AnsiString instances also have runtime codepage information (codepage field in the string instance header), but usually the compiler never checks the runtime codepage information and uses static codepage information for string conversions.

The magic of RawByteString type is that the compiler has no static codepage information; it does not mean that a RawByteString instance has no runtime codepage information.

If you typecast an AnsiString instance to RawByteString type no string conversion happens.

The RawByteString type is for ANSI strings’ typecasting only, not for creating instances of the type. To understand how it works let us first consider “an educated abuse” of RawByteString:

type
  string1251 = type AnsiString(1251);
  string1252 = type AnsiString(1252);

var
  S1: string1251;
  S2: string1252;

begin
// just initialize it with some data
  S1:= 'АБВГДЕЙКА';

// no string conversion here;
// the S1 string instance is copied 'as is',
//   with codepage information
  S2:= RawByteString(S1);

Now we have an instance (S2) of string1252 type containing data in ANSI 1251 encoding and runtime codepage 1251. But since the compiler normally uses static codepage information the subsequent use of the instance may produce strange results.

Finally an example of correct RawByteString type usage. The following function counts the number of occurrences of ‘?’ character in an ANSI string:

function CountQuestions(const S: RawByteString): Integer;
const
  Mark = $3F;

var
  I: Integer;

begin
  Result:= 0;
  for I:= 1 to Length(S) do begin
    if Byte(S[I]) = Mark
      then Inc(Result);
  end;
end;

The purpose of using RawByteString type for the function’s argument is to avoid unnecessary string conversion.