User-friendly Hash Library

9

TForge 0.65 released.

The release features a hash library with fluent coding support. While I was writing the library I was inspired by usability of the Python’s hashlib.

Current release supports:

  • Cryptographic hash algorithms:
    1. MD5
    2. SHA1
    3. SHA256
  • Non-cryptographic hash algorithms:
    1. CRC32
    2. Jenkins-One-At-Time
  • Hash-based MAC algorithm (HMAC)
  • Key derivation algorithms:
    1. PBKDF1
    2. PBKDF2

Let us consider a common problem: calculate MD5 and SHA1 digests of a file. The simplest way to do it is:

program HashFile;

{$APPTYPE CONSOLE}

uses
  SysUtils, Classes, tfTypes, tfHashes;

procedure CalcHash(const FileName: string);
begin
  Writeln('MD5:  ', THash.MD5.UpdateFile(FileName).Digest.ToHex);
  Writeln('SHA1: ', THash.SHA1.UpdateFile(FileName).Digest.ToHex);
end;

begin
  try
    if ParamCount = 1 then begin
      CalcHash(ParamStr(1));
    end
    else
      Writeln('Usage: > HashFile filename');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  Readln;
end.

The above application demonstrates the beauty of fluent coding. The code is compact and clear – you create an instance of a hash algorithm, feed a file data to it, generate resulting digest and convert it to hexadecimal; the instance is freed automatically, no need for explicit call of the Free method.

But the above code is not optimal – it reads a file twice. A better solutions involves some coding:

procedure CalcHash(const FileName: string);
const
  BufSize = 16 * 1024;

var
  MD5, SHA1: THash;
  Stream: TStream;
  Buffer: array[0 .. BufSize - 1] of Byte;
  N: Integer;

begin
  MD5:= THash.MD5;
  SHA1:= THash.SHA1;
  try
    Stream:= TFileStream.Create(FileName,
                                fmOpenRead or fmShareDenyWrite);
    try
      repeat
        N:= Stream.Read(Buffer, BufSize);
        if N <= 0 then Break
        else begin
          MD5.Update(Buffer, N);
          SHA1.Update(Buffer, N);
        end;
      until False;
    finally
      Stream.Free;
    end;
    Writeln('MD5:  ', MD5.Digest.ToHex);
    Writeln('SHA1: ', SHA1.Digest.ToHex);
  finally
    MD5.Burn;
    SHA1.Burn;
  end;
end;

The code also demonstrate the use of Burn method; it is not needed here and could be safely removed with corresponding try/finally block but can be useful in other cases – it destroys all sensitive data in an instance. The use of Burn method is optional – it is called anyway when an instance is freed, but explicit call of the Burn method gives you full control over erasing the sensitive data.

The Free method does not free an instance; it only decrements the instance’s reference count, and since the compiler can create hidden references to the instance the moment when the reference count turns zero and the instance is freed is generally controlled by the compiler.


The use of non-cryptographic hash algorithms has one caveat – since they actually return an integer value the bytes of a digest are reversed. The idiomatic way to get the correct result is to cast the digest to integer type:

  Writeln('CRC32: ',
    IntToHex(LongWord(THash.CRC32.UpdateFile(FileName).Digest), 8));

or else you can reverse the digest’s bytes:

  Writeln('CRC32: ',
    THash.CRC32.UpdateFile(FileName).Digest.Reverse.ToHex);

HMAC algorithm generates digest using a cryptographic hash algorithm and a secret key. Here is an example of calculating SHA1-HMAC digest of a file:

procedure SHA1_HMAC_File(const FileName: string;
                         const Key: ByteArray);
begin
  Writeln('SHA1-HMAC: ',
    THMAC.SHA1.ExpandKey(Key).UpdateFile(FileName).Digest.ToHex);
end;

begin
..
  SHA1_HMAC_File(ParamStr(1),
    ByteArray.FromText('My Secret Key'));

Key derivation algorithms generate keys from user passwords by applying hash algorithms. PBKDF1 applies a cryptographic hash algorithm directly, PBKDF2 uses HMAC. Here are usage examples:

procedure DeriveKeys(const Password, Salt: ByteArray);
begin
  Writeln('PBKDF1 Key: ',
    THash.SHA1.DeriveKey(Password, Salt,
                         10000,   // number of rounds
                         16       // key length in bytes
                         ).ToHex);
  Writeln('PBKDF2 Key: ',
    THMAC.SHA1.DeriveKey(Password, Salt,
                         10000,   // number of rounds
                         32       // key length in bytes
                         ).ToHex);
end;

begin
..
  DeriveKeys(ByteArray.FromText('User Password'),
             ByteArray.FromText('Salt'));

Configuration and Installation

The release contains 2 runtime packages TForge and THashes. You should build them, first TForge, next THashes.

For Delphi users:

The packages are in Packages\DXE subfolder (Delphi XE only).

You should make the folder Source\Include available via project’s search path before you can build the packages. To do it open “Project Options” dialog for each package, set “Build Configuration” to “Base” and replace the path to TFL.inc by your path (depends on where you unpacked the downloaded archive):

Include

Now use the project manager and build the packages in “Debug” and “Release” configurations:

build

Optionally, to enable Ctrl-click navigation in the editor, add paths to the source code units to the Browsing path:
Browse

To make the packages available to an application open the application’s “Project Options” dialog, set “Build Configuration” to “Base” and add the next path to search path:

appconfig

For FPC/Lazarus users:

The packages are in Packages\FPC subfolder.

I don’t know much about Lazarus packages. Here are the package configuration options I used:

Laz2

_lazconfig

Byte Array redux

0

I am continuing to experiment with the coding patterns used in my numerics dll project and releasing TForge 0.60.

First of all TForge is a modern cryptographic library for Delphi and Lazarus/FPC (which I am currently working on) implemented as a set of runtime packages. The current release contains only one package (also named TForge); it is core package of the whole TForge project and is required by other packages (to be released later).

The purpose of the release is to introduce a new type – ByteArray; the type is an enhanced version of the standard RTL TBytes type. If you want to test the ByteArray you need to build tforge package; the release containes tforge packages for Delphi XE and Lazarus.

The release includes ByteArrayDemo console application which demonstrates functionality of the ByteArray type. For example, you can concatenate byte arrays:

var
  A1, A2: ByteArray;

begin
  A1:= ByteArray(1);
  A2:= TBytes.Create(2, 3, 4);
  Writeln('A1 + A2 = ', (A1 + A2).ToString); // 1, 2, 3, 4

perform bitwise boolean operations (xor is most useful):

var
  A1, A2: ByteArray;

begin
  A1:= ByteArray(1);
  A2:= TBytes.Create(2, 3, 4);
  Writeln('A1 xor A2 = ', (A1 xor A2).ToString); // 3 (= 1 xor 2, min array length used)

use fluent coding style (which appears to be very handy once one gets accustomed to it):

begin
  Writeln(ByteArray.FromText('ABCDEFGHIJ').Insert(3,
    ByteArray.FromText(' 123 ')).Reverse.ToText); // JIHGFED 321 CBA

and so on. It was fun to code ByteArray; I am using it more and more now as a TBytes replacement because of better usability.

Why PDWORD is not a pointer to DWORD

2

Once upon a time the Delphi team lead decided that Delphi should contain declarations of C++ types like unsigned long for the sake of better C++ compatibility. So he has written

type
  CppULongInt = LongWord;

No, that is not good thought the team lead. Delphi’s LongWord is a 32-bit type while the C++’s unsigned long is a platform-dependent type. So let us declare CppULongInt as a distinct type instead of an alias type:

type
  CppULongInt = type LongWord;

That is better. Now how about DWORD type? It is declared in Windows headers as

  typedef unsigned long       DWORD;

So for compatibility sake we will declare DWORD as

type
  DWORD = CppULongInt;

Great, thought the Delphi team lead. He call two juniors (let us call them DevA and DevB) and said:

- DevA, create a branch (BranchA) in the Delphi project, declare type DWORD = CppULongInt; and fix all bugs that may appear after;
DevB, create a branch (BranchB) in the Delphi project, declare type PDWORD = ^CppULongInt; and fix all bugs that may appear after;

After you both are ready we will merge the branches and declare PDWORD type as it should be:

type
  PDWORD = ^DWORD;

In a due time the happy DevB came to the Delphi team lead and said: “I did everything as you said Sir!. Now type DWORD = CppULongInt;, and everything is fine!”.

But the second guy DevA was not happy. He said: “Sir, we have tons of code like that:

procedure Foo(var Value: DWord);
begin
  Value:= 0;
end;

var
  D: Longword;

begin
  Foo(D);

If I declare

type
  DWORD = CppULongInt;

Then it does not compile. What should I do?”

-“Nothing”, said the Delphi team lead. “The BranchB will be the main trunk now”.

Disclaimer: the tale was written after reading this SO question

Why unit tests are not always good

17

Unit tests are good to detect most bugs in your code but not all bugs. When you are writing standard unit tests for a class you are doing the following

  • Create a fresh class instance (ex using Setup method in DUnit framework);
  • Run a code under test (usually a single call of a single method) on the instance;
  • Free the instance (ex using TearDown method in DUnit framework).

And this is how unit tests should be written; if your test detects a bug you immediately know the bug’s origin.

The problem with the above scenario is that it is ideal to hide some badly reproducible bugs such as access violation (AV) bugs. To detect such a bad bug with good probability you need something different, probably to do multiple calls of a method on the same instance, or to call different methods in the same test, and this approach is quite opposite to the idea of unit testing.

Numerics 0.57 released (Hashtables, Bugfix)

2

1. The main purpose of the release is to implement hash tables (aka associative arrays) with keys of BigCardinal or BigInteger type; such hash tables are important for cryptographic applications. The hash tables in Delphi are implemented by a generic TDictionary<TKey,TValue> class. The release includes a new unit GNumerics.pas with two helper generic classes TBigIntegerDictionary<TValue> and TBigCardinalDictionary<TValue> specializing the base generic dictionary class. For example to work with a hash table having BigInteger keys and string values you need something like this:

uses GNumerics;

[..]
var
  HashTable: TBigIntegerDictionary<string>;
  A: BigInteger;

[..]
begin
// create hash table instance
  HashTable:= TBigIntegerDictionary<string>.Create;
  A:= BigInteger('1234567890987654321');
  try
// fill hash table with data
    HashTable.Add(A * A, 'some string value');
    [..]
  finally
// destroy hash table
    HashTable.Free;
  end;
end;

2. Some bugs fixed; in particular a nasty access violation bug in BigInteger.ModPow function was fixed.

3. Minor changes in BigCardinal/BigInteger methods.

Link to the download page

Update

Version 0.58 fixes the conversion bug from (U)Int64 to BigInteger/BigCardinal.

RawByteString type explained

0

With the introduction of Unicode support Delphi also introduced magic RawByteString type; the word ‘magic’ here means that you can’t implement a type with RawByteString functionality without hidden compiler support.

A common misunderstanding about the RawByteString type is that instances of RawByteString contain no encoding information and because of this the compiler can’t implement implicit string conversion. That is not true.

First of all the RawByteString is AnsiString (1-byte character size). If you typecast a Unicode string to RawByteString type or a RawByteString string to Unicode type the compiler will always implement string conversion; if the compiler has no information about the ANSI codepage of RawByteString it uses the system codepage for conversion.

So the RawByteString magic is for AnsiStrings only.

When you declare a variable of AnsiString type you also declare a codepage; for example

type
  CyrString = type AnsiString(1251);

var
  S: CyrString;

That means the compiler has static codepage information; if you do not declare codepage the compiler assumes the system codepage. The AnsiString instances also have runtime codepage information (codepage field in the string instance header), but usually the compiler never checks the runtime codepage information and uses static codepage information for string conversions.

The magic of RawByteString type is that the compiler has no static codepage information; it does not mean that a RawByteString instance has no runtime codepage information.

If you typecast an AnsiString instance to RawByteString type no string conversion happens.

The RawByteString type is for ANSI strings’ typecasting only, not for creating instances of the type. To understand how it works let us first consider “an educated abuse” of RawByteString:

type
  string1251 = type AnsiString(1251);
  string1252 = type AnsiString(1252);

var
  S1: string1251;
  S2: string1252;

begin
// just initialize it with some data
  S1:= 'АБВГДЕЙКА';

// no string conversion here;
// the S1 string instance is copied 'as is',
//   with codepage information
  S2:= RawByteString(S1);

Now we have an instance (S2) of string1252 type containing data in ANSI 1251 encoding and runtime codepage 1251. But since the compiler normally uses static codepage information the subsequent use of the instance may produce strange results.

Finally an example of correct RawByteString type usage. The following function counts the number of occurrences of ‘?’ character in an ANSI string:

function CountQuestions(const S: RawByteString): Integer;
const
  Mark = $3F;

var
  I: Integer;

begin
  Result:= 0;
  for I:= 1 to Length(S) do begin
    if Byte(S[I]) = Mark
      then Inc(Result);
  end;
end;

The purpose of using RawByteString type for the function’s argument is to avoid unnecessary string conversion.

Implementing generic interfaces in Delphi

8

Delphi supports generic interfaces; for example we can declare a generic interface

type
  IChecker<T> = interface
    function Check(const Instance: T): Boolean;
  end;

and use this generic interface as follows:

unit UseDemo;

interface

uses GenChecks;

type
  TDemo<T> = class
  private
    FChecker: IChecker<T>;
  public
    constructor Create(AChecker: IChecker<T>);
    procedure Check(AValue: T);
  end;

implementation

{ TDemo<T> }

procedure TDemo<T>.Check(AValue: T);
begin
  if FChecker.Check(AValue)
    then Writeln('Passed')
    else Writeln('Stopped')
end;

constructor TDemo<T>.Create(AChecker: IChecker<T>);
begin
  FChecker:= AChecker;
end;

end.

To implement the above generic interface IChecker we need a generic class; the straightforward solution is

type
  TChecker<T> = class(TInterfacedObject, IChecker<T>)
    function Check(const Instance: T): Boolean;
  end;

If the IChecker interface can be implemented like that, we need nothing else. The problem with the above implementation is that we are limited to the generic constraints on the type T and can’t use properties of specific types like Integer or string that will finally be substituted for the type T.

A more elastic solution is to introduce an abstract base type and derive the specific implementations from it. Here is a full code example:

program GenericEx1;

{$APPTYPE CONSOLE}

uses
  SysUtils,
  GenChecks in 'GenChecks.pas',
  UseDemo in 'UseDemo.pas';

procedure TestInt;
var
  Demo: TDemo<Integer>;

begin
  Demo:= TDemo<Integer>.Create(TIntChecker.Create(42));
  Demo.Check(0);
  Demo.Check(42);
end;

procedure TestStr;
var
  Demo: TDemo<string>;

begin
  Demo:= TDemo<string>.Create(TStrChecker.Create('trololo'));
  Demo.Check('ololo');
  Demo.Check('olololo');
end;

begin
  TestInt;
  TestStr;
  ReadLn;
end.
unit GenChecks;

interface

type
  IChecker<T> = interface
    function Check(const Instance: T): Boolean;
  end;

type
  TCustomChecker<T> = class(TInterfacedObject, IChecker<T>)
  protected
    FCheckValue: T;
    function Check(const Instance: T): Boolean; virtual; abstract;
  public
    constructor Create(ACheckValue: T);
  end;

  TIntChecker = class(TCustomChecker<Integer>)
  protected
    function Check(const Instance: Integer): Boolean; override;
  end;

  TStrChecker = class(TCustomChecker<string>)
  protected
    function Check(const Instance: string): Boolean; override;
  end;

implementation

{ TCustomChecker<T> }

constructor TCustomChecker<T>.Create(ACheckValue: T);
begin
  FCheckValue:= ACheckValue;
end;

{ TIntChecker }

function TIntChecker.Check(const Instance: Integer): Boolean;
begin
  Result:= Instance = FCheckValue;
end;

{ TStrChecker }

function TStrChecker.Check(const Instance: string): Boolean;
begin
  Result:= Length(Instance) = Length(FCheckValue);
end;

end.

In the above example each interface reference ICheck references its own class instance; this is necessary because every instance contains a parameter (FCheckValue) set in the constructor. If an implementation does not require such a parameter creating new instances for every interface reference will be an overhead. A better solution is to use a singleton instance.

Here is a full code example for the integer type:

program GenericEx2;

{$APPTYPE CONSOLE}

uses
  SysUtils,
  GenChecks in 'GenChecks.pas',
  UseDemo in 'UseDemo.pas';

procedure TestInt;
var
  Demo: TDemo<Integer>;

begin
  Demo:= TDemo<Integer>.Create(TIntChecker.Ordinal);
  Demo.Check(0);
  Demo.Check(42);
end;

begin
  TestInt;
  ReadLn;
end.
unit GenChecks;

interface

uses Generics.Defaults;

type
  IChecker<T> = interface
    function Check(const Instance: T): Boolean;
  end;

  TCustomChecker<T> = class(TSingletonImplementation, IChecker<T>)
  protected
    function Check(const Instance: T): Boolean; virtual; abstract;
  end;

  TIntChecker = class(TCustomChecker<Integer>)
  private
    class var
      FOrdinal: TCustomChecker<Integer>;
  public
    class function Ordinal: TIntChecker;
  end;

implementation

type
  TOrdinalIntChecker = class(TIntChecker)
  public
    function Check(const Instance: Integer): Boolean; override;
  end;

{ TOrdinalIntChecker }

function TOrdinalIntChecker.Check(const Instance: Integer): Boolean;
begin
  Result:= Instance = 42;
end;

{ TIntChecker }

class function TIntChecker.Ordinal: TIntChecker;
begin
  if FOrdinal = nil then
    FOrdinal := TOrdinalIntChecker.Create;
  Result := TIntChecker(FOrdinal);
end;

end.