TCiphers – Multithreading Support.


TCiphers supports encryption/decryption in multiple threads for stream ciphers and block ciphers in CTR mode of operation.

The idea is simple – to encrypt N bytes of data in m threads you split the data into m parts, each part approximately of N/m bytes size, then create m TCipher instances and encrypt m parts in parallel. The TCiphers methods needed to implement this algorithm are:

  function TCipher.Copy: TCipher;
  function TCipher.Skip(Value: LongWord): TCipher; overload;
  function TCipher.Skip(Value: UInt64): TCipher; overload;
  property TCipher.BlockSize: Cardinal;

The TCipher.Copy duplicates an instance of TCipher, TCipher.Skip(Value) discards Value blocks of the keystream, and TCipher.BlockSize returns the cipher’s block size. These are all ingredients needed for multithreading.

The ThreadDemo project demonstrates how it works. The thread class implementing the encryption receives a cipher instance, an address of the portion of data to be encrypted and its size in the constructor and uses Windows event to signal the main thread when the operation is completed:

unit CipherThreads;


  Windows, Classes, tfCiphers;

  TCipherThread = class(TThread)
    FCipher: TCipher;
    FOrigin: Pointer;
    FSize: LongWord;
    FLast: Boolean;
    FEvent: THandle;
    procedure Execute; override;
    constructor Create(ACipher: TCipher; AOrigin: Pointer; ASize: LongWord;
                       ALast: Boolean; AEvent: THandle);


{ TCipherThread }

constructor TCipherThread.Create(ACipher: TCipher; AOrigin: Pointer;
  ASize: LongWord; ALast: Boolean; AEvent: THandle);
  FCipher:= ACipher;
  FOrigin:= AOrigin;
  FSize:= ASize;
  FLast:= ALast;
  FEvent:= AEvent;
  FreeOnTerminate:= True;
  inherited Create(False);

procedure TCipherThread.Execute;
  DataSize: LongWord;

  DataSize:= FSize;
  FCipher.KeyCrypt(FOrigin^, DataSize, FLast);


The main thread splits the data to be encrypted into 4 portions with the cipher’s block size granularity. Then we create an instance of TCipher (AES and Salsa20 algorithms are tested); the instance will finally be used to check the multi-thread encryption result by the single-thread decryption. Each thread uses its own copy of the cipher’s instance, created by using TCipher.Copy and TCipher.Skip methods. That’s all:

program ThreadDemo;


  CipherThreads in '..\Source\CipherThreads.pas';

procedure TestCipher(Cipher: TCipher);
  DATA_SIZE = 1024 * 1024;
  NThreads = 4;

  PData = ^TData;
  TData = array[0 .. DATA_SIZE - 1] of LongWord;

  Data: PData;

  I: Integer;
  DataSize: Cardinal;
  BlockSize: Cardinal;
  Origin: PByte;
  Chunk, Size: Cardinal;
  Events: array[0 .. NThreads - 1] of THandle;

  GetMem(Data, SizeOf(TData));
// fill the data with some known values
    for I:= 0 to DATA_SIZE - 1 do
      Data[I]:= I;
    BlockSize:= Cipher.BlockSize;
    Origin:= PByte(Data);
    Chunk:= SizeOf(TData) div (NThreads * BlockSize);

// encrypt the data using multiple threads
    for I:= 0 to NThreads - 1 do begin
      if I = NThreads - 1 then
        Size:= SizeOf(TData) - (NThreads - 1) * Chunk * BlockSize
        Size:= Chunk * BlockSize;
      Events[I]:= CreateEvent(nil, False, False, nil);
      TCipherThread.Create(Cipher.Copy.Skip(Chunk * LongWord(I)),
        Origin, Size, I = NThreads - 1, Events[I]);
      Inc(Origin, Chunk * BlockSize);

      WaitForMultipleObjects(NThreads, @Events, True, INFINITE);
      for I:= 0 to NThreads - 1 do

// check the result by decryption
    DataSize:= SizeOf(TData);
    Cipher.KeyCrypt(Data^, DataSize, True);
    for I:= 0 to DATA_SIZE - 1 do
      if Data[I] <> LongWord(I) then
        raise Exception.Create('!! Error -- decryption failed');


  HexKey = '2BD6459F82C5B300952C49104881FF48';
  HexIV  = '6B1E2FFFE8A114009D8FE22F6DB5F876';

    Writeln('=== Running AES Test ..');
                                     CTR_ENCRYPT or PADDING_NONE,

    Writeln('=== Running Salsa20 Test ..');
    Writeln('=== Done ! ===');
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);

This example should work with any stream cipher, but it improves performance only if the stream cipher is parallelazable. The unparallelazable RC4 cipher implements the TCipher.Skip(Value) method by generating Value bytes of a keystream and discarding them, so multitheading with RC4 is useless. On the other hand Salsa20 implementation discards the keystream blocks by incrementing directly the block number field in the cipher instance state, leveraging that the Salsa20 algorithm is parallelazable.




TForge 0.70 is released.

The release adds a new package TCiphers implementing private-key cryptography (block and stream ciphers). The current release supports:

  • Block Ciphers:
    • AES
    • DES
    • RC5
  • Modes of operation
    • ECB
    • CBC
    • CTR
  • Padding methods
    • No padding
    • Zero padding
    • ANSI X.923
    • PKCS
    • ISO 10126
    • ISO/IEC 7816-4
  • Stream Ciphers
    • RC4
    • Salsa20

The main idea behind writing the package is user-friendly Pascal Crypto API, suitable both for writing production code and for learning cryptography. Here is an example (the full source code can be found in the CipherDemo project) demonstrating how to encrypt and decrypt a file using AES cipher and CBC mode of operation in TForge:

procedure CBCFileDemo(const FileName: string;
                      const IV, Key: ByteArray);
  TCipher.AES.ExpandKey(Key, CBC_ENCRYPT, IV)
             .EncryptFile(FileName, FileName + '.aes');
  TCipher.AES.ExpandKey(Key, CBC_DECRYPT, IV)
             .DecryptFile(FileName + '.aes', FileName + '.bak');

The above example requires some explanation. For security reasons TForge don’t keep a copy of a plaintext key in a state of cipher instance if an algorithm supports key expansion (aka “key schedule”) procedure; instead it immediately expands a plaintext key passed as a parameter of ExpandKey method into a form internally used by the algorithm. A key expansion procedure may generally be affected by the encryption direction (encryption/decryption) and mode of operation, that is why TForge requires to set the encryption direction explicitly. If you expand key for decryption (ex using CBC_DECRYPT flag) and then call encryption method (ex EncryptFile), TForge raises exception.

The example is more suitable for crypto exercises. Production code usually follows the next template:

  • create a cipher instance
  • use the cipher instance for encryption or decryption
  • erase the key data from the instance

which can be implemented as follows:

  Cipher: TCipher;

  Cipher:= TCipher.AES.ExpandKey(Key, CBC_ENCRYPT, IV);
    Cipher.EncryptFile(FileName, FileName + '.aes');
//    Cipher.Free;

The instances of the TCipher type support Free method but there is no need to call Free explicitly because an instance is freed automatically when it goes out of scope.

If the padding method is not set explicitly TForge applies “default” padding method which depends on the mode of operation used. Ex for the CBC mode of operation the default padding method is “PKCS”, for the CTR mode of operation the default padding method is “no padding”; if you need say CTR with PKCS padding declare the padding method explicitly:

  Cipher:= TCipher.AES.ExpandKey(Key, CTR_ENCRYPT or PADDING_PKCS, IV);

TCipher type includes several helper methods to keep simple tasks simple; for example to encrypt/decrypt a single 64-bit block by DES cipher with a given key

procedure BlockDemo(const Block, Key: ByteArray);
  CipherText, PlainText: ByteArray;

  CipherText:= TCipher.DES.EncryptBlock(Block, Key);
  PlainText:= TCipher.DES.DecryptBlock(CipherText, Key);
  Assert(PlainText = Block);

the methods EncryptBlock and DecryptBlock keep the key expansion details under the hood.

Another handy pair of TCipher methods is designed to encrypt/decrypt arbitrary-sized data:

procedure CBCDemo(const PlainText, IV, Key: ByteArray);
  CipherText, Plaintext2: ByteArray;

  CipherText:= TCipher.AES.ExpandKey(Key, CBC_ENCRYPT, IV)
  PlainText2:= TCipher.AES.ExpandKey(Key, CBC_DECRYPT, IV)
  Assert(PlainText = PlainText2);

The core TCipher methods for encryption/decryption of arbitrary data are

    procedure TCipher.Encrypt(var Data; var DataSize: LongWord;
                      BufSize: LongWord; Last: Boolean);
    procedure TCipher.Decrypt(var Data; var DataSize: LongWord;
                      Last: Boolean);

The methods are designed for block ciphers but work with stream ciphers too. The Last argument defines whether the current portion of data is the last or not. For the last block the padding should be applied, and consequently the size of output data can differ from the size of input data. Generally the size of output data is larger for encryption and lesser for decryption; the difference can’t exceed the cipher’s block size. After the data is processed, the DataSize parameter contains the size of output data; the BufSize should be large enough to hold the output data, otherwise an exception is raised.

Stream ciphers expand a random key into a pseudorandom keystream and encrypt/decrypt by calculating xor of the keystream and the input data, no padding involved, so the size of output data is always equal to the size of input data. This operation is implemented by the method TCipher.KeyCrypt:

  procedure TCipher.KeyCrypt(var Data; DataSize: LongWord;
                             Last: Boolean);

which is the preferable method for encryption/decryption with a stream cipher.

The Last argument defines whether the current portion of data is last or not; if Last is false then the DataSize should be multiple of the cipher’s block size.

Legacy stream ciphers (RC4) have no block structure inside, so the “block size” for RC4 is equal to 1 (byte). On the other hand Salsa20 generates a keystream in 64-byte (that is 512-bit) blocks, sort of having a block cipher under the hood, so the notion of block size applies both to block and stream ciphers.

Any secure cipher can be used as a secure pseudorandom generator; the pseudorandom data can be generated by the method

  function TCipher.KeyStream(DataSize: LongWord): ByteArray;

The seed of the pseudorandom generator is a random key.

Block ciphers implement TCipher.KeyCrypt and TCipher.KeyStream methods by applying the CTR mode of operation without padding, which converts a block cipher into a stream cipher. Ex the right way to use the AES algorithm for generating pseudorandom integer values is to expand a random key for CTR mode without padding and call TCipher.KeyStream:

procedure RandDemo(const Key: ByteArray);
  Cipher: TCipher;
  I: Integer;
  Rand: LongWord;

  Cipher:= Cipher.AES.ExpandKey(Key, CTR_ENCRYPT);
  for I:= 0 to 9 do begin
    Rand:= LongWord(Cipher.KeyStream(SizeOf(Rand)));
    Writeln(I:3, ': ', Rand);

The RC5 block cipher algorithm is fully implemented as it was originally described, including 32- and 128-bit sized blocks. An instance of RC5 cipher can be created either as “standard” (8-byte block size, 12 rounds – just RC5) or “extended” (4-, 8- or 16-byte block size, 1..255 rounds – RC5(BlockSize, Rounds)):

procedure RC5BlockDemo(const Block, Key: ByteArray);
  CipherText, PlainText: ByteArray;

  CipherText:= TCipher.RC5(Block.Len, 20)
                      .EncryptBlock(Block, Key);
  PlainText:= TCipher.RC5(Block.Len, 20)
                     .DecryptBlock(CipherText, Key);
  Assert(PlainText = Block);

RC4 is a simple non-parallelizable stream cipher, widely used in today’s communication protocols but not recommended to use in new products because it is not considered secure now. Here is an example of file encryption/decryption using RC4:

procedure RC4FileDemo(const FileName: string; const Key: ByteArray);
             .EncryptFile(FileName, FileName + '.rc4');
             .DecryptFile(FileName + '.rc4', FileName + '.bak');

The example uses TCipher.Skip method to discard the first 1536 bytes of RC4 keystream [recommended for security reasons in RFC4345] prior to encryption/decryption. The parameter of TCipher.Skip method is the number of cipher blocks to be discarded; since RC4 has no block structure the “block” of RC4 is 1 byte.

Salsa20 is one of the eStream portfolio stream ciphers, modern, fast and parallelizable. The instances of Salsa20 can be created as “standard” (20-rounds) or “non-standard” (any even number of rounds from 2 to 254):

    class function TCipher.Salsa20: TCipher;
    class function TCipher.Salsa20(Rounds: LongWord): TCipher;

I will tell more about Salsa20 support in the next post dedicated to multithreading with TCiphers.