TCiphers – Multithreading Support.

TCiphers supports encryption/decryption in multiple threads for stream ciphers and block ciphers in CTR mode of operation.

The idea is simple – to encrypt N bytes of data in m threads you split the data into m parts, each part approximately of N/m bytes size, then create m TCipher instances and encrypt m parts in parallel. The TCiphers methods needed to implement this algorithm are:

  function TCipher.Copy: TCipher;
  function TCipher.Skip(Value: LongWord): TCipher; overload;
  function TCipher.Skip(Value: UInt64): TCipher; overload;
  property TCipher.BlockSize: Cardinal;

The TCipher.Copy duplicates an instance of TCipher, TCipher.Skip(Value) discards Value blocks of the keystream, and TCipher.BlockSize returns the cipher’s block size. These are all ingredients needed for multithreading.

The ThreadDemo project demonstrates how it works. The thread class implementing the encryption receives a cipher instance, an address of the portion of data to be encrypted and its size in the constructor and uses Windows event to signal the main thread when the operation is completed:

unit CipherThreads;

interface

uses
  Windows, Classes, tfCiphers;

type
  TCipherThread = class(TThread)
  private
    FCipher: TCipher;
    FOrigin: Pointer;
    FSize: LongWord;
    FLast: Boolean;
    FEvent: THandle;
  protected
    procedure Execute; override;
  public
    constructor Create(ACipher: TCipher; AOrigin: Pointer; ASize: LongWord;
                       ALast: Boolean; AEvent: THandle);
  end;

implementation

{ TCipherThread }

constructor TCipherThread.Create(ACipher: TCipher; AOrigin: Pointer;
  ASize: LongWord; ALast: Boolean; AEvent: THandle);
begin
  FCipher:= ACipher;
  FOrigin:= AOrigin;
  FSize:= ASize;
  FLast:= ALast;
  FEvent:= AEvent;
  FreeOnTerminate:= True;
  inherited Create(False);
end;

procedure TCipherThread.Execute;
var
  DataSize: LongWord;

begin
  DataSize:= FSize;
  FCipher.KeyCrypt(FOrigin^, DataSize, FLast);
  SetEvent(FEvent);
end;

end.

The main thread splits the data to be encrypted into 4 portions with the cipher’s block size granularity. Then we create an instance of TCipher (AES and Salsa20 algorithms are tested); the instance will finally be used to check the multi-thread encryption result by the single-thread decryption. Each thread uses its own copy of the cipher’s instance, created by using TCipher.Copy and TCipher.Skip methods. That’s all:

program ThreadDemo;

{$APPTYPE CONSOLE}

uses
  Windows,
  SysUtils,
  tfTypes,
  tfBytes,
  tfCiphers,
  CipherThreads in '..\Source\CipherThreads.pas';

procedure TestCipher(Cipher: TCipher);
const
  DATA_SIZE = 1024 * 1024;
  NThreads = 4;

type
  PData = ^TData;
  TData = array[0 .. DATA_SIZE - 1] of LongWord;

var
  Data: PData;

var
  I: Integer;
  DataSize: Cardinal;
  BlockSize: Cardinal;
  Origin: PByte;
  Chunk, Size: Cardinal;
  Events: array[0 .. NThreads - 1] of THandle;

begin
  GetMem(Data, SizeOf(TData));
  try
// fill the data with some known values
    for I:= 0 to DATA_SIZE - 1 do
      Data[I]:= I;
    BlockSize:= Cipher.BlockSize;
    Origin:= PByte(Data);
    Chunk:= SizeOf(TData) div (NThreads * BlockSize);

// encrypt the data using multiple threads
    for I:= 0 to NThreads - 1 do begin
      if I = NThreads - 1 then
        Size:= SizeOf(TData) - (NThreads - 1) * Chunk * BlockSize
      else
        Size:= Chunk * BlockSize;
      Events[I]:= CreateEvent(nil, False, False, nil);
      TCipherThread.Create(Cipher.Copy.Skip(Chunk * LongWord(I)),
        Origin, Size, I = NThreads - 1, Events[I]);
      Inc(Origin, Chunk * BlockSize);
    end;

    try
      WaitForMultipleObjects(NThreads, @Events, True, INFINITE);
    finally
      for I:= 0 to NThreads - 1 do
        CloseHandle(Events[I]);
    end;

// check the result by decryption
    DataSize:= SizeOf(TData);
    Cipher.KeyCrypt(Data^, DataSize, True);
    for I:= 0 to DATA_SIZE - 1 do
      if Data[I] <> LongWord(I) then
        raise Exception.Create('!! Error -- decryption failed');

  finally
    FreeMem(Data);
  end;
end;

const
  HexKey = '2BD6459F82C5B300952C49104881FF48';
  HexIV  = '6B1E2FFFE8A114009D8FE22F6DB5F876';

begin
  try
    Writeln('=== Running AES Test ..');
    TestCipher(TCipher.AES.ExpandKey(ByteArray.ParseHex(HexKey),
                                     CTR_ENCRYPT or PADDING_NONE,
                                     ByteArray.ParseHex(HexIV))
               );

    Writeln('=== Running Salsa20 Test ..');
    TestCipher(TCipher.Salsa20
                      .SetNonce($123456789ABCDEF0)
                      .ExpandKey(ByteArray.ParseHex(HexKey))
               );
    Writeln('=== Done ! ===');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  Readln;
end.

This example should work with any stream cipher, but it improves performance only if the stream cipher is parallelazable. The unparallelazable RC4 cipher implements the TCipher.Skip(Value) method by generating Value bytes of a keystream and discarding them, so multitheading with RC4 is useless. On the other hand Salsa20 implementation discards the keystream blocks by incrementing directly the block number field in the cipher instance state, leveraging that the Salsa20 algorithm is parallelazable.

Advertisements

10 thoughts on “TCiphers – Multithreading Support.

  1. I was trying this out but when I add the tfSalsa20 unit to my project I get the following errors:-

    FUNCTION:-
    class function TSalsa20.KeyBlock()
    LINE:-
    PLongWord(Data)[N]:= PLongWord(Data)[N] + Inst.FExpandedKey[N];
    XE8-ERROR:-
    E2016 Array type required
    W1024 Combining signed and unsigned types – widened both operands
    F2063 Could not compile used unit ‘tfSalsa20.pas’

    Please advise …

    • Interesting. I have no XE8, try this (only the function’s signature and one more line of code changed):

      class function TSalsa20.KeyBlock(Inst: PSalsa20; Data: TSalsa20.PBlock): TF_RESULT;
      var
        N: Cardinal;
      
      begin
        Move(Inst.FExpandedKey, Data^, SizeOf(TBlock));
        N:= Inst.FRounds;
        repeat
          DoubleRound(TSalsa20.PBlock(Data));
          Dec(N);
        until N = 0;
        repeat
          Data[N]:= Data[N] + Inst.FExpandedKey[N];
      //    PLongWord(Data)[N]:= PLongWord(Data)[N] + Inst.FExpandedKey[N];
          Inc(N);
        until N = 16;
        Inc(Inst.FExpandedKey[8]);
        if (Inst.FExpandedKey[8] = 0)
          then Inc(Inst.FExpandedKey[9]);
        Result:= TF_S_OK;
      end;
      
      • @JIM This line of would compile but without changing the function’s signature:

        class function TSalsa20.KeyBlock(Inst: PSalsa20; Data: TSalsa20.PBlock): TF_RESULT;

        the whole Salsa20 implementation should be wrong. See the code in my comment above – I tested it with my testvectors.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s