On the record initialization/finalization in Delphi

If a Delphi record type contains managed fields Delphi uses special initialization and finalization routines for the record instances. Unfortunately these routines use some kind of RTTI information and slow; they could be made much faster.

I have written a simple benchmark to find out how the built-in record initialization/finalization routines can possibly slow down my BigInteger implementation:

program CustomBench;

{$APPTYPE CONSOLE}

uses
  SysUtils;

type
  TIRec = record
    II: IInterface;
  end;

  TPRec = record
    PP: Pointer;
  end;

procedure ITest(const AI: IInterface);
var
  IR: TIRec;

begin
  IR.II:= AI;
end;

procedure PTest(const AI: IInterface);
var
  PR: TPRec;

begin
  PR.PP:= nil;                  // custom initialization
  try
    IInterface(PR.PP):= AI;
  finally
    IInterface(PR.PP):= nil;      // custom finalization;
  end;
end;

procedure TestI(const AI: IInterface; Count: Integer);
begin
  while Count > 0 do begin
    ITest(AI);
    Dec(Count);
  end;
end;

procedure TestP(const AI: IInterface; Count: Integer);
begin
  while Count > 0 do begin
    PTest(AI);
    Dec(Count);
  end;
end;

procedure BenchMark(Count: Integer);
const
  MillisPerDay = 24 * 60 * 60 * 1000;

var
  II: IInterface;
  StartTime: TDateTime;
  ElapsedMillis1: Integer;
  ElapsedMillis2: Integer;

begin
  II:= TInterfacedObject.Create;
  StartTime:= Now;
  TestI(II, Count);
  ElapsedMillis1:= Round((Now - StartTime) * MillisPerDay);
  Writeln('Built-in initialization/finalization: ', ElapsedMillis1, ' ms.');
  StartTime:= Now;
  TestP(II, Count);
  ElapsedMillis2:= Round((Now - StartTime) * MillisPerDay);
  Writeln('Custom initialization/finalization: ', ElapsedMillis2, ' ms.');
  Writeln('Built-in slowdown: ',
    Round((ElapsedMillis1/ElapsedMillis2 - 1) * 100), '%');
  II:= nil;
end;

begin
  try
    BenchMark(10000000);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

The typical result I am getting on my laptop (Delphi XE, 32 bits) is:

Slow

Not a showstopper, the actual slowdown for a real code which is doing something will be significantly less, still the slowdown is not negligible.

Advertisements

10 thoughts on “On the record initialization/finalization in Delphi

  1. This part of the RTL is not optimzed.

    IMHO record initialization should not be made at runtime, using RTTI, but should be prepared at compile time, with proper code generation.

    An heuristic algorithm should let the compiler switch between two ways of implementation:
    – if the record contain a lot of non reference-counted data and a few reference-counted fields, generate code to initialize only some reference-counted field;
    – if the record contain mainly reference-counted data (as in your benchmark), generate code to make a fillchar() of 0 on the whole record (like for class initialization).

    You did only benchmark initialization.
    But finalization also uses the RTL and RTTI – so can be pretty slow. And this finalization code is shared with the class release code.
    IMHO finalization should also not be made at runtime using RTTI, but at compile time, with code generation.

    For our enhanced RTL (only for Delphi 7 and 2007), we re-wrote the initialization/finalization part of the RTL with optimized asm, with some success. See http://blog.synopse.info/post/2010/01/18/Enhanced-System-Run-Time-for-Delphi-7-and-Delphi-2007 and http://synopse.info/files/SynopseRTLsources.zip
    But compile-time code generation should be much better.

    • I think I benchmarked both initialization and finalization by replacing the built-in routines based on RTTI by my custom code with initialization and finalization which does not use RTTI.
      And yes, I think a smarter compiler could generate the same optimized code; no RTTI is needed for my simple record with only one field.

      • My mistake: you did also benchmark finalization, of course!
        I was focusing on initialization part. Reading too early in the morning, before my first liter of coffee, does not help. 🙂

        I’ve just added some optional asm patchs to our SynCommons.pas unit, to speed up record initialization/finalization (and class finalization, BTW). Sadly, TMonitor.Destroy is private to System.pas, so TObject.CleanupInstance can’t be patched directly.
        See http://synopse.info/fossil/info/b5430fa5d0

      • I just tested with my asm optimized routines (for 50000000 iterations, with Delphi 7):

        STANDARD RTL:
        Built-in initialization/finalization: 2178 ms.
        Custom initialization/finalization: 1602 ms.
        Built-in slowdown: 36%

        SYNCOMMONS PATCHES:
        Built-in initialization/finalization: 1943 ms.
        Custom initialization/finalization: 1629 ms.
        Built-in slowdown: 19%

        Speed benefit will be even higher if you have composite records (i.e. more than one reference counted field).

  2. That is one problem yes, another is lack of record constructor / destructor that would be called automatically. There is no other way to save this, but to hook the initialization and finalization of the record. Not pretty but doable. I have done this for Cromis.AnyValue:

    http://www.cromis.net/blog/downloads/cromis-anyvalue

    It speeds up things significantly and also makes possible to do some very neat things with records. But it is a kind of a hack and I have to revert to old style under X64 because the Detours I use are not x64 safe.

  3. I think that PTest should be with try/finally (similar as ‘compiler magic’ will do, because it’s mem-leak ‘safe’) if you want compare with same functionality:

    procedure PTest(AI: IInterface);
    var
      PR: TPRec;
    begin
      PR.PP:= nil;                  // custom initialization
      try
        IInterface(PR.PP):= AI;
      finally
        IInterface(PR.PP):= nil;      // custom finalization;
      end;
    end;
    

    It’s still slower (on my machine about 4-5%) because of InitializeRecord/FinializeRecord when using ITest.

    If you look for perf. you can use ‘const interface’, so there won’t be AddRef/Release on each call (similar with other ref-counted types, like Strings or dyn-arrays):

    procedure ITest(const AI: IInterface); 85% faster than ITest(const AI: IInterface);

  4. There is an other solution for the task
    type
    PTIRec = ^TIRec;
    procedure TestW(const AI: IInterface; Count: Integer);
    var
    TmpIR: PTIRec;
    IR: PTIRec;
    SaveCnt: Integer;
    begin
    SaveCnt := Count;
    GetMem(IR, SizeOf(TIRec) * Count);
    if IR nil then
    begin
    TmpIR := IR;
    Initialize(IR^, Count);
    while Count > 0 do
    begin
    TmpIR^.II := AI;
    Inc(TmpIR);
    Dec(Count);
    end;
    Finalize(IR^, SaveCnt);
    FreeMem(IR, SizeOf(TIRec) * SaveCnt);
    end;
    end;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s