A String of Byte?

7

Dynamic arrays are implemented in Delphi as lifetime-managed reference types without “copy-on-write” support. What does it mean on practice can be shown by the following code:

program Project1;

{$APPTYPE CONSOLE}

type
  TIntArray = array of Integer;

var
  A, B: TIntArray;

begin
  A:= TIntArray.Create(1,2,3)
  B:= A;      // B = (1,2,3)
  A[0]:= 10;  // A = (10,2,3), B = (10,2,3)
  Writeln(B[0]);
  readln;
end.

A and B are just the same references, so by changing a value referenced by A we also change the same value referenced by B. That makes dynamic arrays different from long strings, which are implemented as lifetime-managed reference types with “copy-on-write” support:

program Project1;

{$APPTYPE CONSOLE}

var
  A, B: string;

begin
  A:= 'ABC';
  B:= A;       // B = 'ABC'
  A[1]:= 'Z';  // A = 'ZBC', B = 'ABC'
  Writeln(A:5, B:5);
  readln;
end.

To avoid the above side effect with dynamic arrays one should use Copy function (B:= Copy(A);) instead of plain assignment (B:= A;). But sometimes an absence of the “copy-on-write” support is more disappointing. Consider a dynamic array as a record field:

program Project2;

{$APPTYPE CONSOLE}

type
  TIntArray = array of Integer;
  TSomeRec = record
    FArr: TIntArray;
    FInt: Integer;
  end;

var
  A, B: TSomeRec;

begin
  A.FArr:= TIntArray.Create(1,2,3);
  A.FInt:= 5;
  B:= A;
  A.FArr[0]:= 11;  // B.FArr[0] = 11
  A.FInt:= 55;     // B.FInt = 5
  Writeln(B.FArr[0]:5, B.FInt:5);
  readln;
end.

One can’t use B:= Copy(A) here. In any case one should be careful while making use of records with dynamic array fields and dynamic array variables, and have in mind the possible side effects.

The dynamic arrays’ shortcomings were also discussed in CR article. But “copy-on-write” semantics is not a silver bullet – as for dynamic arrays there are many situations when we don’t need it at all. With compatibility considerations taken into account, it seems better to leave dynamic arrays “as is”.

What about extending language syntax? Let us imagine the following construction:

type
  TByteString = string of Byte;

The “string of” types should be reference types. They should support “copy-on-write” semantics (like strings) and be zero-based (like dynamic arrays). There is no need in multidimensional “string of” or elements of complex type – plain scalar element’s types like Byte are enough.

And finally about “string of Char”. Current Delphi string type implementation carries heavy legacy burden. Delphi strings are 1-based arrays, that is not good. The recent enhancements (codepage support) though very interesting does not seem to have much practical value.
“String of Char” implementation should follow KISS principle – just a lifetime-managed reference type supporting “copy-on-write” semantics like any other “string of” type, with minimum additional compiler support (ex assignment compatibility with traditional strings, etc).

The Emperor’s New Clothes

11

I had a very disappointing discussion with Embarcadero employees on EDN forums recently on the subject already covered in CR blog or Deltics blog. I have no desire to give a link to the EDN thread – the thread is locked now, it does not bring honour to its participants. Instead I give a link to wikipedia article about the story by Hans Christian Andersen. Borland/Embarcadero weavers always pretended and continue to pretend that they see something invisible to those unfit for their positions or incompetent. The result is inevitable – when the Emperor parades before his subjects in his new clothes someone cries out the truth, “But he isn’t wearing anything at all!”.

Delphi interfaces on binary level

15

An interface reference in Delphi is a pointer to pointer to an interface method table (IMT). That follows the COM specifications and is a good starting point to understand what Delphi interfaces are on binary level. Delphi interfaces can be made 100% compatible with COM specifications, but that is not necessary – it is also possible to implement “light COM-like” interfaces.

Interface methods in Delphi are implemented as object’s methods, so let us have a quick look on the Delphi objects on binary level. Each Delphi 2009 object instance have 2 necessary fields, 4 bytes each. The first field is a pointer to the class VMT, the last field (prefixed by ‘hf’ – Hidden Field? in System.pas) is used by TMonitor advanced record, currently I don’t know what this field is actually for. There are no more fields in TObject instance, so TObject instance size in Delphi 2009 is 8 bytes. We need not these fields here, but we must have in mind that the first 4 bytes and the last 4 bytes of any object instance are “reserved”.
Now let us consider what TInterfacedObject is on binary level. TInterfacedObject implements IInterface that is declared in System.pas as

type
  IInterface = interface
    ['{00000000-0000-0000-C000-000000000046}']
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  end;

The TInterfacedObject itself is declared as

  TInterfacedObject = class(TObject, IInterface)
  protected
    FRefCount: Integer;
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  public
    procedure AfterConstruction; override;
    procedure BeforeDestruction; override;
    class function NewInstance: TObject; override;
    property RefCount: Integer read FRefCount;
  end;

TInterfacedObject instance size (in Delphi 2009) is 16 bytes, and we have 2 additional fields (4 bytes each). The first additional field is FRefCount field, the second is more interesting for us – it is a pointer to the interface method table. If we create an interface reference – for example, by calling

var
  II: IInterface;

begin  
  II:= TInterfacedObject.Create;
  ..

then we have

IMT consists of 3 entries – pointers to QueryInterface, AddRef and Release implementations.

Note that an interface reference is just a simple 4-byte pointer while interface methods are object’s methods and require two pointers – a 4-byte pointer to the method’s code and a 4-byte pointer to the object’s instance. The code pointer can be found in IMT, but what about the object’s instance pointer? After having a closer look on the above picture we can guess that the compiler “knows” the offset of IMT field and subtracts it from an interface reference value to obtain a pointer to object’s instance. Let us check the guess:

procedure TForm1.Button1Click(Sender: TObject);
var
  II: IInterface;

begin
  II:= TInterfacedObject.Create;
  II._AddRef;
  II._Release;
end;

The above code just calls two interface methods – _AddRef and _Release. The compiler implements these calls as follows:

II._AddRef;
        mov eax,[ebp-$04]
        push eax
        mov eax,[eax]
        call dword ptr [eax+$04]
II._Release;
        mov eax,[ebp-$04]
        push eax
        mov eax,[eax]
        call dword ptr [eax+$08]

[ebp-$04] is the interface reference II. The compiler pushes it onto stack (as required by stdcall calling conventions), takes a pointer to IMT from the object’s field pointed by interface reference, adds a IMT offset ($04 for _AddRef, $08 for _Release) and calls the corresponding code. No offset is subtracted – the calls are implemented as if a pointer to object’s instance is equal to an interface reference value, but we know they are different.
Let us go further and have a look to the code called by

        call dword ptr [eax+$04]
        call dword ptr [eax+$08]

instructions:

        add dword ptr [esp+$04],-$08
        jmp TInterfacedObject._AddRef
        add dword ptr [esp+$04],-$08
        jmp TInterfacedObject._Release

Yes! That is where the compiler uses its knowledge about the IMT field offset in an object’s instance. Instead of calling TInterfacedObject methods directly the compiler calls a proxy code that converts an interface reference value into a pointer to an object’s instance and jumps to an object’s method implementation. For optimization reasons the compiler adds -8 instead of subtracting 8 (the offset of IMT field in TInterfacedObject instance), that does not matter for us.

Now all pieces of the puzzle are in place.
On the “client” side we have an interface reference – a plain 4-byte pointer. An interface reference is a pointer to pointer to the IMT; the IMT is an array of pointers to the method’s proxy code. When the compiler calls an interface method pointed by an IMT entry it uses the value of the interface reference as an additional “Self” method’s argument.
On the “server” side we have an object with methods that implements the interface methods. Object’s methods require “Self” argument – a pointer to the object’s instance. But the object’s “Self” is not equal to the value of interface reference.
Between the “client” and “server” there is a proxy code that converts a “client” interface reference value to a “server” object’s instance pointer and jumps to the object’s method implementation.

Now we understand how object’s methods are called using an interface reference, and we can convert an interface reference into a method pointer manually. The following code is a modification of the code from Barry Kelly’s post about the anonymous methods in Delphi:

procedure IntRefToMethPtr(const IntRef; var MethPtr; MethNo: Integer);
type
  TVtable = array[0..999] of Pointer;
  PVtable = ^TVtable;
  PPVtable = ^PVtable;
begin
  // QI=0, AddRef=1, Release=2, etc
  TMethod(MethPtr).Code := PPVtable(IntRef)^^[MethNo];
  TMethod(MethPtr).Data := Pointer(IntRef);
end;

Let us test the above procedure:

type
  TIntFunc = function: Integer of object; stdcall;

procedure TForm1.Button2Click(Sender: TObject);
var
  II: IInterface;
  AddRefMeth: TIntFunc;
  Obj: TInterfacedObject;

begin
  Obj:= TInterfacedObject.Create;
  II:= Obj;
  IntRefToMethPtr(II, AddRefMeth, 1);
  ShowMessage(IntToStr(Obj.RefCount));
  AddRefMeth;
  ShowMessage(IntToStr(Obj.RefCount));
  II._Release;
  ShowMessage(IntToStr(Obj.RefCount));
end;

We can see that AddRefMeth method pointer call do the same as _AddRef interface method call – increments a reference count.