A String of Byte?

Dynamic arrays are implemented in Delphi as lifetime-managed reference types without “copy-on-write” support. What does it mean on practice can be shown by the following code:

program Project1;

{$APPTYPE CONSOLE}

type
  TIntArray = array of Integer;

var
  A, B: TIntArray;

begin
  A:= TIntArray.Create(1,2,3)
  B:= A;      // B = (1,2,3)
  A[0]:= 10;  // A = (10,2,3), B = (10,2,3)
  Writeln(B[0]);
  readln;
end.

A and B are just the same references, so by changing a value referenced by A we also change the same value referenced by B. That makes dynamic arrays different from long strings, which are implemented as lifetime-managed reference types with “copy-on-write” support:

program Project1;

{$APPTYPE CONSOLE}

var
  A, B: string;

begin
  A:= 'ABC';
  B:= A;       // B = 'ABC'
  A[1]:= 'Z';  // A = 'ZBC', B = 'ABC'
  Writeln(A:5, B:5);
  readln;
end.

To avoid the above side effect with dynamic arrays one should use Copy function (B:= Copy(A);) instead of plain assignment (B:= A;). But sometimes an absence of the “copy-on-write” support is more disappointing. Consider a dynamic array as a record field:

program Project2;

{$APPTYPE CONSOLE}

type
  TIntArray = array of Integer;
  TSomeRec = record
    FArr: TIntArray;
    FInt: Integer;
  end;

var
  A, B: TSomeRec;

begin
  A.FArr:= TIntArray.Create(1,2,3);
  A.FInt:= 5;
  B:= A;
  A.FArr[0]:= 11;  // B.FArr[0] = 11
  A.FInt:= 55;     // B.FInt = 5
  Writeln(B.FArr[0]:5, B.FInt:5);
  readln;
end.

One can’t use B:= Copy(A) here. In any case one should be careful while making use of records with dynamic array fields and dynamic array variables, and have in mind the possible side effects.

The dynamic arrays’ shortcomings were also discussed in CR article. But “copy-on-write” semantics is not a silver bullet – as for dynamic arrays there are many situations when we don’t need it at all. With compatibility considerations taken into account, it seems better to leave dynamic arrays “as is”.

What about extending language syntax? Let us imagine the following construction:

type
  TByteString = string of Byte;

The “string of” types should be reference types. They should support “copy-on-write” semantics (like strings) and be zero-based (like dynamic arrays). There is no need in multidimensional “string of” or elements of complex type – plain scalar element’s types like Byte are enough.

And finally about “string of Char”. Current Delphi string type implementation carries heavy legacy burden. Delphi strings are 1-based arrays, that is not good. The recent enhancements (codepage support) though very interesting does not seem to have much practical value.
“String of Char” implementation should follow KISS principle – just a lifetime-managed reference type supporting “copy-on-write” semantics like any other “string of” type, with minimum additional compiler support (ex assignment compatibility with traditional strings, etc).

Advertisements

7 thoughts on “A String of Byte?

  1. You’re right, you have to take care of the dynamic arrays reference counting, and don’t rely on a similar behavior to string type. Both are dynamic, but as you stated clearly in your post, dynamic arrays don’t have “copy on write”.

    So let’s use dynamic arrays for what they are: dynamic arrays of data. Dynamic arrays copy (B := A) is handy only for method/function parameters. Then use dynamic arrays like other arrays, making copies by hand, when you need to modify the data.

    The main feature of dynamic array is that their size is not fixed at compilation, and that their memory is freed automatically when they are out of scope.

    If you need “string of byte” or “string of integer” or “string or whatever”, and want the “copy on write” feature, use RawByteString and the SetString() standard method, then pointers.

    I’m not sure this “string of” feature should be useful. It doesn’t sounds necessary to me. You can do what you want with RawByteString and pointers. It works very well, it’s crosscomplier, even if you have to use pointers. And pointers are NOT evil, they are just speed and power, you just have to be carrefull with them, and know what you’re doing.

    I don’t agree when you write that “the recent enhancements (codepage support) does not seem to have much pratical value”. The work is done, conversion is made, ansi strings are supported together with unicode strings…

    Perhaps a REAL compiler enhancement should be some kind of threadlocalvar statement, as I wrote in https://forums.codegear.com/thread.jspa?threadID=30826&tstart=90 which would avoid the LOCK instruction generated by the RTL for string and dynamic arrays, therefore would make Delphi much more multi-core friendly that it is now. This couldn’t be achieved with the current Delphi implementation. So here we could need a compiler enhancement.

  2. I don’t really like the name “string of …”. I think that maybe something like this should be achieved via attributes instead.

    [TypeProperties([CopyOnWrite, CopyByRef, …])]
    TIntArray = array of Integer;

  3. Serg — what I was trying to say in that post was simply that I don’t find the semantics of dynamic arrays very clear or obvious. Because of this, in some cases, I prefer using ReallocMem and typed pointers directly (despite the extra code required), since that makes it patently clear that only a pure reference type is being used. Admittedly, the ‘manual’ method can be much more error prone when you want to have more than one dimension however.

    Delphi strings are 1-based arrays, that is not good. The recent enhancements (codepage support) though very interesting does not seem to have much practical value.

    I don’t understand either point. So what if Delphi strings are 1-based? For sure, if there wasn’t the history, long strings would have been 0-based when introduced in D2. Nonetheless, it’s not like one string type in Delphi is 1-based and the other 0-based (i.e., all of AnsiString, WideString and UnicodeString are 1-based), so there’s consistency. (I know about PChar of course, but if you are manipulating strings at the PChar level, then by definition, you should know what you’re doing.)

    WRT codepage support, it wraps the WideCharToMultiByte/MultiByteToWideChar API. One relatively common ‘practical use’, I would have thought, was its makiing UTF8String a proper type, unlike before.

  4. That said, WRT your actual proposal — I like it myself, though I’m not sure the potential benefit would be thought big enough to actually implement it. Have you QC’ed it though?

    • @CR – No, I have not QC’ed it. I am not sure the potential benefits are enough too, I just see that the dynamic array implementation can be improved. The improvement can be implemented in many ways, ex like in Dan Bartlett comment, but generally the idea is to add more built-in reference types with copy-on-write support.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s