TThread Facts

5

Delphi RTL TThread class has been an object of criticism since its introduction. Some criticism was deserved, some not. TThread implementation was gradually improving with every Delphi version. Here I am neither criticizing nor advocating the TThread class, just listing some details of TThread implementation in modern Delphi versions that Delphi programmer should know.

  1. TThread class does not allow setting a thread stack size – it is always equal to default value, usually 1 Mb. If you need a different value you can call BeginThread function directly, but that is a low level solution and not as handy as using TThread class;
  2. TThread constructor has the only parameter (CreateSuspended: Boolean). Irrespective of the parameter value the underlying Windows thread object is always created in suspended state:

        FCreateSuspended := CreateSuspended {...};
        FHandle := BeginThread(nil, 0, @ThreadProc, Pointer(Self), CREATE_SUSPENDED, FThreadID);
    

    A thread can start execution in AfterConstruction method (which is called after all inherited constructors):

      if not FCreateSuspended {...} then Resume;
    

    or later;

  3. Synchronization with the main thread does not work in console application. That means that you should not use Synchronize and Queue methods of TThreads class in console application. The same also applies to OnTerminate event which calls Synchronize method internally; you can override protected DoTerminate method instead;
  4. If you set FreeOnTerminate = True for a TThread instance, the instance is destroyed in the context of underlying background thread. Usually the destruction context does not matter, but in some cases it does. For example, create a TTimer instance in the constructor of TThread descendant, destroy timer in the destructor and see what happens (hint: an invisible window of TTimer instance is created in the main thread and destroyed in a different thread);
  5. TThread does not guarantee that Execute method will be called. I have bumped into this problem a little before.

Despite all issues we are better to use TThread class ‘as is’ in GUI applications. For my experiments I have written a simple alternative thread wrapper class that allows setting a thread stack size, always calls Execute method and does not contain synchronization methods; it is well suited for console applications.

Yet Another Word about FreeAndNil

6

The FreeAndNil discussion never stops. I decided to add my twopence. How about this:

{.$DEFINE FREEANDNIL}
{$IFDEF FREEANDNIL}
procedure FreeObj(var Obj); inline;
begin
  FreeAndNil(Obj);
end;
{$ELSE}
procedure FreeObj(Obj: TObject); inline;
begin
  Obj.Free;
end;
{$ENDIF}

With FreeObj procedure you can have both “defensive coding” for debugging and clean production code. It is also clear that FreeAndNil is used just for debugging, so future maintainers will not be mislead and distracted by FreeAndNil usage. Since FreeObj is inlined, it does not also have an influence on the executable code size.

Hierarchical database structures and Firebird PSQL

4

The idea of storing a hierarchical data in a relational database is very simple, but effective work with such a data requires some knowledge of server-side programming using procedural SQL (PSQL). So the hierarchical data structures are a good chance to get acquainted with PSQL. For demonstration purposes I created a simple database with single table TCLASS containing Delphi class hierarchy. Here is a SQL script that creates an empty database (I used Firebird version 2.1.4 and Firebird ISQL Tool to execute SQL scripts and single SQL statements):

SET SQL DIALECT 3;
SET NAMES WIN1251;
CREATE DATABASE 'localhost:c:\Projects\Hierarch\Test.FDB' user 'SYSDBA' password 'masterkey'
  DEFAULT CHARACTER SET UTF8;

COMMIT;

CREATE TABLE TCLASS (
  TCLASS_ID INTEGER NOT NULL,
  TCLASS_ANCESTOR_ID INTEGER,
  TCLASS_NAME VARCHAR(64) NOT NULL UNIQUE,
  CONSTRAINT PK_TCLASS
    PRIMARY KEY (TCLASS_ID)
  );

COMMIT;

CREATE GENERATOR GEN_TCLASS_ID;

SET TERM ^ ;

CREATE TRIGGER BI_TCLASS FOR TCLASS
ACTIVE BEFORE INSERT POSITION 0
AS
  BEGIN
    IF (NEW.TCLASS_ID IS NULL) THEN
      NEW.TCLASS_ID = GEN_ID(GEN_TCLASS_ID, 1);
  END^


CREATE PROCEDURE ADDCLASS(CLASS_NAME VARCHAR(64), ANCESTOR_NAME VARCHAR(64))
  RETURNS (CLASS_ID INTEGER)
AS
  DECLARE ANCESTOR_ID INTEGER;
  BEGIN
    CLASS_ID = 0;
    IF (ANCESTOR_NAME = '') THEN BEGIN
      CLASS_ID = GEN_ID(GEN_TCLASS_ID, 1);
      INSERT INTO TCLASS(TCLASS_ID, TCLASS_NAME)
        VALUES(:CLASS_ID, :CLASS_NAME);
    END
    ELSE BEGIN
      FOR
        SELECT TCLASS_ID
          FROM TCLASS
          WHERE TCLASS_NAME = :ANCESTOR_NAME
        INTO :ANCESTOR_ID
      DO BEGIN
        CLASS_ID = GEN_ID(GEN_TCLASS_ID, 1);
        INSERT INTO TCLASS(TCLASS_ID, TCLASS_ANCESTOR_ID, TCLASS_NAME)
          VALUES(:CLASS_ID, :ANCESTOR_ID, :CLASS_NAME);
      END
    END
  END^

CREATE PROCEDURE GETANCESTORS_EX(CURRENT_ID INTEGER, CURRENT_LEVEL INTEGER)
  RETURNS (CLASS_ID INTEGER, ANCESTOR_ID INTEGER, CLASS_NAME VARCHAR(64), CLASS_LEVEL INTEGER)
AS
  BEGIN
    FOR
      SELECT TCLASS_ID, TCLASS_ANCESTOR_ID, TCLASS_NAME
        FROM TCLASS
        WHERE TCLASS_ID = :CURRENT_ID
        INTO :CLASS_ID, :ANCESTOR_ID, :CLASS_NAME
    DO BEGIN
      CLASS_LEVEL = CURRENT_LEVEL - 1;
      SUSPEND;
      FOR
        SELECT CLASS_ID, ANCESTOR_ID, CLASS_NAME, CLASS_LEVEL
          FROM GETANCESTORS_EX(:ANCESTOR_ID, :CLASS_LEVEL)
          INTO :CLASS_ID, :ANCESTOR_ID, :CLASS_NAME, :CLASS_LEVEL
      DO BEGIN
        SUSPEND;
      END
    END
  END^

CREATE PROCEDURE GETANCESTORS(ID INTEGER)
  RETURNS (CLASS_ID INTEGER, CLASS_NAME VARCHAR(64), CLASS_LEVEL INTEGER)
AS
  BEGIN
    FOR
      SELECT TCLASS_ANCESTOR_ID
        FROM TCLASS
        WHERE TCLASS_ID = :ID
        INTO :CLASS_ID
    DO BEGIN
      CLASS_LEVEL = 1000;
      FOR
        SELECT CLASS_ID, CLASS_NAME, CLASS_LEVEL
          FROM GETANCESTORS_EX(:CLASS_ID, :CLASS_LEVEL)
          INTO :CLASS_ID, :CLASS_NAME, :CLASS_LEVEL
      DO BEGIN
        SUSPEND;
      END
    END
  END^

SET TERM ; ^

COMMIT;

TCLASS table contains TCLASS_ANCESTOR_ID field that links each record (Delphi class) with its ancestor record, so that TCLASS table can store a class hierarchy of unlimited depth:

CREATE TABLE TCLASS (
  TCLASS_ID INTEGER NOT NULL,
  TCLASS_ANCESTOR_ID INTEGER,
  TCLASS_NAME VARCHAR(64) NOT NULL UNIQUE,
  CONSTRAINT PK_TCLASS
    PRIMARY KEY (TCLASS_ID)
  );

Let us fill TCLASS table with data. To simplify the task I have written ADDCLASS procedure that inserts a single record into TCLASS table. The procedure has two parameters – a class name and ancestor class name. An ancestor class name is used to find an ancestor record ID.

CREATE PROCEDURE ADDCLASS(CLASS_NAME VARCHAR(64), ANCESTOR_NAME VARCHAR(64))
  RETURNS (CLASS_ID INTEGER)
AS
  DECLARE ANCESTOR_ID INTEGER;
  BEGIN
    CLASS_ID = 0;
    IF (ANCESTOR_NAME = '') THEN BEGIN
      CLASS_ID = GEN_ID(GEN_TCLASS_ID, 1);
      INSERT INTO TCLASS(TCLASS_ID, TCLASS_NAME)
        VALUES(:CLASS_ID, :CLASS_NAME);
    END
    ELSE BEGIN
      FOR
        SELECT TCLASS_ID
          FROM TCLASS
          WHERE TCLASS_NAME = :ANCESTOR_NAME
        INTO :ANCESTOR_ID
      DO BEGIN
        CLASS_ID = GEN_ID(GEN_TCLASS_ID, 1);
        INSERT INTO TCLASS(TCLASS_ID, TCLASS_ANCESTOR_ID, TCLASS_NAME)
          VALUES(:CLASS_ID, :ANCESTOR_ID, :CLASS_NAME);
      END
    END
  END^

If the second argument (ANCESTOR_NAME) is empty, the procedure assumes that the first argument (CLASS_NAME) is a name of the root of hierarchy (that should be TObject in Delphi); the procedure generates unique ID for a new record (TCLASS_ID field), leaves TCLASS_ANCESTOR_ID field empty (NULL) and executes INSERT SQL statement.
If the second argument is not empty, we have ‘FOR SELECT … DO …’ construction. The procedure executes SELECT statement, and executes a block of code after ‘DO’ keyword for every row in a dataset created by SELECT statement. In our case the dataset contains only one record (it can also be empty if ANCESTOR_NAME argument is wrong – in this case no new record is inserted and the returned value of CLASS_ID is 0).
Note that when a procedure parameters or local variables are used inside SQL statement (SELECT or INSERT) they should be preceded by ‘:’.
Now we can fill TCLASS table with some data using ADDCLASS procedure; here is a simple SQL script:

CONNECT 'localhost:c:\Projects\Hierarch\Test.FDB' user 'SYSDBA' password 'masterkey';

EXECUTE PROCEDURE ADDCLASS('TObject','');
EXECUTE PROCEDURE ADDCLASS('TList','TObject');
EXECUTE PROCEDURE ADDCLASS('TPersistent','TObject');
EXECUTE PROCEDURE ADDCLASS('TCollection','TPersistent');
EXECUTE PROCEDURE ADDCLASS('TStrings','TPersistent');
EXECUTE PROCEDURE ADDCLASS('TStringList','TStrings');
EXECUTE PROCEDURE ADDCLASS('TStream','TObject');
EXECUTE PROCEDURE ADDCLASS('THandleStream','TStream');
EXECUTE PROCEDURE ADDCLASS('TFileStream','THandleStream');
COMMIT;

After executing the script TCLASS table contains the following data:

SQL> SELECT * FROM TCLASS;
   TCLASS_ID  TCLASS_ANCESTOR_ID  TCLASS_NAME
================================================== 
           1             <null>   TObject 
           2                  1   TList 
           3                  1   TPersistent 
           4                  3   TCollection 
           5                  3   TStrings 
           6                  5   TStringList 
           7                  1   TStream 
           8                  7   THandleStream 
           9                  8   TFileStream 
SQL>

A more interesting problem is to find all ancestors of a given class. The database contains two procedures that solve the problem. The first is

CREATE PROCEDURE GETANCESTORS_EX(CURRENT_ID INTEGER, CURRENT_LEVEL INTEGER)
  RETURNS (CLASS_ID INTEGER, ANCESTOR_ID INTEGER, CLASS_NAME VARCHAR(64), CLASS_LEVEL INTEGER)
AS
  BEGIN
    FOR
      SELECT TCLASS_ID, TCLASS_ANCESTOR_ID, TCLASS_NAME
        FROM TCLASS
        WHERE TCLASS_ID = :CURRENT_ID
        INTO :CLASS_ID, :ANCESTOR_ID, :CLASS_NAME
    DO BEGIN
      CLASS_LEVEL = CURRENT_LEVEL - 1;
      SUSPEND;
      FOR
        SELECT CLASS_ID, ANCESTOR_ID, CLASS_NAME, CLASS_LEVEL
          FROM GETANCESTORS_EX(:ANCESTOR_ID, :CLASS_LEVEL)
          INTO :CLASS_ID, :ANCESTOR_ID, :CLASS_NAME, :CLASS_LEVEL
      DO BEGIN
        SUSPEND;
      END
    END
  END^

The first parameter (CURRENT_ID) is an ID of the direct ancestor of a given class. The second parameter (CURRENT_LEVEL) can be any integer value; it’s only purpose is an option to sort the resulting dataset by ancestors’ level.
The procedure is recursive. It has the same FOR SELECT … DO … logic as ADDCLASS procedure, but it uses SUSPEND statement in the block of code after DO keyword.
The usage of SUSPEND statement means that the procedure is written to be called by SELECT statement, as different from ADDCLASS procedure that is written to be called by EXECUTE PROCEDURE statement. The SUSPEND statement passes a row of data to the outer SELECT statement, forming a resulting dataset. Here is a usage example:

SQL> SELECT * FROM GETANCESTORS_EX(8, 99)
CON> ORDER BY CLASS_LEVEL;
    CLASS_ID  ANCESTOR_ID CLASS_NAME     CLASS_LEVEL 
====================================================
           1       <null> TObject        96 
           7            1 TStream        97 
           8            7 THandleStream  98 

SQL>

The first parameter value (8) is an ID of TFileStream direct ancestor (THandleStream), so the above SQL statement creates a dataset of TFileStream ancestors.
The GETANCESTORS procedure does the same work as GETANCESTORS_EX procedure, but is more user-friendly; GETANCESTORS has the only parameter that is a given class ID (not an ancestor ID as with GETANCESTORS_EX procedure).

CREATE PROCEDURE GETANCESTORS(ID INTEGER)
  RETURNS (CLASS_ID INTEGER, CLASS_NAME VARCHAR(64), CLASS_LEVEL INTEGER)
AS
  BEGIN
    FOR
      SELECT TCLASS_ANCESTOR_ID
        FROM TCLASS
        WHERE TCLASS_ID = :ID
        INTO :CLASS_ID
    DO BEGIN
      CLASS_LEVEL = 1000;
      FOR
        SELECT CLASS_ID, CLASS_NAME, CLASS_LEVEL
          FROM GETANCESTORS_EX(:CLASS_ID, :CLASS_LEVEL)
          INTO :CLASS_ID, :CLASS_NAME, :CLASS_LEVEL
      DO BEGIN
        SUSPEND;
      END
    END
  END^

The usage example (9 is an ID of TFileStream):

SQL> SELECT CLASS_ID INTEGER, CLASS_NAME
CON> FROM GETANCESTORS(9)
CON> ORDER BY CLASS_LEVEL;
   CLASS_ID CLASS_NAME
==========================
          1 TObject
          7 TStream
          8 THandleStream
SQL>

Why we need interfaces in Delphi.

25

Objects are normally accessed by an object reference. Interface reference is a different method to access an object’s functionality. A simple question – why do we need interface references at all, why can’t we use object references everywhere?
There are several reasons to use interface references instead of object references, but most important of them (at least historically) is accessing an object created in a different program module.
Let us consider a simple example – an object is created in .dll module and consumed in .exe module.
The TMathObject class implements Square and Cube functions on the FOperand field; we start with the following naive code:

unit MathUnit;

interface

type
  TMathObject = class
  private
    FOperand: Double;
  public
    function Square: Double;
    function Cube: Double;
    property Operand: Double read FOperand write FOperand;
  end;

implementation

function TMathObject.Square: Double;
begin
  Result:= Sqr(FOperand);
end;

function TMathObject.Cube: Double;
begin
  Result:= Sqr(FOperand) * FOperand;
end;

end.

We want to create and destroy TMathObject instances in dll module:

library MathDll;

uses
  MathUnit in 'MathUnit.pas';

function CreateObject: TMathObject;
begin
  Result:= TMathObject.Create;
end;

procedure FreeObject(Obj: TMathObject);
begin
  Obj.Free;
end;

exports
  CreateObject, FreeObject;

{$R *.res}

begin
end.

and use an instance of TMathObject in exe module:

program MathTest;

{$APPTYPE CONSOLE}

uses
  MathUnit in 'MathUnit.pas';

function CreateObject: TMathObject; external 'MathDll.dll';
procedure FreeObject(Obj: TMathObject); external 'MathDll.dll';

var
  MathObj: TMathObject;

begin
  MathObj:= CreateObject;
  MathObj.Operand:= 2;
  Writeln('Square = ', MathObj.Square:3:2, '; Cube = ', MathObj.Cube:3:2);
  FreeObject(MathObj);
  Write('Press ''Enter'' key ... ');
  Readln;
end.

If you compile the above example you can see it works, but TMathObject implementation (MathUnit.pas) is duplicated in both program modules (MathTest.exe and MathDll.dll), and that is not just a waste of program memory.
One of the main reasons to split a program into program modules is a possibility to modify the modules separately; for example to modify and deploy a different .dll version while keeping an .exe module intact. In the above example the implementation of TMathObject is a contract that both sides (exe and dll) should adhere, so the implementation of TMathObject can’t be changed in dll module only.
We need a different form of contract that does not include an object’s implementation. A possible solution is to introduce a base class containing virtual abstract methods only:

unit BaseMath;

interface

type
  TBaseMathObject = class
  protected
    function GetOperand: Double; virtual; abstract;
    procedure SetOperand(const Value: Double); virtual; abstract;
  public
    function Square: Double; virtual; abstract;
    function Cube: Double; virtual; abstract;
    property Operand: Double read GetOperand write SetOperand;
  end;

implementation

end.

Note that we can’t access FOperand field directly now because it is a part of TMathObject implementation that should be hidden in .dll module, so we introduce getter (GetOperand) and setter (SetOperand) virtual methods.
Now we inherit a class that implements virtual methods from TBaseMathObject.

unit MathUnit;

interface

uses BaseMath;

type
  TMathObject = class(TBaseMathObject)
  private
    FOperand: Double;
  protected
    function GetOperand: Double; override;
    procedure SetOperand(const Value: Double); override;
  public
    function Square: Double; override;
    function Cube: Double; override;
  end;

implementation

function TMathObject.GetOperand: Double;
begin
  Result:= FOperand;
end;

procedure TMathObject.SetOperand(const Value: Double);
begin
  FOperand:= Value;
end;

function TMathObject.Square: Double;
begin
  Result:= Sqr(FOperand);
end;

function TMathObject.Cube: Double;
begin
  Result:= Sqr(FOperand) * FOperand;
end;

end.

The library module source code now is

library MathDll;

uses
  BaseMath in 'BaseMath.pas',
  MathUnit in 'MathUnit.pas';

function CreateObject: TBaseMathObject;
begin
  Result:= TMathObject.Create;
end;

procedure FreeObject(Obj: TBaseMathObject);
begin
  Obj.Free;
end;

exports
  CreateObject, FreeObject;

{$R *.res}

begin
end.

The executable module source code is

program MathTest;

{$APPTYPE CONSOLE}

uses
  BaseMath in 'BaseMath.pas';

function CreateObject: TBaseMathObject; external 'MathDll.dll';
procedure FreeObject(Obj: TBaseMathObject); external 'MathDll.dll';

var
  MathObj: TBaseMathObject;

begin
  MathObj:= CreateObject;
  MathObj.Operand:= 2;
  Writeln('Square = ', MathObj.Square:3:2, '; Cube = ', MathObj.Cube:3:2);
  FreeObject(MathObj);
  Write('Press ''Enter'' key ... ');
  Readln;
end.

We can see that MathTest project does not contain MathUnit.pas unit, and is not dependent on TMathObject implementation; in fact MathTest project does not know that TMathObject class even exist. We can change TMathObject implementation in dll module as much as we want provided that we keep TBaseMathObject intact, inherit TMathObject from TBaseMathObject and override TBaseMathObject‘s virtual abstract methods.
We implemented a general concept of interface in the form of pure abstract class. Pure abstract classes are a way how interfaces are implemented in C++ . This approach has a limited value in Delphi because Delphi does not support multiple inheritance, and a Delphi class can have only one contract in the form of base abstract class. Another problem is a limited use of ‘is’ and ‘as’ operators for an object created in a different program module:

program IsTest;

{$APPTYPE CONSOLE}

uses
  SysUtils,
  BaseMath in 'BaseMath.pas';

function CreateObject: TBaseMathObject; external 'MathDll.dll';
procedure FreeObject(Obj: TBaseMathObject); external 'MathDll.dll';

var
  MathObj: TBaseMathObject;

procedure TestObj(Obj: TObject);
begin
  try
    Assert(Obj is TBaseMathObject);  // fails
  except
    on E:Exception do
      Writeln(E.Classname, ': ', E.Message);
  end;
end;

procedure TestObj2(Obj: TBaseMathObject);
begin
  try
    Assert(Obj is TBaseMathObject);  // success
  except
    on E:Exception do
      Writeln(E.Classname, ': ', E.Message);
  end;
end;

begin
  MathObj:= CreateObject;
  TestObj(MathObj);
  TestObj2(MathObj);
  FreeObject(MathObj);
  Write('Press ''Enter'' key ... ');
  Readln;
end.

Here is a brief explanation why the first assertion fails while the second assertion succeeds. Delphi class can be identified by its name or by its VMT. Even if a class does not define any virtual method and VMT itself is empty a pointer to class VMT is still valid. For the TBaseMathObject class we have a single name and two VMT’s, one in .exe module and one in .dll module. That may cause strange behavior, as shown in the above example.
Starting from version 3 Delphi introduces a concept of interface that is different from a pure abstract class and solves the problems with object’s export by using interface references instead of object references:

unit BaseMath;

interface

type
  IBaseMath = interface
  ['{92E9AFF4-25B7-41BD-9EB6-557D12F98BE6}']
    function GetOperand: Double;
    procedure SetOperand(const Value: Double);
    function Square: Double;
    function Cube: Double;
    property Operand: Double read GetOperand write SetOperand;
  end;

implementation

end.

There is no need to inherit TMathObject class from a given base class now; we can inherit TMathObject class from any class we like. Since all Delphi interfaces are descendants of IUnknown (also nicknamed as IInterface in Delphi) we should also implement the methods of IUnknown interface in TMathObject class. Delphi provides a helper TInterfacedObject class that already implements the methods of IUnknown and can be used as TMathObject ancestor:

unit MathUnit;

interface

uses BaseMath;

type
  TMathObject = class(TInterfacedObject, IBaseMath)
  private
    FOperand: Double;
  protected
    function GetOperand: Double;
    procedure SetOperand(const Value: Double);
  public
    function Square: Double;
    function Cube: Double;
  end;

implementation

function TMathObject.GetOperand: Double;
begin
  Result:= FOperand;
end;

procedure TMathObject.SetOperand(const Value: Double);
begin
  FOperand:= Value;
end;

function TMathObject.Square: Double;
begin
  Result:= Sqr(FOperand);
end;

function TMathObject.Cube: Double;
begin
  Result:= Sqr(FOperand) * FOperand;
end;

end.

There is no need for FreeObject procedure now. The FreeObject procedure was introduced in the previous examples to enforce that a TMathObject instance is destroyed in the same program module where it was created (i.e. in .dll module). It is always a good rule of thumb that the one who creates an object is the one who destroys it. But now there is no need to enforce it – if we use interface references object instances are automatically destroyed in the same program module where they were created.

library MathDll;

uses
  BaseMath in 'BaseMath.pas',
  MathUnit in 'MathUnit.pas';

function CreateObject: IBaseMath;
begin
  Result:= TMathObject.Create;
end;

exports
  CreateObject;

{$R *.res}

begin
end.

In the next example a TMathObject object instance is destroyed by assigning nil value to MathObj interface reference. In most cases there is no need for doing it because an object is destroyed automatically when all interface references goes out of scope. In the following code the MathObj interface reference is a global variable and never goes out of scope, so assigning it to nil explicitly makes sense:

program MathTest;

{$APPTYPE CONSOLE}

uses
  BaseMath in 'BaseMath.pas';

function CreateObject: IBaseMath; external 'MathDll.dll';

var
  MathObj: IBaseMath;

begin
  MathObj:= CreateObject;
  MathObj.Operand:= 2;
  Writeln('Square = ', MathObj.Square:3:2, '; Cube = ', MathObj.Cube:3:2);
  MathObj:= nil;
  Write('Press ''Enter'' key ... ');
  Readln;
end.

Semaphore throttle

15

Suppose we have an algorithm that uses N parallel threads, and we have a system with M CPU cores, N > M. Running the algorithm on the system leads to performance loss because the threads are contend for available CPU cores and cause time-consuming thread context switching. A better approach is to execute threads sequentially on the available CPU cores, to avoid the unnecessary thread context switching. There is a simple technique called semaphore throttle which limits the number of active, contending threads.
It is interesting to estimate how efficient semaphore throttle is on real system. Or, in other words, how much is the performance loss caused by thread context switching.
As a test problem I have chosen the sum of arithmetic progression: S = 1 + 2 + 3 +… + Count * 64 . To find the sum I run 64 threads, each thread calculates a partial sum from 1 + I * Count to (I + 1) * Count, I = [0..63].
To obtain valid timings it is important that each thread is executed sufficiently long, so that the thread is preempted many times by other contending threads (when the scheduler’s time quantum ends). To increase the thread execution time I turned optimization off and chosen the Count value as much as possible for the resulting sum to fit into int64 range, but it appeared insufficient. Finally I decided to insert an additional loop into the thread function so that the thread execution time exceeded 1 second on my system:

unit SumThreads;

interface

uses
  Windows, Classes;

type
  TSumThread = class(TThread)
  private
    FSum: Int64;
    FBase: Integer;
    FCount: Integer;
    FSemaphore: THandle;
  protected
    procedure Execute; override;
  public
    constructor Create(ABase, ACount: Integer; ASemaphore: THandle);
    property Sum: Int64 read FSum;
  end;

implementation

{$O-}

constructor TSumThread.Create(ABase, ACount: Integer; ASemaphore: THandle);
begin
  FBase:= ABase;
  FCount:= ACount;
  FSemaphore:= ASemaphore;
  inherited Create(False);
end;

procedure TSumThread.Execute;
var
  Cnt, Value, J: Integer;
  S: Int64;

begin
  if FSemaphore <> 0 then
    WaitForSingleObject(FSemaphore, INFINITE);
  try
// to increase execution time the calculation is repeated
    J:= 20;
    repeat
      Value:= FBase;
      Cnt:= FCount;
      S:= 0;
      repeat
        S:= S + Value;
        Inc(Value);
        Dec(Cnt);
      until Cnt = 0;
      FSum:= S;
      Dec(J);
    until J = 0;
  finally
    if FSemaphore <> 0 then
      ReleaseSemaphore(FSemaphore, 1, nil);
  end;
end;

end.

To run the test I have written a simple console application that receives the number of concurrent threads as command line parameter, and outputs the number of concurrent threads and total execution time in seconds:

program throttle;

{$APPTYPE CONSOLE}

uses
  Windows,
  SysUtils,
  Classes,
  Diagnostics,
  SumThreads in 'SumThreads.pas';

const
  CHUNK = 30000000;

var
  Semaphore: THandle;
  Count, I: Integer;
  Sum: Int64;
  Threads: array[0..63] of TSumThread;
  Handles: array[0..63] of THandle;
  StopWatch: TStopWatch;

begin
  try
    if ParamCount <> 1 then
      raise Exception.Create('Number of concurrent threads not defined');
    Count:= StrToInt(ParamStr(1));
    if (Count < 0) or (Count > 64) then
      raise Exception.Create('Invalid number of concurrent threads');
    if Count <> 0 then begin
      Semaphore:= CreateSemaphore(nil, Count, Count, nil);
      Win32Check(Bool(Semaphore));
    end
    else
      Semaphore:= 0;
    try
      StopWatch:= TStopWatch.StartNew;
      for I:= 0 to 63 do begin
        Threads[I]:= TSumThread.Create(1 + I * CHUNK, CHUNK, Semaphore);
        Handles[I]:= Threads[I].Handle;
      end;
      if WaitForMultipleObjects(64, @Handles, True, INFINITE) = WAIT_FAILED
        then raise Exception.Create('WaitForMultipleObjects Failed');
      StopWatch.Stop;
      Sum:= 0;
      for I:= 0 to 63 do begin
        Sum:= Sum + Threads[I].Sum;
        Threads[I].Free;
      end;
    finally
      CloseHandle(Semaphore);
    end;
    Writeln(Count:5, ' -- ', StopWatch.Elapsed.TotalSeconds:3:2);
//    Writeln('Number of concurrent threads: ', Count);
//    Writeln('Time elapsed (seconds): ', StopWatch.Elapsed.TotalSeconds:3:2);
//    Writeln('Sum obtained: ', Sum);
//    Sum:= CHUNK * 64; // number of summands
//    Sum:= (Sum * (Sum + 1)) div 2;
//    Writeln('Sum expected: ', Sum);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

I ran the tests on my notebook with 64-bit Windows 7 SP1 and CPU Intel Core i3 M380 using a simple batch file throttle.bat:

@echo off
time /t
throttle 1
throttle 2
throttle 3
throttle 4
throttle 5
throttle 6
throttle 8
throttle 16
throttle 32
throttle 64
throttle 0
time /t

The resulting log is

13:17
    1 -- 109.17
    2 -- 55.44
    3 -- 60.14
    4 -- 67.56
    5 -- 67.85
    6 -- 67.42
    8 -- 67.39
   16 -- 67.52
   32 -- 67.56
   64 -- 67.59
    0 -- 67.55
13:30

It can be clearly seen that my CPU has 2 physical cores (the number of logical cores for Core i3 M380 is 4); the fastest execution time is achieved by allowing two threads to be executed concurrently. Running 3 concurrent threads is slower, running 4 concurrent threads is even more slow. But there is no further performance degradation.
If you think of the results you can guess that if N concurrent threads are executed on M CPU cores the performance should degrade from N = M to N = 2M because of the increasing number of thread context switching (I have not tested it since I have no appropriate systems). For N > 2M the number of thread context switching should not grow anymore – now we have at least 2 concurrent threads per each core, each thread is preempted when every scheduler’s timeslice ends, that is the ‘worst case’ situation and it is reached at N = 2M.
A little arithmetic shows that properly tuned semaphore throttle can give about 15-20% performance gain on my notebook. Different systems can show different results.

——————-

You can download tests (both source code and executable) here