CHSH game in detail


CHSH (John Clauser, Michael Horne, Abner Shimony, and Richard Holt) game is another view on the Bell’s inequalities, showing that under the hood our world is not classical. The game is played by Alice and Bob, and proceeds as follows:

  • a referee generates two independent uniformly chosen random bits (x and y, also called “questions”) and sends x bit to Alice and y bit to Bob
  • After receiving the questions, Alice and Bob send their answers back to the referee; an answer is one bit too (a for Alice and b for Bob)
  • Alice and Bob win if x \cdot y = a \oplus b, where \oplus stands for xor operation.

The game is cooperative – Alice and Bob does not play against each other, their goal is to get the highest winning probability together. Before the game starts, Alice and Bob can communicate and decide on the strategy they will use; after the game started, Alice and Bob are not allowed to communicate anymore; Alice doesn’t know what question the referee sent to Bob, and Bob doesn’t know what question the referee sent to Alice.

In the classical world, Alice and Bob have 16 possible deterministic strategies – both Alice and Bob have two possible answers for each of two possible questions, combining them we get 2^2 \cdot 2^2 = 16 possible strategies. Since the probability of x \cdot y = 0 is 75%, it can be easily seen that there exists 75%-winning strategy – Alice and Bob simply ignore the questions and answer identically – either by bit 0 both or by bit 1 both; it seems natural that this is the best strategy, and it is indeed so, as can be formally shown by checking all 16 possible strategies.

In the quantum world, Alice and Bob have additional degree of freedom – before the game starts, they share an entangled bipartite state, for example two-qubit EPR state:

|\Psi_{AB}\rangle = \frac{1}{\sqrt{2}}(|0\rangle_A\otimes|0\rangle_B + |1\rangle_A\otimes|1\rangle_B)

Now, Alice and Bob choose the measurement bases according to the questions they receive from the referee, and answer by measuring their qubits in the chosen bases.

Let us denote the basis vectors as \nu_i(\theta):

|\nu_0(\theta)\rangle = \cos \theta |0\rangle + \sin \theta |1\rangle
|\nu_1(\theta)\rangle = \sin \theta |0\rangle - \cos \theta |1\rangle

Alice uses \theta_{A0} on receiving the question 0, and \theta_{A1} on receiving the question 1; Bob uses \theta_{B0} on receiving the question 0, and \theta_{B1} on receiving the question 1.

If the referee sends questions x = 0, y = 0, Alice and Bob win if they answer identically a = 0, b = 0 or a = 1, b = 1. The correspondent probability of winning (given x = 0, y = 0) is:

P(win|x=0,y=0) =
|\langle_A\nu_0(\theta_{A0})|\otimes\langle_B\nu_0(\theta_{B0})|\Psi_{AB}\rangle|^2 + |\langle_A\nu_1(\theta_{A0})|\otimes\langle_B\nu_1(\theta_{B0})|\Psi_{AB}\rangle|^2

Substituting |\nu_i(\theta)\rangle we get
P(win|x=0,y=0) =
|(\cos\theta_{A0} \langle_A 0| + \sin\theta_{A0} \langle_A 1|) \otimes (\cos\theta_{B0} \langle_B 0| + \sin\theta_{B0} \langle_B 1|)|\Psi_{AB}\rangle|^2 +
|(\sin\theta_{A0} \langle_A 0| - \cos\theta_{A0} \langle_A 1|) \otimes (\sin\theta_{B0} \langle_B 0| - \cos\theta_{B0} \langle_B 1|)|\Psi_{AB}\rangle|^2

Substituting |\Psi_{AB}\rangle we get
P(win|x=0,y=0) =
\frac{1}{2}|\cos\theta_{A0} \cos\theta_{B0} + \sin\theta_{A0} \sin\theta_{B0}|^2 + \frac{1}{2}|\sin\theta_{A0} \sin\theta_{B0} + \cos\theta_{A0} \cos\theta_{B0}|^2

P(win|x=0,y=0) = \cos^2 (\theta_{A0} - \theta_{B0})

The cases x = 0, y = 1 and x = 1, y = 0 are similar to x = 0, y = 0: Alice and Bob win if they answer identically:

P(win|x=0,y=1) =
|\langle_A\nu_0(\theta_{A0})|\otimes\langle_B\nu_0(\theta_{B1})|\Psi_{AB}\rangle|^2 + |\langle_A\nu_1(\theta_{A0})|\otimes\langle_B\nu_1(\theta_{B1})|\Psi_{AB}\rangle|^2 =
|(\cos\theta_{A0} \langle_A 0| + \sin\theta_{A0} \langle_A 1|) \otimes (\cos\theta_{B1} \langle_B 0| + \sin\theta_{B1} \langle_B 1|)|\Psi_{AB}\rangle|^2 +
|(\sin\theta_{A0} \langle_A 0| - \cos\theta_{A0} \langle_A 1|) \otimes (\sin\theta_{B1} \langle_B 0| - \cos\theta_{B1} \langle_B 1|)|\Psi_{AB}\rangle|^2

P(win|x=0,y=1) = \cos^2 (\theta_{A0} - \theta_{B1})

P(win|x=1,y=0) =
|\langle_A\nu_0(\theta_{A1})|\otimes\langle_B\nu_0(\theta_{B0})|\Psi_{AB}\rangle|^2 + |\langle_A\nu_1(\theta_{A1})|\otimes\langle_B\nu_1(\theta_{B0})|\Psi_{AB}\rangle|^2=
|(\cos\theta_{A1} \langle_A 0| + \sin\theta_{A1} \langle_A 1|) \otimes (\cos\theta_{B0} \langle_B 0| + \sin\theta_{B0} \langle_B 1|)|\Psi_{AB}\rangle|^2 +
|(\sin\theta_{A1} \langle_A 0| - \cos\theta_{A1} \langle_A 1|) \otimes (\sin\theta_{B0} \langle_B 0| - \cos\theta_{B0} \langle_B 1|)|\Psi_{AB}\rangle|^2

P(win|x=1,y=0) =\cos^2 (\theta_{A1} - \theta_{B0})

In the last case x = 1, y = 1 Alice and Bob win if they answer differently, so:

P(win|x=1,y=1) =
|\langle_A\nu_0(\theta_{A1})|\otimes\langle_B\nu_1(\theta_{B1})|\Psi_{AB}\rangle|^2 + |\langle_A\nu_1(\theta_{A1})|\otimes\langle_B\nu_0(\theta_{B1})|\Psi_{AB}\rangle|^2=

|(\cos\theta_{A1} \langle_A 0| + \sin\theta_{A1} \langle_A 1|) \otimes (\sin\theta_{B1} \langle_B 0| - \cos\theta_{B1} \langle_B 1|)|\Psi_{AB}\rangle|^2 +
|(\sin\theta_{A1} \langle_A 0| - \cos\theta_{A1} \langle_A 1|) \otimes (\cos\theta_{B1} \langle_B 0| + \sin\theta_{B1} \langle_B 1|)|\Psi_{AB}\rangle|^2=

\frac{1}{2}|\cos\theta_{A1} \sin\theta_{B1} - \sin\theta_{A1} \cos\theta_{B1}|^2 + \frac{1}{2}|\sin\theta_{A1} \cos\theta_{B1} - \cos\theta_{A1} \sin\theta_{B1}|^2

P(win|x=1,y=1) = \sin^2 (\theta_{A1} - \theta_{B1})

Putting it all together:

P(win) = \frac{1}{4}(P(win|x=0,y=0) + P(win|x=0,y=1)+ P(win|x=1,y=0)+ P(win|x=1,y=1)) =
\frac{1}{4}(\cos^2 (\theta_{A0} - \theta_{B0})+\cos^2 (\theta_{A0} - \theta_{B1})+\cos^2 (\theta_{A1} - \theta_{B0})+\sin^2 (\theta_{A1} - \theta_{B1}))

if Alice and Bob choose \{\theta_{A0}=0, \theta_{A1}=\frac{\pi}{4}, \theta_{B0}=\frac{\pi}{8}, \theta_{B1}=-\frac{\pi}{8}\}, then

P(win) = \frac{3}{4} \cos^2 \frac{\pi}{8} + \frac{1}{4} \sin^2 \frac{3\pi}{8}= \cos^2 \frac{\pi}{8} = \frac{1+cos \frac{\pi}{4}}{2}= \frac{1}{2} + \frac{1}{2\sqrt{2}}\approx 0.85355

and we obtain the winning probability above 85%, which is significantly better than classical 75%.



TForge 0.76 is released.

The release introduces a new class TStreamCipher. The name TStreamCipher does not mean that the class supports stream cipher algorithms only, such as Salsa20 or RC4 – it supports all cipher algorithms currently implemented in TForge, like TCipher class does; block ciphers are converted into stream ciphers using CTR mode of operation.

The initial motivation to write TStreamCipher class was to generate “solid” byte-wise keystream. TCipher class generates block-wise keystream; to obtain solid byte-wise keystream you need to maintain additional state, and that’s what the TStreamCipher class is doing. Here is the code sample illustrating the above:

  Nonce = 42;

  Cipher: TCipher;
  StreamCipher: TStreamCipher;
  Key: ByteArray;
  KeyStream, KeyStream1, KeyStream2: ByteArray;

// create 128-bit AES key:
  Key:= ByteArray.AllocateRand(16);

// generate 16 bytes of keystream:
  KeyStream:= TCipher.AES.ExpandKey(Key, CTR_ENCRYPT, Nonce).KeyStream(16);

// generate 8 + 8 bytes of keystream:
  Cipher:= TCipher.AES.ExpandKey(Key, CTR_ENCRYPT, Nonce);
// Warning - KeyStream1:= Cipher.KeyStream(8) + Cipher.KeyStream(8) is wrong
//   because the compiler does not evaluate the summands in order
  KeyStream1:= Cipher.KeyStream(8);
  KeyStream1:= KeyStream1 + Cipher.KeyStream(8);
  Assert(KeyStream <> KeyStream1);

// generate 8 + 8 bytes of keystream using TStreamCipher instance:
  StreamCipher:= TStreamCipher.AES.ExpandKey(Key, Nonce);
  KeyStream2:= StreamCipher.KeyStream(8);
  KeyStream2:= KeyStream2 + StreamCipher.KeyStream(8);
  Assert(KeyStream = KeyStream2);

The Cipher instance generates 8 + 8 bytes of keystream by taking the first 8 bytes from the first block and the second 8 bytes from the second block (AES block size is 16 bytes), that is why the generated keystream is different.

The code also illustrates the use of Nonce. The purpose of nonce is to encrypt multiple messages (ex files) with the same secret key – each message should be encrypted with unique non-secret nonce. Modern cryptoalgorithms such as Salsa20 have built-in support for nonce, and TForge supported the feature since version 0.71; old block cipher algorithms such as AES allow to implement the same feature using IV (initialization vector), and this is the second thing that TStreamCipher class is doing; TStreamCipher API does not expose IV of the underlying cryptoalgorithm, it exposes Nonce only.

Nonces are now supported by both TStreamCipher and TCipher classes, as the above code sample shows; there is a limitation though: the block size of an algorithm should be 128 bits at least; that means you can’t use nonces with legacy 64-bit block ciphers such as DES or Triple DES.

TForge 0.76 includes new demo project FileEncryptDemo (Demos\Ciphers folder) which shows how to use TStreamCipher. The idea of the project is to implement secure encryption of multiple files with a single key (and multiple non-secret nonces) and transparent reading of the encrypted files.

Hensel’s lifting


Number theory is full of gems that turn into effective algorithms demanded in cryptography. Consider the congruence

x N \equiv 1 \mod R

where N is odd, and R > N is a power of 2.

Probably the most effective algorithm to solve it is based on the Hensel’s lemma; the corollary that applies for our congruence is: suppose we know the solution x_{k-1} for R=2^{k-1} :

x_{k-1}N \equiv 1 \mod 2^{k-1}

then the solution x_k for

x_k N \equiv 1 \mod 2^k

is either x_{k-1} or x_{k-1} + 2^{k-1}

Example: solve

x \cdot 21 \equiv 1 \mod 32

We start with the trivial

x_1 \cdot 21 \equiv 1 \mod 2

which can be nothing else but x_1 = 1; next,

x_2 \cdot 21 \equiv 1 \mod 4

which can be either x_2 = 1 or x_2 = 1 + 2 = 3; checking the first alternative gives 1\cdot 21\ and\ 3 = 1, so x_2=1; next,

x_3 \cdot 21 \equiv 1 \mod 8; checking x_3 = x_2 = 1 gives 1\cdot 21\ and\ 7 = 5, so x_3=x_2+4=5; next,

x_4 \cdot 21 \equiv 1 \mod 16; checking x_4 = x_3 = 5 gives 5\cdot 21\ and\ 15 = 9, so x_4=x_3+8=13; finally,

x_5 \cdot 21 \equiv 1 \mod 32; checking x_5 = x_4 = 13 gives 13\cdot 21\ and\ 31 = 17, so x_5=x_4+16=29.

The solution of the congruence is x = 29.

On the “out” parameter specifier in Delphi


I’ve posted before what the “out” parameter specifier is actually doing in a Delphi code. Now let us consider when the “out” keyword should be used instead of “var”.

There is only one case when the “out” keyword should be used. It is closely related to COM technology support, but it is not really limited to COM. The case is importing a function with a parameter of interface type which violates Delphi contract on reference counting. To illustrate the point consider the following DLL:

library DemoDll;


  PMyUnknown = ^TMyUnknown;
  TMyUnknown = record
    FVTable: PPointer;
    FRefCount: Integer;
    class function QueryIntf(Inst: Pointer; const IID: TGUID;
                             out Obj): HRESULT; stdcall; static;
    class function AddRef(Inst: PMyUnknown): Integer; stdcall; static;
    class function Release(Inst: PMyUnknown): Integer; stdcall; static;

  VTable: array[0..2] of Pointer = (

function ReleaseInstance(Inst: Pointer): Integer;
  TVTable = array[0..2] of Pointer;
  PVTable = ^TVTable;
  PPVTable = ^PVTable;

  TRelease = function(Inst: Pointer): Integer; stdcall;

  Result:= TRelease(PPVTable(Inst)^^[2])(Inst);

class function TMyUnknown.QueryIntf(Inst: Pointer; const IID: TGUID;
  out Obj): HRESULT;

class function TMyUnknown.Addref(Inst: PMyUnknown): Integer;
  Result:= Inst.FRefCount;

class function TMyUnknown.Release(Inst: PMyUnknown): Integer;
  Result:= Inst.FRefCount;
  if Result = 0 then

procedure GetIUnknown1(var Inst: PMyUnknown);
  P: PMyUnknown;

  P^.FVTable:= @VTable;
  P^.FRefCount:= 1;
  if Inst <> nil then ReleaseInstance(Inst);
  Inst:= P;

procedure GetIUnknown2(var Inst: PMyUnknown);
  Inst^.FVTable:= @VTable;
  Inst^.FRefCount:= 1;


{$R *.res}

  ReportMemoryLeaksOnShutdown:= True;

The DLL exports 2 procedures which create instances implementing IUnknown. The first procedure (GetIUnknown1) follows Delphi rules on reference counting, the second (GetIUnknown2) does not. To use the procedures in an executable we need an import unit:

unit DemoImport;


procedure GetIUnknown1(var I: IUnknown);
//procedure GetIUnknown2(var I: IUnknown); // memory leak
procedure GetIUnknown2(out I: IUnknown);
//                     ^^^!

procedure GetIUnknown1(var I: IUnknown); external 'DemoDLL' name 'GetIUnknown1';
//procedure GetIUnknown2(var I: IUnknown); external 'DemoDLL' name 'GetIUnknown2';
procedure GetIUnknown2(out I: IUnknown); external 'DemoDLL' name 'GetIUnknown2';


Note that the second procedure is imported with out parameter specifier; the result of replacing “out” by “var” is a memory leak which can be shown by the test application:

program DemoEXE;


  DemoImport in 'DemoImport.pas';

procedure Test;
  I: IUnknown;



Class Methods vs Instance Methods


C# implements BigInteger.Parse as a static class method, and there is obvious reason for it: you can call a static class methods in a variable’s declaration:

  BigInteger N = BigInteger.Parse(stringToParse);

Delphi/Pascal does not support the above syntax, so the equivalent code is

  N: BigInteger;
  N:= BigInteger.Parse(stringToParse);

But now it looks like implementing BigInteger.Parse as a static class method is nothing but additional typing; using an instance method looks better:

  N: BigInteger;

So, what is the right “Delphi way” of implementing BigInteger.Parse – as a static class method or as an instance method?

Integer division by constant


If divisor is fixed, the division operation can be replaced by multiplication by a reciprocal; the idea is very simple: x / d = x * (1/d); since d is fixed we can precompute 1/d and use faster multiplication operation. This trick works for integer division, with some additional complications. For example, the next function is identical to division by 5 for all 32-bit dividends:

function Div5(Dividend: LongWord): LongWord;
  Mult = $CCCCCCCD;
  PostSh = 2;

        MOV     EDX,Mult
        MUL     EDX
        MOV     EAX,EDX
        SHR     EAX,PostSh

It turns out that 32-bit multiplication constant does not exists for every divisor; in this case it is possible to use 33-bit constant with hidden senior 1 bit:

function Div5_2(Dividend: LongWord): LongWord;
  Mult = $9999999A;
  PostSh = 3;

        MOV     ECX,EAX     // save dividend
        MOV     EDX,Mult
        MUL     EDX
        MOV     EAX,ECX     // restore dividend
        ADD     EAX,EDX
        RCR     EAX,1       // because addition could produce carry
        SHR     EAX,PostSh-1

Naively, the multiplication constant ($9999999A) and postshift (3) can be obtained using the following steps:

1. Write 1/5 in binary form: 1/5 = 0.0011 0011 0011 0011 0011 ..

2. Count leading zeroes after decimal point and add 1; this gives the postshift

3. Write the leading significant bits:

1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 ..

4. Round to 33 bits; rounding is not necessarily based on the value of 34th bit only (later we will use a different approach which is free from rounding problem)

1100 1100 1100 1100 1100 1100 1100 1100 1101 0

5. Remove the leading bit and write as hexadecimal to get the multiplication constant:

99 99 99 9A

The second version is more complicated, but good news is that it works for all divisors. Generally, it can be proven that for N-bit division there exists (N+1) bit multiplication constant – for the details see the key article Division by Invariant Integers using Multiplication. The rest of the post is a review of the results obtained in the article cited. I will use 32-bit arithmetic, but the results can be generalized for any bitness.

The multiplication constant and postshift can be found using the following procedure:

procedure GetParams(Divisor: LongWord);
  L, N: LongWord;
  Tmp: LongWord;
  M: UInt64;
  Mult: LongWord;

  Assert(Divisor > 1);
  Tmp:= Divisor;
  L:= 0;
  repeat            // count number of significant bits in Divisor
    Tmp:= Tmp shr 1;
  until Tmp = 0;
  N:= 1 shl L;      // N = 2^L
  M:= $100000000;   // 2^32
  M:= (M * (N - Divisor)) div Divisor + 1;
  Mult:= LongWord(M);
  Writeln('Mult  = ' + IntToHex(Mult, 8));
  Writeln('Shift = ' + IntToStr(L));

The multiplication constant is not unique. If the constant obtained from the above procedure is odd, it makes sense to increment it by 1 (provided the incremented constant is correct too). The correctness can be checked by

function CheckMult(Divisor, Mult: LongWord): Boolean;
  L, N: LongWord;
  Tmp: LongWord;
  P32, MD: UInt64;
  Min, Max: UInt64;

  P32:= $100000000;   // 2^32;
  MD:= (P32 + Mult) * Divisor;
  L:= 0;
  Tmp:= Divisor;
  repeat            // count number of significant bits in Divisor
    Tmp:= Tmp shr 1;
  until Tmp = 0;
  N:= 1 shl L;
  Min:= P32 * N;
  Max:= Min + N;
  Result:= (Min <= MD) and (MD <= Max);

The above function gives sufficient condition for correct multiplication constant; it is probably possible that some constant fails the check but still correct, as can be shown by the full search over the dividend range; we will not consider this case.

Optimization 1. If the multiplication constant is even, we can shift it right and fit into 32-bit range, thus obtaining shorter division algorithm; this also decrements the postshift.

As an example consider division by 641. The GetParams procedure returns

Mult = 98F603FF
Shift = 10

Since the multiplication constant is odd, we try incremented constant 98F60400. CheckMult shows that it is correct too; the constant ends with 10 zero bits, which is equal to postshift, so we eliminate the postshift:

function Div641(Dividend: LongWord): LongWord;
  Mult = $663D81;  // = $198F60400 shr 10

        MOV     EDX,Mult
        MUL     EDX
        MOV     EAX,EDX

Optimization 2. If the divisor is even, we can preshift both dividend and divisor right, thus decreasing the required precision and obtaining shorter division algorithm again (with the dividend preshift step). For details see the article cited above.