Dabbling in BASM

TForge 0.67 adds SHA-256 algorithm BASM optimization (applies also to SHA-224 which is based on SHA-256). Now the library contains BASM-optimized implementations of MD5, SHA1, SHA224 and SHA256 algorithms for Win32 and Win64 platforms.

Here are some benchmarks; I measured the time required to hash 500 MB of data on my laptop (Win7 64-bits, Intel Core i3 CPU) in milliseconds. Both Pascal and BASM implementations use fully unrolled compression routines. I used default compiler settings.

32-bit compiler (Delphi XE):

Delphi-pascal

Delphi-basm

64-bit compiler (FPC 2.6.2):

fpc64-pascal

fpc64-basm

Advertisements

7 thoughts on “Dabbling in BASM

  1. Nice!
    Are you sure unrolling is worth it?
    From my own experiment, modern CPUs tends to like tiny loops, and would do the unrolling themself.
    Our own SHA hashing implementation was first unrolled, then we switch back to the rolled version, which is shorter, easier to maintain, and also a bit faster for small content.
    For instance, on small content (e.g. 50/100 bytes), our unrolled SHA256 version in SynCrypto is slightly faster than your unrolled version.
    On bigger content (e.g. 1 MB or 550 MB), the unrolled version is slightly faster.
    But your Win64 asm version is great.

    In all cases, the fastest would be hardware computation: our 6 €/month dedibox server – see https://www.online.net/en/dedicated-server/dedibox-scg2 – has a VIA NANO cpu which computes SHA-256 much faster than the highest Xeon cpu, in tuned asm. 🙂
    I just hope Intel will eventually includes SHA hardware opcodes in addition to crc32+aes.

  2. Cash size matters for sure:) Though I guess you mean cache.

    But actually guys, it’s job of compiler to produce optimized code leaving programmer with clean and easy to maintain code. Unfortunately Delphi isn’t best here.

  3. Pingback: Новости из мира Delphi 03.11 – 09.11 2014 | Delphi 2010 ru

    • You are right, thank you.

      The bug is in GetHMACAlgorithm function (tfHMAC.pas unit).

      The patch (with some comments about the bug’s origin) is:

      function GetHMACAlgorithm(var Inst: PHMACAlg; const HashAlg: IHashAlgorithm): TF_RESULT;
      var
        P: PHMACAlg;
        BlockSize: Integer;
      
      begin
        BlockSize:= HashAlg.GetBlockSize;
      // protection against hashing algorithms which should not be used in HMAC
        if BlockSize = 0 then begin
          Result:= TF_E_INVALIDARG;
          Exit;
        end;
        try
          New(P);
          P^.FVTable:= @HMACVTable;
          P^.FRefCount:= 1;
                        // interface assignment - refcount is incremented by the compiler
          P^.FHash:= HashAlg;
                        // the bug is commented out - no need to increment refcount manually
      //    HashAlg._AddRef;
          P^.FKey:= nil;
          Result:= ByteVectorAlloc(P^.FKey, BlockSize);
          if Result = TF_S_OK then begin
            if Inst <> nil then THMACAlg.Release(Inst);
            Inst:= P;
          end;
        except
          Result:= TF_E_OUTOFMEMORY;
        end;
      end;
      

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s