Be careful with Ord function in Unicode Delphi versions

Here is a simple test:

program OrdTest;

{$APPTYPE CONSOLE}

uses
  SysUtils;

begin
  try
    Writeln(Ord('Я'), '  ', Ord(Char('Я')));   // 223,  1071
    Assert(Ord('Я') = Ord(Char('Я')));         // Fails
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  Readln;
end.

While evaluating the Ord function with hardcoded character parameter the compiler treats the parameter as ANSI character. In the above example Ord(‘Я’) returns 223 (Cyrillic codepage 1251) instead of 1071 (UTF16) as one could expect. As a result the assertion fails (tested on Delphi XE):
assertion failed

After reading the comments I tried another test with both Cyrillic ‘Я’ (=223 on 1251 codepage) and German ‘ß’ (=223 on 1252 codepage):

program OrdTest2;

{$APPTYPE CONSOLE}

uses
  SysUtils;

begin
  try
    Writeln(Ord('Я'), '  ', Ord(Char('Я')));
    Writeln(Ord('ß'), '  ', Ord(Char('ß')));
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  Readln;
end.

if I set the compiler’s codepage to 1251 I get

if I set the compiler’s codepage to 1252 I get

because German ‘ß’ has the same code (223) both in ANSI 1252 codepage and UTF16 encoding.

Advertisements

17 thoughts on “Be careful with Ord function in Unicode Delphi versions

  1. Just to add, if Anglophones like me want to play along, we need to change the compiler’s codepage setting from 0 (i.e., the system default ‘Ansi’ code page) to 1251 – press Ctrl+. and start typing codepage to find the setting.

    • You can either change the compiler’s codepage as Chris proposed or try to replace the Cyrillic character by a German one – it would be interesting to know if the issue is present for your codepage (1252 I guess).

  2. Kinda makes sense that an embedded constant is being taken from the codepage of the document (and or system default when not set). Guess it could be one of those weird gotchas across different systems as the default codepage changes.

    Surprised, UTF16 isn’t the default codepage.

    I still wish they have left 8bit ansi the default. At least people would have to think a LITTLE about unicode, and then not get quite as surprised when they run in to stuff that needs all 32 bits for unicode representation. (yup, that’s right, utf16 isn’t full unicode, you can still get encoding in your code points)

    • I still wish they have left 8bit ansi the default.

      That’s a ridiculous thing to say, given the modern world (including the technologies Delphi sit on) use Unicode. Or perhaps you would you prefer parallel RTL functions like FPC? (FileExists/FileExistsUTF8 etc.)

      yup, that’s right, utf16 isn’t full unicode

      No, UTF16 *is* ‘full Unicode’. However, characters outside the basic multilingual pane require two ‘WideChar’ values, which is what you (hopefully) meant.

      • Unlike Delphi where strings are all handled uniformly and consistently and we don’t have parallel RTL functions for simple things like uppercasing a string, where Uppercase() is to be used on ASCII strings and ANSIUppercase() is for ANSI and/or Unicode strings or you might use TCharacter.ToUpper() or a ToUpper() function in the Character unit, as long as your code doesn’t need to (potentially) compile on older compilers without a TCharacter class (or a Character unit) in the attendant RTL …

        Yes, you’re right… the Delphi approach is certainly The Gold Standard. 😉

        (And we all know what happened to The Gold Standard, right?)

      • The UTF8 suffixed functions are stopgaps from Lazarus, and not related to FPC till the already in-compiler unicode support has been proper rolled out to RTL and libraries. The code above is flawed since it assumes that “char” is two bytes wide, which is why it is not sane on FPC. Use widechar and set {$codepage } for source encoding properly, and results for FPC will probably improve. Even with the current codebase.

  3. This is a result of just few Senior Engineers trying to do magic in a 6 months life cycle. FPC is doing less but better, with responsibility and better quality.

    This week they will announce Marco Cantu as Delphi Product Manager, another desperate strategy to make Delphi community happy, bad for Embarcadero and bad for Marco Cantu, which won’t be decision maker and will have to work with 5 senior engineer and 10 more junior.

    Btw, last week two great engineers left Embarcadero, you probably now Mark Edington (Senior Delphi Engineer) and Shaunak Mistry (InterBase Principal Engineer) left Embarcadero., both fantastic engineers, btw InterBase now has only one Engineer.

    The situation is getting worse here, they don’t raise our salary, cut more and more people, the senior guys living and nothing else to motivate the team.

    • “This is a result of just few Senior Engineers trying to do magic in a 6 months life cycle.”

      Hmm, looks more an oversight given it’s been that way since D2009, and the Latin-1 equivalent code doesn’t have the same behaviour. I agree it should really be fixed however.

      “FPC is doing less but better, with responsibility and better quality.”

      FPC has its own issues – e.g., Serg’s code doesn’t even compile for me! Relatedly, I can’t be the only person who thinks defaulting to Windows Latin-1 on a Mac is taking backwards compatibility too far.

  4. BTW. For the Latin Codepage 1252 (covers German as well) there a no assertions, I tried it with a german “ß” (Result 223 in both cases)

  5. Pingback: Te Waka o Pascal · New Delphi Product Manager to be Announced ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s