How to uppecase [lowercase] strings in Delphi 2009+ ?

The answer is not so obvious as it was before.

In the time immemorial strings were ASCII strings, and since then we have UpperCase function. The function is  just replaces the lowercase ‘a’..’z’ characters by their uppercase ‘A’..’Z’ versions. Though nowadays the UpperCase is a Unicode function it still ‘uppercases’ only ASCII characters. That is good but that is not all uppercase functionality we need.

The early Delphi versions introduced ANSI strings. The uppercase conversion on ANSI strings includes local characters besides ASCII ‘a’..’z’ and consequntly is locale-dependent. The only locale supported by pre-unicode Delphi versions is system locale. The ANSIUpperCase function was introduced to uppercase ANSI strings using system locale. That is not so good as it may be, since sometimes we need a locale different from system locale, but at least is clear and unambiguous.

Now in Delphi2009 we can explicitly define the ANSI locale in Ansistring type, and that is great. The ANSI uppercase conversion is locale-sensitive, and now we can perform ANSI string uppercase conversions on any locale by just calling AnsiUpperCase function, right? The answer is – NO.

First, in Delphi 2009 the AnsiUpperCase is unicode function. I understand the [compatibility] reasons why the unicode function has Ansi prefix, but I prefer it would never happen.

Second, the real ANSI version of AnsiUpperCase function defined in Ansistrings unit is a wrapper for WinAPI CharUpperBuffA function, and still uses the current system locale and ignores the locale associated with the string type.

As a result it is better to avoid using AnsiUpperCase at all. The AnsiUpperCase function is just a backward compatibility issue.

Nowadays the defaul string type is Unicode string. The unicode string can include characters of all different locales at once, so the unicode strings are again locale-indendent as ASCII strings were, right? Well, it is almost right. But to be absolutely correct, the answer is still NO.

The only problem I know that makes Unicode uppercase conversion locale-dependent is the case of dotless i (ı, $0131) and dotted I (İ, $0130). In most languages the upper of i ($69) is I ($49), but in turkish locale i ($69) maps to İ ($0130). Similarly in turkish the lower of I ($49) is ı ($0131).

Now let us go back to Delphi. The new string format containes no locale information for unicode strings. We have 2 functions to make the unicode uppercase conversion – the already mentioned SysUtils.AnsiUpperCase introduced for backward compatibility and new ToUpper function defined in Character unit. The SysUtils.AnsiUpperCase is a wrapper for WinAPI CharUpperBuffW function and ignores locale-specific issues. The CharUpperBuffW works much better than its ANSI analog CharUpperBuffA, but still can’t help with “turkish case”. The Character.ToUpper function is a wrapper for LCMapString function and is locale-dependent, but the locale parameter is set to system locale. So if you need “turkish uppercase” on the system with different locale (or want to make the result of uppercase independent from system locale) you must write your own LCMapString wrapper.

BTW: the early Delphi 2009 releases containes side-effect bug in ToUpper and ToLower implementations. The bug was fixed in Update 3 (build 12.0.3420.21218).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s