Why Unicode? - Adiscon

Article created 2001-03-12 by Rainer Gerhards.

Unicode is a standard to encode all of the world’s languages correctly on computers. In this article, Rainer Gerhards explains what Unicode is and why Adiscon bases all of its products on the Unicode standard.

What is Unicode?

It is an international standard. Its goal is to resolve ambiguities that traditionally arise when displaying complex scripts like Japanese, Arabian or Chinese on computer systems. Beside solving many Internationalization issues, Unicode-enabled programs also run faster under Windows.

So what does Unicode do?

It’s really easy. Traditional character sets (like the ANSI alphabet) base on 8 bit characters called a byte. A single byte can represent up to 256 different values and thus characters. This is well enough to represent western scripts like being used in English, French or German language. However, if it comes to more complex languages like Japanese or Korean, 256 different characters is simply insufficient.

So users of this languages have developed so called double byte character sets, called DBCS. In DBCS, each character is represented by either one or more bytes. Character encoding specifies how to interpret the byte values and whether or not a byte is a single character or just part of a larger set of bytes representing a multi-byte character.

Unfortunately, there are many different DBCS encodings for a given language. To make matters worse, different operating systems and different programming languages tend to use different DBCS encodings. Also, programming is relatively complex because of byte parsing operations.

Unicode’s goal is to solve this issue by using more than one byte for each character. In a typical implementation, 2 bytes are used, being able to represent 65,564 different characters. This is enough to store most of the world’s characters. So with Unicode, all different characters can be stored in one string. As all characters have a fixed width, programming complexity is greatly reduced.

Why do Unicode Applications run faster under Windows?

That reduced complexity of course provides better performance for applications. Complex character mappings and detects need not to be done. This will definitely improve performance.

Under the Windows NT based operating systems, there is an additional big, big performance plus. Windows NT itself internally bases on Unicode only. So all operating system calls (APIs) do expect characters encoded in Unicode – even on e.g. US English language versions of the operating system. However, there are also APIs available for the many applications the work with ANSI strings (with 8-bit characters). But these APIs are so called “wrappers” – the wrap the Unicode version of the API. All the ANSI version does is to convert the ANSI string to a Unicode string and then pass it to the Unicode version.

These translations not only involve the actual conversion but also allocation and de-allocation of temporary buffers to hold the converted strings. Easy guess that this will take a lot of time.

So Unicode-only applications can perform a lot better if run on Windows NT.

Internationalization

An additional big plus is much easier internationalization. Applications using Unicode internally are able to store and process all of the world’s characters. These removes many difficulties traditionally involved when internationalizing an application.

Of course, successful internationalizations is much more than Unicode enabling an application. It required careful screen design (different languages need different space to display the same sentence), translation and cultural understanding.

Unicode, however, is a building block to successful internationalization and already has solved many troubles developers traditionally experience.

How about Adiscon Products?

As part of our internationalization strategy, we will base all of our products internally on Unicode. We also expect a notable performance gain from that step.

At the time of this writing, the EventReporter product (beginning with version 5.1) is already natively based on Unicode. Work on the WinSyslog 3.3 release is already in progress and includes native Unicode support. The other products will follow.

Our products running under Windows will also be based on Unicode internally. Unfortunately, these operating systems do not provide an Unicode API. So Adiscon products on that platforms will support ANSI characters externally.

Anything else in Stock?

Indeed, there are more good news. Unicode enables us to reliable store all of the world’s characters. But we honor the fact the world is not yet Unicode based. So supporting DBCS is extremely important when it comes to system interoperability. However, with everything stored in Unicode, we can easily convert to other DBCS encodings when it comes to forwarding information. The EventReporter product, for example, can forward messages in JIS, SJIS and EUC-JP encodings. That capability will also be included in all products.

Want to know more on Unicode?

We hope you got a first start on the benefits of Unicode and how it enhances Adiscon products. If you would like to find more detailed information, please visit the Unicode Consortium. This is the body dedicated to the advance and promotion of Unicode. It’s web-site offers a wealth of information and useful resources.

I hope that this article helpful. If you have any questions or remarks, please do not hesitate to contact me at rgerhards@adiscon.com.

Why Unicode?