How To Use ICU4C From COBOL

This page has moved to https://unicode-org.github.io/icu/userguide/usefrom/cobol.html

The contents below is out of date.

Overview

This document describes how to use ICU functions within a COBOL program. It is assumed that the programmer understands the concepts behind ICU, and is able to identify which ICU APIs are appropriate for his/her purpose. The programmer must also understand the meaning of the arguments passed to these APIs and of the returned value, if any. This is all explained in the ICU documentation, although in C/C++ style. This document’s objective is to facilitate the adaptation of these explanations to COBOL syntax.

It must be understood that the packaging of ICU data and executable code into libraries is platform dependent. Consequently, the calling conventions between COBOL programs and the C/C++ functions in ICU may vary from platform to platform. In a lesser way, the C/C++ types of arguments and return values may have different equivalents in COBOL, depending on the platform and even the specific COBOL compiler used.

This document is supplemented with three sample programs illustrating using ICU APIs for code page conversion, collation and normalization. Description of the sample programs appears in the appendix at the end of this document.

ICU API invocation in COBOL

    1. Invocation of ICU APIs is done with the COBOL “CALL” statement.

    2. Variables, pointers and constants appearing in ICU *.H files (for C/C++) must be defined in the WORKING-STORAGE section for COBOL.

    3. Arguments to a C/C++ API translate into arguments to a COBOL CALL statement, passed by value or by reference as will be detailed below.

    4. For a C/C++ API with a non-void return value, the RETURNING clause will be used for the CALL statement.

    5. Character string arguments to C/C++ must be null-terminated. In COBOL, this means using the Z“xxx” format for literals, and adding X“00” at the end of the content of variables.

    6. Special consideration must be given when a pointer is the value returned by an API, since COBOL implements a more limited concept of pointers than C/C++. How to handle this case will be explained below.

COBOL and C/C++ Data Types

The following table (extracted from IBM VisualAge COBOL documentation) shows the correspondence between the data types available in COBOL and C/C++.

Parts of identifier names in Cobol are separated by ‘-’, not by ‘_’ like in C.

A number of C definitions specific to ICU (and many other compilers on POSIX platforms) that are not presented in the table above can also be translated into COBOL definitions.

Enumerations (first possibility)

C Enumeration types do not translate very well into COBOL. There are two possible ways to simulate these enumerations.

C example

typedef enum {

/** No decomposition/composition. @draft ICU 1.8 */

UNORM_NONE = 1,

/** Canonical decomposition. @draft ICU 1.8 */

UNORM_NFD = 2,

. . .

} UNormalizationMode;

COBOL example

WORKING-STORAGE section.

*--------------- Ported from unorm.h ------------

* enum UNormalizationMode {

77 UNORM-NONE PIC

S9(9) Binary value 1.

77 UNORM-NFD PIC

S9(9) Binary value 2.

Enumerations (second possibility)

C example

/*==== utypes.h ========*/

typedef enum UErrorCode {

U_USING_FALLBACK_WARNING = -128, /* (not an error) */

U_USING_DEFAULT_WARNING = -127, /* (not an error) */

. . .

} UErrorCode;

COBOL example

*==== utypes.h ========

01 UerrorCode PIC S9(9) Binary value 0.

* A resource bundle lookup returned a fallback

* (not an error)

88 U-USING-FALLBACK-WARNING value -128.

* (not an error)

88 U-USING-DEFAULT-WARNING value -127.

. . .

Call statement, calling by value or by reference

In general, arguments defined in C as pointers (‘*’) must be listed in the COBOL Call statement with the using by reference clause. Arguments which are not pointers must be transferred with the using by value clause. The exception to this requirement is when an argument is a pointer which has been assigned to a COBOL variable (e.g. as a value returned by an ICU API), then it must be passed by value. For instance, a pointer to a Converter passed as argument to conversion APIs.

Conversion Declaration Examples

C (API definition in *.h file)

/*--------------------- UCNV.H ---------------------------*/

U_CAPI int32_t U_EXPORT2

ucnv_toUChars(UConverter * cnv,

UChar * dest,

int32_t destCapacity,

const char * src,

int32_t srcLength,

UErrorCode * pErrorCode);

COBOL

PROCEDURE DIVISION.

Call API-Pointer using

by value Converter-toU-Pointer

by reference Unicode-Input-Buffer

by value destCapacity

by reference Input-Buffer

by value srcLength

by reference UErrorCode

Returning Text-Length.

Call statement, Returning clause

Returned value is Pointer or Binary

C (API definition in *.h file)

U_CAPI UConverter * U_EXPORT2

ucnv_open(const char * converterName,

UErrorCode * err);

COBOL

WORKING-STORAGE section.

01 Converter-Pointer PIC S9(9) BINARY.

PROCEDURE DIVISION

Move Z"iso-8859-8" to converterNameSource.

. . .

Call API-Pointer using

by reference converterNameSource

by reference UErrorCode

Returning Converter-Pointer.

Returned value is a Pointer to string

If the returned value in C is a string pointer (‘char *’), then in COBOL we must use a pointer to string defined in the Linkage section.

C ( API definition in *.h file)

U_CAPI const char * U_EXPORT2

ucnv_getAvailableName(int32_t n);

COBOL

DATA DIVISION.

WORKING-STORAGE section.

01 Converter-Name-Link-Pointer Usage is Pointer.

LINKAGE section.

01 Converter-Name-Link.

03 Converter-Name-String pic X(80).

PROCEDURE DIVISION using Converter-Name-Link.

Call API-Pointer using by value Converters-Index

Returning Converter-Name-Link-Pointer.

SET Address of Converter-Name-Link

to Converter-Name-Link-Pointer.

. . .

Move Converter-Name-String to Debug-Value.

How to invoke ICU APIs

Inter-language communication is often problematic. This is certainly the case when calling C/C++ functions from COBOL, because of the very different roots of the two languages. How to invoke the ICU APIs from a COBOL program is likely to depend on the operating system and even on the specific compilers in use. The section below deals with COBOL to C calls on a Windows platform. Similar sections should be added for other platforms.

Windows platforms

The following instructions were tested on a Windows 2000 platform, with the IBM VisualAge COBOL compiler and the Microsoft Visual C/C++ compiler.

For Windows, ICU APIs are normally packaged as DLLs (Dynamic Load Libraries). For technical reasons, COBOL calls to C/C++ functions need to be done via dynamic loading of the DLLs at execution time (load on call).

The COBOL program must be compiled with the following compiler options:

* options CBL PGMNAME(MIXED) CALLINT(SYSTEM) NODYNAM

In order to call an ICU API, two preparation steps are needed:

    1. Load in memory the DLL which contains the API

    2. Get the address of the API

For performance, it is better to perform these steps once before the first call and to save the returned values for future use (the sample programs get the address of APIs for each call, for the sake of logging; production programs should get the address once and reuse it

as many times as needed).

When no more APIs from a DLL are needed, the DLL should be unloaded in order to free the associated memory.

Load DLL Into Memory

This is done as follows:

Call "LoadLibraryA" using by reference DLL-Name

Returning DLL-Handle.

IF DLL-Handle = ZEROS

Perform error handling. . .

Return value: DLL Handle, defined as PIC S9(9) BINARY

Input Value: DLL Name (null-terminated string)

Errors may happen if the DLL name is not correct, or the string is not null-terminated, or the DLL file is not available (in the current directory or in a directory included in the PATH system variable).

Get API address

This is done as follows:

Call "GetProcAddress" using by value DLL-Handle

by reference API-Name

Returning API-Pointer.

IF API-Pointer = NULL

Perform error handling. . .

Return value: API address, defined as PROCEDURE-POINTER

Input Value: DLL Handle (returned by call to LoadLibraryA)

Procedure Name (null-terminated string)

Errors may happen if the API name is not correct (remember that API names are case-sensitive), or the string is not null-terminated, or the API is not included in the specified DLL. If the API pointer is not null, the call to the API is done with following according to the arguments and return value of the API.

Call API-Pointer using . . . returning . . .

After calling an API, the returned error code should be checked when relevant. Code to check for error conditions is illustrated in the sample programs.

Unload DLL from Memory

This is done as follows:

Call "FreeLibrary" using DLL-Handle.

Return value: none

Input Value: DLL Handle (returned by call to LoadLibraryA)

Sample Programs

Three sample programs are supplied with this document. The sample programs were developed on and for a Windows 2000 platform. Some adaptations may be necessary for other platforms

Before running the sample programs, you must perform the following steps:

    1. Install the version of ICU appropriate for your platform

    2. Build ICU libraries if needed (see the ICU Readme file)

    3. Make the libraries accessible (for instance on Windows systems, add the directory containing the libraries to the PATH system variable)

    4. Compile the sample programs with appropriate compiler options

    5. Copy the test files to a work directory

Each program is supplied with input test files and with a model log file. If the log file that you create by running a sample program is equivalent to the model log file, your setup is probably correct.

The three sample programs focus each on a certain ICU area of functionality:

    1. Conversion

    2. Collation

    3. Normalization

Conversion sample program

* The sample program includes the following steps:

* - Display the names of the converters from a list of all

* converters contained in the alias file.

* - Display the current default converter name.

* - Set new default converter name.

*

* - Read a string from Input file "ICU_Conv_Input_8.txt"

* (File in UTF-8 Format)

* - Convert this string from UTF-8 to code page iso-8859-8

* - Write the result to output file "ICU_Conv_Output.txt"

*

* - Read a line from Input file "ICU_Conv_Input.txt"

* (File in ANSI Format, code page 862)

* - Convert this string from code page ibm-862 to UTF-16

* - Convert the resulting string from UTF-16 to code page windows-1255

* - Write the result to output file "ICU_ Conv_Output.txt"

* - Write debugging information to Display and

* log file "ICU_Conv_Log.txt" (File in ANSI Format)

* - Repeat for all lines in Input file

**

* The following ICU APIs are used:

* ucnv_countAvailable

* ucnv_getAvailableName

* ucnv_getDefaultName

* ucnv_setDefaultName

* ucnv_convert

* ucnv_open

* ucnv_toUChars

* ucnv_fromUChars

* ucnv_close

The ucnv_xxx APIs are documented in file "UCNV.H".

Collation sample program

* The sample program includes the following steps:

* - Read a string array from Input file "ICU_Coll_Input.txt"

* (file in ANSI format)

* - Convert string array from code page into UTF-16 format

* - Compare the string array into the canonical composed

* - Perform bubble sort of string array, according

* to Unicode string equivalence comparisons

* - Convert string array from Unicode into code page format

* - Write the result to output file "ICU_Coll_Output.txt"

* (file in ANSI format)

* - Write debugging information to Display and

* log file "ICU_Coll_Log.txt" (file in ANSI format)

**

* The following ICU APIs are used:

* ucol_open

* ucol_strcoll

* ucol_close

* ucnv_open

* ucnv_toUChars

* ucnv_fromUChars

* ucnv_close

The ucol_xxx APIs are documented in file "UCOL.H".

The ucnv_xxx APIs are documented in file "UCNV.H".

Normalization sample program

* The sample includes the following steps:

* - Read a string from input file "ICU_NORM_Input.txt"

* (file in ANSI format)

* - Convert the string from code page into UTF-16 format

* - Perform quick check on the string, to determine if the

* string is in NFD (Canonical decomposition)

* normalization format.

* - Normalize the string into canonical composed form

* (FCD and decomposed)

* - Perform quick check on the result string, to determine

* if the string is in NFD normalization form

* - Convert the string from Unicode into the code page format

* - Write the result to output file "ICU_NORM_Output.txt"

* (file in ANSI format)

* - Write debugging information to Display and

* log file "ICU_NORM_Log.txt" (file in ANSI format)

**

* The following ICU APIs are used:

* ucnv_open

* ucnv_toUChars

* unorm_normalize

* unorm_quickCheck

* ucnv_fromUChars

* ucnv_close

The unorm_xxx APIs are documented in file "UNORM.H".

The ucnv_xxx APIs are documented in file "UCNV.H".