Selecting, Reformatting, and Manipulating Characters

Reference:

In character semantics mode, selection tests against a mask are automatically adjusted to work with characters rather than bytes. Formats assigned by reformatting a field in a request or by defining a temporary field are interpreted in terms of characters. Character functions interpret all lengths in terms of characters, as well.

Example: Defining a Virtual Field

Consider the following DEFINE in the Master File for the EMPLOYEE data source:

DEFINE FIRST_ABBREV/A5 WITH FIRST_NAME = EDIT(FIRST_NAME, '99999$$$$$');$

In character semantics mode, format A5 is interpreted as five characters (up to 15 bytes on ASCII platforms, up to 20 bytes on EBCDIC platforms), and the comparison is performed based on this number of bytes. In byte semantics mode, format A5 is interpreted as five bytes, and the comparison is performed based on five bytes. In either case, the correct characters are compared and extracted.

Example: Reformatting a Field

Consider the following PRINT command:

PRINT FIELD1/A10

In character semantics mode, format A10 is interpreted as 10 characters (up to 30 bytes), meaning that up to 30 bytes must be retrieved when this field is referenced. In byte semantics mode, format A10 means that 10 bytes will be retrieved. In either case, the field displays as 10 characters that take up 10 spaces on the report output.

Reference: Character Functions That Support Character Semantics

In character semantics mode, all character manipulation functions interpret lengths in terms of characters. The following functions operate on alphanumeric strings in character semantics mode when Unicode is configured:

  • String manipulation and extraction functions.

    GETTOK, OVRLAY, PARAG, REVERSE, SQUEEZ, STRIP, SUBSTR, SUBSTV, TRIM, TRIMV

  • Justification functions.

    CTRFLD, LJUST, RJUST

  • Length and position functions.

    ARGLEN, LENV, POSIT, POSITV

  • Format conversion functions.

    EDIT

  • Decoding, comparison, and editing functions.

    CHKFMT, EDIT, DECODE, SOUNDEX

  • String replacement functions.

    CTRAN, HEXBYT, BYTVAL (see notes below), STRREP

  • Case translation functions.

    LCWORD, LOCASE, LOCASV, UPCASE, UPCASV

Note: The HEXBYT, BYTVAL, and CTRAN functions have been extended to handle multibyte characters in Unicode configurations. These functions use or produce numeric values to represent characters. In Unicode configurations, they use or produce values in the range:

  • 0 to 255 for 1-byte characters
  • 256 to 65535 for 2-byte characters
  • 65536 to 16777215 for 3-byte characters
  • 16777216 to 4294967295 for 4-byte characters (primarily for EBCDIC)

To find the numeric value corresponding to a given character, find its hexadecimal code and convert to decimal with a hex calculator, such as the Windows Calculator program. Make sure to use the UTF-8 or UTF-EBCDIC code, not the Unicode code point, which would be the UTF-16 value.

For example, assume you would like to create a variable of format A1 containing the euro sign. The euro sign in UTF-8 is, in hex, E282AC. Converting this to decimal gives 14849492. You can then use the HEXBYT function to convert a decimal value to the euro character. Thus, a DEFINE or COMPUTE to generate a euro symbol would be:

EUROSIGN/A1 = HEXBYT(14849492, 'A1');

For more information on the HEXBYT function, see HEXBYT: Converting a Decimal Integer to a Character.

If you are creating a FOCEXEC with a UTF-8 compliant editor, then instead of using the Windows calculator to find the decimal value, you can also get the value of the euro sign (€) using the BYTVAL function:

EUROVAL/I8 = BYTVAL('€', 'I8');

For more information on the BYTVAL function, see BYTVAL: Translating a Character to Decimal.

You can have this value input directly into the CTRAN function. In the example below, the decimal value generated by the BYTVAL function is stored as EUROVAL. EUROVAL is then referenced by the CTRAN function.

EUROVAL/I8 = BYTVAL('€', 'I8');
NEWFLD/A40 = CTRAN(40, OLDFLD, EUROVAL, 49827, 'A40');

The CTRAN function replaces all occurrences of a character in a string with another character, given the decimal values that represent the hexadecimal codes for the two characters. Traditionally, this technique was used to replace characters that were difficult to input directly. However, decimal values of characters can be complicated to determine. Therefore, if you want to replace characters or character strings that you can input directly using a UTF-compliant text editor, it may be easier use the STRREP string replacement function.

The following translates all of the euro signs (€) in a 40-character UTF-8 field to pound sterling signs (£ = 49827):

NEWFLD/A40 = CTRAN(40, OLDFLD, EUROVAL, 49827, 'A40');

For more information on the CTRAN function, see CTRAN: Translating One Character to Another.

Alternatively, you could use the following STRREP function to perform the translation. This removes the step of determining the decimal values for each character, but requires you to be able to enter each character on your platform, in this case the euro character (€) and the pound sterling character (£).

NEWFLD/A40 = STRREP(40, OLDFLD, 1, '€', 1, '£', 40, NEWFLD);

For more information on the STRREP function, see STRREP: Replacing Character Strings.

WebFOCUS

Feedback