Grouping Numeric Data Into Ranges

Topics:

How to:

When you sort a report using a numeric sort field, you can group the sort field values together and define the range of each group.

There are several ways of defining groups. You can define groups of:

The FOR phrase is usually used to produce matrix reports and is part of the Financial Modeling Language (FML). However, you can also use it to create columnar reports that group sort field values in unequal ranges.

The FOR phrase displays the sort value for each individual row. The ranges do not have to be contiguous, that is, you can define your ranges with gaps between them. The FOR phrase is described in more detail in Creating Financial Reports With Financial Modeling Language (FML).

Note: If there is not any data for a group, a row for the group still appears in the report.

Syntax: How to Define Groups of Equal Range

{BY|ACROSS} sortfield IN-GROUPS-OF value [TOP limit]

where:

sortfield

Is the name of the sort field. The sort field must be numeric: its format must be I (integer), F (floating-point number), D (decimal number), or P (packed number).

value

Is a positive integer that specifies the range by which sort field values are grouped.

limit

Is an optional number that defines the highest group label to be included in the report.

Example: Defining Groups of Equal Ranges

The following illustrates how to show which employees fall into which salary ranges, and to define the ranges by $5,000 increments.

TABLE FILE EMPLOYEE
PRINT LAST_NAME
BY CURR_SAL IN-GROUPS-OF 5000
END

The output is:

  CURR_SAL  LAST_NAME
  --------  ---------
 $5,000.00  SMITH
            GREENSPAN
$10,000.00  STEVENS
            SMITH
$15,000.00  JONES
            MCCOY
            MCKNIGHT
$20,000.00  ROMANS
            BLACKWOOD
$25,000.00  BANNING
            IRVING
            CROSS

Syntax: How to Define Equal Ranges

{BY|ACROSS} sortfield IN-RANGES-OF value [TOP limit]

where:

sortfield

Is the name of the sort field. The sort field must be numeric: its format must be I (Integer), F (floating-point), D (double-precision), or P (packed).

value

Is an integer greater than zero indicating the range by which sort field values are grouped.

limit

Is an optional number that defines the highest range label to be included in the report. The range is extended to include all data values higher than this value.

Note: IN-RANGES-OF generates an internal sort phrase that must be counted in the total number of sort phrases.

Example: Defining Equal Ranges

TABLE FILE EMPLOYEE
PRINT LAST_NAME
BY CURR_SAL IN-RANGES-OF 5000
END

The output is:

Syntax: How to Define Custom Groups of Data Values

FOR sortfield
begin1 TO end1 [OVER begin2 TO end2 ... ]

where:

sortfield

Is the name of the sort field.

begin

Is a value that identifies the beginning of a range.

end

Is a value that identifies the end of a range.

Example: Defining Custom Groups of Data Values

The following request displays employee salaries, but it groups them in an arbitrary way. Notice that the starting value of each range prints in the report.

TABLE FILE EMPLOYEE
PRINT LAST_NAME
FOR CURR_SAL
9000 TO 13500 OVER
14000 TO 19700 OVER
19800 TO 30000
END

The output is:

Grouping Numeric Data Into Tiles

How to:

Reference:

You can group numeric data into any number of tiles (percentiles, deciles, quartiles, etc.) in tabular reports. For example, you can group student test scores into deciles to determine which students are in the top ten percent of the class, or determine which sales representatives are in the top half of all sales representatives based on total sales.

Grouping is based on the values in the selected vertical (BY) field, and data is apportioned as equally as possible into the number of tile groups you specify.

The following occurs when you group data into tiles:

  • A new column, labeled TILE by default, is added to the report output and displays the tile number assigned to each instance of the tile field. You can change the column heading with an AS phrase. For details on the AS phrase, see Using Headings, Footings, Titles, and Labels.
  • Tiling is calculated within all of the higher-level sort fields in the request, and restarts whenever a sort field at a higher level than the tile field value changes.
  • Instances are counted using the tile field. If the request prints fields from lower level segments, there may be multiple report lines that correspond to one instance of the tile field.
  • Instances with the same tile field value are placed in the same tile. For example, consider the following data, which is to be apportioned into three tiles:

    1 5 5 5 8 9

    In this case, dividing the instances into groups containing an equal number of records produces the following:

    Group

    Data Values

    1

    1,5

    2

    5,5

    3

    8,9

    However, because all of the same data values must be in the same tile, the fives (5) that are in group 2 are moved to group 1. Group 2 remains empty. The final tiles are:

    Tile Number

    Data Values

    1

    1,5,5,5

    2

     

    3

    8,9

Syntax: How to Group Numeric Data Into Tiles

BY [ {HIGHEST|LOWEST} [k] ] tilefield [AS 'head1']
         IN-GROUPS-OF n TILES [TOP m] [AS 'head2']

where:

HIGHEST

Sorts the data in descending order so that the highest data values are placed in tile 1.

LOWEST

Sorts the data in ascending order so that the lowest data values are placed in tile 1. This is the default sort order.

k

Is a positive integer representing the number of tile groups to display in the report. For example, BY HIGHEST 2 displays the two non-empty tiles with the highest data values.

tilefield

Is the field whose values are used to assign the tile numbers.

head1

Is a heading for the column that displays the values of the tile sort field.

n

Is a positive integer not greater than 32,767, specifying the number of tiles to be used in grouping the data. For example, 100 tiles produces percentiles, while 10 tiles produces deciles.

m

Is a positive integer indicating the highest tile value to display in the report. For example, TOP 3 does not display any data row that is assigned a tile number greater than 3.

head2

Is a new heading for the column that displays the tile numbers.

Note:

  • The syntax accepts numbers that are not integers for k, n, and m. On z/OS, values with decimals are rounded to integers; on UNIX and Windows they are truncated. If the numbers supplied are negative or zero, an error message is generated.
  • Both k and m limit the number of rows displayed within each sort break in the report. If you specify both, the more restrictive value controls the display. If k and m are both greater than n (the number of tiles), n is used.

Example: Grouping Data Into Five Tiles

The following illustrates how to group data into five tiles.

TABLE FILE EMPLOYEE
PRINT LAST_NAME FIRST_NAME
BY DEPARTMENT
BY CURR_SAL IN-GROUPS-OF 5 TILES
END

The output is:

Note that the tiles are assigned within the higher-level sort field DEPARTMENT. The MIS category does not have any data assigned to tile 3. The PRODUCTION category has all five tiles.

Example: Displaying the First Three Tile Groups

In this example, the employees with the three lowest salaries are grouped into five tiles.

TABLE FILE EMPLOYEE
PRINT LAST_NAME FIRST_NAME
BY DEPARTMENT
BY LOWEST 3 CURR_SAL IN-GROUPS-OF 5 TILES
END

The output is:

Note that the request displays three tile groups in each category. Because no data was assigned to tile 3 in the MIS category, tiles 1, 2, and 4 display for that category.

Example: Displaying Tiles With a Value of Three or Less

In this example, the employees with the three lowest salaries are listed and grouped into five tiles, but only the tiles that are in the top 3 (tiles 1, 2, or 3) are displayed in the report. Also, the heading for the TILES field has been renamed (using the AS phrase) to DECILES.

TABLE FILE EMPLOYEE
PRINT LAST_NAME FIRST_NAME
BY DEPARTMENT
BY LOWEST 3 CURR_SAL IN-GROUPS-OF 5 TILES TOP 3 AS DECILES
END

The output is:

Because no data was assigned to tile 3 in the MIS category, only tiles 1 and 2 display for that category.

Reference: Usage Notes for Tiles

  • If a request retrieves data from segments that are descendants of the segment containing the tile field, multiple report rows may correspond to one instance of the tile field. These additional report rows do not affect the number of instances used to assign the tile values. However, if you retrieve fields from multiple segments and create a single-segment output file, this flat file will have multiple instances of the tile field, and this increased number of instances may affect the tile values assigned. Therefore, when you run the same request against the multi-level file and the single-segment file, different tile assignments may result.
  • Tiles are always calculated on a BY sort field in the request.
  • Only one tiles calculation is supported per request. However, the request can contain up to five (the maximum allowed) non-tile IN-GROUP-OF phrases in addition to the TILES phrase.
  • Comparisons for the purpose of assigning tile numbers use exact data values regardless of their display format. Therefore, if you display a floating-point value as D7, you may not be showing enough significant digits to indicate why values are placed in separate tiles.
  • The tile field can be a real field or a virtual field created with a DEFINE command or a DEFINE in the Master File. The COMPUTE command cannot be used to create a tile field.
  • Empty tiles do not display in the report output.
  • In requests with multiple sort fields, tiles are supported only at the lowest level and only with the BY LOWEST phrase.
  • Tiles are supported with output files. However, the field used to calculate the tiles propagates three fields to a HOLD file (the actual field value, the tile, and a ranking field) unless you set HOLDLIST to PRINTONLY.
  • Tiles are not supported with BY TOTAL, TABLEF, FML, and GRAPH.

WebFOCUS

Feedback