Configuring Hyperstage

Topics:

Hyperstage is a column-oriented, high performance analytic engine designed for analytic applications and data marts that need fast query response across large data volumes. Hyperstage was designed specifically for large volume data analytics applications with up to 50 Terabytes of data.

Hyperstage Overview

Topics:

Hyperstage uses a unique and patent-pending approach to compressing, storing, and processing data that allows it to be installed and run on commodity hardware with little or no DBA intervention. Hyperstage requires little tuning to support ad hoc or complex business analytic queries.

Hyperstage is a database engine utilizing the PostgreSQL database environment. As such, Hyperstage is fully compatible with all PostgreSQL-compliant Business Intelligence tools and utilizes the PostgreSQL administrative interface to reduce the learning curve for system administrators.

Hyperstage provides a versatile, highly-compressed database system optimized for analytic-type queries. The ratio of possible compression and the speed of data import and retrieval are optimized at the expense of some transactional features of the engine performance, like the frequent data updating.

Hyperstage executes complex or ad hoc queries across vast amounts of data with a low cost of ownership.

Hyperstage and PostgreSQL

Hyperstage combines the Hyperstage storage engine with PostgreSQL server implementation. Hyperstage consists of several layers. The upper layers are provided by the PostgreSQL server implementation, and the lower layers are provided by Hyperstage.

Hyperstage includes both its own optimizer and executor along with the storage engine. The PostgreSQL query engine can be used with Hyperstage. However, since the PostgreSQL storage engine interface is row oriented, it cannot take full advantage of the column orientation or the Knowledge Grid and hence query execution through this path is reduced. Queries will be directed to the Hyperstage optimizer whenever possible.

Hyperstage ships with the full PostgreSQL binaries required. PostgreSQL is used to store catalog information (as with other storage engines). You can use the PostgreSQL instance for other purposes, but joining PostgreSQL and Hyperstage tables may result in reduced performance as the PostgreSQL query engine will be used.

PostgreSQL provides:

  • Mature connectors, tools and resources.
  • Interconnectivity and certification with BI tools.
  • Management services and utilities.

Hyperstage provides:

  • Load function that compresses data.
  • Column-oriented storage engine.
  • Knowledge Grid metadata layer that contains information about the compressed data.
  • Optimizer/executor that uses the Knowledge Grid.

Configuring the Hyperstage Database

Topics:

The following section describes the configuration steps for Hyperstage.

Configuring Hyperstage

Topics:

The Hyperstage configuration file is called infobright.cnf and is located in the ib_data subdirectory within the Hyperstage Data installation directory (for example, C:\ibi\WebFOCUS_BUE82\srv\wfs\hs\ib_data). The configuration file is a text file containing the Hyperstage configuration parameters.

Each parameter is shown on a separate line.

If a parameter is not present in the configuration file or if the configuration file does not exist, the default values are used. Blank lines and comments (lines starting with #) are ignored.

Be sure to customize the following parameters to optimize performance. These tuning parameters are case sensitive and must be typed as shown in the following table.

Note: The values are commented out (preceded by #) in the infobright.cnf file, which causes them to default to the application minimum allowed values of 600 and 320 for ServerMainHeapSize and LoaderMainHeapSize, respectively.

Hyperstage Configuration Parameters

Parameter Name

Description

LicenseFile

Specifies the path or name of the newly required License file.

LogLevel

Controls how much information is written to logs. This is similar to the obsolete ControlMessages parameter.

LogRotateSize

Specifies how large the log file can be before it is rotated and archived.

LogRotateFiles

Specifies how many log archive files are kept.

KNFolder

Specifies the folder where Knowledge Grid is stored.

CacheFolder

Specifies the folder where temporary objects are stored.

ServerMainHeapSize

Specifies the size (in MB) of the main memory heap.

ThrottleLimit

Controls how many SELECT queries can run concurrently.

Configuration Tips and Examples

Important: You must properly configure your memory settings to ensure optimal performance.

The following table shows sample, recommended memory configurations for different systems.

System Memory

Server Main Heap Size

Loader Main Heap Size

64GB

48000

800

48GB

32000

800

32GB

24000

800

16GB

10000

800

8GB

4000

800

4GB

1300

400

2GB

600

320

In most cases, the loader does not benefit from larger memory settings. However, increasing the LoaderMainHeapSize can help when:

  • A table to be loaded has very long text values.

    or

  • The table has many columns (for example, 1000 columns).

You can use more memory at import if you are planning to execute several concurrent load tasks to different data tables. However, disk access may become a bottleneck.

ServerMainHeapSize should be as large as possible, but safely smaller than the amount of physical memory on the machine. If performance decreases because of memory swapping by the operating system, try to set lower heap sizes. We also recommend decreasing the heap size if many users are running queries in parallel.

Note: Hyperstage may use additional memory for heavy loads or queries. Also, other applications on your server will use memory for their processes. It is important that the total of ServerMainHeapSize is less than the total available physical memory. If the system needs to swap memory, performance will be severely impacted.

Using the Hyperstage Database Beyond WebFOCUS

Topics:

The following section describes how to work with the Hyperstage server.

Starting and Stopping the Hyperstage Server

The Hyperstage Server starts and stops automatically when starting and stopping the Business User Edition Reporting Server. The Manager user ID is required to start and stop any component of the Business User Edition.

To manually stop the Hyperstage Server, from the Workspace/Select Special Services and Listeners section of the Reporting Server Web Console, right-click the HYPER service and select Stop, as shown in the following image.

To manually start the Hyperstage server, from the Workspace/Select Special Services and Listeners section of the Reporting Server Web Console, right-click the HYPER service and select Start, as shown in the following image.

Quick Copy For Hyperstage Using Extended Bulk Load Utility

Note: Hyperstage only supports Quick Copy, and Custom Copy as ETL tools.

The Quick Copy tool allows for the copying of all data from a Source table into Hyperstage. The Bulk Load option should be selected in order for data to be loaded quickly. If the Bulk Load option is cleared, the data will take much longer to load.

The Custom Copy tool allows for the copying of selected columns, presorting data within selected columns, and filtering of columns from a Source table into Hyperstage.

To access the Quick Copy tool, right-click the name of the synonym corresponding to the table or data you wish to copy into Hyperstage, and select Quick Copy.

The following configuration setting options are available:

Load Option

New. Recreate the target table before loading the data.

Append Existing Data. Data is loaded to an existing table.

Target Adapter

The list of adapters currently configured on the Reporting Server.

Target Connection

The Hyperstage connection used for the load operation.

Target Synonym Application

The target application on the Reporting Server where the target synonym will be stored.

Target Synonym

The name of the target synonym defining the Target Table Name.

Target Table Name

The name of the Hyperstage table where the data will be loaded.

Bulk Load

When selected, data will be loaded using the Hyperstage Bulk Load functionality. Bulk Load is the recommended approach for loading data into Hyperstage.

When cleared, data will be loaded using Insert/Update. Insert/Update is not recommended and will perform extremely slow.

Managing Hyperstage Tables

Topics:

The following section describes how to work with the Hyperstage tables and lists the data types supported.

About the Hyperstage Database Files

Hyperstage tables are located in the ib_data subdirectory in your Hyperstage installation directory. Within the ib_data subdirectory, Hyperstage databases are stored in separate subdirectories.

Important: Do not manually copy a data table from one database to another by copying the database files. Internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table, backup the entire database directory (see Hyperstage Backup and Recovery).

The following path and image shows the content of the ib_data directory, containing the Hyperstage databases webfocus and utf8test, as well as the BH_RSI_Repository directory, which holds the Knowledge Notes:

C:\ibi\WebFOCUS_BUE82\srv\wfs\hs\ib_data

About Supported Data Types

The following data types are supported in Hyperstage. Note that numeric data types ranges are 1 less than the PostgreSQL minimums and maximums.

Numeric Types

Data Type

Minimum

Maximum

BOOLEAN

Values are either 0 or 1.

 

SMALLINT

-32767

32767

INT (INTEGER)

-2147483647

2147483647

BIGINT

-9223372036854775807

-9223372036854775807

REAL

-3.402823466E+38

3.402823466E+38

DOUBLE PRECISION

-1.7976931348623157E+308

1.7976931348623157E+308

Numeric(M, D)

where:

0 < M <= 18 and 0 <= D <= M

-(1E+M - 1) / (1E+D)

(1E+M - 1) / (1E+D)

Date and Time Types

Data Type

Minimum

Maximum

Format

DATE

100-01-01

9999-12-31

YYYY-mm-dd

Time (without timezone)

00:00:00

24:00:00

HH:MM:SS

TIMESTAMP (without timezone)

100-01-01 00:00:00

9999-12-31 23:59:59

YYYY-mm-dd HH:MM:SS

TIME0053TAMP (with timezone)

1970-01-01 00:00:00 in UTC

2038-01-01 00:59:59 in UTC

 

Interval

-178000000 years

178000000 years

 

String Type

Data Type

Maximum Length

BYTEA (binary string)

0 < N <= 65536

CHAR(N)

Fixed-length. Maximum length depends on character set (encoding). 0 < N * B <= 65536 where B is the maximum number of bytes for a single character.

VARCHAR(N)

Maximum length depends on character set (encoding). 0 < N * B <= 65536, where B is the maximum number of bytes for a single character. For example, for UTF-8 it is 4 bytes, so the maximum number of characters that can be stored in a (VAR)CHAR column is 65536 / 4 = 16384

Creating and Dropping Tables

Use the standard PostgreSQL commands to create and drop tables in Hyperstage, the same as you would with a PostgreSQL table. For detailed syntax information, see the PostgreSQL 9.2 Documentation.

Important: Do not manually copy a data table from one database to another by copying the database files. Internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table from one database to another, back up the entire database directory (see Hyperstage Backup and Recovery). You can rename the entire database by renaming the folder. However, you should not copy a database folder from one active instance to another, or within the same active instance.

To create a table, enter the following command:

psql> create table <table_name> (<column(s)>) with (ENGINE=INFOBRIGHT);
Note:
  • 'with (ENGINE=INFOBRIGHT)' syntax is necessary when creating tables manually, to specify that the table will be stored as part of the Hyperstage-specific Infobright engine. Without this syntax, the table will be created and stored as a regular PostgreSQL table.
  • When creating a table, as a matter of practice, you should always use the ENGINE= option to ensure that the correct database engine is used. Hyperstage is shipped with DEFAULT ENGINE=INFOBRIGHT, but this can be changed. The name of the engine can be specified explicitly at the end of the create table statement.

To drop a table, enter the following command:

psql> drop table table_name;

For information on supported and unsupported options when creating columns, see About Column Options.

Character Set Support

Topics:

The following section describes the character sets supported by Hyperstage.

Supported Character Sets

Hyperstage storage supports all ANSI and UTF-8 character sets. This means that Hyperstage can store and retrieve data encoded in 8-bit and multi-byte character sets.

Important: Queries that evaluate against UTF-8 character data columns will execute with less performance than an equivalent query against ASCII character data, due to ASCII support of Character Maps in the Knowledge Grid. UTF-8 specific Knowledge Grid extensions will be available in an upcoming release.

Collations and Comparisons

Hyperstage supports all custom UTF-8 collations supported by PostgreSQL:

  • utf8_bin
  • utf8_czech_ci
  • utf8_danish_ci
  • utf8_esperanto_ci
  • utf8_estonian_ci
  • utf8_general_ci (default)
  • utf8_hungarian_ci
  • utf8_icelandic_ci
  • utf8_latvian_ci
  • utf8_lithuanian_ci
  • utf8_persian_ci
  • utf8_polish_ci
  • utf8_roman_ci
  • utf8_romanian_ci
  • utf8_slovak_ci
  • utf8_slovenian_ci
  • utf8_spanish2_ci
  • utf8_spanish_ci
  • utf8_swedish_ci
  • utf8_turkish_ci
  • utf8_unicode_ci*

*utf8_unicode_ci properly handles both French and German collation, so specific collation types for these languages are not necessary.

For more information, see the PostgreSQL 9.2 Documentation.

The SQL standard does not define a default collation. Therefore, many DBMS engines have different default collations and produce different results. As a result, there are several differences between Hyperstage and other DBMS engines.

  • For Hyperstage, character data types are case-sensitive. For example, the condition 'toronto'='Toronto' is not true in Hyperstage. Similarly, the condition, LIKE 'Abc%' is not true for 'abcde'.
  • The Hyperstage sorting order is A…Z a…z (for example 'Zeta' < 'alfa'), which is the same sorting order as used by Oracle. The Hyperstage sorting order is different than the default PostgreSQL sorting order, which mixes lowercase and uppercase. The SQL Server order, which is aAbB…zZ; and the DB2 order, which is AaBb…Zz.
  • The Hyperstage sorting order affects ORDER BY results, GROUP BY results (which is the order of groups and their definitions (for example, 'aaa' and 'AAA' define different groups) and DISTINCT results. WHERE conditions may also be affected if you are expecting a different sorting order than the one used by Hyperstage.
  • To simulate Hyperstage collation in the PostgreSQL engine, set latin1_bin collation while creating a table (for more information, see the PostgreSQL 9.2 Documentation). Enter the following command:
    psql> create table … collate ascii_bin;

Padding

Hyperstage treats padding differently than other DBMS engines. Hyperstage assumes literal comparisons of text fields, including all whitespace characters. Therefore, a string containing two spaces is different than a string containing one space or an empty (0 length) string, which is also different than the NULL value.

The Hyperstage padding definition is compatible with the SQL standard. However, most DBMS systems have defined less restricted, customizable rules regarding text comparison. For example, 'abc ' = 'abc' may be true in some databases, but is not true in Hyperstage.

Note: In CHAR columns, trailing spaces are trimmed on LOAD, INSERT, and UPDATE, whereas in VARCHAR columns values are loaded with all spaces.

Hyperstage Backup and Recovery

Topics:

The following section provides instructions on how to backup and restore the Hyperstage databases.

Backup Procedure

Use the following procedures to back up Hyperstage.

  • To back up the Hyperstage databases, copy the ib_data and pg_data directories.
  • You can take advantage of incremental backups, since only some of the database files are updated when new data is imported. Be sure to do a full backup occasionally.

Important: Some files in the KNFolder are updated when queries (using JOIN) are run, so be sure to back up the KNFolder on a regular basis.

Restore Procedure

To restore the Hyperstage databases from a backup copy, do the following:

  1. Replace the ib_data and pg_data directories with the backup copy.
  2. Replace the KNFolder with the backup copy (if the KNFolder is not inside the data directory).

Important: Do not manually modify database files or move them from one database to another. This may lead to data corruption and unpredictable results.

Functions and Operators

Topics:

The following section lists the functions and operators supported by Hyperstage.

Hyperstage Optimizer Supported Functions and Operators

Comparison Functions and Operators

COALESCE

YES

Control Flow Functions

CASE

YES

COALESCE

TBD

NULLIF

YES

String Functions

BIT_LENGTH

YES

CONCAT

YES

LEFT

YES

LENGTH

YES

LOCATE

YES

LOWER

YES

LPAD

YES

LTRIM

YES

OCTET_LENGTH

YES

POSITION

YES

RIGHT

YES

RPAD

YES

RTRIM

YES

SUBSTR

YES

TRIM

YES

TRUNC

TBD

UPPER

YES

Numeric Functions

Modulo ( % )

YES

ABS

YES

ACOS

YES

ASIN

YES

ATAN2, ATAN

YES

ATAN

YES

CEIL

YES

COS

YES

COT

YES

DEGREES

YES

EXP

YES

FLOOR

YES

LN

YES

LOG

YES

MOD

YES

PI

YES

POWER

YES

RADIANS

YES

RANDOM

TBD

SIGN

YES

SIN

YES

SQRT

YES

TAN

YES

Date and Time Functions

CURRENT_DATE

YES

CURRENT_TIME

YES

DATE

YES

DAY

YES

DAYOFYEAR

YES

FROM_UNIXTIME

YES

HOUR

YES

MINUTE

YES

MONTH

YES

NOW

YES

QUARTER

YES

SECOND

YES

TIME

YES

YEAR

No

Text Search and Other Functions

CAST

YES

MD5

TBD

Group By Aggregate Functions

AVG

YES

BIT_OR

No

BIT_AND

No

COUNT(DISTINCT)

TBD

COUNT

YES

MIN

YES

MAX

YES

STD, STDDEV

YES

STDDEV_POP

YES

STDDEV_SAMP

YES

SUM

YES

VAR_POP

YES

VAR_SAMP

YES

VARIANCE

YES

Hyperstage Data Tools

Topics:

The following section describes the data tools used by Hyperstage.

Hyperstage Consistency Manager

Topics:

How to:

Hyperstage provides a tool to validate Hyperstage-specific metadata structures. The Hyperstage Consistency Manager is an external stand-alone application that can be run against a Hyperstage instance to verify and repair most Hyperstage data structures, including the Knowledge Grid and Data Packs.

If you are seeing unexpected behavior with Hyperstage, such as server crashes, it can help to run the Hyperstage Consistency Manager for information for support and to perform repairs.

Note: Currently, the Hyperstage database must be offline in order to run the Hyperstage Consistency Manager.

Hyperstage Consistency Manager Tests

The Hyperstage Consistency Manager runs tests, as described in the following table.

Test

Description

Delete mask consistency check

Checks that the delete mask headers contain the proper sum for the delete mask body. If any inconsistency is found between the header and body, the Hyperstage Consistency Manager returns the list of blocks of delete mask where inconsistencies were found.

Number of objects in columns equality

Compares the stored number of objects in each column file related to the table. If any inconsistency is found in the number of objects, the Hyperstage Consistency Manager returns the first two columns with different object numbers.

Comparison of maximal value in DIMENSION dictionary versus DPN

Executes only for DIMENSION columns. Compares the maximal key value stored in the DIMENSION column dictionary and in DPNs. If the values differ, the Hyperstage Consistency Manager writes them to the log.

Comparison of number of objects in first-column DPN versus delete mask

Compares the metadata stored in the headers of the delete mask and DPN file related to the number of objects. If any inconsistencies are found, the Hyperstage Consistency Manager returns both numbers. The Hyperstage Consistency Manager compares only the first column because there is an independent test comparing this value between columns. If the test does not find the proper delete mask file or the proper DPN file, the Hyperstage Consistency Manager reports corruption.

Knowledge Grid consistency for column

Checks if the histograms report the proper value of the fixed parameter. A basic test of the Knowledge Node, ensuring the file has a proper format and the type of Knowledge Node corresponds to the column.

Knowledge Grid format for column

Each Knowledge Node is stored in a separate file. This test validates that the header data of each file is in the proper format.

Test for overlapping Data Packs in data files

Checks if there are Data Packs in files that overlap each other. If this situation occurs, the Hyperstage Consistency Manager returns a list of pairs of Data Packs numbers that are overlapping.

Tests of table metadata consistency

Verifies if the table metadata is valid. Includes verification of files used to store items, such as table name, number of columns and their names, types, and constrains like NOT NULL. These are the files created on CREATE TABLE and modified only on ALTER TABLE.

Test of DPNs for non-binary collation

Verifies Data Packs specifically for non-binary collation types (for example: Latin1_swedish.ci). If errors exist, they can be repaired using the Hyperstage Consistency Manager - -repair option.

Syntax: How to Run the Hyperstage Consistency Manager

To view the run options, run Hyperstage Consistency Manager with the - -help flag:

Icm-pure --help

To run Hyperstage Consistency Manager, use the following command:

Icm-pure --datadir=/data_directory_path [parameters]

For example:

c:\ibi\srv77\home06Hyperstage\hs\bin>icm-pure.exe 
  --datadir=C:\HyperstagePG\ib_data --log-file=C:\temp\icm-pure.log

Note: Hyperstage Consistency Manager should be run by the 'postgres' user. It should not be executed by 'root' or any rebuilt knowledge nodes will be owned by root (and cannot be edited), which will result in issues when loading any subsequent data into the 'corrected' tables.

The following table describes the Hyperstage Consistency Manager parameters.

Parameter

Description

- -help

Displays help message and exit.

-V [ - -version ]

Displays version information and exit.

- -basedir

Absolute path to Hyperstage installation directory.

arg

 

- -datadir arg

Absolute path to directory. Mandatory.

- -database arg

Name of database chosen for data integrity testing. Optional. If specified, no other databases will be tested.

- -table arg

Name of table chosen for data integrity testing. Optional. If specified, no other tables will be tested.

- -log-file arg

Prints output to log file. Optional. If not specified, the logs will be printed to the console.

-F [ - -full-check ]

Runs full set of tests (may be time-intensive). Running Hyperstage Consistency Manager without the full-check option will result in a quicker test. However, the "Knowledge Grid consistency for column" test will not be run.

- -repair

Repairs found problems.

- -rebuild-kns

Rebuilds the Knowledge Grid. For more information, see About Rebuilding or Repairing Knowledge Nodes.

- -stop-on-error

Stops tests on first error and report.

- -cleanup

In case of an error in the Hyperstage Consistency Manager repair procedure, this option enables Hyperstage Consistency Manager to manually revert the datadir to its previous state. Running Hyperstage Consistency Manager with the - -cleanup option removes the old DPN files (containing incorrect DPNs) from the datadir and also makes the changes performed by Hyperstage Consistency Manager impossible to undo. If the - -cleanup option is not used, the old DPN files remain in the datadir.

About Rebuilding or Repairing Knowledge Nodes

Executing a rebuild of the Knowledge Nodes (using the - -rebuild-kns option) will run the following tests:

  • Test of table metadata consistency
  • Test of Knowledge Grid format for column
  • Test of Knowledge Grid consistency for column

The - -rebuild-kns option will fix any issues found for the first two tests ("Test of table metadata consistency" and "Test of Knowledge Grid format for column").

You can also use the - -repair option along with the - -full-check option to achieve the same results as - -rebuild-kns. Using either of these methods will rebuild any Knowledge Nodes that have been deleted.

About Cleanup Procedures

The Hyperstage Consistency Manager creates backup files when repairing problems related to "Test of DPNs for non-binary collation" (backup files are not created for any other tests). These backup files can be used to revert back to the original data if the Hyperstage Consistency Manager encounters an error during the repair procedure. To revert to the original data, copy or rename the TAXXXXXDPN.icm_bck files to the TAXXXXXDPN.ctb files (found in the ib_data directory).

WebFOCUS

Feedback