Topics: |
Hyperstage is a column-oriented, high performance analytic engine designed for analytic applications and data marts that need fast query response across large data volumes. Hyperstage was designed specifically for large volume data analytics applications with up to 50 Terabytes of data.
Topics: |
Hyperstage uses a unique and patent-pending approach to compressing, storing, and processing data that allows it to be installed and run on commodity hardware with little or no DBA intervention. Hyperstage requires little tuning to support ad hoc or complex business analytic queries.
Hyperstage is a database engine utilizing the PostgreSQL database environment. As such, Hyperstage is fully compatible with all PostgreSQL-compliant Business Intelligence tools and utilizes the PostgreSQL administrative interface to reduce the learning curve for system administrators.
Hyperstage provides a versatile, highly-compressed database system optimized for analytic-type queries. The ratio of possible compression and the speed of data import and retrieval are optimized at the expense of some transactional features of the engine performance, like the frequent data updating.
Hyperstage executes complex or ad hoc queries across vast amounts of data with a low cost of ownership.
Hyperstage combines the Hyperstage storage engine with PostgreSQL server implementation. Hyperstage consists of several layers. The upper layers are provided by the PostgreSQL server implementation, and the lower layers are provided by Hyperstage.
Hyperstage includes both its own optimizer and executor along with the storage engine. The PostgreSQL query engine can be used with Hyperstage. However, since the PostgreSQL storage engine interface is row oriented, it cannot take full advantage of the column orientation or the Knowledge Grid and hence query execution through this path is reduced. Queries will be directed to the Hyperstage optimizer whenever possible.
Hyperstage ships with the full PostgreSQL binaries required. PostgreSQL is used to store catalog information (as with other storage engines). You can use the PostgreSQL instance for other purposes, but joining PostgreSQL and Hyperstage tables may result in reduced performance as the PostgreSQL query engine will be used.
PostgreSQL provides:
Hyperstage provides:
Topics: |
The following section describes the configuration steps for Hyperstage.
Topics: |
The Hyperstage configuration file is called infobright.cnf and is located in the ib_data subdirectory within the Hyperstage Data installation directory (for example, C:\ibi\WebFOCUS_BUE82\srv\wfs\hs\ib_data). The configuration file is a text file containing the Hyperstage configuration parameters.
Each parameter is shown on a separate line.
If a parameter is not present in the configuration file or if the configuration file does not exist, the default values are used. Blank lines and comments (lines starting with #) are ignored.
Be sure to customize the following parameters to optimize performance. These tuning parameters are case sensitive and must be typed as shown in the following table.
Note: The values are commented out (preceded by #) in the infobright.cnf file, which causes them to default to the application minimum allowed values of 600 and 320 for ServerMainHeapSize and LoaderMainHeapSize, respectively.
Hyperstage Configuration Parameters
Parameter Name |
Description |
---|---|
LicenseFile |
Specifies the path or name of the newly required License file. |
LogLevel |
Controls how much information is written to logs. This is similar to the obsolete ControlMessages parameter. |
LogRotateSize |
Specifies how large the log file can be before it is rotated and archived. |
LogRotateFiles |
Specifies how many log archive files are kept. |
KNFolder |
Specifies the folder where Knowledge Grid is stored. |
CacheFolder |
Specifies the folder where temporary objects are stored. |
ServerMainHeapSize |
Specifies the size (in MB) of the main memory heap. |
ThrottleLimit |
Controls how many SELECT queries can run concurrently. |
Important: You must properly configure your memory settings to ensure optimal performance.
The following table shows sample, recommended memory configurations for different systems.
System Memory |
Server Main Heap Size |
Loader Main Heap Size |
---|---|---|
64GB |
48000 |
800 |
48GB |
32000 |
800 |
32GB |
24000 |
800 |
16GB |
10000 |
800 |
8GB |
4000 |
800 |
4GB |
1300 |
400 |
2GB |
600 |
320 |
In most cases, the loader does not benefit from larger memory settings. However, increasing the LoaderMainHeapSize can help when:
or
You can use more memory at import if you are planning to execute several concurrent load tasks to different data tables. However, disk access may become a bottleneck.
ServerMainHeapSize should be as large as possible, but safely smaller than the amount of physical memory on the machine. If performance decreases because of memory swapping by the operating system, try to set lower heap sizes. We also recommend decreasing the heap size if many users are running queries in parallel.
Note: Hyperstage may use additional memory for heavy loads or queries. Also, other applications on your server will use memory for their processes. It is important that the total of ServerMainHeapSize is less than the total available physical memory. If the system needs to swap memory, performance will be severely impacted.
Topics: |
The following section describes how to work with the Hyperstage server.
The Hyperstage Server starts and stops automatically when starting and stopping the Business User Edition Reporting Server. The Manager user ID is required to start and stop any component of the Business User Edition.
To manually stop the Hyperstage Server, from the Workspace/Select Special Services and Listeners section of the Reporting Server Web Console, right-click the HYPER service and select Stop, as shown in the following image.
To manually start the Hyperstage server, from the Workspace/Select Special Services and Listeners section of the Reporting Server Web Console, right-click the HYPER service and select Start, as shown in the following image.
Note: Hyperstage only supports Quick Copy, and Custom Copy as ETL tools.
The Quick Copy tool allows for the copying of all data from a Source table into Hyperstage. The Bulk Load option should be selected in order for data to be loaded quickly. If the Bulk Load option is cleared, the data will take much longer to load.
The Custom Copy tool allows for the copying of selected columns, presorting data within selected columns, and filtering of columns from a Source table into Hyperstage.
To access the Quick Copy tool, right-click the name of the synonym corresponding to the table or data you wish to copy into Hyperstage, and select Quick Copy.
The following configuration setting options are available:
New. Recreate the target table before loading the data.
Append Existing Data. Data is loaded to an existing table.
The list of adapters currently configured on the Reporting Server.
The Hyperstage connection used for the load operation.
The target application on the Reporting Server where the target synonym will be stored.
The name of the target synonym defining the Target Table Name.
The name of the Hyperstage table where the data will be loaded.
When selected, data will be loaded using the Hyperstage Bulk Load functionality. Bulk Load is the recommended approach for loading data into Hyperstage.
When cleared, data will be loaded using Insert/Update. Insert/Update is not recommended and will perform extremely slow.
Topics: |
The following section describes how to work with the Hyperstage tables and lists the data types supported.
Hyperstage tables are located in the ib_data subdirectory in your Hyperstage installation directory. Within the ib_data subdirectory, Hyperstage databases are stored in separate subdirectories.
Important: Do not manually copy a data table from one database to another by copying the database files. Internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table, backup the entire database directory (see Hyperstage Backup and Recovery).
The following path and image shows the content of the ib_data directory, containing the Hyperstage databases webfocus and utf8test, as well as the BH_RSI_Repository directory, which holds the Knowledge Notes:
C:\ibi\WebFOCUS_BUE82\srv\wfs\hs\ib_data
The following data types are supported in Hyperstage. Note that numeric data types ranges are 1 less than the PostgreSQL minimums and maximums.
Numeric Types |
||
---|---|---|
Data Type |
Minimum |
Maximum |
BOOLEAN |
Values are either 0 or 1. |
|
SMALLINT |
-32767 |
32767 |
INT (INTEGER) |
-2147483647 |
2147483647 |
BIGINT |
-9223372036854775807 |
-9223372036854775807 |
REAL |
-3.402823466E+38 |
3.402823466E+38 |
DOUBLE PRECISION |
-1.7976931348623157E+308 |
1.7976931348623157E+308 |
Numeric(M, D) where: 0 < M <= 18 and 0 <= D <= M |
-(1E+M - 1) / (1E+D) |
(1E+M - 1) / (1E+D) |
Date and Time Types |
|||
---|---|---|---|
Data Type |
Minimum |
Maximum |
Format |
DATE |
100-01-01 |
9999-12-31 |
YYYY-mm-dd |
Time (without timezone) |
00:00:00 |
24:00:00 |
HH:MM:SS |
TIMESTAMP (without timezone) |
100-01-01 00:00:00 |
9999-12-31 23:59:59 |
YYYY-mm-dd HH:MM:SS |
TIME0053TAMP (with timezone) |
1970-01-01 00:00:00 in UTC |
2038-01-01 00:59:59 in UTC |
|
Interval |
-178000000 years |
178000000 years |
String Type |
|
---|---|
Data Type |
Maximum Length |
BYTEA (binary string) |
0 < N <= 65536 |
CHAR(N) |
Fixed-length. Maximum length depends on character set (encoding). 0 < N * B <= 65536 where B is the maximum number of bytes for a single character. |
VARCHAR(N) |
Maximum length depends on character set (encoding). 0 < N * B <= 65536, where B is the maximum number of bytes for a single character. For example, for UTF-8 it is 4 bytes, so the maximum number of characters that can be stored in a (VAR)CHAR column is 65536 / 4 = 16384 |
Use the standard PostgreSQL commands to create and drop tables in Hyperstage, the same as you would with a PostgreSQL table. For detailed syntax information, see the PostgreSQL 9.2 Documentation.
Important: Do not manually copy a data table from one database to another by copying the database files. Internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table from one database to another, back up the entire database directory (see Hyperstage Backup and Recovery). You can rename the entire database by renaming the folder. However, you should not copy a database folder from one active instance to another, or within the same active instance.
To create a table, enter the following command:
psql> create table <table_name> (<column(s)>) with (ENGINE=INFOBRIGHT);
To drop a table, enter the following command:
psql> drop table table_name;
For information on supported and unsupported options when creating columns, see About Column Options.
Topics: |
The following section describes the character sets supported by Hyperstage.
Hyperstage storage supports all ANSI and UTF-8 character sets. This means that Hyperstage can store and retrieve data encoded in 8-bit and multi-byte character sets.
Important: Queries that evaluate against UTF-8 character data columns will execute with less performance than an equivalent query against ASCII character data, due to ASCII support of Character Maps in the Knowledge Grid. UTF-8 specific Knowledge Grid extensions will be available in an upcoming release.
Hyperstage supports all custom UTF-8 collations supported by PostgreSQL:
|
*utf8_unicode_ci properly handles both French and German collation, so specific collation types for these languages are not necessary.
For more information, see the PostgreSQL 9.2 Documentation.
The SQL standard does not define a default collation. Therefore, many DBMS engines have different default collations and produce different results. As a result, there are several differences between Hyperstage and other DBMS engines.
psql> create table … collate ascii_bin;
Hyperstage treats padding differently than other DBMS engines. Hyperstage assumes literal comparisons of text fields, including all whitespace characters. Therefore, a string containing two spaces is different than a string containing one space or an empty (0 length) string, which is also different than the NULL value.
The Hyperstage padding definition is compatible with the SQL standard. However, most DBMS systems have defined less restricted, customizable rules regarding text comparison. For example, 'abc ' = 'abc' may be true in some databases, but is not true in Hyperstage.
Note: In CHAR columns, trailing spaces are trimmed on LOAD, INSERT, and UPDATE, whereas in VARCHAR columns values are loaded with all spaces.
Topics: |
The following section provides instructions on how to backup and restore the Hyperstage databases.
Use the following procedures to back up Hyperstage.
Important: Some files in the KNFolder are updated when queries (using JOIN) are run, so be sure to back up the KNFolder on a regular basis.
To restore the Hyperstage databases from a backup copy, do the following:
Important: Do not manually modify database files or move them from one database to another. This may lead to data corruption and unpredictable results.
Topics: |
The following section lists the functions and operators supported by Hyperstage.
Topics: |
|
|
|
|
|
|
|
|
Topics: |
The following section describes the data tools used by Hyperstage.
Topics: |
How to: |
Hyperstage provides a tool to validate Hyperstage-specific metadata structures. The Hyperstage Consistency Manager is an external stand-alone application that can be run against a Hyperstage instance to verify and repair most Hyperstage data structures, including the Knowledge Grid and Data Packs.
If you are seeing unexpected behavior with Hyperstage, such as server crashes, it can help to run the Hyperstage Consistency Manager for information for support and to perform repairs.
Note: Currently, the Hyperstage database must be offline in order to run the Hyperstage Consistency Manager.
The Hyperstage Consistency Manager runs tests, as described in the following table.
Test |
Description |
---|---|
Delete mask consistency check |
Checks that the delete mask headers contain the proper sum for the delete mask body. If any inconsistency is found between the header and body, the Hyperstage Consistency Manager returns the list of blocks of delete mask where inconsistencies were found. |
Number of objects in columns equality |
Compares the stored number of objects in each column file related to the table. If any inconsistency is found in the number of objects, the Hyperstage Consistency Manager returns the first two columns with different object numbers. |
Comparison of maximal value in DIMENSION dictionary versus DPN |
Executes only for DIMENSION columns. Compares the maximal key value stored in the DIMENSION column dictionary and in DPNs. If the values differ, the Hyperstage Consistency Manager writes them to the log. |
Comparison of number of objects in first-column DPN versus delete mask |
Compares the metadata stored in the headers of the delete mask and DPN file related to the number of objects. If any inconsistencies are found, the Hyperstage Consistency Manager returns both numbers. The Hyperstage Consistency Manager compares only the first column because there is an independent test comparing this value between columns. If the test does not find the proper delete mask file or the proper DPN file, the Hyperstage Consistency Manager reports corruption. |
Knowledge Grid consistency for column |
Checks if the histograms report the proper value of the fixed parameter. A basic test of the Knowledge Node, ensuring the file has a proper format and the type of Knowledge Node corresponds to the column. |
Knowledge Grid format for column |
Each Knowledge Node is stored in a separate file. This test validates that the header data of each file is in the proper format. |
Test for overlapping Data Packs in data files |
Checks if there are Data Packs in files that overlap each other. If this situation occurs, the Hyperstage Consistency Manager returns a list of pairs of Data Packs numbers that are overlapping. |
Tests of table metadata consistency |
Verifies if the table metadata is valid. Includes verification of files used to store items, such as table name, number of columns and their names, types, and constrains like NOT NULL. These are the files created on CREATE TABLE and modified only on ALTER TABLE. |
Test of DPNs for non-binary collation |
Verifies Data Packs specifically for non-binary collation types (for example: Latin1_swedish.ci). If errors exist, they can be repaired using the Hyperstage Consistency Manager - -repair option. |
To view the run options, run Hyperstage Consistency Manager with the - -help flag:
Icm-pure --help
To run Hyperstage Consistency Manager, use the following command:
Icm-pure --datadir=/data_directory_path [parameters]
For example:
c:\ibi\srv77\home06Hyperstage\hs\bin>icm-pure.exe --datadir=C:\HyperstagePG\ib_data --log-file=C:\temp\icm-pure.log
Note: Hyperstage Consistency Manager should be run by the 'postgres' user. It should not be executed by 'root' or any rebuilt knowledge nodes will be owned by root (and cannot be edited), which will result in issues when loading any subsequent data into the 'corrected' tables.
The following table describes the Hyperstage Consistency Manager parameters.
Parameter |
Description |
---|---|
- -help |
Displays help message and exit. |
-V [ - -version ] |
Displays version information and exit. |
- -basedir |
Absolute path to Hyperstage installation directory. |
arg |
|
- -datadir arg |
Absolute path to directory. Mandatory. |
- -database arg |
Name of database chosen for data integrity testing. Optional. If specified, no other databases will be tested. |
- -table arg |
Name of table chosen for data integrity testing. Optional. If specified, no other tables will be tested. |
- -log-file arg |
Prints output to log file. Optional. If not specified, the logs will be printed to the console. |
-F [ - -full-check ] |
Runs full set of tests (may be time-intensive). Running Hyperstage Consistency Manager without the full-check option will result in a quicker test. However, the "Knowledge Grid consistency for column" test will not be run. |
- -repair |
Repairs found problems. |
- -rebuild-kns |
Rebuilds the Knowledge Grid. For more information, see About Rebuilding or Repairing Knowledge Nodes. |
- -stop-on-error |
Stops tests on first error and report. |
- -cleanup |
In case of an error in the Hyperstage Consistency Manager repair procedure, this option enables Hyperstage Consistency Manager to manually revert the datadir to its previous state. Running Hyperstage Consistency Manager with the - -cleanup option removes the old DPN files (containing incorrect DPNs) from the datadir and also makes the changes performed by Hyperstage Consistency Manager impossible to undo. If the - -cleanup option is not used, the old DPN files remain in the datadir. |
Executing a rebuild of the Knowledge Nodes (using the - -rebuild-kns option) will run the following tests:
The - -rebuild-kns option will fix any issues found for the first two tests ("Test of table metadata consistency" and "Test of Knowledge Grid format for column").
You can also use the - -repair option along with the - -full-check option to achieve the same results as - -rebuild-kns. Using either of these methods will rebuild any Knowledge Nodes that have been deleted.
The Hyperstage Consistency Manager creates backup files when repairing problems related to "Test of DPNs for non-binary collation" (backup files are not created for any other tests). These backup files can be used to revert back to the original data if the Hyperstage Consistency Manager encounters an error during the repair procedure. To revert to the original data, copy or rename the TAXXXXXDPN.icm_bck files to the TAXXXXXDPN.ctb files (found in the ib_data directory).
WebFOCUS | |
Feedback |