About Column Options

Topics:

Hyperstage supports NULL and NOT NULL specifications for columns.

LOOKUP Columns

Hyperstage provides an additional modifier for string data type columns, called a LOOKUP column. The LOOKUP column uses an integer substitution for values. You can declare a LOOKUP column on a CHAR or VARCHAR column to increase its compression and performance in queries. However, to use a LOOKUP column, the CHAR or VARCHAR column should meet the following criteria:

  • There is no fixed upper limit for unique values in the column (cardinality), but the cardinality of the column should be low. The total size of a dictionary, being the total length of all distinct values, will be loaded into RAM (for example, 1 million distinct values that are each 100-characters wide will permanently occupy 100 MB of RAM).
  • The column must contain a large number of duplicate values: the ratio of total number of records to distinct values should be greater than 10.

Typically, a LOOKUP column is useful for fields like state, gender, category, and similar fields where the number of instances is very high, but the number of unique values is very low. To determine the ratio of records to distinct values, determine the number of distinct values using SELECT COUNT (DISTINCT <COLUMN>) FROM… Then, compare this to the number of records using a SELECT COUNT(<COLUMN>) FROM…

Note: Using a LOOKUP on a column where there are more than 10,000 distinct values will result in greatly reduced load speeds.

To declare a column as a LOOKUP column, use the following example syntax when creating a table:

CREATE TABLE (a VARCHAR(200), b VARCHAR(200) LOOKUP=TRUE, c INTEGER) with (engine=infobright);

In this example, a LOOKUP attribute is associated with columns a and b, but column c is a standard integer column.

Note:

  • You can only declare a column as a LOOKUP modifier at the time the table is created. Modifying the column using ALTER TABLE to add or remove the LOOKUP modifier is not supported.
  • In certain releases of Hyperstage, LOOKUP columns were called DIMENSION columns. Columns previously defined as DIMENSION columns will now be associated with LOOKUP columns.

    For backwards compatibility, LOOKUP columns can now be defined by either using the new syntax lookup=true, or the previous syntax dimension=true.

  • Issuing a \d <table name> command will display whether the lookup modifier has been used for any column in the table.

Optimizing Columns for INSERTs

Hyperstage provides an additional modifier for columns to help optimize for INSERT operations, called a for_insert column. The for_insert modifier ensures that the most recent data pack is left uncompressed allowing for faster INSERTs in the case of a large number of single INSERTs with AUTOCOMMIT enabled, or small frequent LOADs (< 10000 rows each) with AUTOCOMMIT enabled.

If you are expecting a large number of individual INSERTs or small frequent LOADs with AUTOCOMMIT enabled, you should consider setting the for_insert modifier on character columns and large numeric columns (for example, 64-bit random identifiers or partNo). Small numeric columns (for example, color number or region ID) can be decompressed and recompressed with ease, and are unlikely to gain performance benefit from the for_insert modifier. For columns marked as lookup, the for_insert modifier may give very little benefit only. For smaller machines, you may wish to leave the for_insert modifier off in order to maximize compression for disk space.

Note: Currently, you can only set the for_insert modifier at the time of table creation. Modifying the column using ALTER TABLE to add or remove the for_insert modifier will be supported in a future release.

To declare a column as a for_insert column, add the comment 'for_insert' on the column. Enter the following command:

psql> create table …
(…
<<column name>> <<column type>> … comment 'for_insert'  …
…)
engine=infobright;

Issuing a SHOW CREATE TABLE command will display if the 'for_insert' modifier has been used for each column.

Unsupported Column Options

The following column options are ignored by Hyperstage:

  • Default values.
  • References to other tables.

Unsupported Indices Options

Hyperstage uses Knowledge Grid technology instead of standard indices and does not support explicit indices. The following elements of CREATE TABLE syntax related to indices are not allowed:

  • Keys
  • Indices
  • Unique columns
  • Auto-increment columns

WebFOCUS

Feedback