Topics: |
After the data targets have been added to the flow, there are two types of options that govern the behavior of the data flow when copying data into the targets.
Data flow properties include:
A commit value of 0 indicates that a commit is only issued when the data flow completes.
Note: A commit value of NONE can be used to indicate that DataMigrator should not issue any commit. This may be useful in cases where two or more flows should complete successfully before any data is committed to the target tables. A commit (or rollback) should be issued in a stored procedure after the data flows are complete. This option should be used with caution because if no commit or rollback is issued, the target table can become locked.
Target-specific properties include:
Topics: |
How to: |
Reference: |
Load options specify the way DataMigrator loads data into the targets. If you are moving data to more than one target, they will all use the same load type. The load type is displayed on all target objects and, if changed in one target, the change will be reflected in all the target objects.
Load options vary depending on the type of target. They can include:
If you select this option, you can set a behavior when duplicate records are found from the If the record exists drop-down menu. Insert/Update options can be set on a per target basis.
If the record exists:
If the record does not exist:
Note: When loading relational data targets, the Reject, Update, and Delete options can adversely affect performance, because they determine the existence of a key value on an incoming record before performing the specified action. This is done by issuing a SELECT command against the target table, and then waiting for the response. If one of these actions is required, try to limit the volume of records in the incremental change. These actions will perform best if there are unique keys on the table.
If you select Update the existing record or Delete the existing record from the If the record exists drop-down menu, you can also set behavior when the record does not exist using the If the record does not exist drop-down menu:
Note: This option is only available for Db2/UDB CLI, MySQL, MS SQL Server, Oracle, Informix, Sybase ASE, and Teradata 13. For other databases, the number of records in a block will default to 1 and cannot be changed. For details about loading relational targets, see How to Set Options for Relational Targets, and Target Properties Window for Relational Targets.
Insert records from memory:
The options available will depend on the RDBMS of your target type.
Note that rows rejected by a DBMS vendor bulk load program are not counted as errors. Users of the bulk loader should look into the vendor specific documentation on how to handle the error handling of the bulk loader program.
The options available will depend on the RDBMS of your target type. For more information on the options available for each adapter, see Extended Bulk Load Utility Options.
Note: Does not provide row counts (the number of records inserted or rejected) in the detail log or statistics.
Note: This setting is not available if the Optimize Load option is enabled in the Data Flow properties.
Note: This setting is not available if the Optimize Load option is enabled in the Data Flow properties.
Note: This setting is not available if the Optimize Load option is enabled in the Data Flow properties.
This option uses a database loader program with additional capabilities and is currently available for specific adapters. When extended bulk load is used for large data volumes, faster throughput is achieved by creating multiple load sessions (or threads). It reads from the source in parallel, a block at a time, where the block size is set by the Commit every rows parameter. It also creates multiple sessions for loading if the database load utility supports that functionality, up to the number specified in Maximum number of load sessions parameter.
Note: The options available will depend on the RDBMS of your target type.
An example of the Extended Bulk Load Utility options for MS SQL Server can be seen in the following image:
The following option is available for every adapter that has an Extended Bulk Load Utility load type.
Number of rows in a block. Use at least the default value of 1,000,000.
The following parameters depend on the database:
The maximum number of the concurrent load sessions logged on to the database. This parameter is applicable when the BULKLOAD parameter is set to ON. Its setting can be defined on the flow level and also on the Adapter level. The level setting of the adapter applies to the designs of the new Data Flows. The value set on a flow level takes precedence during the execution.
The default value is set to 10. However, the user should set the value carefully and consider current environment factors, such as the number of CPU cores, available memory, the parameters on the data access agents on a server, network capabilities, and the number of users running the requests.
The maximum number of restart attempts after recoverable load session error.
The above three options are the only Extended Bulk Load options available for the following adapters:
The following sections describe the additional options that are available for each adapter.
Apache Hive and Cloudera ImpalaApache Hive, Cloudera Impala, and Salesforce.com targets only have the default Extended Bulk Load Utility option, as shown in the following image.
Note: For the best performance when loading Apache Hive or Cloudera Impala, it is recommended that the DataMigrator Server be located on a data node or Edge Node (with the Hadoop client installed) of your Hadoop cluster.
However, it is possible to use DataMigrator to load data into Hadoop, even when the servers are not co-located. In this case, the DataMigrator Server will create a temporary file and use FTP to copy it to the Hadoop node. This requires that an FTP server is available on the Hadoop cluster, and in addition for use with Impala REXEC daemon.
Using Extended Bulk Load with several adapters requires that intermediate data files are transferred to the server where the database is running or a data node of a Hadoop cluster. These adapters include:
For these adapters, you must enable the Flat File adapter and create a connection using FTP or SFTP to the serverconfigure Bulk Load to stage using FTP. From the DMC or Web Console, perform the following steps:
The Configure Bulk Load dialog box opens.
Note: For Hive and Drill this must be a data node of your Hadoop cluster. For Impala, it must be a Data node or an Edge node. For Jethro, it must be the system where the Jethro server is running. For EXASol it should not be the system where it is running, rather it should be some other system that it can access using FTP.
The following message should appear: User has read access to the FTP server.
Note: Although not tested at this time, you must also have write access. A future release will add testing for write access.
Greenplum targets have the following additional options, as shown in the following image.
Delimiter characters between fields in the intermediate flat file that is loaded into the database. The delimiter can be up to four characters and can be specified as the following:
Specifies the single character that is used for escape sequences, such as ,, or @;. Also used for escaping data characters that might otherwise be interpreted as row or column delimiters.
Characters used to enclose each alphanumeric value in the intermediate flat file that is loaded into the database. The enclosure consists of up to four printable characters. The most common enclosure is one double quotation mark. Numeric digits, symbols used in numbers, such as a period (.), plus sign (+), or minus sign (-), and 0x09 (Horizontal Tab), 0x0A (Line Feed), or 0x0D (Carriage return) cannot be used in the enclosure sequence. In order to specify a single quotation march as the enclosure character, you must enter four consecutive single quotation marks.
Note: When a non-default value is used, the intermediate file is created as the Greenplum term CSV, instead of what is called TEXT.
These options are only available when BULKLOAD=ON and apply to all new flows using Greenplum as the target.
Note: ALIAS_CASE controls the case of column names when new tables are created. Greenplum users may want to select Enforce lower case.
Informix targets have the following additional options, as shown in the following image.
Delimiter characters between fields in the intermediate flat file that is loaded into the database. The delimiter can be up to four characters and can be specified as the following:
Jethro targets have the following additional options, as shown in the following image.
Delimiter characters between fields in the intermediate flat file that is loaded into the database. The delimiter can be up to four characters and can be specified as the following:
MS SQL Server ODBC targets have the following additional options, as shown in the following image.
The packet size in bytes.
PostgreSQL targets have the following additional options, as shown in the following image.
Delimiter characters between fields in the intermediate flat file that is loaded into the database. The delimiter can be up to four characters and can be specified as the following:
Specifies the single character that is used for escape sequences, such as ,, or @;. Also used for escaping data characters that might otherwise be interpreted as row or column delimiters.
Characters used to enclose each alphanumeric value in the intermediate flat file that is loaded into the database. The enclosure consists of up to four printable characters. The most common enclosure is one double quotation mark. Numeric digits; symbols used in numbers, such as a period (.), plus sign (+), or minus sign (-); and 0x09 (Horizontal Tab), 0x0A (Line Feed), or 0x0D (Carriage return) cannot be used in the enclosure sequence. In order to specify a single quotation march as the enclosure character, you must enter four consecutive single quotation marks.
Note: When a non-default value is used, the intermediate file is created as the PostgreSQL term CSV, instead of what is called TEXT.
These options are only available when BULKLOAD=ON and apply to all new flows using PostgreSQL as the target.
Note: ALIAS_CASE controls the case of column names when new tables are created. PostgreSQL users may want to select Enforce lower case.
Sybase ASE and Sybase IQ targets have the following additional options, as shown in the following image.
Is the delimiter character(s) used between fields. It is used by the intermediate flat file used to load the database.
The delimiter can be up to four characters and can be specified as:
TAB. A tab character. This is the default value.
a - A. A character string. For example, ~.
0x nn. A hex code, for example, 0x44 (a comma) or 0x0D0A (a return and a linefeed).
Is the delimiter character used between records. It is used by the intermediate flat file to the load the database. The row delimiter can be specified in the same manner as the (field) delimiter, except that character comma (,) is not permitted.
This parameter is available for Sybase IQ only.
Activate log information about integrity constraint violations and the types of violations. With this option two types of LOG files could be written into a temporary space in the configuration directory: MESSAGE LOG file etlblk.msg and ROW LOG file etlblk.log.
For Sybase IQ:
allow_read_client_file = on allow_write_client_file = on
This can be confirmed by using the following Sybase stored procedure:
sp_iqcheckoptions
If these permissions are not set, the flow will fail and display the following error:
FOC1261 PHYSICAL INTERFACE NOT FOUND FOR SUFFIX SQLSYB. MODULE NAME : dbodbc12_r
1010 targets have the following additional options, as shown in the following image.
Delimiter characters between fields in the intermediate flat file that is loaded into the database. The delimiter can be up to four characters and can be specified as the following:
The Teradata Extended Bulk Load Utility uses TPT (Teradata Parallel Transporter) to load data.
Teradata targets have the following additional options, as shown in the following image.
The maximum number of rejected records that can be stored in one of the error tables before the job is terminated. If unspecified, the default is unlimited.
The Teradata Director Program Id.
The Account Id used to access the database.
Name of table to contain records that were rejected because of data conversion errors, constraint violations, or AMP configuration changes. This must be a new table name. The default is the database default.
Name of table to contain records that violated the unique primary index constraint. This must be a new table name. The default is the database default.
Db2 targets have the following options:
INSERT. Insert the rows only. This is the default.
REPLACE. Drops the existing destination table, and creates a new table.
The Bulk Insert API (FASTLOAD) could be in effect for Db2 targets with load option Insert records from memory. The command that changes the settings for the Db2 FASTLOAD is:
ENGINE DB2 SET FASTLOAD [ON|OFF]
If effective, the default value for FASTLOAD is ON.
A SET command for Db2 to control whether or not the FASTLOAD operations are recoverable is:
SQL DB2 SET FASTLOAD_OPTION RECOVERABLE/NONRECOVERABLE
The default value is RECOVERABLE.
There are no additional bulk load options for MySQL.
Nucleus targets have the following options:
Starts the server in exclusive mode. No other database sessions may be established.
The server must already be started in multi-user mode and listening on the default port 1285.
Creates a new file.
Appends to existing log.ndl file.
Issues a commit.
Issues a rollback.
Number of bytes to skip before loading data.
Specifies the character to represent null.
Number of rows to skip before loading data.
Number of rows to process before stopping.
Disconnects connections.
Skips rows with unprintable characters.
NDL server password. Required if Single User (ndls) mode and a server password actually exists.
Red brick targets have the following options:
Row number to begin copying data to the target.
Row number to end copying data to the target.
The combination of language and location.
The maximum number of discarded records allowed before loading stops. The default is the target RDBMS default.
File where duplicate records are discarded. The records will be stored for possible reloading.
File where discarded records based on referential integrity will be stored for possible reloading.
INSERT. Insert the rows only. If the table is not empty, the load operation ends. This is the default.
APPEND. Used to insert additional rows of data into an existing table. Each new row must have a primary-key value that does not already exist in the table. Otherwise, the record is discarded.
REPLACE. Replaces the entire contents of a table.
MODIFY. Used to insert additional rows or to update existing rows in a table. If the input row has the same primary-key value as an existing row, the new row replaces the existing row. Otherwise, it is added as a new row.
UPDATE. Updates existing rows in an existing table. Each new row must have a primary-key value that is already present in the table. Otherwise, the record is discarded.
MODIFY AGGREGATE. If the primary key of the input row matches an existing row in the table, the existing row is updated as defined for the specified aggregate operator. If the primary key of the input row does not match an existing row in the table, the row is inserted.
UPDATE AGGREGATE. If the primary key of the input row does not match the primary key of a row already in the table, the input row is discarded. If it does match an existing row, the existing row is updated as defined for the specified aggregate operator.
OFF. Indexes are updated when each input row is inserted into the data file, which provides better performance when the data being loaded contains many duplicate rows. This is the default.
ON. Overrides the global optimize mode setting in the rbw.config file.
Teradata targets have the following options:
Note: Teradata has two bulk load programs that DataMigrator supports. FastLoad is used for new target tables, and MultiLoad is used for existing target tables. Not all options shown in the image above are available for new target tables.
Two environment variables, $FASTLOAD_EXE and $MLOAD_EXE, should be set for using bulk load.
$FASTLOAD_EXE. Specifies the location of the Teradata FastLoad utility.
$MLOAD_EXE. Specifies the location of the Teradata MultiLoad utility.
Two lines should be added to the profile, or server start up shell script (edastart), that are similar to the following:
export FASTLOAD_EXE = ~teradata/fastload export MLOAD_EXE = ~teradata/mload
The maximum number of rejected records allowed before loading stops. The default is the target RDBMS default.
The Teradata Director Program Id.
The account Id used to access the database.
The maximum number of MultiLoad or FastLoad sessions logged on to the Teradata database.
Row number to begin copying data to the target.
Row number to end copying data to the target.
Name of the work table.
This table provides information about all errors that occur during the acquisition phase of your Update operator job, as well as some errors that occur during the application phase if the Teradata RDBMS cannot build a valid primary index.
This table provides information about uniqueness violations, field overflow on columns other than primary index fields, and constraint errors.
INSERT. Inserts the rows only. This is the default.
UPSERT. Does inserts for missing update rows.
Note: This option is available for existing target tables with keys.
The MultiLoad response to a down Access Module Processor (AMP) condition.
ALL. Pauses the MultiLoad job when AMP is down.
NONE. Allows the MultiLoad job to start, restart, or continue as long as no more than one AMP is down in a cluster.
APPLY. Inhibits the MultiLoad job from entering or exiting the application phase when an AMP is down.
Note: This option is available for existing targets only.
Restarts the log table for the MultiLoad checkpoint information. This option is available for existing target tables only.
Enter an annotation for this target.
Setting the data flow record logging options allows you to write particular types of transactions to log files.
Although Record Logging options appear in every data target, they are set on a per flow basis. Changing the options in one target will reset them for all targets.
The Properties pane opens.
Record logging options are optional and are not available when the data target is a formatted file.
For more information on record logging options for a data flow, see How to Set the Flow Record Logging Properties.
Reference: |
To access the Target Properties pane, right-click a target object and click Properties. The following image shows the Properties pane for an existing Relational target.
All targets have the following properties:
By default, shows the application name, if enabled, and synonym name. Controls the name that is shown under the object in the flow.
Enter an annotation for this target.
Is the target type.
Is the target adapter.
Is the synonym name.
Note: Synonym names cannot be longer than eight characters.
The following sections describe the Target Options available for each different target.
How to: |
The Target Options section of the Properties pane for relational targets contains the following additional options:
Is the connection for the data target. For a relational data target, this is a database server. For ODBC, this is a data source.
Is the name of data target table.
Is the names of key columns in order. This option is only available for new targets.
Does not delete the records already in a data target.
Deletes all rows and creates a database log.
Deletes all rows from the table, but does not generate a database log. This is a faster option than using Delete all rows from table.
The Target Load Options include the following load options:
Note: Although Load Type appears in every data target, it is set on a per flow basis. Changing the type in one target will reset it for all targets.
Specifies the behavior of DataMigrator while loading data.
Insert/Update options can be set on a per target basis.
If you select this option, you can set a behavior when duplicate records are found from the If the record exists drop-down menu.
Note: If you select Include the record, the record is passed directly to the relational database, which determines whether to accept it or not. If inserting a record would result in a duplicate key, the RDBMS will reject it due to a Unique Index constraint violation and return an error. Processing continues even if such errors occur, up to the number of errors specified under Stop processing after __ DBMS errors in the General properties of the flow.
If you select Update the existing record or Delete the existing record from the If the record exists drop-down menu, you can also set behavior when the record does not exist using the If the record does not exist drop-down menu:
Note: The Reject, Update, and Delete options can adversely affect performance because they determine the existence of a key value on an incoming record before performing the specified action. This is done by issuing a SELECT command against the target table, then waiting for the response. If one of these actions is required, try to limit the volume of records in the incremental change. These actions will perform best if there are unique keys on the table.
When you specify this load type, you need to specify:
Note: The value for Commit every rows should be a multiple of the Block size, for example, 10000 and 2000.
These options will depend on your target type.
Uses a database loader program with additional capabilities. For more information, see Extended Bulk Load Utility Options.
This option is only available for existing relational targets.
Enables you to load a data target with column values that change over time. There are two column types. Type I simply overwrites the value. Type II allows you to track activity before and after the change. Type II changes are handled either using an activation flag or an end date/begin date. When you update a Type II column, you do not actually change its value. Instead, you update the activation flag to inactive, or the end date to the current date. You then add a new record with the new value, and the activation flag on by default or the begin date set.
Enables you to load a data target with only the records that have changed. This feature is an optional, add-on component to DataMigrator. When this option is selected, the Prior to Load option is eliminated.
This load type is a combination of the previous two load types. The Insert/Update/Delete indications in the CDC logs are not applied directly to the target. Instead, the SCD Types of the target are used and deleted records in the source are marked as inactive in the target.
The target Properties pane opens.
Note: Truncate table is not supported by Db2 on IBM i.
This option:
Note: The IUD Processing option is only available for existing relational targets.
How to: |
The Target Options section of the Properties pane for FOCUS/FDS or XFOCUS targets contains the following additional options:
Is the name of the data file pointed to by the synonym.
On IBM z/OS, to create a file in HFS (hierarchical file system), enter the name. To create a dataset, enter the name as
//’qualif.tablename.FOCUS’
where:
Is a fully qualified location.
Should match the synonym name.
Is the number of key columns. This option is only available for new targets.
These options are only available for existing targets.
Does not drop (delete) the data target.
Drops and recreates the data target.
Specifies the method DataMigrator uses to load data.
Since DataMigrator uses Insert/Update to load FOCUS/FDS or XFOCUS targets, you can set a behavior when duplicate records are found from the If the record exists drop-down menu. These options can be set on a per target basis.
If you select Update the existing record or Delete the existing record from the If the record exists drop-down menu, you can also set behavior when the record does not exist using the If the record does not exist drop-down menu:
The Target Properties pane opens.
Include the record includes the key in the data target.
DataMigrator can create a new XML document with name/value pairs. For more information on any other structures, see Creating a Data Flow Using a Target Based on a Predefined XML Schema.
Note: In order to create an XML document, a Server Administrator must first configure an adapter for XML.
The Target Options section of the Properties pane for XML targets contains the following additional options:
Is the name of the XML document described by the synonym. DataMigrator will also create an XML Schema Definition with the same name as the data file and an extension of .xsd.
Is set to Insert/Update for XML documents. Note that no updates are currently performed.
Is the name of the top level element.
Is the name of the row level element.
Is the name of the column level element.
The following example uses the same table ibisamp/dminv as a source with all the actual columns, and then creates a new XML target called dminvx. The element names are entered as shown:
Records - Inventory
Record - Item
Column - Detail
The first three rows of the resulting XML document would look like the following example when viewed in a browser:
The Target Options section of the Properties pane for delimited flat file targets contains the following additional options:
Note: Delimited Flat File will only appear as a target type if it is configured on the server.
Select the connection, FTP or SFTP server, where the data should be written.
Is the name of the data file pointed to by the synonym.
On IBM z/OS, to create a file in HFS (hierarchical file system), enter the name. To create a dataset, enter the name as
//’qualif.tablename.DATA’
where:
Is a fully qualified location.
Should match the synonym name.
Specifies a code page for the target data.
Is the delimiter character(s) used between fields. The delimiter can be up to 30 characters. It can be specified as:
TAB. A tab character. This is the default.
a. A character string, for example ~.
0xnn. A hex code, for example, 0x44 (a comma), or 0x0D0A (a return and a linefeed). The hex code uses ASCII for Windows or UNIX systems and EBCDIC for IBM Mainframes.
Inserts column headings as the first row of data and surrounds the column names with the character specified in the Enclosure field. When set to Yes, the FIELDNAME is used for the header.
You can change this to use the TITLE as the header instead. To do so, expand the Adapters folder, right-click Delimited Flat File (CSV/TAB), and click Change Settings. From the HEADER drop-down menu, select Build header based on field title and click Save.
This character is used to surround the column headings when Header is set to Yes.
The Target Options section of the Properties pane for flat file targets contains the following additional options:
The connection configured for the selected adapter.
Is the name of the data file pointed to by the synonym.
On IBM z/OS, to create a file in HFS (hierarchical file system), enter the name. To create a dataset, enter the name as
//’qualif.tablename.DATA’
where:
Is a fully qualified location.
Should match the synonym name.
Specifies a code page for the target data.
This option is only available for existing targets.
Does not delete the data target.
Deletes and recreates the data target.
How to: |
The Target Options section of the Properties pane for formatted file targets contains the following additional options:
The connection configured for the adapter. The default value is <local>.
Is the format of the target data.
Is the name of the data file pointed to by the synonym.
The Target Properties pane opens.
No changes does not delete the data target. New records are appended.
Delete File drops and recreates the data target.
How to: |
Reference: |
In order to insert or update rows in a Salesforce object, you can use a custom field identified as an External ID field.
The Account (or other object) Fields page opens. The new field is shown with its AP name, for example, Account_ID__C.
The synonyms created for Salesforce.com objects identify the ID field as the key. To use the External ID field as the key, you must edit the generated synonym and save a copy of it.
Using the alternative synonym you can now use the External ID field to identify rows to be inserted or updated as you would with a table in a relational database. Salesforce.com will automatically do an upsert; if a value of the External ID field matches a value already in the object being loaded, an update will occur. If not, it will do an insert.
The internal ID field is used to delete rows from a Salesforce.com object. To delete rows using the External ID, create a data flow that performs a DB_LOOKUP. You can also create a data flow that joins a table with a list of External ID values to be deleted to the Salesforce.com object to obtain the values of the corresponding ID field.
WebFOCUS | |
Feedback |