Topics: |
Data Profiling provides data characteristics for the columns in a synonym. You can display the characteristics for all the columns in a synonym or segment, or for an individual column.
For alphanumeric columns, Data Profiling provides the segment, format, count of distinct values, total count, patterns count, maximum, minimum, and average length, minimum and maximum values, and number of nulls. Patterns count shows the number of patterns found in each alphanumeric column.
For numeric columns, Data Profiling provides the segment, format, count of distinct values, total count, maximum, minimum, and average values, and number of nulls.
Data Profiling for an individual column provides access to Statistics, Patterns, Values, and Outliers reports.
How to: |
Data Profiling provides information on all the columns in a synonym or segment. You can also drill down to the Values or Patterns reports for an individual column from a synonym or segment Data Profiling report.
Note: Data Profiling is also available from the navigation pane by right-clicking a synonym, pointing to Data Profiling, and then clicking Statistics.
To view the Data Profiling information for a synonym or segment:
The Synonym Editor opens to the Field View tab.
The Data Profiling information displays in the workspace. The last four columns are shown below the rest of the information for illustrative purposes only. The actual report runs across the workspace.
You may use the Report tab in the ribbon to view server messages, print the report, copy data as text, and export the report.
For pattern analysis, a 9 represents a digit, an A represents any uppercase letter, and an a represents any lowercase letter. All printable special characters are represented by themselves, and unprintable characters are represented by an X.
Note: Data Profiling is also available from the navigation pane by right-clicking a synonym.
Key Analysis provides a report that shows which columns in a data source can be used individually, or in combination, to uniquely identify a row. The columns identified in this report are candidates for key columns.
Note: Key Analysis is also available from the navigation pane by right-clicking a synonym, pointing to Data Profiling, and then clicking Key Analysis.
To view key analysis for a synonym or segment:
The Key Analysis dialog box opens.
The Key Analysis report for that synonym box opens.
The report shows:
The selected segment.
Name of the segment.
The format of each column.
The number of elements (columns) shown.
The number of rows.
The number of distinct rows.
The percentage of rows that are distinct. This value must be 100% for a combination of columns to be used as key.
The number of duplicate values.
The percentage of duplicate values. This value must be 0% for a combination of columns to be used as key.
By default, the report is sorted by the number of elements, so the first rows in the report show one element each. This enables you to determine if any single column could be used by itself as a key. The report then shows all combinations of two columns, three columns, and so on.
To see the values in the report, right-click on any row.
The duplicate rows option shows all duplicate values, which prevent the desired column combination from being used as a key.
How to: |
Data Profiling for an individual column provides access to nine reports:
For alphanumeric columns, the Statistics report provides the segment, format, count of distinct values, total count, patterns count, maximum, minimum, and average length, minimum and maximum values, and number of nulls.
For numeric columns, the Statistics report provides the segment, format, count of distinct values, total count, maximum, minimum, and average values, and number of nulls.
Optionally, you can click the pattern to drill down to the values for the rows containing those patterns.
For pattern analysis, a 9 represents a digit, an A represents any uppercase letter, and an a represents any lowercase letter. All printable special characters are represented by themselves, and unprintable characters are represented by an X.
The tooltip for each bar also shows the count of instances of the field and the percent of the total.
Note: Outliers produce a maximum of 10 highest and lowest distinct values, if they exist.
These reports are available by right-clicking a column in the Synonym Editor and pointing to Data Profiling.
To view a Data Profiling report for a single column:
The Synonym Editor opens to the Field View tab.
The Statistical Data Profiling information displays in the workspace, as shown in the following image.
The report shows:
The name of the segment in the synonym. For a cluster synonym, each segment can refer to a separate table; a segment can also refer to a group of columns.
The column name. If you click the name, another report opens for that column showing each unique value and a count of the number of times the value appears and the percentage
The FOCUS format of the column.
The number of values for the field.
The number of unique values for the column.
The percentage of values that are unique.
The number of unique patterns (combinations of letters and digits) that are found.
Only for numeric fields.
Only for character fields.
The number of values that are missing.
The percentage of values that are missing.
The number of values that are not unique.
The percentage of values that are not unique.
The list of columns displayed can be selected from the ribbon by clicking Customize Columns. Clear any that are not desired. The following optional columns can also be displayed:
The value where half the values are less and half the values are more.
The most commonly occurring value.
Optionally, you can click a column name or patterns count (for alphanumeric columns) to drill down to the Values or Patterns reports, respectively.
Note: Data profiling requires reading every row and column of the source table. Use caution with large data sources.
WebFOCUS | |
Feedback |