To produce a valid output sample, each sampling definition
requires information about the file to be sampled. In each sampling
routine you can use the sample calculator to calculate the values
for each of the following or enter your own values:
- Population Size. The
number of records in the overall data set to be sampled. Before
performing a sampling routine, the population size must be determined.
Within the sampling function, the population size can be retrieved
at design time, generated live at run time, or defined as a fixed
value. Although the population size is usually the total number
of records in the file, there will be times when you want to sample
a subset of the file. To ensure a statistically valid sample, the
specified population size must be the same as the number of records
sampled.
- Confidence Level. Is
the probability, expressed in percent, that the selected sample
will represent the total population. Most guidelines establish a minimum
acceptable confidence level of 90%, 95%, or 98%.
- Margin of Error. Represents
the amount of error, expressed in percent, that you can tolerate.
Lower margins of error require larger sample sizes.
- Response Distribution. Allows
you to correct for skewness in the sample (if the sample deviates
from the normal standard deviation). Use the Response Distribution
percentage to account for the skewness in population.
- Seed. The
statistical sampling routines use a seed value for the pseudo random
number generator. This generator produces a series of random numbers
from the entered seed. These random numbers are then used to determine
each record to include in the sample. The seed has no affect on
the number of records included in the sample, it only affects which
records are selected. A single seed will produce the same set of
random numbers, so if you want to replicate a sample, use the same
seed, population size, and record order. To generate a unique sample,
enter a new seed value each time.
- Sample Size. The
number of records that you want to store in the output file. The
number should be a positive integer, greater than zero and less
than the population size.