How to: |

The linear regression equation estimates values by assuming that the dependent variable (the new calculated values) and the independent variable (the sort field values) are related by a function that represents a straight line:

y=mx +b

where:

`y`- Is the dependent variable.
`x`- Is the independent variable.
`m`- Is the slope of the line.
`b`- Is the y-intercept.

FORECAST_LINEAR uses a technique called Ordinary Least Squares to calculate
values for *m* and *b* that minimize the sum of the squared
differences between the data and the resulting line.

The following formulas show how *m* and *b* are
calculated.

where:

`n`- Is the number of data points.
`y`- Is the data values (dependent variables).
`x`- Is the sort field values (independent variables).

Trend values, as well as predicted values, are calculated using the regression line equation.

FORECAST_LINEAR(display,infield,interval,npredict)

where:

`display`-
Keyword

Specifies which values to display for rows of output that represent existing data. Valid values are:

- INPUT_FIELD. This displays the original field values for rows that represent existing data.
- MODEL_DATA. This displays the calculated values for rows that represent existing data.

Note: You can show both types of output for any field by creating two independent COMPUTE commands in the same request, each with a different display option.

`infield`- Is any numeric field. It can be the same field as the result field, or a different field. It cannot be a date-time field or a numeric field with date display options.
`interval`- Is the increment to add to each sort field value (after
the last data point) to create the next value. This must be a positive
integer. To sort in descending order, use the BY HIGHEST phrase.
The result of adding this number to the sort field values
is converted to the same format as the sort field.
For date fields, the minimal component in the format determines how the number is interpreted. For example, if the format is YMD, MDY, or DMY, an interval value of 2 is interpreted as meaning two days. If the format is YM, the 2 is interpreted as meaning two months.

`npredict`- Is the number of predictions for FORECAST to calculate. It must be an integer greater than or equal to zero. Zero indicates that you do not want predictions, and is only supported with a non-recursive FORECAST.

The following request calculates a regression line using the VIDEOTRK data source of QUANTITY by TRANSDATE. The interval is one day, and three predicted values are calculated.

TABLE FILE VIDEOTRK SUM QUANTITY COMPUTE FORTOT=FORECAST_LINEAR(MODEL_DATA,QUANTITY,1,3); BY TRANSDATE ON TABLE SET PAGE NOLEAD ON TABLE SET STYLE * GRID=OFF,$ ENDSTYLE END

The output is shown in the following image:

Note:

- Three predicted values of FORTOT are calculated. For values outside the range of the data, new TRANSDATE values are generated by adding the interval value (1) to the prior TRANSDATE value.
- There are no QUANTITY values for the generated FORTOT values.
- Each FORTOT value
is computed using a regression line, calculated using all of the actual
data values for QUANTITY.
TRANSDATE is the independent variable (x) and QUANTITY is the dependent variable (y). The equation is used to calculate QUANTITY FORECAST trend and predicted values.

The following version of the request charts the data values and the regression line.

GRAPH FILE VIDEOTRK SUM QUANTITY COMPUTE FORTOT=FORECAST_LINEAR(MODEL_DATA,QUANTITY,1,3); BY TRANSDATE ON GRAPH PCHOLD FORMAT JSCHART ON GRAPH SET LOOKGRAPH VLINE END

The output is shown in the following image.

WebFOCUS | |

Feedback |