KNN_REGRESS: K-Nearest Neighbors Regression

K-nearest neighbors regression is a method for predicting a target value for a data point in the space spanned by the predictors. The prediction is the average of the target values of its k nearest neighbors. This method requires having a distance definition in this space.

Reference: Calculate a K-Nearest Neighbors Regression

KNN_REGRESS(options, neighbors, power,
        predictor_field1[, predictor_field2, ...] target_field)

where:

options

Reserved for future use.

neighbors

Integer

Is the number of nearest neighbors to participate in the prediction.

power

Integer

Is the power (p) of the L^p-distance.

  • power=1 calculates the distance as the sum of the absolute values of the differences between the coordinates (Manhattan distance).
  • power=2 calculates the distance as the square root of the sum of the squares of the differences between the coordinates (Euclidean distance).
predictor_field1[, predictor_field2, ...]

Numeric

Are one or more predictor field names.

target_field

Numeric

Is the target field.

Example: Predicting Income Using KNN_REGRESS

The following request uses KNN_REGRESS to predict income using the 10 nearest neighbors and Euclidean distance, with predictors age, education, population range, and gender. The DEFINE FILE command creates virtual fields with correct numeric formats for use in the function.

DEFINE FILE WF_RETAIL
POP_CODE/I2 =
  DECODE WF_RETAIL_GEOGRAPHY_CUSTOMER.CITY_POPULATION_RANGE (
    'H: 100,001 - 250,000'       1,
    'I: 250,001 - 1,000,000'     2,
    'J: 1,000,001 - 10,000,000'  3,
    'K: 10,000,001 - 50,000,000' 4,
    ELSE 0  );
GENDER_CODE/I2 =
  DECODE WF_RETAIL_CUSTOMER.GENDER (
    'M' 1, 'F' 0 );
END
TABLE FILE WF_RETAIL                                                            
PRINT                                                                           
ID_CUSTOMER   
EDUC_LEVEL_M
POP_CODE
GENDER_CODE
INCOME_M   
COMPUTE PRED_INCOME/D12.2 = KNN_REGRESS(' ',10,2,
                                        AGE,
                                        EDUC_LEVEL_M,
                                        POP_CODE, 
                                        GENDER_CODE, 
                                       INCOME_M);                                 
WHERE INCOME GT 12001.00                                                        
WHERE OUTPUTLIMIT EQ 12     
WHERE POP_CODE NE 0
WHERE EDUC_LEVEL_M NE 0
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

Partial output is shown in the following image.

WebFOCUS

Feedback