K-nearest neighbors classification is a method for assigning a class membership to a data point in the space spanned by the predictors. The classification is done by assigning the class most common among its k nearest neighbors. This method requires having a distance definition in this space.
KNN_CLASSIFY(options, neighbors, power, predictor_field1[, predictor_field2, ...] target_field)
where:
Reserved for future use.
Integer
Is the number of nearest neighbors to participate in the prediction.
Integer
Is the power (p) of the L^p-distance.
Numeric
Are one or more predictor field names.
Numeric
Is the target field.
The following request uses KNN_CLASSIFY to predict education level using the 20 nearest neighbors and Euclidean distance, with predictors age, income, population range, and gender. The DEFINE FILE command creates virtual fields with correct numeric formats for use in the function. The data set includes rows with missing target values.
DEFINE FILE WF_RETAIL POP_CODE/I2 = DECODE WF_RETAIL_GEOGRAPHY_CUSTOMER.CITY_POPULATION_RANGE ( 'H: 100,001 - 250,000' 1, 'I: 250,001 - 1,000,000' 2, 'J: 1,000,001 - 10,000,000' 3, 'K: 10,000,001 - 50,000,000' 4, ELSE 0 ); GENDER_CODE/I2 = DECODE WF_RETAIL_CUSTOMER.GENDER ( 'M' 1, 'F' 0 ); END
TABLE FILE WF_RETAIL
PRINT ID_CUSTOMER AGE INCOME DEGREE_M EDUC_LEVEL_M
COMPUTE ED_PRED/I2=KNN_CLASSIFY(' ',20,2,
AGE,
INCOME,
POP_CODE,
GENDER_CODE,
EDUC_LEVEL_M);
WHERE POP_CODE NE 0;
WHERE OUTPUTLIMIT IS 8;
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END
Partial output is shown in the following image.
WebFOCUS | |
Feedback |