RF_REGRESS: Random Forest Regression

How to:

RF_REGRESS creates a random forest, which is an ensemble of decision trees. Each decision tree produces an independent regression prediction, and the prediction of the forest is the average of the individual predictions.

Syntax: How to Calculate a Random Forest Regression

RF_REGRESS(options, number_of_trees, 
         predictor_field1[, predictor_field2, ...] target_field)

where:

options

Reserved for future use.

number_of_trees

Integer

Is the number of decision trees in the forest.

predictor_field1[, predictor_field2, ...]

Numeric

Are one or more predictor field names.

target_field

Numeric

Is the target field.

Example: Predicting Income Using RF_REGRESS

The following procedure uses RF_REGRESS to predict income, using a random forest with 100 decision trees, with predictors age, education level, population range, and gender. The DEFINE FILE command creates virtual fields with correct numeric formats for use in the function.

DEFINE FILE WF_RETAIL
POP_CODE/I2 =      
  DECODE WF_RETAIL_GEOGRAPHY_CUSTOMER.CITY_POPULATION_RANGE ( 
    'H: 100,001 - 250,000'       1, 
    'I: 250,001 - 1,000,000'     2,
    'J: 1,000,001 - 10,000,000'  3, 
    'K: 10,000,001 - 50,000,000' 4,  
    ELSE 0  ); 
GENDER_CODE/I2 =  
  DECODE WF_RETAIL_CUSTOMER.GENDER ( 
    'M' 1, 'F' 0 ); 
END 
TABLE FILE WF_RETAIL 
PRINT                
ID_CUSTOMER 
EDUC_LEVEL_M
POP_CODE
GENDER_CODE 
INCOME_M   
COMPUTE PRED_INCOME/D12.2 = RF_REGRESS(' ',100,
                                       AGE,
                                       EDUC_LEVEL_M,
                                       POP_CODE,
                                       GENDER_CODE,
                                       INCOME_M);   
WHERE EDUC_LEVEL_M NE 0 
WHERE POP_CODE NE 0      
WHERE INCOME GT 12001.00 
WHERE OUTPUTLIMIT IS 12; 
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

Partial output is shown in the following image.

WebFOCUS

Feedback