Explanation of the Market Basket Model


How to:


What Is Market Basket Analysis?

Market Basket Analysis is a technique which identifies the strength of association between pairs of products purchased together and identify patterns of co-occurrence. A co-occurrence is when two or more things take place together.

Market Basket Analysis creates If-Then scenario rules, for example, if item A is purchased then item B is likely to be purchased. The rules are probabilistic in nature or, in other words, they are derived from the frequencies of co-occurrence in the observations. Frequency is the proportion of baskets that contain the items of interest. The rules can be used in pricing strategies, product placement, and various types of cross-selling strategies.

How Market Basket Analysis Works

In order to make it easier to understand, think of Market Basket Analysis in terms of shopping at a supermarket. Market Basket Analysis takes data at transaction level, which lists all items bought by a customer in a single purchase. The technique determines relationships of what products were purchased with which other product(s). These relationships are then used to build profiles containing If-Then rules of the items purchased.

The rules could be written as:

If {A} Then {B}

The If part of the rule (the {A} above) is known as the antecedent and the THEN part of the rule is known as the consequent (the {B} above). The antecedent is the condition and the consequent is the result. The association rule has three measures that express the degree of confidence in the rule, Support, Confidence, and Lift.

For example, you are in a supermarket to buy milk. Based on the analysis, are you more likely to buy apples or cheese in the same transaction than somebody who did not buy milk?

In the following table (table 1), there are nine baskets containing varying combinations of milk, cheese, apples, and bananas.

The next step is to determine the relationships and the rules. For explanation purposes, the following table shows some of the relationships. In total there are 22 rules for the nine baskets. The complete set of rules are shown in the explanation of the RStat output.

The first measure called the support is the number of transactions that include items in the {A} and {B} parts of the rule as a percentage of the total number of transactions. It is a measure of how frequently the collection of items occur together as a percentage of all transactions.

The support formula written out would look something like:

Interpreted as: Fraction of transactions that contain both A and B.

The second measure called the confidence of the rule is the ratio of the number of transactions that include all items in {B} as well as the number of transactions that include all items in {A} to the number of transactions that include all items in {A}.

The confidence formula written out would like something like:

Interpreted as: How often items in B appear in transactions that contain A only.

The third measure called the lift or lift ratio is the ratio of confidence to expected confidence. Expected confidence is the confidence divided by the frequency of B. The Lift tells us how much better a rule is at predicting the result than just assuming the result in the first place. Greater lift values indicate stronger associations.

The lift formula written out would look something like:

Interpreted as: How much our confidence has increased that B will be purchased given that A was purchased.

Practical Applications of Market Basket Analysis

When one hears Market Basket Analysis, one thinks of shopping carts and supermarket shoppers. It is important to realize that there are many other areas in which Market Basket Analysis can be applied. An example of Market Basket Analysis for a majority of Internet users is a list of potentially interesting products for Amazon. Amazon informs the customer that people who bought the item being purchased by them, also reviewed or bought another list of items. A list of applications of Market Basket Analysis in various industries is listed below:

Data Requirement

  1. Baskets
    • This column identifies the individual baskets.
    • Values can be categoric or numeric to identify the baskets.
  2. Products
    • This column has all the items that are included in each basket.
    • Values of items can be categoric or numeric.

For example, from the table 1 below:

Procedure: How to Create an Association Model Using Market Basket Analysis

In this example, we are going to create a model for Market Basket Analysis of purchases at a grocery store. We will use the Basket data set that contains observations on the purchases of particular items, such as milk, cheese, and apples.

  1. Define the Model Data.
    • Load the Baskets data set into RStat. For more information on loading data into RStat, see Getting Started With RStat.
    • Turn off sampling by unchecking the Partition check box.
    • For the Target Data Type, leave the Auto radio button selected.
    • Select BASKET as the Ident variable, which defines the basket.
    • Select PRODUCT as the Target variable, which defines the products in the basket.
    • Click Execute to run the Model Data.

    The Status bar confirms your data settings, as shown in the following image.

  2. Select the Associate tab, as shown in the following image.
    • Select the Baskets check box.
    • Leave the default values for Support and Confidence. Changing Support and Confidence control values will increase or reduce the number of rules that get created.

      Note: Support is a numeric value for the minimal support of an item set (the default value is 0.1). Confidence is a numeric value for the minimal confidence of the rules or association hyperedges (the default value is 0.1).

    • Click Execute to run the Model Data.

    The model output appears. You may need to scroll to see the complete output, depending on the size of your window.

Reference: Output From the Market Basket Analysis

  • Summary of the Apriori Association Rules. This is the title of the output. Apriori is the best known algorithm to mine association rules. Apriori iteratively discovers pairs with the largest frequencies and then with decreasing frequencies.
  • Number of Rules: 80. The number indicates how many rules are generated from the data with the parameters selected.
  • Summary of the Measures of Interestingness. This is a summary of the descriptive statistics of the distribution values for Support, Confidence, and Lift.
  • Summary of the execution of the apriori commands.

    This is a summary of the settings that come with the apriori algorithm. Except for Support and Confidence, which you can change in the GUI, the remaining settings are set to default values.

    • The mining parameters (parameter) change the characteristics of the mined item sets or rules (for example, the minimum support).
      parameter specification:
       confidence minival smax arem aval  originalSupport support minlen maxlen target  ext
             .01     0.1    1  none FALSE            TRUE     0.1      1      5  rules  FALSE
    • Control parameters (control) influence the performance of the algorithm (for example, enable or disable initial sorting of the items with respect to their frequency).
      algorithmic control:
       filter   tree   heap   memopt   load   sort   verbose
         0.1    TRUE   TRUE   FALSE    TRUE      2   TRUE

Note: For more information on the apriori algorithm parameters, see the R documentation for the arules package at: http://cran.r-project.org/web/packages/arules/arules.pdf.

Procedure: How to Generate Rules

To generate rules, click the Show Rules button.

The output will be printed below the original information presented.

  • The LHS is the Antecedent ({A} from the example above).
  • The RHS is the Consequent ({B} from the example above).

Note: Based on the data, the rules are created. For rule 1:

  • Support says that 67% of customers purchased milk and cheese.
  • Confidence is that 100% of the customers that bought milk also bought cheese.
  • Lift represents the 28% increase in expectation that someone will buy cheese, when we know that they bought milk. This is the conditional probability.

Procedure: How to Generate a Frequency Plot

To generate a frequency plot, click the Freq Plot button. The output appears in a new window. This frequency plot shows the percent of times each unique item occurs in all baskets.

The bar chart below shows the frequency of the individual items in the analysis.

Using a Market Basket Analysis Routine for Scoring

How to:

In the example that is used in this chapter, the data set contains products that a customer in a grocery store might purchase (for example, milk, cheese, bananas, and apples). To run the Market Basket Analysis, the data set only needs to contain the basket and the product information. Once the Market Basket technique is run in RStat, a scoring routine can be exported, which would apply the output (rules with regard to the products and the confidence number) to the new data sets. This section provides procedures for the post-Market Basket Analysis execution process.

Procedure: How to Execute the Market Basket Analysis

To execute the Market Basket Analysis:

  1. Load the Baskets data set into RStat. For more information on loading data into RStat, see Getting Started With RStat.
  2. Disable sampling by clearing the Partition check box.
  3. For the Target Data Type, leave the Auto radio button selected.
  4. Select NUMBER as the Ident variable.
  5. Select X_IF_PRODUCT as the Target variable.
  6. Ignore all of the remaining variables.
  7. Click Execute.
  8. Click the Associate tab.
  9. Select the Baskets check box.
  10. Click Execute.

Procedure: How to Export the Market Basket Analysis Function

To export the Market Basket Analysis:

  1. Click Export.

    The Export C or PMML dialog box opens, as shown in the following image.

    There are two export types that can be selected:

    • Item. Exports rules that will determine the products in the new data set.
    • Confidence. Exports the confidence number for the products that are selected in the result.
  2. Select Item or Confidence as the export type, depending on your requirements.
  3. Create a scoring file to run the Market Basket Analysis rules against.

    This will show the format and file structure for the scoring data set. The Max Inputs for the scoring file should be the total items in the training data set minus 1.

    1. Create a test file, as shown in the following image.
    2. Using the Upload Data option in App Studio, create a Master File, as shown in the following image.

      Image of the Upload Wizard with the Market Basket data

      Note: The Upload Data option allows you to upload new data to an application, creating a new, unique Master File. Define or create a data source, right-click a folder from the relevant application folder, and then click Upload Data. The Reporting Server Console opens and you then select a file to upload. The Business View and Prepared result display. On the ribbon, click Load and Next, set your load options, and then click Proceed to Load to create the Master File in your repository. For more information on uploading data, see the Business Intelligence Portal manual.

    3. Deploy the C files.
    4. Test the C routines.

      The following image shows the output that is generated for the sample WebFOCUS report in a web browser.

    The above report output lists the items and confidence value for each item to be selected. Values in the ITEM_2 and ITEM_3 columns are inputs. Values in the item and confidence columns are the results of the Market Basket Analysis routine. In other words, item is the product recommendation that the customer is most likely to buy after buying item 2 and item 3 together according to the associated rules generated by the historical data.

    In the first case, item 2 is empty, so the suggested item is Milk for people who only purchase Cheese. In the second case, Beer is not within the historical data for generating the rules, so No match found is returned. This means that there is no product recommendation for people who purchase Beer. The result of the third case indicates that people who purchase Milk and Apples will also purchase Cheese. This is followed by a confidence value that shows the possibility of buying Cheese after purchasing Milk and Apples together.