Store Performance Prediction and Impact Models

Introduction

The problem of where to locate a retail store and what size (area) of store to build is a major problem in modern retail. Simple metrics such as population density, average household income and/or locations of competitors do not guarantee that a new store will to store term profitability, or even profitability at all. Demographics combined with geophysical attributes such as road conditions can lead to vastly different outcomes even if the same type of store is located in similar neighborhoods. With razor thin margins and the large investments needed to create stores, there is an ever-increasing need for accurate prediction before any contracts are signed and any construction started.

Typically, medium sized retailers will use systems such as SPECTRA to look at geographic data as a starting point. SPECTRA is very expensive, and licencing costs mean that there is an ongoing cost associated with using it. SPECTRA is not a prediction tool, it is simply a collection of geographic data, such as the type of family around a given location, mean income, professional qualifications and the such like, arranged in concentric circles (1 mile, 3 mile, 5 mile) around the location of interest. It is up to the user of SPECTRA to draw any conclusions on how a store in that location would perform. This leads to inherent problems as what a retailer experiences in one city can vary widely from that in another city. So making decision based on past geographic data in one location is inherently problematic.

Literature searches have failed to show many, in any approaches to solve this problem. This does not mean approaches do not exist, but if any do, then they are not being published and it’s easy to speculate as to the reason why. Such tools and systems will give a competitive advantage to such organizations.

Therefore, this research is focus on finding methods to address this problem. It seems reasonable to assume its a non-linear classification problem and so AI and machine learning techniques seem to be a reasonable starting point.

Approach

The primary purpose of this research is to predict sales data for major Texas liquor stores using various metrics from the area surrounding each store. The data used as predictors are then scored for variable importance to explain what data best accounts for retail store success.

Data Sources

Delinquency Data

When retailers of alcohol are delinquent in payment to the wholesaler from which they purchase alcoholic beverages, the wholesaler is required to report a notice of delinquency to the The Texas Alcohol and Beverage Commission. These records become public information available through the TABC website. This information was used to quantify how much product each retail store of interest was purchasing, thus serving as an proxy indicator of retail store success.

Property Values

Property values may supplement average household income as an indicator for economic status of a given area and is thus used as a potential predictor of retail store success. Property values are collected from Appraisal District public records from each county in Texas. These records contain Appraisal Values for each property in a given county, as well as Address/City/Zip for each of those properties. Latitude and longitude coordinates for each property are extrapolated using a C# application built around a geocoding API developed by Texas A&M University Geoservices.

Liquor Stores

To obtain the number of competing liquor stores surrounding each target location, data was pulled from the Texas Alcoholic Beverage Commission that lists all liquor stores in Texas, along with their license or permit number. Latitude and longitude were geo-coded for every liquor store to extract location and distance relative to each target. Only those within a set distance from the target would be used in the feature set.

Sales Tax Permit Holders

In order to understand the importance of commercial density on retail success, a complete list of all sales tax permit holders was acquired from the Texas Comptroller’s office. Geocoordinates of every permit holder allow for the determination of the number of businesses surrounding each target location.

Mixed Beverage Gross Receipts

To include information about where consumers might purchase alcohol outside liquor stores, bar and restaurant sales of wine, beer and liquor were obtained from the Mixed Beverage Gross Receipts report from Texas Comptroller’s office.

Data Preparation

Raw data was imported into Microsoft SQL Server 2016 to be manipulated into usable data for the machine learning algorithm. Concentric circles were arranged around each retailer of interest, such that each data type could be considered at a given distance (see below). Data points for each subtype were measured against the retailer’s latitude and longitude using the Haversine formula (programmed as a SQL function), so that each data type could be divided into 1, 3, and 5 mile radii. On some experiments, the concentric circles were further divided into quadrants in order to take into account the surround natural features which might preclude the involvement of certain data. For instance, a lake in the southwest quadrant of a retailer would prohibit the inclusion of any liquor stores, property values, or mixed beverage sales for that quadrant.

Quad

Machine learning algorithms require data to be in the form of an m x n matrix, with m rows representing each record (target retailer) and n columns representing each predictor. An example of such a feature set is shown below. The targets were divided into four equal classes based on total delinquency amount, such that each class represented a 25% percentile (e.g., class 1 represents top 25%, class 4 represents bottom 25%).

Blog Sample FS

Model Selection

Choosing the best model for a supervised learning classification task requires some trial and error, but an understanding of the data set, as well as of the particular goal, give insight into this decision. Here we take into account the primary purpose of our problem: given labeled records for each store, which features are most important in classifying and predicting store performance?

Rather than beginning with a complex neural network, we choose to start with a decision tree model, which requires very little data preparation. This non-parametric supervised learning method creates a model by learning simple if-then-else decision rules inferred from the data features. The deeper the tree, the more complex the decision rules and the fitter the model. A benefit to using a white box model such as a decision tree rather than a black box model (e.g., an artificial neural network), is that the conditions can be explained by boolean logic and easily visualized. The higher the feature in the decision tree, the more important it is in delineating classes.

Decision tree models do have drawbacks, however, mainly that they can be relatively unstable and can be inaccurate when used in isolation. Small changes in data can lead to large changes in the structure of the optimal decision tree model. This can be monitored by observing the consistency of feature importance. When the model is ran several times, are there features which are consistently reported as being important? Or do the most important features change at random, signifying that the model is unstable? If the latter is true, a single decision tree can be combined with other techniques, or could be included into a random forest, by which an ensemble of trees can determine, on average, which features are most important across a number of decision trees. These techniques can also be used to combat inaccurate classification of a single decision tree, which is common when the number of predictors/features outnumber the number of targets/records.

Experiments

Experiment 1

Experiment 2 – Data Exploration