In Part 1, a pilot analysis was conducted to assess the performance of decision trees in classifying retail stores based on surrounding metrics including property value, nearby beverage sales, and nearby liquor stores. Classification accuracy reached a max value of 93.9% and 46.3% for training and testing sets, respectively. However, learning curves demonstrated that the model expressed high variance, but not necessarily high bias. Therefore, a series of additional models were ran as an addendum to the decision tree pilot.
Retail Only, Even Classes
The feature set used in the pilot model was not exclusive to retail-only stores, but rather included locations that did local wholesale distribution as well. The disparity in total delinquency amount (which is discretized to make target classes) between retail-wholesale and retail-only may help cause enough variance to help the algorithm. From a practical perspective, however, it doesn’t make sense to compare apples to oranges. While it would be a worthwhile experiment to run a classification model to predict retail-wholesale vs retail-only, the two should not be included into the same classes in a classify-by-price algorithm. Thus, for this model, 23 retail-wholesale samples were removed from the feature set, leaving 39 retail-only targets.
With so few samples, it is important to have equal class sizes so as not to input unbalanced data. Here, the 39 samples two experiments were ran that discretized samples by percentile rather than by an arbitrary delinquency amount: 1) quartiles (4 classes) and 2) terciles (3 classes).
- Decision tree accuracy reached 80% train, 25% test. Random forest reached 100% train, 12.5% test. Most important features: DT – sD10Q3, stdLiqD15, aD10Q4, RF – none
- Decision tree accuracy reached 95% train, 50% test. Random forest reached 100% train, 37.5% test. Most important features: DT- stdBevD15Q3, sD5Q1, aBevD15Q4, RF- none
No results can be drawn from this experiment due to insufficient number of samples.
Reduced number of features
From a machine learning perspective, an algorithm must work harder and is prone to inaccuracy if the number of features far outweigh the number of samples, as is the case here. From a practical standpoint, it can be argued that the feature size used in this experiment could be diminished in order to draw logical and meaningful conclusions. The pilot study consisted of data that was organized not only by distance from the target, but also by cardinal location (quadrants). When looking at which variables are most important in classifying stores, an answer such as “average property value within 1 mile but only in the northwest quadrant” is not as helpful as, say, “average property value within 1 mile radius of the store.” So the model was ran using a feature set that, rather than including quadrants, only included sum, average and standard deviation of each data type.
It can be inferred that any results seen so far are unstable and potentially inaccurate due to the insufficient sample size. Due diligence has to be done to verify if a sufficient sample size can be reached using only locations without a local distributor permit. If distributor locations must be included to obtain a usable sample set, a new approach must be taken.