A risk free way to create business value from data

Reading time: 5 min

As a company providing data solutions, we often encounter a kind of confusion among our customers about how to apply a Machine Learning (ML) based solution to a business problem (or even how to give data-oriented support for an entire department or a company). One of the most important basic features of analytical initiatives is that they are often pilot projects, and their success and added value are difficult to predict at the beginning. Therefore, if we do not approach it carefully and with proper methodology, such an initiative can easily be a failure and burn into the minds of business decision-makers as a negative experience, a total waste of money. In case of analytical projects, an approach is required where phases are defined in a way that enables gradually higher investment and parallelly increasing guaranteed ROI (return on investment). Based on our experience, the best solution is a Proof of Concept-based approach that can and should be used at any level of data maturity.

PoC, i.e. Proof Of Concept

By PoC, we mean mini-projects that explore the feasibility of a (Machine Learning) solution and its business benefits quickly and cheaply.

Source: techopedia

Many times, a data project idea sounds very good in theory: it promises low costs and high benefits, and on a high level there is no sign of any obstacles. However, in reality, when the development has already begun, the project might fail for a variety of reasons (e.g., no data, poor data quality, or simply due to a business issue that is not able to be modelled, etc.). As a manager, it’s tempting to accept such good ideas with the promise of quick implementation, but it’s much more uncomfortable to face the fact that the dreamed-of ML app still doesn’t meet our expectations, plus we’ve thrown a small (or even bigger) amount of money out the window. The solution to minimizing such risks is the PoC-based approach, which means spending extra energy (and money) only on what is proven to be worth addressing.

The life cycle of ML-based solutions

The figure below illustrates how an idea is integrated into business processes.

As a first step, we need to be aware of what use cases and data related business problems are available. If these are not available, it is worth stepping back and reconsidering the main pain points of the company’s operations (e.g., reducing costs, stopping customer dropouts, increasing margins, etc.). Once these pain points are given, testable ideas (hypotheses) can be defined, for example: risk of churn can be determined using customer data and past behavior patterns.

Regardless of the sector, there will be several business problems for which a data solution is conceivable, but there is no capacity to implement all the ideas. Ideas should therefore be prioritized, i.e. it should be determined for which idea(s) it is the most worthwhile to start a PoC project. In our experience, there is no obstacle to running multiple PoCs in parallel (if you have the capacity or, for example, to test multiple suppliers), but you should expect these threads to run independently. The big advantage of this is that you can have two strings to your bow, so you have a better chance of success, but obviously, the attention is divided, so it is worth deciding on parallelization depending on the available capacities. It is also worth considering speed through prioritization, i.e. to define PoC projects for those business cases where you can validate the possibilities and pitfalls of an analytical solution relatively quickly.

If the PoC is successful, i.e., the idea is tried and brings the expected benefits, it is worth moving on to implementation and incorporating the results into day-to-day operations. In case the result is the opposite, stop and draw conclusions, don’t waste more resources on the idea.

Principles of the methodology:
  1. Planning – Let’s spend time selecting an idea for PoC, specifically checking feasibility in terms of data availability and quality.
  2. Toolbox – No need for a cloud data warehouse or sandbox, but sufficient amount of good quality data is really needed
  3. Open-mindedness – It is crucial that the organization be open to such a solution, and not as an attack against expertise, as experts could feel “a machine will not tell me what to do”.
  4. Expertise – When acquiring business knowledge, it is worth involving experts, asking for help and not competing with them with an algorithm.
  5. Lessons Learned – If a PoC is not successful, draw conclusions and use them in the next idea.
  6. Multiple attempts – If possible, try multiple problems (e.g. prediction, optimization, anomaly detection, etc.)
  7. Tensions – Be prepared for the fact that a new solution can replace old processes, and you have to deal with these changes.
  8. Simplicity – To maintain motivation and address expectations, start with an easier idea that is expected to be successful, and then deal with riskier ones.
Arguments against PoC, difficulties

Although there are many arguments in favor of the PoC approach, we also have some against to highlight:

  1. You can’t just launch PoC projects forever. It is important that the data “project funnel” be strong and that integrated data applications are born, and there is not only experiment all the time.
  2. Scalability. It is not always clear how a successful PoC project can be scaled to an implemented solution. This aspect must be taken into account from the beginning.
  3. Dealing with excessive expectations. It is important that experts and business decision-makers involved in the process know what to expect. The result of a PoC is not a final solution, not a classic project, and the realization of financial benefits typically come in a follow-up project.
  4. Take the time. One of the advantages of PoCing is that it is fast, but that doesn’t mean you can hurry. Depending on the complexity of the PoC, the lead time can take from 2 weeks to up to half a year, and the typical lead time is 10 weeks based on our experience.
  5. Buy-in. Without the commitment and motivation of the related business area, it is not possible to design and operate an ML solution successfully.
  6. Quantify benefits. It is often unclear how the potential benefits will be accurately demonstrated (e.g. savings due to an unforeseen event). Significant energies need to be allocated for thorough and fair re-measurement and evaluation.

Szabolcs Biró – Head of Advanced Analytics
Ákos Matzon – Advisory Team Leader

How to price your convenience stores?

Reading time: 3 min

This case study describes our results of a recently completed project. Our partner has shops in hundreds of locations, where impulse products give the major part of the revenue. One feature of these kind of products is that their consumption patterns are related to the location of the business, thus it is worth differentiating pricing on a location basis.

The system of location rules has developed over the years and decades, but experts estimate there is plenty of space for data analysis-assisted validation and optimization.

Our main goal was to set the optimal price level at certain locations and reach maximum margin growth.

We developed a customized pricing methodology: instead of the classic pricing model, we developed a product group + shop based model. This was used at the store level in aggregate.

Detailed purchase data and other external data (for example promotions, available information on store renovation periods) were also used for analysis.

In the following, we describe the steps of the analysis.

1. Filter out distortions!

We have seen a type of mineral water that brought in the sales volume thirty-five times by using a strong marketing activity month after month, then the yield declined to re-promotion status next month. One popular energy drink was offered at bargain price every two months, therefore, we have experienced tremendous fluctuation in quantity. Most of the shops were open during renovation, but most product groups were not offered for sale, and the other product for sale indicated upgrading reduces in traffic. We filtered out outlier values, which otherwise affected the model adversely, so we ended up keeping 60% of the stores and 20% of the products.

As a result, we could develop a reliable model, which is free from interfering effects, but is still based on a representative sample.

2. Assign a clear price to product-store pairs!

The business didn’t record the shelf price of the products to an easily accessible database. Many products were sold at several different prices, and lower prices were available with a loyalty card discount, multibuy discount or coupon. We needed shelf prices, because these influence customer decisions. Fortunately, these prices changed rarely within a day, so we chose the daily median price, on which 97% of the quantity was sold.

3. Adjust quantity with seasonal effects!

That we can compare the quantity sold at different prices at different times, we needed seasonal adjustment. Impulse products have strong seasonal (weather) effects, which affect their sales volume regardless of their selling price. Higher rate means higher traffic, lower rates means weaker traffic. Typical seasonality is the surge in the consumption of frozen products and drinks during the summer.


With the seasonality rate, we can normalize monthly sales to get a monthly sales volume.

4. Determine price elasticity of demand at store + product group level!

Price elasticity of demand of a product shows that a 1% price change causes a percentage change in demand. We approached this from past price-sold quantity data.  In this project we fitted a price elasticity curve to the price-quantity data points at the store + product group level.

The effects of the outliers were reduced with a few tricks, for example, we used robust regression and calculated a confidence indicator to filter out poorly performing models.

5. Raise the results to store level!

The store + product group models were aggregated to store level by weighted averaging. By using this method, we got an indicator for all stores showing the price elasticity of the customers visiting that store.

6. Set optimal prices!

We can determine an optimal price on shop level compared to the last known price of the given product, and by setting this, the product of unit margin and quantity is maximal, and the total revenue can be raised. Based on the results, the optimum of the total selection showed a higher total margin than the current one. As usual in retail, instead of the prices given by the model, we rounded them to psychological price points. We could not determine e.g. a price of 66,5 HUF, therefore, a rounding system changed the prices to the psychological price points (e.g. to 69 HUF) as a last step.

As a result, our model estimated an annual 3-3.5% rise in margin at our customer, which is approximately equal to the profit growth of opening ten new shops.

The methodology works, we successfully optimized the price for location differences. Besides, there are significant opportunities forward – for example product-level pricing and production system development. Industry trends show, that in the increasing competition, more and more retailers have the opportunity and need to make data-driven optimization decisions with improved data quality and availability.

Marton Biro – Senior Data Scientist
Marton Zimmer – Managing Partner