How (& when) to find the right data consultant

This guide’s aim is both to help you determine whether you need a data consultant for your data related business questions, AND if you know you need one, help you find the right consultant who fits your business needs.

Although you can use our guide for almost any industry you might be working in, we specifically work with retailers and thus this guide has parts that are especially focused on retail analytics related questions.

Avoid common mistakes and pitfalls in the search for a data consultant and leap over its challenges like you’re doing hopscotch with our guide.

Jump to the good part.

Have you seen those “How to” videos and articles in which you are searching for an answer to something and instead of just that: the answer…you have to scroll through the artistic ramblings of the author? 

Right, so this is not one of those articles.

Let me get straight to the point with a table of contents:

If you know exactly which of the above you are here for, go ahead and skip this part.

If you however feel like you need some more information on why it makes sense to work with a consultant or consultancy, then read on. 

(Also: yes, consultant and consultancy are used interchangeably in this article, although both project cost and timeframe wise a freelance consultant and a consultancy will likely have very different details. However: the point of working with one and how you can decide is pretty much the same.)

Why does it make sense to hire a consultancy? 

The point of a data project and improving your operations by adopting a value driven approach is -very roughly- to be more precise with less waste. With the never before seen amount of businesses and customers on the market, data analytics has become a crucial tool for businesses that want to stay relevant. 

However the market is long past the days when basic data analytics meant a competitive advantage. Advanced analytics is unavoidable for long-term growth and competitiveness. Without that, you won’t be able to provide accurate offers for your customers, which means their experiences will be suboptimal and their loyalty will fade. In simpler terms: you won’t be able to sustain your customer growth or even your current base in the long run.

Most companies – certainly ones that would read articles such as this one- will usually have some level of data analytics knowledge in-house. In our experience this means a few BI personnel and some sort of data analytics tool like Tableau, Power BI etc. That is a great starting point, however as mentioned above, basic data projects will not help to build a sustainable growth strategy. 

When it comes to advanced data projects with specific requirements of a company, however great a tool is, it should not be expected to plan and execute these use cases.

That is why, when you want to build a long lasting growth engine, involving experts for your data projects is a no brainer. 

Right. But when is that? 

At what point does it make sense to involve data consultants? 

Involving data experts does not necessarily have to wait until you have already gained some experience in data analytics in-house. The pros of having a data team in-house is that you are likely already committed to and on the path of creating a data driven organization.

On the other hand as per above, you will not be able to base sustainable growth on a small in-house team and “one-fits-all” tools. Involving data experts early on then means that you are able to avoid some costly (not just money wise) failures that may even break down your enthusiasm. 

So whether you are already doing some data analysis in-house, which you’d like to advance to the next level or you are at the base line merely recording some data, without being sure on what to do with it… talking to experts is a good idea.

To simplify the question of when

  1. If you already have ongoing data collection in your business but you’re either: 
    1. Not sure how to capitalize on it → You roughly know what you would like to achieve (e.g.: you want to target your customers more precisely at the right time, with the right message, on the right channel; you would like to predict your stocking and logistics needs more accurately so you can optimize your sortiment range etc.) 
    2. You have second thoughts about the way you’d like to use the data → Instead of a short term benefit of a basic use case such as tweaking your online ads, you are looking for something more complex with a long term benefit like customer segmentation
  2. You don’t have an expert with the specific skills or demonstrated history necessary to run advanced data projects → This one is quite straightforward, if you have a small team of BIs, or relatively inexperienced data engineers/analysts in your data team, it makes sense to do a few workshops or a joint project with experts so that your team can carry on with expanded knowledge 
  3. You know that you should be able to get more out of your data → This point is somewhat of an extension of the 1st point in the list. If you are already collecting data but you do little with it and you know that your competition gains way more from data collection (i.e. you get better recommendations, are targeted better etc.), then you’ll certainly benefit from talking to experts
  4. You have had issues with optimal budget allocation…or in more simple terms, you are spending too much on too few customers. It could be as a result of challenges with customer retention, campaign efficiency or issues with your offerings → All of the above can be improved upon and fixed by getting better at using data provided by your customers
  5. You don’t know how to create a roadmap from collecting data to exploiting it → Even if you are already collecting data, you might not be able to tell whether you are collecting the right information, or how to synchronise data from different sources. It’s one thing to collect some data from your customers, but it requires specific expertise to know what to collect and how to turn it into gold.
  6. You are looking to base your decisions on actual data, i.e. you want to implement data driven methods in your business operations. → This is somewhat related to all of the above. If you feel like you could improve your business processes by adopting a data driven approach (you are right about that!), you could also seek the help of data consultants. 

Right, now you know when it is appropriate to work with a consultant, so let’s see:

Where to start searching for a data consultant 

For a quick detour let’s jump back to the consultant vs consultancy question I’ve mentioned before. When it comes to working with data experts there are -very basically- 4 variations of 2 roads. You can either hire a  team of consultants or a freelance consultant or  “rent” a team or a freelancer. 

Both have it’s pros and cons. While a company (consultancy/team) will likely have a stable background (with insurance policies, guarantees) and in-house talent (with a variety of skills), a freelancer or freelancing team will likely be able to work on a somewhat smaller budget and could be more flexible with their availability. 

Now back to the question of where to find a great consultant.

When you are looking for a freelancer in general it helps to base your search on social platforms, where you could somewhat easily get feedback on a candidate. The most obvious choice would be either Upwork as one of the biggest freelancing platforms or  Linkedin, where you could check out previous projects of candidates and if their profiles are well maintained you could get some feedback from previous clients of theirs. You could also check out forums (Reddit / Quora) or communities (Slack / Discord) to find viable candidates and feedback on their work. Besides the forums could also go by personal recommendations as you probably are familiar with other retailers who may have faced similar challenges in the past. 

When it comes to a consultancy firm you can probably more easily find those with a search. You will most likely be able to select between major consultancies that work with a platoon of industries and smaller ones that specialize in certain areas. Not surprisingly major ones will likely have less industry focus and higher fees, with more people working on each case. It would make sense to check out ones that are active on forums and social platforms (Linkedin mainly) and especially ones that generate useful content. 

You know, like a great article on how to find a great consultant.
Generally great consultancies will put in the effort to detail these steps.
Anyway on to our next point. 

What are some of the questions a great consultant should lead with? 

Since you are running a business, this will likely not be your first business agreement with a professional. Still, it could help to discuss in broad terms what you should be hearing from a great consultant. 

  • They should definitely ask a lot of questions. Whenever a professional leads with how they will solve whatever problems you’ve got, that should raise a flag. It’s one thing to list the benefits of a service in a pitch, but when it comes to discussing details, a great consultancy should always ask loads of questions. It shows that they are actually interested in solving your problem and not just selling you their service. 

In the case of a data consultant this would mean a mapping of your data inventory and at least a surface level audit of your operations and current processes. It is also crucial for them to understand how their solution will fit into your system. Both from a business and technical perspective. 

  • Now it’s great if a professional asks a lot of questions, but that only has value if then they set up a clear set of options. A roadmap with possible directions to take and the expected outcomes of said directions. 

In this case, the data experts should base the potential projects you can select from on the data maturity of your organization and the -likely roughly- defined goals you would like to achieve. 

  • Finally, once the experts have audited your business and drawn up some options to take, they should guide you in selecting your first/pilot project. 

The idea here is to draw up a project that will have an actual effect on your operation (in the form of revenue, resource optimization etc.), but more importantly, that will help you understand the process of a data project and will pave the way for a value driven transformation. 

By now (if we did our job right) you have a basic understanding of what to expect from a great consultant. However what if you are specifically looking for someone who is not only a great consultant but is specifically skilled and experienced in retail analytics?

How do you know that a data analytics consultant is competent in retail analytics

To be able to create a sound data strategy for retail, a consultancy should understand some key aspects of retail…even better is if they have actual experience with retail companies. 

Some of the aspects a great retail consultancy should be aware of:

  • A general knowledge of customer analytics. What a customer journey looks like, what statuses can be attributed to a customer, how a customer life cycle builds up. 
  • How operations and the buying process work specifically in retail? Which areas are responsible in certain decision making processes, what are the main channels used. 
  • What are the key bottlenecks in retail operations? What are some of the biggest challenges they have to overcome? 
  • General data knowledge in retail and the main pain points.
  • Extensive methodology know-how: How certain business needs can be met by data based solutions.

They should also have a clear vision on what route to take based on your specific goals and needs.

Now then. You know when, where and how to find a consultant. You even have a grasp on what they should know… 

Let’s talk money.

How does a good consultancy price their services

This – no surprise – can depend on a platoon of things. 

One thing is for sure though: a great consultancy will not provide a large chunk of work free of charge for you. 

Any experienced professional will know that for them to provide valuable, personalized service they need to use their expensive resources. Thus if someone does work for you free of charge, that is either a fragment of what you need, or likely not very valuable. 

When it comes to pricing strategies consultancies will often either ask for a fixed price with a money back guarantee if you are unhappy with the process…

Or they might ask for a commission. The latter of course only applies if the project directly affects revenue or the success of an ongoing project on your end. If the consultancy has a commission based solution, that will likely be a higher end cost, compared to the fixed price solution. 

Some consultancies may also be open for a daily fee with a transparent daily reporting agreement. However since most of these data projects should run for at least a few months, this can be tricky. In case someone decides that the money you’ve been spending on the data projects should be allocated to a different department, you can end up with an abruptly abandoned project that has already burned through some money and might never be finished.

In any case, the key is that the consultant or consultancy openly communicates about their pricing so you may find a mutually beneficial solution. 

What’s next 

I hope now you know enough about the “how to-s”  of retail analytics consultants to be confident in your search. If you feel like you’d like to test your newfound knowledge on a great consultancy: let us know.



Retail analytics use cases

Within the span of a few years, the field of retail has gone through remarkable changes: data analytics has become an essential support line for retailers in the struggle of staying competitive in an ever-changing, fast-growing industry. 

Continue reading

How to predict customer churn and why you should do it?

Reading time: 5 min

When it comes to customer relationship management, every retailer faces the same three challenges: customer acquisition, expansion, and retention. Bringing new customers to the store is essential, but it is not enough to keep the business growing – that’s why it is just as important to find ways to create extra value that will keep customers happy and loyal. The first two, however, are quite ineffective without a robust customer retention strategy. In what follows, we will outline one of our previous projects that focused on churn prevention.

About our client

Our client owns 4000 drugstores across Europe, of which circa 200 are located in Hungary – these are the stores our project focused on exclusively. There are roughly 1 million customers in our client’s loyalty program, but because of GDPR, we were allowed to work with only 60% of them – which is still a great amount of data.

Identifying customers with a higher probability to leave a business is an essential step to build an effective customer retention strategy. Realizing this, our client requested that we build a model that detects customers who are in the “danger zone” – customers, who are likely to churn.
The location of this “danger zone” is very important: when we can detect the signs of churn early, the costs of retention are lower since customers who haven’t left just yet require less intense intervention. That means early detection equals smaller marketing costs, thereby a higher revenue.

Churn definition

The first step of our project was to define ‘churn’, as there was no commonly used definition available on our client’s side. This can be a very tricky task, as we can hardly ever tell for sure that a customer has churned – it all depends on the definition we use. We need to find a definition that will flag customers with an optimal false positive and false negative ratio: falsely flagging customers as ‘churned’ increases marketing costs unnecessarily, but not detecting customers who truly churned will result in losing them. Therefore, finding the best possible churn definition is crucial, and unfortunately, there is no general definition, as it highly depends on which type of retail store we are working with, and several other factors. During this phase, we worked closely together with our client, and we performed multiple analyses to help us decide on the optimal churn definition.

Feature extraction

After a few iterations, we landed on a definition that both our client and our team found feasible, which led to the next step: finding features that contribute to the prediction of churn.

We collected about a hundred variables in total, including demographic variables; features that describe customer behaviour, such as campaign affinity and private label affinity; and we used dynamic variables to grasp potential change in behaviour as well.

Some of these had greater effect on the prediction than others, of course. We found that number of days since registration was most relevant to the target variable (as customers who have been loyal for a long time are less likely to churn), but the quantity of coupons a customer usually redeems is also a strong factor, and age plays a significant role in churn as well.


We constructed a churn definition, we extracted important features that may contribute to the prediction, so there was only one thing left to do: to build a model. To predict whether a customer will churn or not, we used logistic regression with lasso regularization to find the best model which keeps the most significant variables only. Our final model assigned a churn probability to each customer with relatively high accuracy: we used a very simple algorithm for baseline, which our model exceeded by almost 25%.


After handing over the list of customers who are in the danger zone, our client’s marketing team constructed a series of marketing campaigns, and our team was highly involved in this stage of the project as well. The campaigns were highly successful in terms of redemption rate, but the real results of our project should be long-term, and it is too early to see anything yet, but if our client decides to continue with the series of marketing campaigns, the ROI generated from our project will reach more than 300% by the end of the year.

We should also emphasize that detecting customers who are likely to churn is only one side of the problem. It is also very important to create campaigns that keep customers engaged continuously: a few marketing actions scattered randomly throughout the year is not going to stop customers from leaving. Maintaining campaigns constantly can seem like an awful lot of resources, but at the end of the day, investing in customer retention surely pays off.


Eszter Dudás – Data Scientist
Márton Biró – Senior Data Scientist

How to create business value from data without risking anything?

Reading time: 7 min

I’m probably not telling you anything new when I say data is already the oil for company development and will be even more so in the coming decades. Growing your business, increasing customer satisfaction or analyzing if there are any upcoming threats at a sustainable pace is impossible without implementing data science techniques. The question rather is: what is the right technique for data projects? How can you check the feasibility of a solution and assess its business benefits quickly and cheaply – before implementing anything? 

The answer: A Proof of Concept (POC) based approach is the way to go! 

This article will not only help you understand what a data science POC is, but it will also introduce you to the exact steps you need to follow in order to fuel growth without risking anything! 

Risk-free & cost-effective – The right way to create business value from data

Data science is an essential part of any industry today, given the massive amounts of data being produced. Nevertheless, as a company providing successful data solutions in various industries, we often encounter confusion among our customers about how to apply a machine learning (ML) based solution to a business problem (or even how to give data-oriented support to an entire department or a company). 

It is important to know that analytical initiatives are often pilot projects entailing that their success and added value are difficult to predict at the beginning. Therefore, if we do not approach such an initiative carefully and with the proper methodology, the project can fail easily for a variety of reasons (e.g., no data, poor data quality, or simply due to a business issue that can not be modelled, etc.). 

The result?

Business decision makers will think of the project as a negative experience and will feel a deep regret about throwingmoney out of the window. 

So then: is there a better way?  

Yes, indeed! 

The best way to approach these projects is by: defining phases with gradually rising investments, which in return will mean an increasing guaranteed ROI (Return on Investment). Based on our 20 years’ of experience, the best solution is a Proof of Concept based approach that can and should be used at any level of data maturity.

What is a data science POC?

I am sure you know the meaning of “Proof of Concept” (just in case though: mini-projects that explore the feasibility of a system, product or service and its business benefits quickly and cheaply). But what does running a POC specifically for data science mean? 

In this case, a Proof of Concept helps you evaluate the value, relevance and feasibility of different data science solutions before they are implemented. A data science POC should always prove that a given system will provide widespread value to the company: that is, it’s capable of bringing a data-driven perspective to a range of the business’s strategic objectives. 

In terms of advantages, this methodology helps you with: 

  • Saving time and money by sustainably allocating the resources of the company only towards those projects, which will reliably create value.
  • Ensuring that the given solution meets specific needs or sets of predefined requirements.
  • Becoming a data-driven organization from the core. 
  • Receiving timely and valuable feedback from various stakeholders.
  • Giving your team the opportunity to get comfortable with change.

Plus, just some of the indirect advantages of a POC include:

  • POC connects experts from different areas and helps them get a better understanding of processes, verifying or rejecting hypotheses.
  • Colleagues, who participate in the project, can get a better understanding of dashboard tools or predictive methodological processes.
  • POC helps the team ask the right questions and transform these into “data language”.
  • It helps to change even strong opinions once they turn out to be false.
How to run a successful data science POC?

Let’s see what are the 5 essential elements of an efficient data science POC which can help you keep your project on track!

  1. Set up a team consisting of the right people

    I know, it sounds obvious, so who exactly do I mean?

    To run an efficient POC, you need to include people from all parts of the organization in your team. 

    • Data scientists and/or analysts will definitely be connected to the project. 
    • The IT team will also need to test the solution’s feasibility. 
    • Furthermore, any business teams involved with or impacted by the results of the project 
    • as well as end-users of the solution should be involved.

    However, be careful you don’t swing too far in the other direction and involve too many people, as this can slow down progress and efficiency. A few representatives from each team or group is generally sufficient.

  2. Define data-related business problems

    Now that your team is set up, the next and probably most important step is the definition of the business problems. 

    Ok, but how?

    Try to gather ideas together with your team for a variety of business issues. Once the list is complete, find the answers to the following questions: 

    • What does the current process look like?
    • Would the use of data in general or data science/machine learning techniques help with this business issue? If so, how?
    • Do you have the necessary data for a POC?
    • Where is the data stored? 
    • Are you willing to work on this use case with an external partner?
    • Will this use case help you make money, save money, or realize any other benefits? 

    The outcome of this step should be the definition of testable ideas (hypotheses). For example, the risk of churn can be determined by using customer data and past behavior patterns.

  3. Select business problems for the POC

    Regardless of the sector, there will be several business problems for which a data solution is conceivable. However, there is no capacity to develop a solution for all. Ideas should therefore be prioritized. Thus, your next step is to determine for which idea(s) it is the most worthwhile to start a POC project. 

    In our experience, there is no obstacle to running multiple POCs in parallel (if you have the capacity), but you should expect these threads to run independently. The big advantage of this is that you have a better chance of success, at the same time the downside is that attention is divided. 

    If capacities are low, it is worth considering speed when prioritizing the ideas. For example, you could define POC projects for business cases where you can validate the possibilities and pitfalls of an analytical solution relatively quickly.

  4. Set up clear deliverables

    In order to restrict a POC to a reasonable timeframe, you have to set up clear deliverables. Without them, no one can be really sure what to consider done or what to consider a success.

    Ideally, the final deliverable is the implementation of a data project based on the selected use case. However, it’s also worth setting up deliverables/milestones along the way. For example, individual teams could set up checkpoints to evaluate their subset of the project.

  5. Know when to stop

    The idea’s been tried and it brought the expected benefits?

    Excellent! It’s time to move on to implementation and incorporate the results into your day-to-day operations. 

    However, if the result is the opposite, don’t waste more resources on it! Draw conclusions and use them in the next idea.

    The best solution is to make a GO / NO-GO decision at about the third of every POC project. 

    This way, the process of developing a data-driven solution can be simplified and risks can be reduced. Use the following decision tree from the use case idea to day-to-day implementation:


Data science projects are complex and it is always a question whether they are feasible or not. 

A Proof of Concept is an excellent way for businesses to evaluate the viability of an idea and sustainably allocate their resources towards those projects, which will reliably create value.

By following the above steps and choosing the right partners for your project you can ensure a smooth and successful POC and allow your organization to move to implementation quickly while saving invaluable resources.

Interested in getting a great amount of valuable data insight, but not sure how to start? Book a 20-minute discussion with one of our experts to help you get started!

Have you got experience with data-based POC’s? What are your best practices?

[We are curious about your opinion: what do you think are the top mistakes businesses make when running a data science POC? Let’s chat about it in the comment section below!👇]

Szabolcs Biró – Head of Advanced Analytics
Ákos Matzon – Advisory Team Leader

A BI Developer intern’s life at Hiflylabs

Reading time: 5 min

Why did Norbert Polcz, member of the data warehousing team and a BI Developer intern, choose Hiflylabs? What data warehouse-related tasks make his work motivating? What career path does he see at the company?

Hiflylabs: Why did you decide to apply for this job at the company?

Norbert Polcz: I found Hiflylabs on a job portal, I looked at the website, based on the information I read there, it was encouraging that the company deals with a relatively large number of things, so there is not just one direction for me. I hadn’t worked in the IT field before Hifly, so it was also sympathetic that they did not expect high practical knowledge. It wasn’t a completely conscious decision, I was lucky, I would say in retrospect that my expectations and abilities matched.

HL: What motivates you in your work?

NP: The truth is that the company itself maintains the initial motivation, as I can turn to anyone for help at any time. It was interesting that when I was hired, we had a conversation with the CEO, Tamás Fehér, who was very direct. I think this kind of attitude is rare, I remember, he also said, that I can find him at any time. That was a positive impression. A direct relationship with higher leaders and my mentor keeps me motivated. I can turn to the latter at any time and it helps a lot in my professional development. It is also important to me that if I have ideas, if possible, we try to find opportunities to implement them together, and I can learn new things and new tools regularly. I think the main values ​​of the company are trust, a family atmosphere, and a vision that they want the best for everyone.

HL: What projects are you currently working on?

NP: I do data warehouse and data infrastructure developments most of my time, but I have made a dashboard or PPT for a client several times, so I also had the opportunity to present our completed solution myself. In addition to writing code, we will soon be generating documents that are currently “handwritten,” following rapidly changing technologies and tools. So I may have other responsibilities in a few months. I enjoy learning from all of these.

HL: What is a Hiflylabs job interview like?

NP: Compared to the ones I’ve been to before, I found that they were also curious about my personality here, as it was important that I would fit into a corporate culture that was “home” to different personalities. Most places expected some work experience, but at Hiflylabs I felt they would give me a chance and be happy to invest in a person if the organization-individual found each other and the aforementioned desire for knowledge was present. That was really sympathetic. For me, Hiflylabs is a vibrant and open organization.

HL: Were there any positive signs in the interview that made you feel like a collaboration was about to begin?

NP: We found the common voice right away, I couldn’t describe what I was feeling, but there was something in the air. Despite how excited I was, I was only anxious at the beginning and calmed down very quickly, resulting in a direct conversation. The leader of the data warehousing team and the HR colleague created a pleasant atmosphere. In the meantime, I got another job offer, so two days after the interview, which was a short waiting time, I called them and I asked for a decision. In the interview, they were also curious if I was happy to learn. They played with open cards, they said they were planning for the long term and that was understandable because if you hire someone to then study for months, it pays off if they spend a few years in the company.

HL: Who do you think can be more successful in your profession: a person who likes monotony or variety/change?

NP: I think that at Hiflylabs, both paths could work, yet I would rather vote for a change-lover: if you want to work here, don’t expect to have a precise definition of your responsibilities that you will be dealing with every day later on, as the company supports acquiring a wide range of knowledge and important skills. In my experience, it’s a huge benefit to be proactive because it makes it easier for you to get new and colorful tasks that can later become part of your job. Moreover, I will go further, if they see your motivation, the desire to learn and the diligence, you can gain the trust that will greatly enrich the internship experience. I knew they had confidence in me almost immediately, from hiring, as I visited the office for the first 2-3 months to deepen my knowledge.

HL: How did you experience being an intern through the challenges posed by the pandemic?

NP: I just submitted my thesis to the BGE Bachelor of Business Informatics, I have been working in the home office since March. This makes it easier to concentrate both on my studies at the university and work, as I save 2 hours of commuting time a day from Gödöllő since I don’t have to travel. On the second day of HO, my team leader called to see if I had everything I needed for work, such as a chair or desk. So I got a monitor that was delivered to my house very soon. I didn’t even feel my job was in danger. I only had to get used to communication, that there was no face-to-face meeting and we could negotiate a little less often. I collected my questions for two days because I couldn’t ask everything at once. If it happens sooner, I might have a harder time catching up because by then I had already done a development or two live and got better into it.

HL: How flexible is the company in terms of studies?

NP: The expectation is that if I can, I shall say in time because that is how they can or we can plan. I find that everything is negotiable. For example, if I need a week because of my exams, they won’t even bother me until I get back to work again.

HL: What useful skills have you acquired so far through work?

NP: My communication has improved a lot because I have realized that the more accurately I define a problem, the less misunderstanding there will be from the clients’ side. I am also constantly trying to pay attention to who and in what detail I am talking to about a given topic. I experienced how real-time development works and acquired a confident SQL and version control (git / GitHub) knowledge. I got to know the data warehouse and other data concepts more deeply, thanks to which I became so enthusiastic that I started to deal with it in my free time.

HL: What career path can you see for yourself at the company?

NP: At the moment, I couldn’t determine what I wanted. I am open to data developments, but I’m curious and would even like to participate in mobile development in the future. I feel like I have the opportunity for both.

Norbert Polcz – BD Developer intern
Patrícia Hanis – Marketing assistant

Do you know when your medicine will arrive? (Case study)

Reading time: 5 min

One of the key elements of logistics solutions has always been adaptation to the given situation, but nowadays this is more important than ever.

The worldwide spread of the coronavirus has posed entirely new logistical challenges to the economy. Changes in border locks, distance, transportation options, and in general, the sending and receiving of shipments have significantly transformed the lives of wholesalers and shipping companies.

This post is about our recently launched ETA application for one of our pharmaceutical wholesale partners.

One of the constant key issues in delivery processes is knowing the exact time of arrival of the package / shipment. This can be annoying as an individual. If the arrival time of a package is not known or is just too wide, that can turn normal daily living upside-down. In the best case, you just have to step out from an online meeting or reschedule other programs, in the worst case, take time off, or reorganize delivery. i, However, in business, knowing the exact delivery time is a much more serious issue. Determining the exact arrival times not only has serious financial implications but also greatly affects the image and long-term competitiveness of the company.

Key features of Hiflylabs ETA application

The delivery forecasting application we developed is a real-time, adaptive, machine-learning-based forecasting system. The application essentially calculates arrival times with high accuracy, taking into account traffic data, driver and vehicle data as well as previous delivery times. The system also includes a mobile and web monitoring interface by which shipments can be tracked.

After a few months of operation, enough information is available to empirically measure the accuracy of the system. 57 percent of issued delivery time forecasts are accurate within 10 minutes and 93 percent are accurate within half an hour.

Importance of delivery times in the pharmaceutical trade

It is important for a pharmacy to know in two respects when the medicine shipment will arrive. On the one hand, in case of a stock shortage, the customers of the pharmacy must be informed when the product will be available again, i.e. when they can come back for it. For a pharmacy customer, timing can be important, e.g. due to health issues or organizational reasons regarding daily life. On the other hand, the incoming shipment also has to be picked up by someone, i.e. the organization of the pharmacy process is also significantly influenced by the exact knowledge of the arrival times. If the forecast is more accurate, both pharmacies and pharmacy customers may be more satisfied.

The delivery company has a basic business interest in accurate forecasting, as this is one of the key qualification factors for its service. Besides, the exact knowledge of the delivery time is, of course, important information for the delivery company, as it can also help optimize operation.

Developer challenges and analyst decisions

Although the above logistical problem seems to be common, in practice many custom solutions worth using to provide a customized and cost-effective solution. Here are some of them.

I. “Tour sequence”

Although there is a well-established system of which pharmacies the wholesaler visits, in practice, of course, the order in which pharmacies are replenished can be quite varied, the so-called supplier “tour sequence”. Orders received for wholesale can by no means be called steady. Medicine turnover normally fluctuates quite turbulently, even at the national level, and at the pharmacy level, the fluctuations are even greater, and the coronavirus epidemic has transformed traffic more than usual. Thus, on a given day, orders for a given pharmacy can vary a lot from one day to the next, and a change in the size of the order also affects how many times the given pharmacy has been refilled in one day. In addition, the order in which a driver visits pharmacies often changes. If the pharmacies visited change, he will also change the order if he deems it needed, and other traffic reasons may also affect the tour order. In addition, it is also possible that two drivers follow the same route in a different order.

During the project, the identification of the tour sequence proved to be a very complex forecasting task in itself. We first built a complex prediction procedure that returns the most probable order for specific pharmacies. This was accurate enough to allow us to continue working with it, but it also became clear that the vast majority of erroneous predictions stem from the poorly predicted tour order. Finally, during the project, in order to find the optimal solution, a tour order was defined and regulated on the customer side. That is, our trading partner also changed its own processes for a more optimal estimation. The refinement of internal regulation and the adaptive forecasting system has thus been able to make forecasts with higher efficiency.

II. When, where is/was the car?

The second challenge is to define the arrival time itself. Drivers signal via their PDA (personal digital assistant) when a package has been loaded or dropped. We originally used this as a basis, but backtesting revealed that the arrival times thus obtained are often unreliable. Drivers often do not send a delivery notice during unloading, but afterward, possibly along with other unloadings. Therefore, we switched to another system, where we used the GPS data of the cars to determine how long the vehicle had been at a pharmacy.

In addition to the identification of arrival times, the definition of departure times was also unclear. To determine the departure times, we could only start from the signals of the PDAs, but based on the previous GPS data, it is possible to estimate how long the car is expected to start after the signal from the PDA. Our customer also requested the drivers regarding the use of the PDA, so this data also proved to be more and more reliable over time.

III. How to make an estimate?

One of the most fundamental questions is how we predict arrival times. Until the moment the car starts, we do not know what pharmacies the driver will visit, so only then can we give a more accurate estimate. Finally, we predict three different part-time types in the application:

  1. start time: The time elapsed between the GSM data indicating the start and the actual start
  2. travel time: travel time between two pharmacies (largely based on traffic data from the city navigation application “Here”)
  3. waiting time: Time spent at each pharmacy

 Our final delivery time forecasts are given iteratively by combining these split times: we calculate when the car arrives at the first pharmacy, from there we count to the second, and so on, all the way to the last pharmacy. The approach is good because it gives a realistic picture of the real processes, but its disadvantage is that in case of a bad tour order the errors increase, and in case of a longer supply chain the errors can accumulate in later tours.

 IV. When should the estimate be made?

A common feature of forecasts is that the later we provide the estimate, the more accurate, but less commercially relevant, due to the late availability of information. During the negotiations with the customer, two different estimation dates, i.e. two different forecast types, were identified. For practical reasons, the “normal forecast” is mapped at the departure of each delivery car, but it was also commercially reasonable to make earlier forecasts. A so-called “early forecast” has also been introduced, which forecasts arrival times for each pharmacy every day one week in advance. For this, the model is based on historical arrival times and gives half- and one-hour intervals as estimates. This is more inaccurate than the initial forecast but can provide a commercially useful result as early information.

Key experiences of development

During the project, several years of multi-phase cooperation became a successful solution, of which it is worth highlighting some important lessons:

  • The results of the project should be evaluated as early as possible, over several iteration cycles, at a higher level, thus ensuring the joint development of the client-side processes and the scope of the project.
  • The successful result product required the long-term work of data engineering experts, data scientists, and our application development colleagues. Several different development fields met on the project, and during the development, the areas had to see through the work of the other areas as well. Thanks to the transparent operation, however, the areas were eventually able to work and develop smoothly throughout the project.
  • It is possible to deliver a fully customer-specific logistics forecasting application cost-effectively, with an acceptable lead time, even in the case of a very complex set of problems.

Dániel Szokolics – Data Scientist

How a Data Scientist thinks when he/she faces a business problem?

Reading time: 4 min

A data scientist needs to understand the broader business context into which his or her analysis fits and to be able to provide the right data-based answer to a proper question. The business relevance of his/her results and his/her solid methodological background together ensure that a project can create business value in the long run.

Most of our analytical projects do not have a data scientist on the client’s side, so it is our responsibility to put our results into an understandable form. In such cases, the one-time analyst has a particularly difficult task, as he/she also has to think like the business side. Ideally, the analyst and the client begin to cooperate at the beginning of the project: the analyst understands the value the project can create, and the client recognizes what information the analyst needs.

Proper communication is so important that many times we dedicate an expert for it in the project. Eszter Somos wrote about this earlier on the blog (Analytics Translator – A geek on the one hand, and a brilliant communicator on the other, a therapist for business and IT). I approach the subject from a slightly different perspective, as a classic analyst.

For one who flies above

The main difference between business and IT people is that while the former thinks more of more abstract goals, the latter requires precise definitions for his/her work. In this respect, a data scientist is somewhere in between, but definitely far enough from the business side to make communication between the two a challenge. On the one hand, the analyst needs to understand the high-level business goal, but on the other hand, he/she also needs the details.

Let’s take a simple example: a customer in charge of business development wants to predict the return on investment. For an analyst to get started, he/she needs to know exactly what return means, how to calculate the return on previous projects, and what information can be used to make a forecast. At the same time, we cannot expect the client to think of everything that is needed for the analysis. Therefore, it is a good idea for the analyst to be aware of the business need as well, as this may remind them of additional issues or pitfalls, and they can make analytical decisions with complete certainty that would require business assistance failing a broader context.

Tell the full truth, not only that which is

As a general rule, we strive to deliver results that are as valuable as possible. These can be simple statistics at first, or quite particular results (hey, you produced a quite good sales result in 2018 in Borsod, you know why?). As time goes on, we will introduce more and more complex analytical tools to explore the patterns, and so we will have an increasingly difficult time telling them. In such cases, we strike a sensible balance, where on one side we are trapped by complete incomprehension and on the other the unpleasant feeling of dishonesty caused by oversimplification. The whole picture always comes with a lot of assumptions, additions, and comments that all add to the truth, but obviously it doesn’t make sense to include the whole thing in a forty-five-minute presentation.

While on the one hand, technical details need to be censored, and on the other hand, commercially important and useful information that seems methodically a by-product needs to be added. Thus, the long description of the cross-validation strategy is replaced by a diagram of the distribution of sales between countries. Therefore, we use the mean (absolute) error to measure the quality of modeling, which a statistician would certainly not use on his own. Ideally, there are several important methodological decisions behind each issue and the outcome of a final presentation.

Analytical projects will always have two end results: one that we show to the client and displays the results we find useful, and another that supports the results and includes the necessary technical details. Together, the two results ensure that a project can create business value in the long run. If the results for the business are great but the technical details are not right, the decisions based on the analysis will not be optimal. If the methodological background is ok but lacks business insights, the results will not be put into practice.

Dániel Szokolics – Data Scientist

Analytics Translator – A geek on the one hand, and a brilliant communicator on the other, a therapist for business and IT

Reading time: 6 min

Who is the Analytics Translator? What is the perfect chemistry for this complex role? What skills and experience are needed to do this job effectively? Where is this role in the organizational structure?

Eszter Somos, Data Solution Advisor at Hiflylabs, answered in the interview.

Hiflylabs: What is the specific purpose of this role?

Eszter Somos: In fact, he/she is a mediator with expertise and business knowledge between people who speak “other languages” in two very different “worlds”. His/her goal is to maintain active communication between the two parties and to find out as soon as possible if something isn’t going to work because that way he/she can reduce frustration on both sides.

HL: What makes a good Analytics Translator?

ES: I see two typical options that work well. One, and also my way, when someone came from the technical side, did coding before, worked as a Data Scientist, so he/she fully understands what is possible, how long it takes to achieve the goal, and how exactly the implementation works. In this case, he/she knows the way of thinking of his/her peers and in the meantime, he/she gets to know and learns more and more about the peculiarities of the business side. The other, I think more common version, when business people go to a Data Science course or at home, delves into the subject in a self-taught way. This is not to say that they can code in practice, they may not (yet) have the statistical knowledge – which is otherwise needed to become a Data Scientist – but they can understand what can be achieved by data and what exactly it entails. So the common practice of placing a general project manager in this role is particularly risky and results in cumbersome operation. have more than once encountered the phenomenon of placing a general project manager in this role. I’ve only seen this work outstandingly once – the reason behind had to be the person’s special talent – but for the majority, it formed an extra layer of confusion in the process.

HL: From a business side approach, how deep your knowledge should be to fulfil this position?

ES: It is essential to deduce or code the problem mathematically, but it is important to know what data types, algorithms, technologies, model types there are and which ones to use. For example, it is good to know what a neural network is for and when to offer this option, or what a regression is for and when you can get good results by using it.

HL: What does this “Translation” job include? To what level should the business problem be “translated” into data language?

ES: If not entirely to the level of a database column, but I think you need to move the typical business questions (e.g. “How our users behave” or “We want to know our users better”) to a more specific level, e.g. Could dividing the marketing target group based on looking at and buying products in carts be an output that the business can use?”. That poses a clustering problem for the data team. 

HL: What is the most important skill for an Analytics Translator?

ES: Design thinking for data science is becoming more and more fashionable, which is a buzzword but covers meaningful things, emphasizing the need for empathy, for example. I think from a technical point of view, this is the key because the business often feels uncertain, as it doesn’t understand several things: what exactly is happening; why the process seems lengthy; what it means to be unreliable or probabilistic. It encounters a lot of uncertainties, from a data point of view, for example, it is natural that due to the statistical nature, it is not possible to say everything in advance or that a lot of data cleaning is required. From a business perspective, these things are not clear, so you need to be sympathetic about the business problem and focus on the solution.

HL: Do you also need sales skills in this role?

ES: Absolutely. He/she should help the client to believe that the solution he/she offers will solve the client’s problem. But often this kind of support is also needed within the company. Analytics Translator is also “expectation management” as you need to know what to expect from the two sides.

HL: What are the specific tasks for an Analytics Translator?

ES: There are business problems that are only articulated at a basic level: for example, our costs have risen too much, or we don’t have enough loyal customers and we want to have more, don’t want to make a process worse, or we just want to work more efficiently. In comparison, I would put the analyst questions at a different, specified level, because a problem with them, for example, is formulated as follows: from which variables can I predict how that other variable will change and what influences it the most. So it is a very typical task to turn a business question into an analyst question. You need to clarify unclear issues. You need to break down the influential factors further, find out what we can control and how we will measure it all, how the project can work, what input variables can be, what is at the data level and is available. 

HL: What are the goals of an organization with an Analytics Translator?

ES: Last year, Gartner released very frightening statistics and predictions, such as that by 2022, only 20% of analyst projects will be commercially successful. I believe it should be an organizational goal to be transparent about what value the data solution actually drives. This is important because these are often not clearly measurable things that are done even slowly. An organization that has a couple of Analytics Translators can make their projects more likely to materialize and bring business value. You can minimize cases where the project is completed, those who worked on it are paid, but the result obtained is useless to the business.

HL: Where is this role within the organization?

ES: Very few organizations have this role explicitly. Many times he/she is close to the project managers, as just like them, he/she is a bit around the project and yet not. In a larger organization, you definitely need to go very close to the business to understand what the problem is.

HL: In addition to improving organizational decision-making and business processes, what made this position exist?

ES: They have been involved in machine learning for a very long time, however, there was a point where the majority not only noticed that they had a lot of data, but also that it was easily accessible and could produce ever-increasing results. After that, many companies had a Data Science team, but it soon became apparent that it wasn’t bringing what was expected of it, in fact, it would even cost a lot, even though they said, “if you have that much data, the business will go better”. This has given rise to a great deal of frustration, for a variety of reasons. So as a response, we got to the Analytics Translator position, which is a bit like a step when we go to couples therapy and learn to communicate better together, to plan together.

HL: What dynamics does this role provide within a company?

ES: It’s not true that there won’t be problems, but they will surface much sooner. We first know what pitfalls may await us, where we should not go any further. In my opinion, these would definitely show up after a while, so there is no need to be afraid of that either. It’s important to note, though, that with an Analytics Translator, the process won’t be fast, and maybe even more frustrating at first, if someone keeps asking questions, but to shorten the process and later, while doing them, all the participants will be satisfied, is necessary. It saves a lot of energy in the long run.

HL: In a data environment, what is beyond the responsibility of an Analytics Translator?

ES: Areas where the problem is very well defined, such as the area of ​​IoT. The more technical the use, the more detailed the problem is defined, the less the role of a Translator is. For example, to stop a car at a stop sign or to build a controlling system for a company, you don’t really need that extra role because everyone understands what to do.

HL: Do you think this role will improve?

ES: Sure, but I don’t know where. If this growth continues in the data world, more and more people will likely specialize in this task.

Eszter Somos – Data Solution Advisor
Patrícia Hanis – Marketing Assistant



How to make millions in profits in 15 days?

Reading time: 4 min

Business problem

Retail pricing is one of those fields that can greatly be supported by data analysis, with measurable, high returns on investment. In many cases it’s simply not possible to set different prices on product level – due to missing data or technical limitations. In such situations pricing may be differentiated by location or by time period.

As an example, there are many several days and weeks in the summer whose sales are influenced by transit customers when the shops can expect less price elasticity and react with appropriate price changes.

Our client is a multinational company in the CEE region with partial retail focus, owning and operating a store network consisting of 1000 shops. With regards to convenience goods sold, the selection and definition of re-priced products and the affected store locations was done by Hiflylabs.

Let’s get to the data!

The basis of our solution is that price-changes will affect not all products and not all stores, as a general price increase can negatively impact sales.

Where to re-price?

The target stores were selected by checking which locations experience a growing ratio of transit customers in the summer season which indirectly decreases price elasticity.

Shops located near touristic, waterfront and high-traffic areas were selected in our first simulations which was in line with our assumptions, however many unexpected shop locations ended up being included in the project.  

What to re-price?

The included products were ones that experienced higher-than-average sales rates in the summer season, such as energy drinks, water or ice cream.

What to check our results against?

Using last year’s sales numbers we’ve come up with correction metrics (customer number growth, adjustments for inflation, etc…) that were used to clean the data from effects independent from pricing. 

Results were evaluated in 80 locations. Compared to base sales the gains are the following:

But what is the ROI?

Without any data analysis, the logical step would’ve been to apply seasonal pricing only in high-traffic areas, where the achievable margin increase’s base would’ve been 10-20 million in HUF. Carrying out data analysis however allowed us to find stores where we could increase margins by 54% more than in only high-traffic locations. This is the result of 15 days of work.

An interesting fact is that stores that are less visited by transit customers experienced losses during our test, which further proves the necessity of data analysis and data-driven decision making in business environments. 

In the following seasons the same methodology can be applied with only 1 day of work, or a software can be created and integrated into existing pricing systems.

Any questions?

No matter how successful data projects may be, our experience is that business stakeholders treat them with caution most of the time. This case study shall serve as an example that even with relatively little invested time, great results can be achieved.

Author: Márton Biró – Data Scientist

How to optimize opening hours?

Reading time: 5 min
Easy question, tough answer

When it comes to leveraging a company’s data assets, the first use-case that pops into your head is probably not the optimization of opening hours. However, if we consider an enterprise which operates a number of offline shops, the potential of such a project becomes clear.

Setting the goals of the optimization is easy as pie: if a shop operates too long then the operating costs are higher than optimal. If it does too short, additional revenues may be realized. In the first case, the open hours should be decreased while in the latter, they should be increased.

We might as well think of two things that question an optimization project like this:

  1. is it indeed possible to “play” with the shop hours?
  2. isn’t the shop manager adaptive enough based on business operation?

The answer to the former concern is the usual “it depends”. During our project, we have dealt with shops that had both legal (some shops’  working hours were regulated by law) and business constraints (employment is only possible with conventional hours/week for employees) on working hours. However, scenarios like the following may be accounted for:

  • 24/7 vs closing at night
  • 12 vs 16 hours of operation
  • opening 1 hour earlier/later
  • closing 1 hour earlier/later

A savvy manager might realize what customer volume to expect in a given hour and he might take this into account when deciding on the operation hours. Nevertheless, can he accurately guess the (additional) expected revenue when operating longer? And does he make an optimal decision on the network level?  By using machine learning, we can definitely predict more accurately and with more objectivity. It is quite sure that, by using only human intuition, we will not be able to find an optimal solution on company level due these two factors:

  • one does not know whether, by operating longer, one actually steals revenue from another shop of ours (cannibalization)
  • one (e.g. shop manager) might have personal incentives not to operate less hours or close a shop entirely

In the following, we present you the method that can tackle this problem in a data-driven way.

Modelling customer decisions in a simulation framework

Simulations are extensively used when optimizing complex, real-world processes (e.g. traffic design). In this case, the outputs such as a financial metric (e.g. EBITDA) of working hour combinations may be simulated. To make this efficient, we should model the decisions of the customers, thus we may see the effects of changing the operation hours on micro-level. The two most important effects are:

  1. how many additional customers do we attract by operating longer and how many of them are taken away from another shop of ours (cannibalization)
  2. how (where) do customers substitute when we close a shop

The animation above demonstrates how customers make decisions when we close two of our shops close to each other. We can see, that by closing shop No. 2, we lose 5 customers (they choose the blue competitor) but we also save operational costs. Our simulation framework models such scenarios in more iterations and larger scale. It assigns an output (e.g. EBITDA) to all inputs (operation hour combination of our shops). Having that, we can easily conclude the optimal set of working hours for our company.

The strength of our solution lies in its ability to account for various factors that influence customer choice. In the animation, the customers chose solely based on distance: after closing their shop, they choose the one that still operates and is the closest (it may also be a competitor). Such additional factors are:

  • loyalty: customers of different loyalty have different willingness to travel
  • competition: the heterogeneity of the competition may be quantified. Substitution with competitors of price level significantly higher or lower is less likely
  • travel time: it may be more important than distance alone

Most companies operating a big network of shops base their decision of operation hours on instincts and conventions. By using sophisticated machine learning solutions, we can support these decisions. Such a project can be really beneficial but it also entails challenges. The biggest challenges is, by no means, the collection and processing of accurate data. If we can tackle this, we are on the right path to create vale with data.

Author: Gerold Csendes – Data Scientist