Do you know when your medicine will arrive? (Case study)

Reading time: 5 min

One of the key elements of logistics solutions has always been adaptation to the given situation, but nowadays this is more important than ever.

The worldwide spread of the coronavirus has posed entirely new logistical challenges to the economy. Changes in border locks, distance, transportation options, and in general, the sending and receiving of shipments have significantly transformed the lives of wholesalers and shipping companies.

This post is about our recently launched ETA application for one of our pharmaceutical wholesale partners.

One of the constant key issues in delivery processes is knowing the exact time of arrival of the package / shipment. This can be annoying as an individual. If the arrival time of a package is not known or is just too wide, that can turn normal daily living upside-down. In the best case, you just have to step out from an online meeting or reschedule other programs, in the worst case, take time off, or reorganize delivery. i, However, in business, knowing the exact delivery time is a much more serious issue. Determining the exact arrival times not only has serious financial implications but also greatly affects the image and long-term competitiveness of the company.

Key features of Hiflylabs ETA application

The delivery forecasting application we developed is a real-time, adaptive, machine-learning-based forecasting system. The application essentially calculates arrival times with high accuracy, taking into account traffic data, driver and vehicle data as well as previous delivery times. The system also includes a mobile and web monitoring interface by which shipments can be tracked.

After a few months of operation, enough information is available to empirically measure the accuracy of the system. 57 percent of issued delivery time forecasts are accurate within 10 minutes and 93 percent are accurate within half an hour.

Importance of delivery times in the pharmaceutical trade

It is important for a pharmacy to know in two respects when the medicine shipment will arrive. On the one hand, in case of a stock shortage, the customers of the pharmacy must be informed when the product will be available again, i.e. when they can come back for it. For a pharmacy customer, timing can be important, e.g. due to health issues or organizational reasons regarding daily life. On the other hand, the incoming shipment also has to be picked up by someone, i.e. the organization of the pharmacy process is also significantly influenced by the exact knowledge of the arrival times. If the forecast is more accurate, both pharmacies and pharmacy customers may be more satisfied.

The delivery company has a basic business interest in accurate forecasting, as this is one of the key qualification factors for its service. Besides, the exact knowledge of the delivery time is, of course, important information for the delivery company, as it can also help optimize operation.

Developer challenges and analyst decisions

Although the above logistical problem seems to be common, in practice many custom solutions worth using to provide a customized and cost-effective solution. Here are some of them.

I. “Tour sequence”

Although there is a well-established system of which pharmacies the wholesaler visits, in practice, of course, the order in which pharmacies are replenished can be quite varied, the so-called supplier “tour sequence”. Orders received for wholesale can by no means be called steady. Medicine turnover normally fluctuates quite turbulently, even at the national level, and at the pharmacy level, the fluctuations are even greater, and the coronavirus epidemic has transformed traffic more than usual. Thus, on a given day, orders for a given pharmacy can vary a lot from one day to the next, and a change in the size of the order also affects how many times the given pharmacy has been refilled in one day. In addition, the order in which a driver visits pharmacies often changes. If the pharmacies visited change, he will also change the order if he deems it needed, and other traffic reasons may also affect the tour order. In addition, it is also possible that two drivers follow the same route in a different order.

During the project, the identification of the tour sequence proved to be a very complex forecasting task in itself. We first built a complex prediction procedure that returns the most probable order for specific pharmacies. This was accurate enough to allow us to continue working with it, but it also became clear that the vast majority of erroneous predictions stem from the poorly predicted tour order. Finally, during the project, in order to find the optimal solution, a tour order was defined and regulated on the customer side. That is, our trading partner also changed its own processes for a more optimal estimation. The refinement of internal regulation and the adaptive forecasting system has thus been able to make forecasts with higher efficiency.

II. When, where is/was the car?

The second challenge is to define the arrival time itself. Drivers signal via their PDA (personal digital assistant) when a package has been loaded or dropped. We originally used this as a basis, but backtesting revealed that the arrival times thus obtained are often unreliable. Drivers often do not send a delivery notice during unloading, but afterward, possibly along with other unloadings. Therefore, we switched to another system, where we used the GPS data of the cars to determine how long the vehicle had been at a pharmacy.

In addition to the identification of arrival times, the definition of departure times was also unclear. To determine the departure times, we could only start from the signals of the PDAs, but based on the previous GPS data, it is possible to estimate how long the car is expected to start after the signal from the PDA. Our customer also requested the drivers regarding the use of the PDA, so this data also proved to be more and more reliable over time.

III. How to make an estimate?

One of the most fundamental questions is how we predict arrival times. Until the moment the car starts, we do not know what pharmacies the driver will visit, so only then can we give a more accurate estimate. Finally, we predict three different part-time types in the application:

  1. start time: The time elapsed between the GSM data indicating the start and the actual start
  2. travel time: travel time between two pharmacies (largely based on traffic data from the city navigation application “Here”)
  3. waiting time: Time spent at each pharmacy

 Our final delivery time forecasts are given iteratively by combining these split times: we calculate when the car arrives at the first pharmacy, from there we count to the second, and so on, all the way to the last pharmacy. The approach is good because it gives a realistic picture of the real processes, but its disadvantage is that in case of a bad tour order the errors increase, and in case of a longer supply chain the errors can accumulate in later tours.

 IV. When should the estimate be made?

A common feature of forecasts is that the later we provide the estimate, the more accurate, but less commercially relevant, due to the late availability of information. During the negotiations with the customer, two different estimation dates, i.e. two different forecast types, were identified. For practical reasons, the “normal forecast” is mapped at the departure of each delivery car, but it was also commercially reasonable to make earlier forecasts. A so-called “early forecast” has also been introduced, which forecasts arrival times for each pharmacy every day one week in advance. For this, the model is based on historical arrival times and gives half- and one-hour intervals as estimates. This is more inaccurate than the initial forecast but can provide a commercially useful result as early information.

Key experiences of development

During the project, several years of multi-phase cooperation became a successful solution, of which it is worth highlighting some important lessons:

  • The results of the project should be evaluated as early as possible, over several iteration cycles, at a higher level, thus ensuring the joint development of the client-side processes and the scope of the project.
  • The successful result product required the long-term work of data engineering experts, data scientists, and our application development colleagues. Several different development fields met on the project, and during the development, the areas had to see through the work of the other areas as well. Thanks to the transparent operation, however, the areas were eventually able to work and develop smoothly throughout the project.
  • It is possible to deliver a fully customer-specific logistics forecasting application cost-effectively, with an acceptable lead time, even in the case of a very complex set of problems.

Dániel Szokolics – Data Scientist

How a Data Scientist thinks when he/she faces a business problem?

Reading time: 4 min

A data scientist needs to understand the broader business context into which his or her analysis fits and to be able to provide the right data-based answer to a proper question. The business relevance of his/her results and his/her solid methodological background together ensure that a project can create business value in the long run.

Most of our analytical projects do not have a data scientist on the client’s side, so it is our responsibility to put our results into an understandable form. In such cases, the one-time analyst has a particularly difficult task, as he/she also has to think like the business side. Ideally, the analyst and the client begin to cooperate at the beginning of the project: the analyst understands the value the project can create, and the client recognizes what information the analyst needs.

Proper communication is so important that many times we dedicate an expert for it in the project. Eszter Somos wrote about this earlier on the blog (Analytics Translator – A geek on the one hand, and a brilliant communicator on the other, a therapist for business and IT). I approach the subject from a slightly different perspective, as a classic analyst.

For one who flies above

The main difference between business and IT people is that while the former thinks more of more abstract goals, the latter requires precise definitions for his/her work. In this respect, a data scientist is somewhere in between, but definitely far enough from the business side to make communication between the two a challenge. On the one hand, the analyst needs to understand the high-level business goal, but on the other hand, he/she also needs the details.

Let’s take a simple example: a customer in charge of business development wants to predict the return on investment. For an analyst to get started, he/she needs to know exactly what return means, how to calculate the return on previous projects, and what information can be used to make a forecast. At the same time, we cannot expect the client to think of everything that is needed for the analysis. Therefore, it is a good idea for the analyst to be aware of the business need as well, as this may remind them of additional issues or pitfalls, and they can make analytical decisions with complete certainty that would require business assistance failing a broader context.

Tell the full truth, not only that which is

As a general rule, we strive to deliver results that are as valuable as possible. These can be simple statistics at first, or quite particular results (hey, you produced a quite good sales result in 2018 in Borsod, you know why?). As time goes on, we will introduce more and more complex analytical tools to explore the patterns, and so we will have an increasingly difficult time telling them. In such cases, we strike a sensible balance, where on one side we are trapped by complete incomprehension and on the other the unpleasant feeling of dishonesty caused by oversimplification. The whole picture always comes with a lot of assumptions, additions, and comments that all add to the truth, but obviously it doesn’t make sense to include the whole thing in a forty-five-minute presentation.

While on the one hand, technical details need to be censored, and on the other hand, commercially important and useful information that seems methodically a by-product needs to be added. Thus, the long description of the cross-validation strategy is replaced by a diagram of the distribution of sales between countries. Therefore, we use the mean (absolute) error to measure the quality of modeling, which a statistician would certainly not use on his own. Ideally, there are several important methodological decisions behind each issue and the outcome of a final presentation.

Analytical projects will always have two end results: one that we show to the client and displays the results we find useful, and another that supports the results and includes the necessary technical details. Together, the two results ensure that a project can create business value in the long run. If the results for the business are great but the technical details are not right, the decisions based on the analysis will not be optimal. If the methodological background is ok but lacks business insights, the results will not be put into practice.

Dániel Szokolics – Data Scientist