Anomaly Detection in Cold Storage Temperature Data
Cold storage applications are often subject to strict temperature requirements during operation. Continuously tracking the condition of fridges containing food, medicine, or other easily spoiled product can help avoid loss of produce by detecting failure onset . However, merely setting a fixed temperature threshold for alerts can result in many false alarms. During business hours, staff might open the doors to a fridge several times, causing a short but large spike in temperature. Also, industry-grade refrigerating equipment often features a regular defrosting cycle feature, quickly raising the temperature at set intervals to promote de-icing.
In this application note, an alternative approach to triggering alarms is proposed, separating short-time oscillations in temperature from the more representative baseline with the aim of reducing false alarms. The following sections explain step-by-step how you can set this up for your own DT Studio project, further building upon the proposed method.
Figure 1: A significant change in temperature triggering an alarm.
The implementation is built around using the developer API to interact with a single DT Studio project containing all temperature sensors used in anomaly detection. If not already done, a project needs to be created and configured to enable API functionalities.
For authenticating the REST API against the DT Studio project, a Service Account key-id, secret, and email must be known, later to be used in the example code. If this concept is unfamiliar to you, read our guide on Creating a Service Account.
By default, all temperature sensors in the project are assumed independent from each other and will be processed as such. The number of sensors used does not have to be configured beforehand and is scaled automatically. The option to move a sensor between projects can be found when selecting a sensor in a DT Studio project, as shown in figure 2.
Figure 2: Detailed overview of sensors in the DT Studio project.
An example code repository is provided in this application note. It illustrates one way of detecting anomalies in temperature data and is meant to serve as a precursor for further development and implementation. It uses the REST API to interact with the DT Studio project.
All code has been written in and tested for Python 3. While not required, it is recommended to use a virtual environment to avoid conflicts. Required dependencies can be installed using pip and the provided requirements text file.
pip3 install -r requirements.txt
Using the details found during the project authentication section, edit the following lines in sensor_stream.py to authenticate the API with your DT Studio project.
USERNAME = "SERVICE_ACCOUNT_KEY" # this is the key
PASSWORD = "SERVICE_ACCOUNT_SECRET" # this is the secret
PROJECT_ID = "PROJECT_ID" # this is the project id
If the example code is correctly authenticated to the DT Studio project as described above, running the script sensor_stream.py will start streaming data from each temperature sensor in the project for which the temperature is continuously monitored.
For more advanced usage, such as visualizing the estimates, one or several flags can be provided upon execution.
usage: sensor_stream.py [-h] [--path] [--starttime] [--endtime] [--plot]
Desk Occupancy Estimation on Stream and Event History.
-h, --help show this help message and exit
--path Absolute path to local .csv file.
--starttime Event history UTC starttime [YYYY-MM-DDTHH:MM:SSZ].
--endtime Event history UTC endtime [YYYY-MM-DDTHH:MM:SSZ].
--plot Plot the estimated desk occupancy.
--debug Plot algorithm operation.
The arguments --starttime and --endtime should be of the format YYYY-MM-DDThh:mm:ssZ, where YYYY is the year, MM the month, and DD the day. Likewise, hh, mm, and ss are the hour, minutes, and seconds respectively. Notice the separator, T, and Z, which must be included. It should also be noted that the time is given in UTC. Local timezone corrections should, therefore, be made accordingly.
By providing the --plot argument, a visualization, as shown in figure 4, will be generated. If historical data is included, an interactive plot will be produced after estimating occupancy for the historical data. By closing this plot, the stream will start, and a non-interactive plot will update for each sample that arrives.
Similar to --plot, the --debug argument will also generate a visualization. It shows an overview of thresholds and other values calculated by the algorithm. It is meant to be used mainly for debugging purposes. It does not work for streaming data by default.
Figure 3: 20 days of data for three sensors, visualized using the --plot argument.
With the aim of automatically monitoring temperature changes in cold storage applications with a reduced amount of false alarms, a simple yet effective approach has been proposed and implemented. By utilizing robust statistics with historical data, an upper- and lower envelope is continuously calculated as new samples arrive in the stream.
For new data to be considered an anomaly, the calculated envelopes have to be breached, though the duration of which the envelopes are exceeded is also considered before sending an alert. The temperature baseline is also extracted and isolated using a centered rolling median. Figure 4 shows an overview of the algorithm flow.
The implementation has been structured such that typical tuning parameters for the algorithm are located in the file ./config/parameters.py. As no single configuration can work for all data, users are encouraged to experiment with different combinations better suited for their own data.
Figure 4: Algorithm flow chart from a new event sample to a triggered warning.
Much of the false alarms can be removed by merely extracting the temperature baseline,
, from the raw temperature data
. This alone might be sufficient for many applications and would result in a straightforward implementation, but is expanded upon in this application note. Instead of using the rolling average, the centered rolling median has been chosen as it is much more robust against outliers and generally spiky data.
Only a single parameter, the window width
, has to be set for this operation. For each new temperature sample, the baseline value is calculated by taking the median of previous samples within length
, as shown in figure 5. Therefore, a larger
will produce a smoother baseline, and vice versa. A larger
does, however, also result in a longer introduced delay given by
and should be minimized. It is therefore recommended that
is set no longer than the feature of the data one wishes to remove. As shown in figure 5, this is enough to completely remove said feature due to the median being used, while minimizing delay. When subtracting
, the resulting differentiated temperature
is therefore given by
is the total number of temperature samples.
It should be mentioned that the introduced delay
could, in practice, be removed by using a right-aligned rolling median instead of the center-aligned presented here, as the calculated baseline value would then be situated together with the latest temperature value. However, this would cause the baseline to lag, not really reducing the delay at all while introducing unnecessary artifacts in later steps when subtracted from the temperature data.
Figure 5: Extracting the temperature baseline through a rolling median.
In order to detect when the temperature behavior changes significantly, previous historical data of length
is used to produce an envelope that spans the area of normal operation. Integral to the envelope calculation, the minimum- and maximum value, together with the median absolute deviation (MAD) must be found. While similar to the standard deviation (STD), the MAD given by
is much more robust towards outliers than the squared nature of the STD.
Instead of evaluating the whole time period
smaller windows of size
can instead be assessed individually, finding both the MAD and maximum value in isolation. Thereafter, by taking the median of all found values, the result is a much better representation of the general data behavior over the time period
. The windows can also be overlapped to increase the number of resulting calculations. Figure 6 shows the minimum- and maximum value found for
Figure 6: Minimum- and maximum value found for nine isolated temperature windows.
After windowing the data and finding the maximum, minimum, and MAD value for each window separately, the upper- and lower envelope
is given by
is the respective values found for each
a modifier which controls envelope width. Regardless of any spurious changes in the temperature data, this envelope should behave rather consistently over time, providing an upper- and lower threshold to evaluate outliers against. As the baseline is used to change the envelope level, only unnaturally large spikes in temperature should be caught.
Figure 7: Calculated envelope for one week of temperature data.
What does and does not define an anomaly comes down to specific use-cases in the end. No one configuration can work for all types of data. Still, if the implemented anomaly detection system is flexible, only a few parameters have to be changed to produce decent performance and a low amount of false alarms.
By using the baseline and calculated envelope in previous sections, this implementation proposes that three types of alarms can be used for sufficient granularity:
No Alarm: If a temperature value is within the calculated envelope, nothing of interest is happening and can be promptly ignored. Warning: If a temperature value exceeds the calculated envelope, but returns within a period of
, the baseline is mostly unaffected and does not see a rise. Such a spike is often caused by opening fridge doors during a typical working day and can be ignored. One could, however, mark these short-time spikes with a warning label for posterity without triggering a full-fledged alert. Alert: Should the temperature rise for a period longer than
, the baseline will do so too. This is useful information, as for applications such as food storage where a maximum storage temperature is often set by regulations , such a prolonged rise in temperature is unfortunate. Therefore, triggering an alarm when the baseline exceeds said maximum value would be a natural choice.
Several occurrences of all three aforementioned types of alarms can be seen in figure 8. The benefit of this method is that even though the temperature exceeds the maximum storage temperature several times, only two of them do so for a prolonged time and triggers an alert. This would drastically reduce alarm fatigue, and the only parameter which has to be tuned is the median window length
Figure 8: Two alarms triggered due to a prolonged rise in temperature.