Outlier Detection on Multiple Temperature Sensors

Introduction

When running large-scale services, continuously monitoring asset temperatures can provide essential information for smooth long-term operation. Whether it is large office spaces, machinery in a production line, or server racks in a data center, multiple sensors are in some applications placed at once. If one or more sensors report temperature values deviating too far from the norm, preventative steps can be taken to avoid further degradation.

Due to their small size and long-lasting battery life, Disruptive Technologies (DT) Wireless Temperature Sensors are well suited for monitoring large amounts of assets in parallel. Employable in almost any environment, by measuring the temperature every 15 minutes, the data trend and behavior can be monitored and possible outliers can be caught in real-time.

In this application note, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is applied on a stream of 25 temperature sensors with the aim of catching outlier events. As shown in figure 1, the data from most sensors are pretty similar in both level and trend. Occurrences of sudden spikes or level shifts caught by the algorithm are therefore considered to be outliers where appropriate action can be taken.

Sensor Placement

If the aim is to highlight outlier behavior in the temperature originating from a specific device or environment, certain considerations should be taken when mountain the sensors. For instance, if room temperatures throughout a building are the source of interest, sensors should be placed away from external heating sources such as air-conditioning or direct sunlight. Otherwise, the algorithm might classify said external intervention as an outlier, resulting in false alarms.

DT Studio Project Configuration

The implementation is built around using the DT Developer API to interact with a single DT Studio project containing all temperature sensors for which outlier detection is performed. If not already done, a project needs to be created and configured to enable the API functionality.

Project Authentication

For authenticating the developer API against a Service Account in your DT Studio project, three separate authentication details have to be located, later to be used in the example code. If you're unfamiliar with the concept, refer to our Introduction to Service Accounts.

Devices

The script will use all temperature devices in the target project. Note that DBSCAN works better the more devices you include, preferably 10 or more.

Example Code

An example code repository is provided in this application note. It illustrates one way of detecting outliers in multistream data and is meant to serve as a precursor for further development and implementation. It uses our Python API to interact with the DT Studio project.

Source Access

The example code source is publicly hosted on the official Disruptive Technologies GitHub repository under the MIT license. It can be found by following this link.

Environment Setup

The code has been written in and tested for Python 3.9+. Dependencies can be installed using pip and the provided requirements text file.

pip3 install -r requirements.txt

Using your authentication details, set the following environment variables.

sensor_stream.py

export DT_SERVICE_ACCOUNT_KEY_ID='<YOUR_SERVICE_ACCOUNT_KEY_ID>'
export DT_SERVICE_ACCOUNT_SECRET='<YOUR_SERVICE_ACCOUNT_SECRET>'
export DT_SERVICE_ACCOUNT_EMAIL='<YOUR_SERVICE_ACCOUNT_EMAIL>'

Usage

If the example code is correctly authenticated to the DT Studio project as described above, running the script main.py will start streaming data from each desk sensor in the project for which outlier detection is performed as new data arrive.

python3 main.py

Use the -h flag to print additional flags available.

Implementation Details

Classifying data for outlier detection is an ongoing research field that has seen many approaches over the years. Lately, machine learning techniques have been the new frontier in this area at the cost of complexity. In contrast, clustering techniques can be comparably simple while still providing good performance. In particular, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm has been found to provide good performance with relatively little parameter tweaking [1].

Preprocessing

Depending on the application, time-series data are often feature-engineered before being applied to a classification scheme. However, each sample in a time series of length NN can also be considered a feature in an NN-dimensional space and be applied directly. This was, during testing, found to result in much better performance than by extracting mean, kurtosis, skew, and other typical time-series features for cluster input.

DBSCAN

Compared to the likes of k-means clustering, DBSCAN does not require prior knowledge about the number of clusters in the data. It is also unsupervised, simplifying its use in many applications. One feature that makes it particularly useful for outlier detection is its notion of noise in the data. If a point does not fit in any cluster, it is classified as noise instead of the closest match. Figure 4 shows the result of applying DBSCAN on some synthetic data with two features. This website provides excellent animated visualizations of the clustering procedure.

When grouping the features into clusters, DBSCAN uses a distance metric, here Euclidean distance, to determine if two or more points should be linked. For this, the two search parameters ϵϵ and pp must be given, where $ϵ$ is the search radius and pp the minimum number of points that can define a cluster. When scanning the dataset, each NN-dimensional point is classified as one out of three possible categories. A core point is defined as one that neighbors at least pp other points within a distance of $ϵ$ . A border point is one that can be reached by a core point, but does not fulfill the requirement itself, marking the edge of a cluster. If a point is not reached by any core point, it is defined as noise. Figure 5 shows an example of how points are classified to form a cluster.

Finding a balance between generalized behavior and performance is one of the challenges when choosing $ϵ$ and $p$ . Here, if we assume that an outlier does not correlate with other potential outliers, setting $p=2$ should result in said outliers being classified as noise by DBSCAN as there should be no other similar series. On the other hand, $ϵ$ dynamically recalculated on each call to compensate for changes in the data. By finding the average of every time series in a window, ϵ is calculated as the median Euclidean distance from each series to the average.

Real-time Application

The script can be extended to work in real time by utilizing the disruptive.Stream module in our Python API. Below is a short visualization of how outlier classification can work in real time. The GIF is significantly sped up here.

References

Last updated 5 months ago