analyse_mov_avg_simplicity#

anomalearn.analysis.dataset_simplicity.analyse_mov_avg_simplicity(x, y, diff: int = 3, window_range: tuple[int, int] | slice | list[int] = (2, 300)) dict#

Analyses whether the time series is moving average simple and its score.

A dataset is moving average simple if just by placing a constant over the moving average series it is possible to separate normal and anomalous points. Here, the considered moving average can have window lengths all different for all channels. The function gives a sore of 1 when the dataset is moving average simple. It gives 0 when no anomalies can be found without producing false positives. Therefore, the higher the score, the higher the number of anomalies that can be found just by placing a constant on the moving average series. The score is the True Positive Rate (TPR) at True Negative Rate (TNR) equal to 1.

The analysis tries to divide the normal and anomalous points just by placing a constant in a moving average space of the time series. It means that the time series is first projected into a moving average space with window of length w to be found, then a constant to divide the points is found. If the time series is multivariate, the constant will be a constant vector in which elements are the constants for the time series channels and each channel may be projected with a different window w.

Parameters:
  • x (array-like of shape (n_samples, n_features)) – The time series to be analysed.

  • y (array-like of shape (n_samples,)) – The labels of the time series.

  • diff (int, default=3) – It is the maximum number of times the series might be differenced to find constant simplicity. If constant simplicity is found at differencing value of i, higher values will not be checked. If it is passed 0, the series will never be differenced.

  • window_range (tuple[int, int] or slice or list[int], default=(2, 200)) – It is the range in which the window will be searched, the slice object describing the range and the step to be used to search windows or a list of windows to try. Theoretically, all window values (between 0 and the length of the time series) should be tried to verify whether a dataset is moving average simple. This parameter limits the search into a specific interval for efficiency reasons and because over certain window dimension may become a useless search.

Returns:

analysis_result – Dictionary with the results of the analysis. It has three keys: mov_avg_score, upper_bound, lower_bound, window and diff_order. The first is the moving average score which lies between 0 and 1. The second is the upper bound for each channel that have been found to separate anomalies and normal points. The third is the lower bound for each channel. The fourth is the best window that has been found to yield the score. The last is the differencing that has been applied to the time series before searching for the moving average score. If any of the bounds is None it means that there is no separation keeping TNR at 1. When bounds are not None, a point is labelled anomalous if any feature of the moving average series of that point is greater or lower than found bounds.

Return type:

dict