A Distributed Outlier Detection Method for High-Dimensional Telemetry Using Sketches and Locality-Sensitive Hashing
Main Article Content
Abstract
Modern observability stacks collect high-dimensional telemetry that mixes numeric metrics, sparse categorical signals, and derived features from logs and traces. Detecting outliers in such data is operationally important but computationally constrained, since telemetry arrives as a distributed stream and must be processed under tight latency and bandwidth budgets. This paper develops a distributed outlier detection method that combines linear sketches for compact representation with locality-sensitive hashing to generate cross-node candidate neighborhoods. Each worker maintains a sliding-window index of sketched telemetry points, periodically exchanging small hash and sketch summaries rather than raw vectors. A coordinator (or peer-to-peer overlay) merges hash collisions into candidate sets and performs approximate neighborhood scoring using only sketches, while adaptively requesting extra sketch detail when uncertainty remains. The method targets distance- and density-based outlier notions in high dimensions, where exact nearest-neighbor search is impractical and naive aggregation is bandwidth-dominated. We formalize the distributed streaming setting, specify sketch constructions that preserve norms and approximate distances with controllable distortion, and couple them to Euclidean and angular LSH families for candidate generation. Analytical results characterize detection error as a function of sketch dimension, hash parameters, window size, and nonstationarity, yielding explicit trade-offs between false alerts and communication. Empirical evaluation on production-like telemetry mixtures and controlled synthetic drift scenarios indicates that sketch-only scoring recovers most of the ranking performance of raw-vector baselines while reducing per-window communication by one to two orders of magnitude under typical parameterizations.