-
PySAD: A Streaming Anomaly Detection Framework in Python
Authors:
Selim F. Yilmaz,
Suleyman S. Kozat
Abstract:
Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors…
▽ More
Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors, probability calibrators, and postprocessors. Unlike existing batch-focused frameworks, PySAD enables efficient real-time processing with bounded memory while maintaining compatibility with PyOD and scikit-learn. Supporting all learning paradigms for univariate and multivariate streams, PySAD provides the most comprehensive streaming anomaly detection toolkit in Python. The source code is publicly available at github.com/selimfirat/pysad.
△ Less
Submitted 24 May, 2025; v1 submitted 5 September, 2020;
originally announced September 2020.
-
Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance
Authors:
Selim F. Yilmaz,
E. Batuhan Kaynak,
Aykut Koç,
Hamdi Dibeklioğlu,
Suleyman S. Kozat
Abstract:
We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each cl…
▽ More
We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in 7 of 9 metrics in 3 different languages using a single model compared to the common baselines and the best-performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Achieving Online Regression Performance of LSTMs with Simple RNNs
Authors:
N. Mert Vural,
Fatih Ilhan,
Selim F. Yilmaz,
Salih Ergüt,
Suleyman S. Kozat
Abstract:
Recurrent Neural Networks (RNNs) are widely used for online regression due to their ability to generalize nonlinear temporal dependencies. As an RNN model, Long-Short-Term-Memory Networks (LSTMs) are commonly preferred in practice, as these networks are capable of learning long-term dependencies while avoiding the vanishing gradient problem. However, due to their large number of parameters, traini…
▽ More
Recurrent Neural Networks (RNNs) are widely used for online regression due to their ability to generalize nonlinear temporal dependencies. As an RNN model, Long-Short-Term-Memory Networks (LSTMs) are commonly preferred in practice, as these networks are capable of learning long-term dependencies while avoiding the vanishing gradient problem. However, due to their large number of parameters, training LSTMs requires considerably longer training time compared to simple RNNs (SRNNs). In this paper, we achieve the online regression performance of LSTMs with SRNNs efficiently. To this end, we introduce a first-order training algorithm with a linear time complexity in the number of parameters. We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time. We provide strong theoretical analysis to support our experimental results by providing regret bounds on the convergence rate of our algorithm. Through an extensive set of experiments, we verify our theoretical work and demonstrate significant performance improvements of our algorithm with respect to LSTMs and the other state-of-the-art learning models.
△ Less
Submitted 31 May, 2021; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Unsupervised Anomaly Detection via Deep Metric Learning with End-to-End Optimization
Authors:
Selim F. Yilmaz,
Suleyman S. Kozat
Abstract:
We investigate unsupervised anomaly detection for high-dimensional data and introduce a deep metric learning (DML) based framework. In particular, we learn a distance metric through a deep neural network. Through this metric, we project the data into the metric space that better separates the anomalies from the normal data and reduces the effect of the curse of dimensionality for high-dimensional…
▽ More
We investigate unsupervised anomaly detection for high-dimensional data and introduce a deep metric learning (DML) based framework. In particular, we learn a distance metric through a deep neural network. Through this metric, we project the data into the metric space that better separates the anomalies from the normal data and reduces the effect of the curse of dimensionality for high-dimensional data. We present a novel data distillation method through self-supervision to remedy the conventional practice of assuming all data as normal. We also employ the hard mining technique from the DML literature. We show these components improve the performance of our model and significantly reduce the running time. Through an extensive set of experiments on the 14 real-world datasets, our method demonstrates significant performance gains compared to the state-of-the-art unsupervised anomaly detection methods, e.g., an absolute improvement between 4.44% and 11.74% on the average over the 14 datasets. Furthermore, we share the source code of our method on Github to facilitate further research.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
RNN-based Online Learning: An Efficient First-Order Optimization Algorithm with a Convergence Guarantee
Authors:
N. Mert Vural,
Selim F. Yilmaz,
Fatih Ilhan,
Suleyman S. Kozat
Abstract:
We investigate online nonlinear regression with continually running recurrent neural network networks (RNNs), i.e., RNN-based online learning. For RNN-based online learning, we introduce an efficient first-order training algorithm that theoretically guarantees to converge to the optimum network parameters. Our algorithm is truly online such that it does not make any assumption on the learning envi…
▽ More
We investigate online nonlinear regression with continually running recurrent neural network networks (RNNs), i.e., RNN-based online learning. For RNN-based online learning, we introduce an efficient first-order training algorithm that theoretically guarantees to converge to the optimum network parameters. Our algorithm is truly online such that it does not make any assumption on the learning environment to guarantee convergence. Through numerical simulations, we verify our theoretical results and illustrate significant performance improvements achieved by our algorithm with respect to the state-of-the-art RNN training methods.
△ Less
Submitted 31 May, 2021; v1 submitted 7 March, 2020;
originally announced March 2020.