Search | arXiv e-print repository

arXiv:2009.02572 [pdf, other]

PySAD: A Streaming Anomaly Detection Framework in Python

Authors: Selim F. Yilmaz, Suleyman S. Kozat

Abstract: Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors… ▽ More Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors, probability calibrators, and postprocessors. Unlike existing batch-focused frameworks, PySAD enables efficient real-time processing with bounded memory while maintaining compatibility with PyOD and scikit-learn. Supporting all learning paradigms for univariate and multivariate streams, PySAD provides the most comprehensive streaming anomaly detection toolkit in Python. The source code is publicly available at github.com/selimfirat/pysad. △ Less

Submitted 24 May, 2025; v1 submitted 5 September, 2020; originally announced September 2020.

Comments: 7 pages, 1 figure

arXiv:2008.11573 [pdf, other]

doi 10.1109/TNNLS.2021.3094304

Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance

Authors: Selim F. Yilmaz, E. Batuhan Kaynak, Aykut Koç, Hamdi Dibeklioğlu, Suleyman S. Kozat

Abstract: We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each cl… ▽ More We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in 7 of 9 metrics in 3 different languages using a single model compared to the common baselines and the best-performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: 11 pages, 6 figures

arXiv:2005.08948 [pdf, other]

Achieving Online Regression Performance of LSTMs with Simple RNNs

Authors: N. Mert Vural, Fatih Ilhan, Selim F. Yilmaz, Salih Ergüt, Suleyman S. Kozat

Abstract: Recurrent Neural Networks (RNNs) are widely used for online regression due to their ability to generalize nonlinear temporal dependencies. As an RNN model, Long-Short-Term-Memory Networks (LSTMs) are commonly preferred in practice, as these networks are capable of learning long-term dependencies while avoiding the vanishing gradient problem. However, due to their large number of parameters, traini… ▽ More Recurrent Neural Networks (RNNs) are widely used for online regression due to their ability to generalize nonlinear temporal dependencies. As an RNN model, Long-Short-Term-Memory Networks (LSTMs) are commonly preferred in practice, as these networks are capable of learning long-term dependencies while avoiding the vanishing gradient problem. However, due to their large number of parameters, training LSTMs requires considerably longer training time compared to simple RNNs (SRNNs). In this paper, we achieve the online regression performance of LSTMs with SRNNs efficiently. To this end, we introduce a first-order training algorithm with a linear time complexity in the number of parameters. We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time. We provide strong theoretical analysis to support our experimental results by providing regret bounds on the convergence rate of our algorithm. Through an extensive set of experiments, we verify our theoretical work and demonstrate significant performance improvements of our algorithm with respect to LSTMs and the other state-of-the-art learning models. △ Less

Submitted 31 May, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2003.03601

arXiv:2005.05865 [pdf, other]

Unsupervised Anomaly Detection via Deep Metric Learning with End-to-End Optimization

Authors: Selim F. Yilmaz, Suleyman S. Kozat

Abstract: We investigate unsupervised anomaly detection for high-dimensional data and introduce a deep metric learning (DML) based framework. In particular, we learn a distance metric through a deep neural network. Through this metric, we project the data into the metric space that better separates the anomalies from the normal data and reduces the effect of the curse of dimensionality for high-dimensional… ▽ More We investigate unsupervised anomaly detection for high-dimensional data and introduce a deep metric learning (DML) based framework. In particular, we learn a distance metric through a deep neural network. Through this metric, we project the data into the metric space that better separates the anomalies from the normal data and reduces the effect of the curse of dimensionality for high-dimensional data. We present a novel data distillation method through self-supervision to remedy the conventional practice of assuming all data as normal. We also employ the hard mining technique from the DML literature. We show these components improve the performance of our model and significantly reduce the running time. Through an extensive set of experiments on the 14 real-world datasets, our method demonstrates significant performance gains compared to the state-of-the-art unsupervised anomaly detection methods, e.g., an absolute improvement between 4.44% and 11.74% on the average over the 14 datasets. Furthermore, we share the source code of our method on Github to facilitate further research. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: 11 pages, 3 figures

arXiv:2003.03601

RNN-based Online Learning: An Efficient First-Order Optimization Algorithm with a Convergence Guarantee

Authors: N. Mert Vural, Selim F. Yilmaz, Fatih Ilhan, Suleyman S. Kozat

Abstract: We investigate online nonlinear regression with continually running recurrent neural network networks (RNNs), i.e., RNN-based online learning. For RNN-based online learning, we introduce an efficient first-order training algorithm that theoretically guarantees to converge to the optimum network parameters. Our algorithm is truly online such that it does not make any assumption on the learning envi… ▽ More We investigate online nonlinear regression with continually running recurrent neural network networks (RNNs), i.e., RNN-based online learning. For RNN-based online learning, we introduce an efficient first-order training algorithm that theoretically guarantees to converge to the optimum network parameters. Our algorithm is truly online such that it does not make any assumption on the learning environment to guarantee convergence. Through numerical simulations, we verify our theoretical results and illustrate significant performance improvements achieved by our algorithm with respect to the state-of-the-art RNN training methods. △ Less

Submitted 31 May, 2021; v1 submitted 7 March, 2020; originally announced March 2020.

Comments: This paper was an early draft of the presented results. We have written and published another paper (arXiv:2005.08948) where we have improved the material in this paper. The published paper covers most of the material presented in this paper as well. Therefore, we remove this paper from Arxiv and kindly refer the interested readers to arXiv:2005.08948

Showing 1–5 of 5 results for author: Yilmaz, S F