Skip to main content

Showing 1–50 of 62 results for author: Akoglu, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.01281  [pdf, other

    cs.AI cs.LG

    Uncertainty-aware Human Mobility Modeling and Anomaly Detection

    Authors: Haomin Wen, Shurui Cao, Zeeshan Rasheed, Khurram Hassan Shafique, Leman Akoglu

    Abstract: Given the temporal GPS coordinates from a large set of human agents, how can we model their mobility behavior toward effective anomaly (e.g. bad-actor or malicious behavior) detection without any labeled data? Human mobility and trajectory modeling have been extensively studied, showcasing varying abilities to manage complex inputs and balance performance-efficiency trade-offs. In this work, we fo… ▽ More

    Submitted 5 May, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

  2. arXiv:2409.05672  [pdf, other

    cs.LG cs.AI

    Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone!

    Authors: Yuchen Shen, Haomin Wen, Leman Akoglu

    Abstract: Outlier detection (OD) has a vast literature as it finds numerous real-world applications. Being an unsupervised task, model selection is a key bottleneck for OD without label supervision. Despite a long list of available OD algorithms with tunable hyperparameters, the lack of systematic approaches for unsupervised algorithm and hyperparameter selection limits their effective use in practice. In t… ▽ More

    Submitted 16 May, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: preprint

  3. arXiv:2408.13667  [pdf, other

    cs.LG cs.CY

    Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors

    Authors: Xueying Ding, Rui Xi, Leman Akoglu

    Abstract: The astonishing successes of ML have raised growing concern for the fairness of modern methods when deployed in real world settings. However, studies on fairness have mostly focused on supervised ML, while unsupervised outlier detection (OD), with numerous applications in finance, security, etc., have attracted little attention. While a few studies proposed fairness-enhanced OD algorithms, they re… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures, 16 appendix pages, accepted at AIES 2024

  4. End-To-End Self-Tuning Self-Supervised Time Series Anomaly Detection

    Authors: Boje Deforce, Meng-Chieh Lee, Bart Baesens, Estefanía Serral Asensio, Jaemin Yoo, Leman Akoglu

    Abstract: Time series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in m… ▽ More

    Submitted 3 April, 2025; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted at SDM 2025

  5. arXiv:2402.07860  [pdf, other

    cs.SI cs.AI cs.GT

    On the Detection of Reviewer-Author Collusion Rings From Paper Bidding

    Authors: Steven Jecmen, Nihar B. Shah, Fei Fang, Leman Akoglu

    Abstract: A major threat to the peer-review systems of computer science conferences is the existence of "collusion rings" between reviewers. In such collusion rings, reviewers who have also submitted their own papers to the conference work together to manipulate the conference's paper assignment, with the aim of being assigned to review each other's papers. The most straightforward way that colluding review… ▽ More

    Submitted 10 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2402.06087  [pdf, other

    cs.LG

    Descriptive Kernel Convolution Network with Improved Random Walk Kernel

    Authors: Meng-Chieh Lee, Lingxiao Zhao, Leman Akoglu

    Abstract: Graph kernels used to be the dominant approach to feature engineering for structured data, which are superseded by modern GNNs as the former lacks learnability. Recently, a suite of Kernel Convolution Networks (KCNs) successfully revitalized graph kernels by introducing learnability, which convolves input with learnable hidden graphs using a certain graph kernel. The random walk kernel (RWK) has b… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: WWW 2024

  7. arXiv:2402.03701  [pdf, other

    cs.LG stat.ML

    Unified Discrete Diffusion for Categorical Data

    Authors: Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu

    Abstract: Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs. Although discrete-time discrete diffusion has been established for a while, only recently Campbell et al. (2022) introduced the first framework for continuous-time discrete diffusion. However, their training and sampling processes differ significantly from the discrete… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Unify Discrete Denoising Diffusion

  8. arXiv:2402.03687  [pdf, other

    cs.LG stat.ML

    Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

    Authors: Lingxiao Zhao, Xueying Ding, Leman Akoglu

    Abstract: Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising… ▽ More

    Submitted 2 December, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Diffusion Model on Graphs

    Journal ref: NeurIPS 2024

  9. arXiv:2311.07355  [pdf, other

    cs.LG

    ADAMM: Anomaly Detection of Attributed Multi-graphs with Metadata: A Unified Neural Network Approach

    Authors: Konstantinos Sotiropoulos, Lingxiao Zhao, Pierre Jinghong Liang, Leman Akoglu

    Abstract: Given a complex graph database of node- and edge-attributed multi-graphs as well as associated metadata for each graph, how can we spot the anomalous instances? Many real-world problems can be cast as graph inference tasks where the graph representation could capture complex relational phenomena (e.g., transactions among financial accounts in a journal entry), along with metadata reflecting tabula… ▽ More

    Submitted 17 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted at IEEE BigData 2023

  10. arXiv:2308.14380  [pdf, other

    cs.LG

    Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities

    Authors: Leman Akoglu, Jaemin Yoo

    Abstract: Self-supervised learning (SSL) is a growing torrent that has recently transformed machine learning and its many real world applications, by learning on massive amounts of unlabeled data via self-generated supervisory signals. Unsupervised anomaly detection (AD) has also capitalized on SSL, by self-generating pseudo-anomalies through various data augmentation functions or external data exposure. In… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  11. arXiv:2307.10529  [pdf, other

    cs.LG cs.AI

    Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

    Authors: Xueying Ding, Yue Zhao, Leman Akoglu

    Abstract: Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the s… ▽ More

    Submitted 24 August, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: 12 pages, 7 figures, accepted at KDD 2024

  12. arXiv:2307.06534  [pdf, other

    cs.LG

    DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection

    Authors: Jaemin Yoo, Yue Zhao, Lingxiao Zhao, Leman Akoglu

    Abstract: Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based ano… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted to ECML PKDD 2023

  13. arXiv:2306.12033  [pdf, other

    cs.LG cs.CV

    End-to-End Augmentation Hyperparameter Tuning for Self-Supervised Anomaly Detection

    Authors: Jaemin Yoo, Lingxiao Zhao, Leman Akoglu

    Abstract: Self-supervised learning (SSL) has emerged as a promising paradigm that presents supervisory signals to real-world problems, bypassing the extensive cost of manual labeling. Consequently, self-supervised anomaly detection (SSAD) has seen a recent surge of interest, since SSL is especially attractive for unsupervised tasks. However, recent works have reported that the choice of a data augmentation… ▽ More

    Submitted 2 March, 2025; v1 submitted 21 June, 2023; originally announced June 2023.

  14. arXiv:2304.03368  [pdf, other

    cs.LG cs.HC

    From Explanation to Action: An End-to-End Human-in-the-loop Framework for Anomaly Reasoning and Management

    Authors: Xueying Ding, Nikita Seleznev, Senthil Kumar, C. Bayan Bruss, Leman Akoglu

    Abstract: Anomalies are often indicators of malfunction or inefficiency in various systems such as manufacturing, healthcare, finance, surveillance, to name a few. While the literature is abundant in effective detection algorithms due to this practical relevance, autonomous anomaly detection is rarely used in real-world scenarios. Especially in high-stakes applications, a human-in-the-loop is often involved… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  15. arXiv:2211.02927  [pdf, other

    cs.CY cs.CR cs.LG

    Unsupervised Machine Learning for Explainable Health Care Fraud Detection

    Authors: Shubhranshu Shekhar, Jetson Leder-Luis, Leman Akoglu

    Abstract: The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify provide… ▽ More

    Submitted 23 February, 2023; v1 submitted 5 November, 2022; originally announced November 2022.

    Comments: NBER Working paper #30946

  16. arXiv:2211.01834  [pdf, other

    cs.LG

    Toward Unsupervised Outlier Model Selection

    Authors: Yue Zhao, Sean Zhang, Leman Akoglu

    Abstract: Today there exists no shortage of outlier detection algorithms in the literature, yet the complementary and critical problem of unsupervised outlier model selection (UOMS) is vastly understudied. In this work we propose ELECT, a new approach to select an effective candidate model, i.e. an outlier detection algorithm and its hyperparameter(s), to employ on a new dataset without any labels. At its c… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: ICDM 2022. Code available at https://github.com/yzhao062/ELECT

  17. arXiv:2210.09535  [pdf, other

    cs.LG cs.SI

    Graph Anomaly Detection with Unsupervised GNNs

    Authors: Lingxiao Zhao, Saurabh Sawlani, Arvind Srinivasan, Leman Akoglu

    Abstract: Graph-based anomaly detection finds numerous applications in the real-world. Thus, there exists extensive literature on the topic that has recently shifted toward deep detection models due to advances in deep learning and graph neural networks (GNNs). A vast majority of prior work focuses on detecting node/edge/subgraph anomalies within a single graph, with much less work on graph-level anomaly de… ▽ More

    Submitted 20 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: ICDM 2022 Short Paper Extension

  18. arXiv:2210.09521  [pdf, other

    cs.LG

    A Practical, Progressively-Expressive GNN

    Authors: Lingxiao Zhao, Louis Härtel, Neil Shah, Leman Akoglu

    Abstract: Message passing neural networks (MPNNs) have become a dominant flavor of graph neural networks (GNNs) in recent years. Yet, MPNNs come with notable limitations; namely, they are at most as powerful as the 1-dimensional Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism testing frame-work. To this end, researchers have drawn inspiration from the k-WL hierarchy to develop m… ▽ More

    Submitted 2 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  19. arXiv:2210.08212  [pdf, other

    cs.LG cs.AI

    D.MCA: Outlier Detection with Explicit Micro-Cluster Assignments

    Authors: Shuli Jiang, Robson Leonardo Ferreira Cordeiro, Leman Akoglu

    Abstract: How can we detect outliers, both scattered and clustered, and also explicitly assign them to respective micro-clusters, without knowing apriori how many micro-clusters exist? How can we perform both tasks in-house, i.e., without any post-hoc processing, so that both detection and assignment can benefit simultaneously from each other? Presenting outliers in separate micro-clusters is informative to… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: Proceedings of the 22nd IEEE International Conference on Data Mining (ICDM 2022)

  20. arXiv:2208.11727  [pdf, other

    cs.LG

    Hyperparameter Optimization for Unsupervised Outlier Detection

    Authors: Yue Zhao, Leman Akoglu

    Abstract: Given an unsupervised outlier detection (OD) algorithm, how can we optimize its hyperparameter(s) (HP) on a new dataset, without any labels? In this work, we address this challenging hyperparameter optimization for unsupervised OD problem, and propose the first systematic approach called HPOD that is based on meta-learning. HPOD capitalizes on the prior performance of a large collection of HPs on… ▽ More

    Submitted 10 October, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

  21. arXiv:2208.07734  [pdf, other

    cs.LG cs.AI

    Data Augmentation is a Hyperparameter: Cherry-picked Self-Supervision for Unsupervised Anomaly Detection is Creating the Illusion of Success

    Authors: Jaemin Yoo, Tiancheng Zhao, Leman Akoglu

    Abstract: Self-supervised learning (SSL) has emerged as a promising alternative to create supervisory signals to real-world problems, avoiding the extensive cost of manual labeling. SSL is particularly attractive for unsupervised tasks such as anomaly detection (AD), where labeled anomalies are rare or often nonexistent. A large catalog of augmentation functions has been used for SSL-based AD (SSAD) on imag… ▽ More

    Submitted 27 July, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Transactions on Machine Learning Research (TMLR)

  22. arXiv:2206.07674  [pdf, other

    cs.SI cs.DB

    Summarizing Labeled Multi-Graphs

    Authors: Dimitris Berberidis, Pierre J. Liang, Leman Akoglu

    Abstract: Real-world graphs can be difficult to interpret and visualize beyond a certain size. To address this issue, graph summarization aims to simplify and shrink a graph, while maintaining its high-level structure and characteristics. Most summarization methods are designed for homogeneous, undirected, simple graphs; however, many real-world graphs are ornate; with characteristics including node labels,… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: 17 pages, 8 figures, 4 tables

  23. arXiv:2206.07647  [pdf, other

    cs.LG cs.AI stat.ME

    Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution

    Authors: Xueying Ding, Lingxiao Zhao, Leman Akoglu

    Abstract: Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in unsupervised settings. HP tuning is an ever-growing problem with the arrival of many new detectors based on deep learning, which usually come with a long list of HPs. Surp… ▽ More

    Submitted 18 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: 19 pages, The code is available at: https://github.com/xyvivian/ROBOD

  24. Sparx: Distributed Outlier Detection at Scale

    Authors: Sean Zhang, Varun Ursekar, Leman Akoglu

    Abstract: There is no shortage of outlier detection (OD) algorithms in the literature, yet a vast body of them are designed for a single machine. With the increasing reality of already cloud-resident datasets comes the need for distributed OD techniques. This area, however, is not only understudied but also short of public-domain implementations for practical use. This paper aims to fill this gap: We design… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: 11 pages, 7 figures, 14 tables

    Journal ref: ACK SIGKDD 2022

  25. Benefit-aware Early Prediction of Health Outcomes on Multivariate EEG Time Series

    Authors: Shubhranshu Shekhar, Dhivya Eswaran, Bryan Hooi, Jonathan Elmer, Christos Faloutsos, Leman Akoglu

    Abstract: Given a cardiac-arrest patient being monitored in the ICU (intensive care unit) for brain activity, how can we predict their health outcomes as early as possible? Early decision-making is critical in many applications, e.g. monitoring patients may assist in early intervention and improved care. On the other hand, early prediction on EEG data poses several challenges: (i) earliness-accuracy trade-o… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: arxiv submission

    Journal ref: Journal of Biomedical Informatics Volume 139, March 2023, 104296

  26. arXiv:2110.08257  [pdf, other

    cs.LG cs.AI

    C-AllOut: Catching & Calling Outliers by Type

    Authors: Guilherme D. F. Silva, Leman Akoglu, Robson L. F. Cordeiro

    Abstract: Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing di… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: 9+4 pages, 3 figures, 11 tables

  27. arXiv:2110.05228  [pdf, other

    cs.LG stat.ML

    Fast Attributed Graph Embedding via Density of States

    Authors: Saurabh Sawlani, Lingxiao Zhao, Leman Akoglu

    Abstract: Given a node-attributed graph, how can we efficiently represent it with few numerical features that expressively reflect its topology and attribute information? We propose A-DOGE, for Attributed DOS-based Graph Embedding, based on density of states (DOS, a.k.a. spectral density) to tackle this problem. A-DOGE is designed to fulfill a long desiderata of desirable characteristics. Most notably, it c… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: ICDM 2021

  28. arXiv:2110.03753  [pdf, other

    cs.LG stat.ML

    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness

    Authors: Lingxiao Zhao, Wei Jin, Leman Akoglu, Neil Shah

    Abstract: Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism t… ▽ More

    Submitted 20 April, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Expressive GNN framework

  29. arXiv:2106.15352  [pdf, other

    cs.SI

    Detecting Changed-Hands Online Review Accounts

    Authors: Geli Fei, Shuai Wang, Bing Liu, Leman Akoglu

    Abstract: A reputable social media or review account can be a good cover for spamming activities. It has become prevalent that spammers buy/sell such accounts openly on the Web. We call these sold/bought accounts the changed-hands (CH) accounts. They are hard to detect by existing spam detection algorithms as their spamming activities are under the disguise of clean histories. In this paper, we first propos… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: 8 pages, 1 figure

  30. A Comprehensive Survey on Graph Anomaly Detection with Deep Learning

    Authors: Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z. Sheng, Hui Xiong, Leman Akoglu

    Abstract: Anomalies represent rare observations (e.g., data records or events) that deviate significantly from others. Over several decades, research on anomaly mining has received increasing interests due to the implications of these occurrences in a wide range of disciplines. Anomaly detection, which aims to identify rare observations, is among the most vital tasks in the world, and has shown its power in… ▽ More

    Submitted 19 April, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: 31 pages

    Journal ref: TKDE 2021

  31. arXiv:2105.10077  [pdf, ps, other

    cs.LG cs.SI

    Anomaly Mining -- Past, Present and Future

    Authors: Leman Akoglu

    Abstract: Anomaly mining is an important problem that finds numerous applications in various real world domains such as environmental monitoring, cybersecurity, finance, healthcare and medicine, to name a few. In this article, I focus on two areas, (1) point-cloud and (2) graph-based anomaly mining. I aim to present a broad view of each area, and discuss classes of main research problems, recent trends and… ▽ More

    Submitted 31 May, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: 6 pages

  32. arXiv:2104.01422  [pdf, other

    cs.LG

    A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice?

    Authors: Martin Q. Ma, Yue Zhao, Xiaorong Zhang, Leman Akoglu

    Abstract: Given an unsupervised outlier detection task, how should one select a detection algorithm as well as its hyperparameters (jointly called a model)? Unsupervised model selection is notoriously difficult, in the absence of hold-out validation data with ground-truth labels. Therefore, the problem is vastly understudied. In this work, we study the feasibility of employing internal model evaluation stra… ▽ More

    Submitted 12 April, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

  33. arXiv:2012.12931  [pdf, other

    cs.LG stat.ML

    On Using Classification Datasets to Evaluate Graph-Level Outlier Detection: Peculiar Observations and New Insights

    Authors: Lingxiao Zhao, Leman Akoglu

    Abstract: It is common practice of the outlier mining community to repurpose classification datasets toward evaluating various detection models. To that end, often a binary classification dataset is used, where samples from one of the classes is designated as the inlier samples, and the other class is substantially down-sampled to create the ground-truth outlier samples. Graph-level outlier detection (GLOD)… ▽ More

    Submitted 18 May, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: extensive revision

  34. FairOD: Fairness-aware Outlier Detection

    Authors: Shubhranshu Shekhar, Neil Shah, Leman Akoglu

    Abstract: Fairness and Outlier Detection (OD) are closely related, as it is exactly the goal of OD to spot rare, minority samples in a given population. However, when being a minority (as defined by protected variables, such as race/ethnicity/sex/age) does not reflect positive-class membership (such as criminal/fraud), OD produces unjust outcomes. Surprisingly, fairness-aware OD has been almost untouched in… ▽ More

    Submitted 30 August, 2021; v1 submitted 5 December, 2020; originally announced December 2020.

    Comments: Updated to AIES'21 version

  35. arXiv:2011.00447  [pdf, other

    cs.SI

    AutoAudit: Mining Accounting and Time-Evolving Graphs

    Authors: Meng-Chieh Lee, Yue Zhao, Aluna Wang, Pierre Jinghong Liang, Leman Akoglu, Vincent S. Tseng, Christos Faloutsos

    Abstract: How can we spot money laundering in large-scale graph-like accounting datasets? How to identify the most suspicious period in a time-evolving accounting graph? What kind of accounts and events should practitioners prioritize under time constraints? To tackle these crucial challenges in accounting and auditing tasks, we propose a flexible system called AutoAudit, which can be valuable for auditors… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: In Proceedings of 2020 IEEE International Conference on Big Data (Big Data)

  36. arXiv:2010.03600  [pdf, other

    cs.DB cs.AI cs.LG

    Anomaly Detection in Large Labeled Multi-Graph Databases

    Authors: Hung T. Nguyen, Pierre J. Liang, Leman Akoglu

    Abstract: Within a large database G containing graphs with labeled nodes and directed, multi-edges; how can we detect the anomalous graphs? Most existing work are designed for plain (unlabeled) and/or simple (unweighted) graphs. We introduce CODETECT, the first approach that addresses the anomaly detection task for graph databases with such complex nature. To this end, it identifies a small representative s… ▽ More

    Submitted 1 May, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: 24 pages

  37. arXiv:2009.10606  [pdf, other

    cs.LG cs.IR stat.ML

    Automating Outlier Detection via Meta-Learning

    Authors: Yue Zhao, Ryan A. Rossi, Leman Akoglu

    Abstract: Given an unsupervised outlier detection (OD) task on a new dataset, how can we automatically select a good outlier detection method and its hyperparameter(s) (collectively called a model)? Thus far, model selection for OD has been a "black art"; as any model evaluation is infeasible due to the lack of (i) hold-out data with labels, and (ii) a universal objective function. In this work, we develop… ▽ More

    Submitted 17 March, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 21 pages. The code is available at http://github.com/yzhao062/MetaOD

  38. arXiv:2006.12294  [pdf, other

    cs.LG stat.ML

    Connecting Graph Convolutional Networks and Graph-Regularized PCA

    Authors: Lingxiao Zhao, Leman Akoglu

    Abstract: Graph convolution operator of the GCN model is originally motivated from a localized first-order approximation of spectral graph convolutions. This work stands on a different view; establishing a \textit{mathematical connection between graph convolution and graph-regularized PCA} (GPCA). Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relatio… ▽ More

    Submitted 2 March, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Under review

  39. arXiv:2006.11468  [pdf, other

    cs.LG stat.ML

    Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

    Authors: Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, Danai Koutra

    Abstract: We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Many popular GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons).… ▽ More

    Submitted 23 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020; version with full appendix

  40. arXiv:2003.05731  [pdf, other

    cs.LG cs.DC cs.IR stat.ML

    SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection

    Authors: Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng Sun, Leman Akoglu

    Abstract: Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection. Due to the lack of ground truth labels, practitioners often have to build a large number of unsupervised, heterogeneous models (i.e., different algorithms with varying hyperparameters) for further c… ▽ More

    Submitted 4 March, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 4th Conference on Machine Learning and Systems (MLSys). The code is available at see http://github.com/yzhao062/SUOD. arXiv admin note: text overlap with arXiv:2002.03222

  41. arXiv:1911.02617  [pdf, ps, other

    cs.LG cs.AI

    Coverage-based Outlier Explanation

    Authors: Yue Wu, Leman Akoglu, Ian Davidson

    Abstract: Outlier detection is a core task in data mining with a plethora of algorithms that have enjoyed wide scale usage. Existing algorithms are primarily focused on detection, that is the identification of outliers in a given dataset. In this paper we explore the relatively under-studied problem of the outlier explanation problem. Our goal is, given a dataset that is already divided into outliers and no… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

  42. arXiv:1909.12385  [pdf, other

    cs.LG cs.SI stat.ML

    A Quest for Structure: Jointly Learning the Graph Structure and Semi-Supervised Classification

    Authors: Xuan Wu, Lingxiao Zhao, Leman Akoglu

    Abstract: Semi-supervised learning (SSL) is effectively used for numerous classification problems, thanks to its ability to make use of abundant unlabeled data. The main assumption of various SSL algorithms is that the nearby points on the data manifold are likely to share a label. Graph-based SSL constructs a graph from point-cloud data as an approximation to the underlying manifold, followed by label infe… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: 11 pages, CIKM-2018

  43. arXiv:1909.12223  [pdf, ps, other

    cs.LG stat.ML

    PairNorm: Tackling Oversmoothing in GNNs

    Authors: Lingxiao Zhao, Leman Akoglu

    Abstract: The performance of graph neural nets (GNNs) is known to gradually decrease with increasing number of layers. This decay is partly attributed to oversmoothing, where repeated graph convolutions eventually make node embeddings indistinguishable. We take a closer look at two different interpretations, aiming to quantify oversmoothing. Our main contribution is PairNorm, a novel normalization layer tha… ▽ More

    Submitted 12 February, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: ICLR 2020 Camera Ready

  44. arXiv:1907.03813  [pdf, other

    stat.ML cs.LG math.ST

    Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

    Authors: Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

    Abstract: Nearest-neighbor (NN) procedures are well studied and widely used in both supervised and unsupervised learning problems. In this paper we are concerned with investigating the performance of NN-based methods for anomaly detection. We first show through extensive simulations that NN methods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a set of ben… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

  45. arXiv:1906.12218  [pdf, other

    cs.LG stat.ML

    Continual Rare-Class Recognition with Emerging Novel Subclasses

    Authors: Hung Nguyen, Xuejian Wang, Leman Akoglu

    Abstract: Given a labeled dataset that contains a rare (or minority) class of of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? We introduce RaRecognize, which (i) estimates a general decision boundary between the rare and the majority class, (ii) learns to recognize individual rare subc… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

    Comments: accepted to PKDD ECML 2019

  46. arXiv:1805.02269  [pdf, other

    cs.LG stat.ML

    Incorporating Privileged Information to Unsupervised Anomaly Detection

    Authors: Shubhranshu Shekhar, Leman Akoglu

    Abstract: We introduce a new unsupervised anomaly detection ensemble called SPI which can harness privileged information - data available only for training examples but not for (future) test examples. Our ideas build on the Learning Using Privileged Information (LUPI) paradigm pioneered by Vapnik et al. [19,17], which we extend to unsupervised learning and in particular to anomaly detection. SPI (for Spotti… ▽ More

    Submitted 24 May, 2018; v1 submitted 6 May, 2018; originally announced May 2018.

  47. arXiv:1710.05333  [pdf, other

    cs.SI

    LookOut on Time-Evolving Graphs: Succinctly Explaining Anomalies from Any Detector

    Authors: Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, Christos Faloutsos

    Abstract: Why is a given node in a time-evolving graph ($t$-graph) marked as an anomaly by an off-the-shelf detection algorithm? Is it because of the number of its outgoing or incoming edges, or their timings? How can we best convince a human analyst that the node is anomalous? Our work aims to provide succinct, interpretable, and simple explanations of anomalous behavior in $t$-graphs (communications, IP-I… ▽ More

    Submitted 15 October, 2017; originally announced October 2017.

  48. arXiv:1708.05929  [pdf, other

    cs.LG stat.ML

    Explaining Anomalies in Groups with Characterizing Subspace Rules

    Authors: Meghanath Macha, Leman Akoglu

    Abstract: Anomaly detection has numerous applications and has been studied vastly. We consider a complementary problem that has a much sparser literature: anomaly description. Interpretation of anomalies is crucial for practitioners for sense-making, troubleshooting, and planning actions. To this end, we present a new approach called x-PACS (for eXplaining Patterns of Anomalies with Characterizing Subspaces… ▽ More

    Submitted 2 May, 2018; v1 submitted 19 August, 2017; originally announced August 2017.

    Comments: 31 pages, 6 figures, 9 tables

  49. arXiv:1702.05764  [pdf, other

    cs.SI

    Fast, Warped Graph Embedding: Unifying Framework and One-Click Algorithm

    Authors: Siheng Chen, Sufeng Niu, Leman Akoglu, Jelena Kovačević, Christos Faloutsos

    Abstract: What is the best way to describe a user in a social network with just a few numbers? Mathematically, this is equivalent to assigning a vector representation to each node in a graph, a process called graph embedding. We propose a novel framework, GEM-D that unifies most of the past algorithms such as LapEigs, DeepWalk and node2vec. GEM-D achieves its goal by decomposing any graph embedding algorith… ▽ More

    Submitted 19 February, 2017; originally announced February 2017.

  50. arXiv:1701.09039  [pdf, other

    cs.SI cs.IR physics.soc-ph

    Ties That Bind - Characterizing Classes by Attributes and Social Ties

    Authors: Aria Rezaei, Bryan Perozzi, Leman Akoglu

    Abstract: Given a set of attributed subgraphs known to be from different classes, how can we discover their differences? There are many cases where collections of subgraphs may be contrasted against each other. For example, they may be assigned ground truth labels (spam/not-spam), or it may be desired to directly compare the biological networks of different species or compound networks of different chemical… ▽ More

    Submitted 31 January, 2017; originally announced January 2017.

    Comments: WWW'17 Web Science, 9 pages