-
Data+Shift: Supporting visual investigation of data distribution shifts by data scientists
Authors:
João Palmeiro,
Beatriz Malveiro,
Rita Costa,
David Polido,
Ricardo Moreira,
Pedro Bizarro
Abstract:
Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+S…
▽ More
Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
GUDIE: a flexible, user-defined method to extract subgraphs of interest from large graphs
Authors:
Maria Inês Silva,
David Aparício,
Beatriz Malveiro,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for…
▽ More
Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for rich, labeled graphs, and expansions consider node and edge attributes. Preliminary results indicate that GUDIE expands to insightful areas while avoiding unimportant connections. The resulting subgraphs contain the relevant context for a seed node and can accelerate and extend analysis capabilities in finance and other critical networks.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Finding NeMo: Fishing in banking networks using network motifs
Authors:
Xavier Fontes,
David Aparício,
Maria Inês Silva,
Beatriz Malveiro,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a no…
▽ More
Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a novel network randomization process that generates valid banking graphs. From our exploratory analysis, we conclude that network motifs extract insightful and interpretable patterns.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.