Search | arXiv e-print repository

Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

Authors: João Palmeiro, Beatriz Malveiro, Rita Costa, David Polido, Ricardo Moreira, Pedro Bizarro

Abstract: Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+S… ▽ More Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: 5 pages, 3 figures, short paper accepted at EuroVis 2022

arXiv:2108.09200 [pdf, other]

GUDIE: a flexible, user-defined method to extract subgraphs of interest from large graphs

Authors: Maria Inês Silva, David Aparício, Beatriz Malveiro, João Tiago Ascensão, Pedro Bizarro

Abstract: Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for… ▽ More Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for rich, labeled graphs, and expansions consider node and edge attributes. Preliminary results indicate that GUDIE expands to insightful areas while avoiding unimportant connections. The resulting subgraphs contain the relevant context for a seed node and can accelerate and extend analysis capabilities in finance and other critical networks. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: 16 pages, 8 figures, accepted at GEM2021

arXiv:2108.04494 [pdf, other]

Finding NeMo: Fishing in banking networks using network motifs

Authors: Xavier Fontes, David Aparício, Maria Inês Silva, Beatriz Malveiro, João Tiago Ascensão, Pedro Bizarro

Abstract: Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a no… ▽ More Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a novel network randomization process that generates valid banking graphs. From our exploratory analysis, we conclude that network motifs extract insightful and interpretable patterns. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: 6 pages, 6 figures, accepted at SEAData 2021

Showing 1–3 of 3 results for author: Malveiro, B