Skip to main content

Showing 1–50 of 50 results for author: Rabbany, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22512  [pdf, ps, other

    cs.CY cs.AI

    Ask before you Build: Rethinking AI-for-Good in Human Trafficking Interventions

    Authors: Pratheeksha Nair, Gabriel Lefebvre, Sophia Garrel, Maryam Molamohammadi, Reihaneh Rabbany

    Abstract: AI for good initiatives often rely on the assumption that technical interventions can resolve complex social problems. In the context of human trafficking (HT), such techno-solutionism risks oversimplifying exploitation, reinforcing power imbalances and causing harm to the very communities AI claims to support. In this paper, we introduce the Radical Questioning (RQ) framework as a five step, pre-… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.20702  [pdf

    cs.AI cs.CY

    The Singapore Consensus on Global AI Safety Research Priorities

    Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai , et al. (63 additional authors not shown)

    Abstract: Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Final report from the "2025 Singapore Conference on AI (SCAI)" held April 26: https://www.scai.gov.sg/2025/scai2025-report

  3. arXiv:2506.15794  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Veracity: An Open-Source AI Fact-Checking System

    Authors: Taylor Lynn Curtis, Maximilian Puelma Touzel, William Garneau, Manon Gruaz, Mike Pinder, Li Wei Wang, Sukanya Krishna, Luda Cohen, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine

    Abstract: The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze us… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  4. Unified Game Moderation: Soft-Prompting and LLM-Assisted Label Transfer for Resource-Efficient Toxicity Detection

    Authors: Zachary Yang, Domenico Tullo, Reihaneh Rabbany

    Abstract: Toxicity detection in gaming communities faces significant scaling challenges when expanding across multiple games and languages, particularly in real-time environments where computational efficiency is crucial. We present two key findings to address these challenges while building upon our previous work on ToxBuster, a BERT-based real-time toxicity detection system. First, we introduce a soft-pro… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 11 pages, 1 figure, 9 Tables, KDD 2025 ADS Track

    ACM Class: I.2.7; J.4

  5. arXiv:2506.05393  [pdf, ps, other

    cs.CL cs.LG

    Are Large Language Models Good Temporal Graph Learners?

    Authors: Shenyang Huang, Ali Parviz, Emma Kondrup, Zachary Yang, Zifeng Ding, Michael Bronstein, Reihaneh Rabbany, Guillaume Rabusseau

    Abstract: Large Language Models (LLMs) have recently driven significant advancements in Natural Language Processing and various other applications. While a broad range of literature has explored the graph-reasoning capabilities of LLMs, including their use of predictors on graphs, the application of LLMs to dynamic graphs -- real world evolving networks -- remains relatively unexplored. Recent work studies… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 9 pages, 9 tables, 4 figures

  6. arXiv:2506.02451  [pdf, ps, other

    cs.LG

    Weak Supervision for Real World Graphs

    Authors: Pratheeksha Nair, Reihaneh Rabbany

    Abstract: Node classification in real world graphs often suffers from label scarcity and noise, especially in high stakes domains like human trafficking detection and misinformation monitoring. While direct supervision is limited, such graphs frequently contain weak signals, noisy or indirect cues, that can still inform learning. We propose WSNET, a novel weakly supervised graph contrastive learning framewo… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    ACM Class: I.2.6

  7. arXiv:2504.09712  [pdf, ps, other

    cs.CR cs.AI cs.CV

    The Structural Safety Generalization Problem

    Authors: Julius Broomfield, Tom Gibbs, Ethan Kosak-Hine, George Ingebretsen, Tia Nasir, Jason Zhang, Reihaneh Iranmanesh, Sara Pieri, Reihaneh Rabbany, Kellin Pelrine

    Abstract: LLM jailbreaks are a widespread safety challenge. Given this problem has not yet been tractable, we suggest targeting a key failure mechanism: the failure of safety to generalize across semantically equivalent inputs. We further focus the target by requiring desirable tractability properties of attacks to study: explainability, transferability between models, and transferability between goals. We… ▽ More

    Submitted 30 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  8. arXiv:2504.00408  [pdf, other

    cs.CY cs.AI cs.HC

    From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions

    Authors: Ruben Weijers, Denton Wu, Hannah Betts, Tamara Jacod, Yuxiang Guan, Vidya Sujaya, Kushal Dev, Toshali Goel, William Delooze, Reihaneh Rabbany, Ying Wu, Jean-François Godbout, Kellin Pelrine

    Abstract: Generative AI has the potential to transform personalization and accessibility of education. However, it raises serious concerns about accuracy and helping students become independent critical thinkers. In this study, we designed a helpful AI "Peer" to help students correct fundamental physics misconceptions related to Newtonian mechanic concepts. In contrast to approaches that seek near-perfect a… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  9. arXiv:2502.15210  [pdf, ps, other

    cs.LG cs.AI cs.CL

    PairBench: Are Vision-Language Models Reliable at Comparing What They See?

    Authors: Aarash Feizi, Sai Rajeswar, Adriana Romero-Soriano, Reihaneh Rabbany, Valentina Zantedeschi, Spandana Gella, João Monteiro

    Abstract: Understanding how effectively large vision language models (VLMs) compare visual inputs is crucial across numerous applications, yet this fundamental capability remains insufficiently assessed. While VLMs are increasingly deployed for tasks requiring comparative judgment, including automated evaluation, re-ranking, and retrieval-augmented generation, no systematic framework exists to measure their… ▽ More

    Submitted 29 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  10. arXiv:2501.12537  [pdf, other

    cs.CL cs.CY

    Enhancing Privacy in the Early Detection of Sexual Predators Through Federated Learning and Differential Privacy

    Authors: Khaoula Chehbouni, Martine De Cock, Gilles Caporossi, Afaf Taik, Reihaneh Rabbany, Golnoosh Farnadi

    Abstract: The increased screen time and isolation caused by the COVID-19 pandemic have led to a significant surge in cases of online grooming, which is the use of strategies by predators to lure children into sexual exploitation. Previous efforts to detect grooming in industry and academia have involved accessing and monitoring private conversations through centrally-trained models or sending private conver… ▽ More

    Submitted 15 April, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: Accepted to AAAI-Social Impact Track - Oral

  11. arXiv:2501.10387  [pdf, other

    cs.CY

    Online Influence Campaigns: Strategies and Vulnerabilities

    Authors: Andreea Musulan, Veronica Xia, Ethan Kosak-Hine, Tom Gibbs, Vidya Sujaya, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine

    Abstract: In order to combat the creation and spread of harmful content online, this paper defines and contextualizes the concept of inauthentic, societal-scale manipulation by malicious actors. We review the literature on societally harmful content and how it proliferates to analyze the manipulation strategies used by such actors and the vulnerabilities they target. We also provide an overview of three cas… ▽ More

    Submitted 18 December, 2024; originally announced January 2025.

    ACM Class: A.1

  12. arXiv:2412.10540  [pdf, other

    cs.LG q-fin.ST

    Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data

    Authors: Soroush Omranpour, Guillaume Rabusseau, Reihaneh Rabbany

    Abstract: In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational comp… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: KDD 2024 Workshop on Machine Learning in Finance

  13. arXiv:2412.02919  [pdf, other

    cs.LG cs.AI

    Higher Order Transformers: Efficient Attention Mechanism for Tensor Structured Data

    Authors: Soroush Omranpour, Guillaume Rabusseau, Reihaneh Rabbany

    Abstract: Transformers are now ubiquitous for sequence modeling tasks, but their extension to multi-dimensional data remains a challenge due to the quadratic cost of the attention mechanism. In this paper, we propose Higher-Order Transformers (HOT), a novel architecture designed to efficiently process data with more than two axes, i.e. higher-order tensors. To address the computational challenges associated… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  14. arXiv:2411.06528  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Epistemic Integrity in Large Language Models

    Authors: Bijean Ghafouri, Shahrad Mohammadzadeh, James Zhou, Pratheeksha Nair, Jacob-Junqi Tian, Hikaru Tsujimura, Mayank Goel, Sukanya Krishna, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine

    Abstract: Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration $\unicode{x2013}$ where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new… ▽ More

    Submitted 8 June, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

  15. arXiv:2411.05060  [pdf, ps, other

    cs.SI cs.CL cs.CY

    A Guide to Misinformation Detection Data and Evaluation

    Authors: Camille Thibault, Jacob-Junqi Tian, Gabrielle Peloquin-Skulski, Taylor Lynn Curtis, James Zhou, Florence Laflamme, Yuxiang Guan, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine

    Abstract: Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of 36 datasets that consist of statements or claims, as well as the 9 datasets that consist of data in purely paragraph form. W… ▽ More

    Submitted 18 June, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

  16. arXiv:2410.15460  [pdf, other

    cs.AI cs.CL math.SP

    Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

    Authors: Shahrad Mohammadzadeh, Juan David Guerra, Marco Bonizzato, Reihaneh Rabbany, Golnoosh Farnadi

    Abstract: As large language models (LLMs) are increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations - outputs that are factually inaccurate or irrelevant to user input - have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focus… ▽ More

    Submitted 7 January, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: 23 pages, 15 figures, under review at ICLR, accepted to Safe Generative AI Workshop @ NeurIPS 2024, resubmitting to change name to appropriate name

  17. arXiv:2410.13915  [pdf, other

    cs.SI cs.AI cs.CY

    A Simulation System Towards Solving Societal-Scale Manipulation

    Authors: Maximilian Puelma Touzel, Sneheel Sarangi, Austin Welch, Gayatri Krishnakumar, Dan Zhao, Zachary Yang, Hao Yu, Ethan Kosak-Hine, Tom Gibbs, Andreea Musulan, Camille Thibault, Busra Tugce Gurbuz, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine

    Abstract: The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-world settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to ad… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  18. arXiv:2409.00137  [pdf, other

    cs.CR cs.AI cs.CL

    Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks

    Authors: Tom Gibbs, Ethan Kosak-Hine, George Ingebretsen, Jason Zhang, Julius Broomfield, Sara Pieri, Reihaneh Iranmanesh, Reihaneh Rabbany, Kellin Pelrine

    Abstract: Large language models (LLMs) are improving at an exceptional rate. However, these models are still susceptible to jailbreak attacks, which are becoming increasingly dangerous as models become increasingly powerful. In this work, we introduce a dataset of jailbreaks where each example can be input in both a single or a multi-turn format. We show that while equivalent in content, they are not equiva… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

  19. arXiv:2409.00009  [pdf, other

    cs.IR cs.AI

    Web Retrieval Agents for Evidence-Based Misinformation Detection

    Authors: Jacob-Junqi Tian, Hao Yu, Yury Orlovskiy, Tyler Vergho, Mauricio Rivera, Mayank Goel, Zachary Yang, Jean-Francois Godbout, Reihaneh Rabbany, Kellin Pelrine

    Abstract: This paper develops an agent-based automated fact-checking approach for detecting misinformation. We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently. Our approach is robust across multiple models, outperforming alternatives and increasing the mac… ▽ More

    Submitted 9 October, 2024; v1 submitted 15 August, 2024; originally announced September 2024.

    Comments: 1 main figure, 8 tables, 10 pages, 12 figures in Appendix, 7 tables in Appendix GitHub URL: https://github.com/ComplexData-MILA/webretrieval

  20. arXiv:2407.12269  [pdf, other

    cs.LG cs.SI

    UTG: Towards a Unified View of Snapshot and Event Based Models for Temporal Graphs

    Authors: Shenyang Huang, Farimah Poursafaei, Reihaneh Rabbany, Guillaume Rabusseau, Emanuele Rossi

    Abstract: Many real world graphs are inherently dynamic, constantly evolving with node and edge additions. These graphs can be represented by temporal graphs, either through a stream of edge events or a sequence of graph snapshots. Until now, the development of machine learning methods for both types has occurred largely in isolation, resulting in limited experimental comparison and theoretical crosspollina… ▽ More

    Submitted 1 December, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  21. arXiv:2407.02807  [pdf, other

    cs.SI

    Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

    Authors: Zachary Yang, Anne Imouza, Maximilian Puelma Touzel, Cecile Amadoro, Gabrielle Desrosiers-Brisebois, Kellin Pelrine, Sacha Levy, Jean-Francois Godbout, Reihaneh Rabbany

    Abstract: Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages (main paper), 9 figures, 1 table

    ACM Class: J.4

  22. arXiv:2406.10426  [pdf, other

    cs.LG

    MiNT: Multi-Network Training for Transfer Learning on Temporal Graphs

    Authors: Kiarash Shamsi, Tran Gia Bao Ngo, Razieh Shirzadkhani, Shenyang Huang, Farimah Poursafaei, Poupak Azad, Reihaneh Rabbany, Baris Coskunuzer, Guillaume Rabusseau, Cuneyt Gurcan Akcora

    Abstract: Temporal Graph Learning (TGL) has become a robust framework for discovering patterns in dynamic networks and predicting future interactions. While existing research has largely concentrated on learning from individual networks, this study explores the potential of learning from multiple temporal networks and its ability to transfer to unobserved networks. To achieve this, we introduce Temporal Mul… ▽ More

    Submitted 14 February, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures, preprint version

  23. arXiv:2406.09639  [pdf, other

    cs.LG cs.SI

    TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

    Authors: Julia Gastinger, Shenyang Huang, Mikhail Galkin, Erfan Loghmani, Ali Parviz, Farimah Poursafaei, Jacob Danovitch, Emanuele Rossi, Ioannis Koutis, Heiner Stuckenschmidt, Reihaneh Rabbany, Guillaume Rabusseau

    Abstract: Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due t… ▽ More

    Submitted 18 October, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 29 pages, 8 figures, 11 tables, accepted at NeurIPS 2024 Track on Datasets and Benchmarks

  24. arXiv:2402.03651  [pdf, other

    cs.SI cs.LG

    Temporal Graph Analysis with TGX

    Authors: Razieh Shirzadkhani, Shenyang Huang, Elahe Kooshafar, Reihaneh Rabbany, Farimah Poursafaei

    Abstract: Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely designed for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  25. arXiv:2401.08694  [pdf, other

    cs.CL cs.AI

    Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation

    Authors: Mauricio Rivera, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine

    Abstract: Large Language Models have emerged as prime candidates to tackle misinformation mitigation. However, existing approaches struggle with hallucinations and overconfident predictions. We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods to provide better calibration for NLP misinformation mitigation solutions. We fi… ▽ More

    Submitted 30 January, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

    Comments: 12 pages, 11 figures

  26. arXiv:2401.06920  [pdf, other

    cs.CL

    Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

    Authors: Tyler Vergho, Jean-Francois Godbout, Reihaneh Rabbany, Kellin Pelrine

    Abstract: Recent large language models (LLMs) have been shown to be effective for misinformation detection. However, the choice of LLMs for experiments varies widely, leading to uncertain conclusions. In particular, GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions. Meanwhile, alternative LLMs have given mixed resu… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  27. arXiv:2401.01990  [pdf, other

    cs.CV cs.AI cs.LG

    GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning

    Authors: Aarash Feizi, Randall Balestriero, Adriana Romero-Soriano, Reihaneh Rabbany

    Abstract: We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  28. arXiv:2401.01197  [pdf, other

    cs.CL cs.AI

    Uncertainty Resolution in Misinformation Detection

    Authors: Yury Orlovskiy, Camille Thibault, Anne Imouza, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine

    Abstract: Misinformation poses a variety of risks, such as undermining public trust and distorting factual discourse. Large Language Models (LLMs) like GPT-4 have been shown effective in mitigating misinformation, particularly in handling statements where enough context is provided. However, they struggle to assess ambiguous or context-deficient statements accurately. This work introduces a new method to re… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  29. Towards Detecting Contextual Real-Time Toxicity for In-Game Chat

    Authors: Zachary Yang, Nicolas Grenan-Godbout, Reihaneh Rabbany

    Abstract: Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gaming platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multipl… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 9 pages, 4 figures, 13 tables. arXiv admin note: text overlap with arXiv:2305.12542

  30. arXiv:2310.04292  [pdf, other

    cs.LG

    Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

    Authors: Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, Ali Parviz, Michael Craig, Michał Koziarski, Jiarui Lu, Zhaocheng Zhu, Cristian Gabellini, Kerstin Klaser, Josef Dean, Cas Wognum, Maciej Sypetkowski, Guillaume Rabusseau, Reihaneh Rabbany, Jian Tang, Christopher Morris , et al. (10 additional authors not shown)

    Abstract: Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by… ▽ More

    Submitted 18 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  31. arXiv:2308.13699  [pdf, other

    cs.SI cs.LG

    Party Prediction for Twitter

    Authors: Kellin Pelrine, Anne Imouza, Zachary Yang, Jacob-Junqi Tian, Sacha Lévy, Gabrielle Desrosiers-Brisebois, Aarash Feizi, Cécile Amadoro, André Blais, Jean-François Godbout, Reihaneh Rabbany

    Abstract: A large number of studies on social media compare the behaviour of users from different political parties. As a basic step, they employ a predictive model for inferring their political affiliation. The accuracy of this model can change the conclusions of a downstream analysis significantly, yet the choice between different models seems to be made arbitrarily. In this paper, we provide a comprehens… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  32. arXiv:2308.10092  [pdf, other

    cs.CL cs.AI

    Open, Closed, or Small Language Models for Text Classification?

    Authors: Hao Yu, Zachary Yang, Kellin Pelrine, Jean Francois Godbout, Reihaneh Rabbany

    Abstract: Recent advancements in large language models have demonstrated remarkable capabilities across various NLP tasks. But many questions remain, including whether open-source models match closed ones, why these models excel or struggle with certain tasks, and what types of practical procedures can improve performance. We address these questions in the context of classification by evaluating three class… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: 14 pages, 15 Tables, 1 Figure

  33. arXiv:2307.01026  [pdf, other

    cs.LG cs.AI

    Temporal Graph Benchmark for Machine Learning on Temporal Graphs

    Authors: Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, Reihaneh Rabbany

    Abstract: We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and tran… ▽ More

    Submitted 27 September, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: 20 pages, 7 figures, 7 tables, accepted at NeurIPS 2023 Datasets and Benchmarks Track

  34. arXiv:2305.14928  [pdf, other

    cs.CL cs.LG

    Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4

    Authors: Kellin Pelrine, Anne Imouza, Camille Thibault, Meilina Reksoprodjo, Caleb Gupta, Joel Christoph, Jean-François Godbout, Reihaneh Rabbany

    Abstract: Misinformation poses a critical societal challenge, and current approaches have yet to produce an effective solution. We propose focusing on generalization, uncertainty, and how to leverage recent large language models, in order to create more practical tools to evaluate information veracity in contexts where perfect classification is impossible. We first demonstrate that GPT-4 can outperform prio… ▽ More

    Submitted 31 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  35. arXiv:2305.12542  [pdf, other

    cs.CL cs.CY

    ToxBuster: In-game Chat Toxicity Buster with BERT

    Authors: Zachary Yang, Yasmine Maricar, MohammadReza Davari, Nicolas Grenon-Godbout, Reihaneh Rabbany

    Abstract: Detecting toxicity in online spaces is challenging and an ever more pressing problem given the increase in social media and gaming consumption. We introduce ToxBuster, a simple and scalable model trained on a relatively large dataset of 194k lines of game chat from Rainbow Six Siege and For Honor, carefully annotated for different kinds of toxicity. Compared to the existing state-of-the-art, ToxBu… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: 11 pages, 3 figures

  36. arXiv:2305.08750  [pdf, other

    cs.LG

    Fast and Attributed Change Detection on Dynamic Graphs with Density of States

    Authors: Shenyang Huang, Jacob Danovitch, Guillaume Rabusseau, Reihaneh Rabbany

    Abstract: How can we detect traffic disturbances from international flight transportation logs or changes to collaboration dynamics in academic networks? These problems can be formulated as detecting anomalous change points in a dynamic graph. Current solutions do not scale well to large real-world graphs, lack robustness to large amounts of node additions/deletions, and overlook changes in node attributes.… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: in PAKDD 2023, 18 pages, 12 figures

  37. arXiv:2302.01204  [pdf, other

    cs.LG

    Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs

    Authors: Shenyang Huang, Samy Coulombe, Yasmeen Hitti, Reihaneh Rabbany, Guillaume Rabusseau

    Abstract: Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly detection in temporal graphs is crucial for many real world applications such as intrusion identification in network systems, detection of ecosystem disturbances and detection of epidemic outbreaks. In this paper, we focus on change point detection in dynamic grap… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 30 pages, 15 figures, extended version of previous paper "Laplacian Change Point Detection for Dynamic Graphs" with novel material. arXiv admin note: substantial text overlap with arXiv:2007.01229

  38. arXiv:2209.11135  [pdf, other

    cs.SI cs.IR

    Active Keyword Selection to Track Evolving Topics on Twitter

    Authors: Sacha Lévy, Farimah Poursafaei, Kellin Pelrine, Reihaneh Rabbany

    Abstract: How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as economics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update ke… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: 10 pages, 3 figures

  39. arXiv:2207.10200  [pdf, other

    cs.CV cs.DB

    Revisiting Hotels-50K and Hotel-ID

    Authors: Aarash Feizi, Arantxa Casanova, Adriana Romero-Soriano, Reihaneh Rabbany

    Abstract: In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels50K and Hotel-ID. The revisited versions provide evaluation setups with different levels of difficulty to better align with the intended real-world application, i.e. countering human trafficking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ICML 2022 DataPerf Workshop

  40. arXiv:2207.10128  [pdf, other

    cs.LG cs.SI

    Towards Better Evaluation for Dynamic Link Prediction

    Authors: Farimah Poursafaei, Shenyang Huang, Kellin Pelrine, Reihaneh Rabbany

    Abstract: Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to un… ▽ More

    Submitted 11 September, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

  41. arXiv:2105.04037  [pdf, ps, other

    cs.LG

    Graph Attention Networks with Positional Embeddings

    Authors: Liheng Ma, Reihaneh Rabbany, Adriana Romero-Soriano

    Abstract: Graph Neural Networks (GNNs) are deep learning methods which provide the current state of the art performance in node classification tasks. GNNs often assume homophily -- neighboring nodes having similar features and labels--, and therefore may not be at their full potential when dealing with non-homophilic graphs. In this work, we focus on addressing this limitation and enable Graph Attention Net… ▽ More

    Submitted 24 October, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

  42. arXiv:2104.06952  [pdf, other

    cs.CL cs.LG

    The Surprising Performance of Simple Baselines for Misinformation Detection

    Authors: Kellin Pelrine, Jacob Danovitch, Reihaneh Rabbany

    Abstract: As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and prevent the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performan… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  43. arXiv:2010.03081  [pdf, other

    cs.SI physics.soc-ph

    Contact Graph Epidemic Modelling of COVID-19 for Transmission and Intervention Strategies

    Authors: Abby Leung, Xiaoye Ding, Shenyang Huang, Reihaneh Rabbany

    Abstract: The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that the structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  44. arXiv:2010.01408  [pdf, other

    physics.soc-ph cs.SI

    Incorporating Dynamic Flight Network in SEIR to Model Mobility between Populations

    Authors: Xiaoye Ding, Shenyang Huang, Abby Leung, Reihaneh Rabbany

    Abstract: Current efforts of modelling COVID-19 are often based on the standard compartmental models such as SEIR and their variations. As pre-symptomatic and asymptomatic cases can spread the disease between populations through travel, it is important to incorporate mobility between populations into the epidemiological modelling. In this work, we propose to modify the commonly-used SEIR model to account fo… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

  45. arXiv:2007.01229  [pdf, other

    cs.LG cs.SI stat.ML

    Laplacian Change Point Detection for Dynamic Graphs

    Authors: Shenyang Huang, Yasmeen Hitti, Guillaume Rabusseau, Reihaneh Rabbany

    Abstract: Dynamic and temporal graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly detection in temporal graphs is crucial for many real world applications such as intrusion identification in network systems, detection of ecosystem disturbances and detection of epidemic outbreaks. In this paper, we focus on change point detection in… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: in KDD 2020, 10 pages

  46. arXiv:1910.07130  [pdf, other

    cs.SI cs.IR

    SCG: Spotting Coordinated Groups in Social Media

    Authors: Junhao Wang, Sacha Levy, Ren Wang, Aayushi Kulshrestha, Reihaneh Rabbany

    Abstract: Recent events have led to a burgeoning awareness on the misuse of social media sites to affect political events, sway public opinion, and confuse the voters. Such serious, hostile mass manipulation has motivated a large body of works on bots/troll detection and fake news detection, which mostly focus on classifying at the user level based on the content generated by the users. In this study, we jo… ▽ More

    Submitted 1 September, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

  47. arXiv:1906.12328  [pdf, other

    cs.SI cs.LG stat.ML

    Anomaly Detection with Joint Representation Learning of Content and Connection

    Authors: Junhao Wang, Renhao Wang, Aayushi Kulshrestha, Reihaneh Rabbany

    Abstract: Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information space to confuse and distract voters. Past works to identify disruptive patterns are mostly focused on analyzing the content of tweets. In this study, we jointly embed the information from both user posted content as well as a user's follower network, to detect… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: 2019 International Conference on Machine Learning Workshop on AI for Social Good

  48. arXiv:1801.01229  [pdf, other

    cs.SI physics.soc-ph

    Modular Networks for Validating Community Detection Algorithms

    Authors: Justin Fagnan, Afra Abnar, Reihaneh Rabbany, Osmar R. Zaiane

    Abstract: How can we accurately compare different community detection algorithms? These algorithms cluster nodes in a given network, and their performance is often validated on benchmark networks with explicit ground-truth communities. Given the lack of cluster labels in real-world networks, a model that generates realistic networks is required for accurate evaluation of these algorithm. In this paper, we p… ▽ More

    Submitted 3 January, 2018; originally announced January 2018.

  49. arXiv:1412.2601  [pdf, other

    cs.SI physics.soc-ph

    Generalization of Clustering Agreements and Distances for Overlapping Clusters and Network Communities

    Authors: Reihaneh Rabbany, Osmar R. Zaïane

    Abstract: A measure of distance between two clusterings has important applications, including clustering validation and ensemble clustering. Generally, such distance measure provides navigation through the space of possible clusterings. Mostly used in cluster validation, a normalized clustering distance, a.k.a. agreement measure, compares a given clustering result against the ground-truth clustering. Cluste… ▽ More

    Submitted 5 March, 2015; v1 submitted 8 December, 2014; originally announced December 2014.

    Journal ref: Data Mining and Knowledge Discovery: Volume 29, Issue 5 (2015)

  50. arXiv:1406.0941  [pdf, other

    cs.AI

    Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning

    Authors: Siamak Ravanbakhsh, Reihaneh Rabbany, Russell Greiner

    Abstract: The cutting plane method is an augmentative constrained optimization procedure that is often used with continuous-domain optimization techniques such as linear and convex programs. We investigate the viability of a similar idea within message passing -- which produces integral solutions -- in the context of two combinatorial problems: 1) For Traveling Salesman Problem (TSP), we propose a factor-gr… ▽ More

    Submitted 4 June, 2014; originally announced June 2014.

    Report number: Advances in Neural Information Processing Systems 27 (NIPS 2014)