-
Scaling Human Judgment in Community Notes with LLMs
Authors:
Haiwen Li,
Soham De,
Manon Revel,
Andreas Haupt,
Brad Miller,
Keith Coleman,
Jay Baxter,
Martin Saveski,
Michiel A. Bakker
Abstract:
This paper argues for a new paradigm for Community Notes in the LLM era: an open ecosystem where both humans and LLMs can write notes, and the decision of which notes are helpful enough to show remains in the hands of humans. This approach can accelerate the delivery of notes, while maintaining trust and legitimacy through Community Notes' foundational principle: A community of diverse human rater…
▽ More
This paper argues for a new paradigm for Community Notes in the LLM era: an open ecosystem where both humans and LLMs can write notes, and the decision of which notes are helpful enough to show remains in the hands of humans. This approach can accelerate the delivery of notes, while maintaining trust and legitimacy through Community Notes' foundational principle: A community of diverse human raters collectively serve as the ultimate evaluator and arbiter of what is helpful. Further, the feedback from this diverse community can be used to improve LLMs' ability to produce accurate, unbiased, broadly helpful notes--what we term Reinforcement Learning from Community Feedback (RLCF). This becomes a two-way street: LLMs serve as an asset to humans--helping deliver context quickly and with minimal effort--while human feedback, in turn, enhances the performance of LLMs. This paper describes how such a system can work, its benefits, key new risks and challenges it introduces, and a research agenda to solve those challenges and realize the potential of this approach.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Community Moderation and the New Epistemology of Fact Checking on Social Media
Authors:
Isabelle Augenstein,
Michiel Bakker,
Tanmoy Chakraborty,
David Corney,
Emilio Ferrara,
Iryna Gurevych,
Scott Hale,
Eduard Hovy,
Heng Ji,
Irene Larraz,
Filippo Menczer,
Preslav Nakov,
Paolo Papotti,
Dhruv Sahnan,
Greta Warren,
Giovanni Zagni
Abstract:
Social media platforms have traditionally relied on internal moderation teams and partnerships with independent fact-checking organizations to identify and flag misleading content. Recently, however, platforms including X (formerly Twitter) and Meta have shifted towards community-driven content moderation by launching their own versions of crowd-sourced fact-checking -- Community Notes. If effecti…
▽ More
Social media platforms have traditionally relied on internal moderation teams and partnerships with independent fact-checking organizations to identify and flag misleading content. Recently, however, platforms including X (formerly Twitter) and Meta have shifted towards community-driven content moderation by launching their own versions of crowd-sourced fact-checking -- Community Notes. If effectively scaled and governed, such crowd-checking initiatives have the potential to combat misinformation with increased scale and speed as successfully as community-driven efforts once did with spam. Nevertheless, general content moderation, especially for misinformation, is inherently more complex. Public perceptions of truth are often shaped by personal biases, political leanings, and cultural contexts, complicating consensus on what constitutes misleading content. This suggests that community efforts, while valuable, cannot replace the indispensable role of professional fact-checkers. Here we systemically examine the current approaches to misinformation detection across major platforms, explore the emerging role of community-driven moderation, and critically evaluate both the promises and challenges of crowd-checking at scale.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Value Profiles for Encoding Human Variation
Authors:
Taylor Sorensen,
Pushkar Mishra,
Roma Patel,
Michael Henry Tessler,
Michiel Bakker,
Georgina Evans,
Iason Gabriel,
Noah Goodman,
Verena Rieser
Abstract:
Modelling human variation in rating tasks is crucial for enabling AI systems for personalization, pluralistic model alignment, and computational social science. We propose representing individuals using value profiles -- natural language descriptions of underlying values compressed from in-context demonstrations -- along with a steerable decoder model to estimate ratings conditioned on a value pro…
▽ More
Modelling human variation in rating tasks is crucial for enabling AI systems for personalization, pluralistic model alignment, and computational social science. We propose representing individuals using value profiles -- natural language descriptions of underlying values compressed from in-context demonstrations -- along with a steerable decoder model to estimate ratings conditioned on a value profile or other rater information. To measure the predictive information in rater representations, we introduce an information-theoretic methodology. We find that demonstrations contain the most information, followed by value profiles and then demographics. However, value profiles offer advantages in terms of scrutability, interpretability, and steerability due to their compressed natural language format. Value profiles effectively compress the useful information from demonstrations (>70% information preservation). Furthermore, clustering value profiles to identify similarly behaving individuals better explains rater variation than the most predictive demographic groupings. Going beyond test set performance, we show that the decoder models interpretably change ratings according to semantic profile differences, are well-calibrated, and can help explain instance-level disagreement by simulating an annotator population. These results demonstrate that value profiles offer novel, predictive ways to describe individual variation beyond demographics or group information.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Using Collective Dialogues and AI to Find Common Ground Between Israeli and Palestinian Peacebuilders
Authors:
Andrew Konya,
Luke Thorburn,
Wasim Almasri,
Oded Adomi Leshem,
Ariel D. Procaccia,
Lisa Schirch,
Michiel A. Bakker
Abstract:
A growing body of work has shown that AI-assisted methods -- leveraging large language models, social choice methods, and collective dialogues -- can help navigate polarization and surface common ground in controlled lab settings. But what can these approaches contribute in real-world contexts? We present a case study applying these techniques to find common ground between Israeli and Palestinian…
▽ More
A growing body of work has shown that AI-assisted methods -- leveraging large language models, social choice methods, and collective dialogues -- can help navigate polarization and surface common ground in controlled lab settings. But what can these approaches contribute in real-world contexts? We present a case study applying these techniques to find common ground between Israeli and Palestinian peacebuilders in the period following October 7th, 2023. From April to July 2024 an iterative deliberative process combining LLMs, bridging-based ranking, and collective dialogues was conducted in partnership with the Alliance for Middle East Peace. Around 138 civil society peacebuilders participated including Israeli Jews, Palestinian citizens of Israel, and Palestinians from the West Bank and Gaza. The process resulted in a set of collective statements, including demands to world leaders, with at least 84% agreement from participants on each side. In this paper, we document the process, results, challenges, and important open questions.
△ Less
Submitted 19 June, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Tell Me Why: Incentivizing Explanations
Authors:
Siddarth Srinivasan,
Ezra Karger,
Michiel Bakker,
Yiling Chen
Abstract:
Common sense suggests that when individuals explain why they believe something, we can arrive at more accurate conclusions than when they simply state what they believe. Yet, there is no known mechanism that provides incentives to elicit explanations for beliefs from agents. This likely stems from the fact that standard Bayesian models make assumptions (like conditional independence of signals) th…
▽ More
Common sense suggests that when individuals explain why they believe something, we can arrive at more accurate conclusions than when they simply state what they believe. Yet, there is no known mechanism that provides incentives to elicit explanations for beliefs from agents. This likely stems from the fact that standard Bayesian models make assumptions (like conditional independence of signals) that preempt the need for explanations, in order to show efficient information aggregation. A natural justification for the value of explanations is that agents' beliefs tend to be drawn from overlapping sources of information, so agents' belief reports do not reveal all that needs to be known. Indeed, this work argues that rationales-explanations of an agent's private information-lead to more efficient aggregation by allowing agents to efficiently identify what information they share and what information is new. Building on this model of rationales, we present a novel 'deliberation mechanism' to elicit rationales from agents in which truthful reporting of beliefs and rationales is a perfect Bayesian equilibrium.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Language Agents as Digital Representatives in Collective Decision-Making
Authors:
Daniel Jarrett,
Miruna Pîslar,
Michiel A. Bakker,
Michael Henry Tessler,
Raphael Köster,
Jan Balaguer,
Romuald Elie,
Christopher Summerfield,
Andrea Tacchetti
Abstract:
Consider the process of collective decision-making, in which a group of individuals interactively select a preferred outcome from among a universe of alternatives. In this context, "representation" is the activity of making an individual's preferences present in the process via participation by a proxy agent -- i.e. their "representative". To this end, learned models of human behavior have the pot…
▽ More
Consider the process of collective decision-making, in which a group of individuals interactively select a preferred outcome from among a universe of alternatives. In this context, "representation" is the activity of making an individual's preferences present in the process via participation by a proxy agent -- i.e. their "representative". To this end, learned models of human behavior have the potential to fill this role, with practical implications for multi-agent scenario studies and mechanism design. In this work, we investigate the possibility of training \textit{language agents} to behave in the capacity of representatives of human agents, appropriately expressing the preferences of those individuals whom they stand for. First, we formalize the setting of \textit{collective decision-making} -- as the episodic process of interaction between a group of agents and a decision mechanism. On this basis, we then formalize the problem of \textit{digital representation} -- as the simulation of an agent's behavior to yield equivalent outcomes from the mechanism. Finally, we conduct an empirical case study in the setting of \textit{consensus-finding} among diverse humans, and demonstrate the feasibility of fine-tuning large language models to act as digital representatives.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
AI and the Future of Digital Public Squares
Authors:
Beth Goldberg,
Diana Acosta-Navas,
Michiel Bakker,
Ian Beacock,
Matt Botvinick,
Prateek Buch,
Renée DiResta,
Nandika Donthi,
Nathanael Fast,
Ravi Iyer,
Zaria Jalan,
Andrew Konya,
Grace Kwak Danciu,
Hélène Landemore,
Alice Marwick,
Carl Miller,
Aviv Ovadya,
Emily Saltz,
Lisa Schirch,
Dalit Shalom,
Divya Siddarth,
Felix Sieker,
Christopher Small,
Jonathan Stray,
Audrey Tang
, et al. (2 additional authors not shown)
Abstract:
Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerba…
▽ More
Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerbating societal schisms. Here, we explore four applications of LLMs to improve digital public squares: collective dialogue systems, bridging systems, community moderation, and proof-of-humanity systems. Building on the input from over 70 civil society experts and technologists, we argue that LLMs both afford promising opportunities to shift the paradigm for conversations at scale and pose distinct risks for digital public squares. We lay out an agenda for future research and investments in AI that will strengthen digital public squares and safeguard against potential misuses of AI.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work
Authors:
Aviv Ovadya,
Kyle Redman,
Luke Thorburn,
Quan Ze Chen,
Oliver Smith,
Flynn Devine,
Andrew Konya,
Smitha Milli,
Manon Revel,
K. J. Kevin Feng,
Amy X. Zhang,
Bilva Chandra,
Michiel A. Bakker,
Atoosa Kasirzadeh
Abstract:
This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps -- such as Meta's Community Forums and Anthropic's Collective Constitutional AI -- have illustrated a promising direction, where democratic processes could be used to meaningfully improv…
▽ More
This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps -- such as Meta's Community Forums and Anthropic's Collective Constitutional AI -- have illustrated a promising direction, where democratic processes could be used to meaningfully improve public involvement and trust in critical decisions. To more concretely explore what increasingly democratic AI might look like, we provide a "Democracy Levels" framework and associated tools that: (i) define milestones toward meaningfully democratic AI, which is also crucial for substantively pluralistic, human-centered, participatory, and public-interest AI, (ii) can help guide organizations seeking to increase the legitimacy of their decisions on difficult AI governance and alignment questions, and (iii) support the evaluation of such efforts.
△ Less
Submitted 18 June, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Supernotes: Driving Consensus in Crowd-Sourced Fact-Checking
Authors:
Soham De,
Michiel A. Bakker,
Jay Baxter,
Martin Saveski
Abstract:
X's Community Notes, a crowd-sourced fact-checking system, allows users to annotate potentially misleading posts. Notes rated as helpful by a diverse set of users are prominently displayed below the original post. While demonstrably effective at reducing misinformation's impact when notes are displayed, there is an opportunity for notes to appear on many more posts: for 91% of posts where at least…
▽ More
X's Community Notes, a crowd-sourced fact-checking system, allows users to annotate potentially misleading posts. Notes rated as helpful by a diverse set of users are prominently displayed below the original post. While demonstrably effective at reducing misinformation's impact when notes are displayed, there is an opportunity for notes to appear on many more posts: for 91% of posts where at least one note is proposed, no notes ultimately achieve sufficient support from diverse users to be shown on the platform. This motivates the development of Supernotes: AI-generated notes that synthesize information from several existing community notes and are written to foster consensus among a diverse set of users. Our framework uses an LLM to generate many diverse Supernote candidates from existing proposed notes. These candidates are then evaluated by a novel scoring model, trained on millions of historical Community Notes ratings, selecting candidates that are most likely to be rated helpful by a diverse set of users. To test our framework, we ran a human subjects experiment in which we asked participants to compare the Supernotes generated by our framework to the best existing community notes for 100 sample posts. We found that participants rated the Supernotes as significantly more helpful, and when asked to choose between the two, preferred the Supernotes 75.2% of the time. Participants also rated the Supernotes more favorably than the best existing notes on quality, clarity, coverage, context, and argumentativeness. Finally, in a follow-up experiment, we asked participants to compare the Supernotes against LLM-generated summaries and found that the participants rated the Supernotes significantly more helpful, demonstrating that both the LLM-based candidate generation and the consensus-driven scoring play crucial roles in creating notes that effectively build consensus among diverse users.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Evaluating Perceptual Deviations in Video See-Through Head-Mounted Displays while Utilizing Physical Touchscreens
Authors:
Rudy De-Xin de Lange,
Roemer Martin Bien Bakker,
Tanja Johanna Juliana Bos
Abstract:
Extended reality technology has become a useful tool in many applications, but still suffers from visual deviations that can hamper the utility of the technology. This paper discusses the types of persisting visual deviations experienced when observing the natural world through video see-through head-mounted displays. A generalizable method to measure the effect of these deviations on real-world i…
▽ More
Extended reality technology has become a useful tool in many applications, but still suffers from visual deviations that can hamper the utility of the technology. This paper discusses the types of persisting visual deviations experienced when observing the natural world through video see-through head-mounted displays. A generalizable method to measure the effect of these deviations on real-world interaction is designed and used in a human-in-the-loop experiment. The experiment compared video see-through sight through an head-mounted display with normal eyesight in a static set-up, focusing on (camera) lens distortions and display deviations. Participants interacted with a real touchscreen, locating the position of flashed markers shortly after disappearance comparing both conditions to check for deviations in position and time. Results show significant larger mean distance errors between the interaction locations and the original marker positions for video see-through compared to normal eyesight. Moreover, errors increase towards the screen periphery. No significant distance error improvement over time was found, however, response times did significantly decrease for both types of sight.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
How will advanced AI systems impact democracy?
Authors:
Christopher Summerfield,
Lisa Argyle,
Michiel Bakker,
Teddy Collins,
Esin Durmus,
Tyna Eloundou,
Iason Gabriel,
Deep Ganguli,
Kobi Hackenburg,
Gillian Hadfield,
Luke Hewitt,
Saffron Huang,
Helene Landemore,
Nahema Marchal,
Aviv Ovadya,
Ariel Procaccia,
Mathias Risse,
Bruce Schneier,
Elizabeth Seger,
Divya Siddarth,
Henrik Skaug Sætra,
MH Tessler,
Matthew Botvinick
Abstract:
Advanced AI systems capable of generating humanlike text and multimodal content are now widely available. In this paper, we discuss the impacts that generative artificial intelligence may have on democratic processes. We consider the consequences of AI for citizens' ability to make informed choices about political representatives and issues (epistemic impacts). We ask how AI might be used to desta…
▽ More
Advanced AI systems capable of generating humanlike text and multimodal content are now widely available. In this paper, we discuss the impacts that generative artificial intelligence may have on democratic processes. We consider the consequences of AI for citizens' ability to make informed choices about political representatives and issues (epistemic impacts). We ask how AI might be used to destabilise or support democratic mechanisms like elections (material impacts). Finally, we discuss whether AI will strengthen or weaken democratic principles (foundational impacts). It is widely acknowledged that new AI systems could pose significant challenges for democracy. However, it has also been argued that generative AI offers new opportunities to educate and learn from citizens, strengthen public discourse, help people find common ground, and to reimagine how democracies might work better.
△ Less
Submitted 27 August, 2024;
originally announced September 2024.
-
Fine-tuning language models to find agreement among humans with diverse preferences
Authors:
Michiel A. Bakker,
Martin J. Chadwick,
Hannah R. Sheahan,
Michael Henry Tessler,
Lucy Campbell-Gillingham,
Jan Balaguer,
Nat McAleese,
Amelia Glaese,
John Aslanides,
Matthew M. Botvinick,
Christopher Summerfield
Abstract:
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might…
▽ More
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Statistical discrimination in learning agents
Authors:
Edgar A. Duéñez-Guzmán,
Kevin R. McKee,
Yiran Mao,
Ben Coppin,
Silvia Chiappa,
Alexander Sasha Vezhnevets,
Michiel A. Bakker,
Yoram Bachrach,
Suzanne Sadedin,
William Isaac,
Karl Tuyls,
Joel Z. Leibo
Abstract:
Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics. One primary example is \textit{statistical discrimination} -- selecting social partners based not on their underlying attributes, but on readily perceptible characteristics that covary with their suitability for the task at ha…
▽ More
Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics. One primary example is \textit{statistical discrimination} -- selecting social partners based not on their underlying attributes, but on readily perceptible characteristics that covary with their suitability for the task at hand. We present a theoretical model to examine how information processing influences statistical discrimination and test its predictions using multi-agent reinforcement learning with various agent architectures in a partner choice-based social dilemma. As predicted, statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture. All agents showed substantial statistical discrimination, defaulting to using the readily available correlates instead of the outcome relevant features. We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias. However, all agent algorithms we tried still exhibited substantial bias after learning in biased training populations.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Multi-task Semi-supervised Learning for Pulmonary Lobe Segmentation
Authors:
Jingnan Jia,
Zhiwei Zhai,
M. Els Bakker,
I. Hernandez Giron,
Marius Staring,
Berend C. Stoel
Abstract:
Pulmonary lobe segmentation is an important preprocessing task for the analysis of lung diseases. Traditional methods relying on fissure detection or other anatomical features, such as the distribution of pulmonary vessels and airways, could provide reasonably accurate lobe segmentations. Deep learning based methods can outperform these traditional approaches, but require large datasets. Deep mult…
▽ More
Pulmonary lobe segmentation is an important preprocessing task for the analysis of lung diseases. Traditional methods relying on fissure detection or other anatomical features, such as the distribution of pulmonary vessels and airways, could provide reasonably accurate lobe segmentations. Deep learning based methods can outperform these traditional approaches, but require large datasets. Deep multi-task learning is expected to utilize labels of multiple different structures. However, commonly such labels are distributed over multiple datasets. In this paper, we proposed a multi-task semi-supervised model that can leverage information of multiple structures from unannotated datasets and datasets annotated with different structures. A focused alternating training strategy is presented to balance the different tasks. We evaluated the trained model on an external independent CT dataset. The results show that our model significantly outperforms single-task alternatives, improving the mean surface distance from 7.174 mm to 4.196 mm. We also demonstrated that our approach is successful for different network architectures as backbones.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Authors:
Wei Chen,
Yu Liu,
Erwin M. Bakker,
Michael S. Lew
Abstract:
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversaria…
▽ More
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversarially. For this purpose, a modality classifier (as a discriminator) is built to distinguish the text and image modalities according to their different statistical properties. This discriminator uses its output probabilities to compute Shannon information entropy, which measures the uncertainty of the modality classification it performs. Moreover, feature encoders (as a generator) project uni-modal features into a commonly shared space and attempt to fool the discriminator by maximizing its output information entropy. Thus, maximizing information entropy gradually reduces the distribution discrepancy of cross-modal features, thereby achieving a domain confusion state where the discriminator cannot classify two modalities confidently. To reduce the semantic gap, Kullback-Leibler (KL) divergence and bi-directional triplet loss are used to associate the intra- and inter-modality similarity between features in the shared space. Furthermore, a regularization term based on KL-divergence with temperature scaling is used to calibrate the biased label classifier caused by the data imbalance issue. Extensive experiments with four deep models on four benchmarks are conducted to demonstrate the effectiveness of the proposed approach.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Lifelong Person Re-Identification via Adaptive Knowledge Accumulation
Authors:
Nan Pu,
Wei Chen,
Yu Liu,
Erwin M. Bakker,
Michael S. Lew
Abstract:
Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identific…
▽ More
Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identification (LReID), which enables to learn continuously across multiple domains and even generalise on new and unseen domains. Following the cognitive processes in the human brain, we design an Adaptive Knowledge Accumulation (AKA) framework that is endowed with two crucial abilities: knowledge representation and knowledge operation. Our method alleviates catastrophic forgetting on seen domains and demonstrates the ability to generalize to unseen domains. Correspondingly, we also provide a new and large-scale benchmark for LReID. Extensive experiments demonstrate our method outperforms other competitors by a margin of 5.8% mAP in generalising evaluation.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Modelling Cooperation in Network Games with Spatio-Temporal Complexity
Authors:
Michiel A. Bakker,
Richard Everett,
Laura Weidinger,
Iason Gabriel,
William S. Isaac,
Joel Z. Leibo,
Edward Hughes
Abstract:
The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve s…
▽ More
The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve socially beneficial outcomes, even in the face of short-term selfish incentives. In many cases, collective action problems possess an underlying graph structure, whose topology crucially determines the relationship between local decisions and emergent global effects. Such scenarios have received great attention through the lens of network games. However, this abstraction typically collapses important dimensions, such as geometry and time, relevant to the design of mechanisms promoting cooperation. In parallel work, multi-agent deep reinforcement learning has shown great promise in modelling the emergence of self-organized cooperation in complex gridworld domains. Here we apply this paradigm in graph-structured collective action problems. Using multi-agent deep reinforcement learning, we simulate an agent society for a variety of plausible mechanisms, finding clear transitions between different equilibria over time. We define analytic tools inspired by related literatures to measure the social outcomes, and use these to draw conclusions about the efficacy of different environmental interventions. Our methods have implications for mechanism design in both human and artificial agent systems.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Self-supervised asymmetric deep hashing with margin-scalable constraint
Authors:
Zhengyang Yu,
Song Wu,
Zhihao Dou,
Erwin M. Bakker
Abstract:
Due to its effectivity and efficiency, deep hashing approaches are widely used for large-scale visual search. However, it is still challenging to produce compact and discriminative hash codes for images associated with multiple semantics for two main reasons, 1) similarity constraints designed in most of the existing methods are based upon an oversimplified similarity assignment(i.e., 0 for instan…
▽ More
Due to its effectivity and efficiency, deep hashing approaches are widely used for large-scale visual search. However, it is still challenging to produce compact and discriminative hash codes for images associated with multiple semantics for two main reasons, 1) similarity constraints designed in most of the existing methods are based upon an oversimplified similarity assignment(i.e., 0 for instance pairs sharing no label, 1 for instance pairs sharing at least 1 label), 2) the exploration in multi-semantic relevance are insufficient or even neglected in many of the existing methods. These problems significantly limit the discrimination of generated hash codes. In this paper, we propose a novel self-supervised asymmetric deep hashing method with a margin-scalable constraint(SADH) approach to cope with these problems. SADH implements a self-supervised network to sufficiently preserve semantic information in a semantic feature dictionary and a semantic code dictionary for the semantics of the given dataset, which efficiently and precisely guides a feature learning network to preserve multilabel semantic information using an asymmetric learning strategy. By further exploiting semantic dictionaries, a new margin-scalable constraint is employed for both precise similarity searching and robust hash code generation. Extensive empirical research on four popular benchmarks validates the proposed method and shows it outperforms several state-of-the-art approaches.
△ Less
Submitted 23 July, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
On the Exploration of Incremental Learning for Fine-grained Image Retrieval
Authors:
Wei Chen,
Yu Liu,
Weiping Wang,
Tinne Tuytelaars,
Erwin M. Bakker,
Michael Lew
Abstract:
In this paper, we consider the problem of fine-grained image retrieval in an incremental setting, when new categories are added over time. On the one hand, repeatedly training the representation on the extended dataset is time-consuming. On the other hand, fine-tuning the learned representation only with the new classes leads to catastrophic forgetting. To this end, we propose an incremental learn…
▽ More
In this paper, we consider the problem of fine-grained image retrieval in an incremental setting, when new categories are added over time. On the one hand, repeatedly training the representation on the extended dataset is time-consuming. On the other hand, fine-tuning the learned representation only with the new classes leads to catastrophic forgetting. To this end, we propose an incremental learning method to mitigate retrieval performance degradation caused by the forgetting issue. Without accessing any samples of the original classes, the classifier of the original network provides soft "labels" to transfer knowledge to train the adaptive network, so as to preserve the previous capability for classification. More importantly, a regularization function based on Maximum Mean Discrepancy is devised to minimize the discrepancy of new classes features from the original network and the adaptive network, respectively. Extensive experiments on two datasets show that our method effectively mitigates the catastrophic forgetting on the original classes while achieving high performance on the new classes.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Dual Gaussian-based Variational Subspace Disentanglement for Visible-Infrared Person Re-Identification
Authors:
Nan Pu,
Wei Chen,
Yu Liu,
Erwin M. Bakker,
Michael S. Lew
Abstract:
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-bas…
▽ More
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-based variational auto-encoder (DG-VAE), which disentangles an identity-discriminable and an identity-ambiguous cross-modality feature subspace, following a mixture-of-Gaussians (MoG) prior and a standard Gaussian distribution prior, respectively. Disentangling cross-modality identity-discriminable features leads to more robust retrieval for VI-ReID. To achieve efficient optimization like conventional VAE, we theoretically derive two variational inference terms for the MoG prior under the supervised setting, which not only restricts the identity-discriminable subspace so that the model explicitly handles the cross-modality intra-identity variance, but also enables the MoG distribution to avoid posterior collapse. Furthermore, we propose a triplet swap reconstruction (TSR) strategy to promote the above disentangling process. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on two VI-ReID datasets.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Assessing Disease Exposure Risk with Location Data: A Proposal for Cryptographic Preservation of Privacy
Authors:
Alex Berke,
Michiel Bakker,
Praneeth Vepakomma,
Kent Larson,
Alex 'Sandy' Pentland
Abstract:
Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of expos…
▽ More
Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of exposure to an infectious disease while preserving individual privacy. Our proposal uses recent GPS location histories, which are transformed and encrypted, and a private set intersection protocol to interface with a semi-trusted authority.
There have been other recent proposals for privacy-preserving contact tracing, based on Bluetooth and decentralization, that could further eliminate the need for trust in authority. However, solutions with Bluetooth are currently limited to certain devices and contexts while decentralization adds complexity. The goal of this work is two-fold: we aim to propose a location-based system that is more privacy-preserving than what is currently being adopted by governments around the world, and that is also practical to implement with the immediacy needed to stem a viral outbreak.
△ Less
Submitted 8 April, 2020; v1 submitted 31 March, 2020;
originally announced March 2020.
-
DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning
Authors:
Michiel A. Bakker,
Duy Patrick Tu,
Humberto Riverón Valdés,
Krishna P. Gummadi,
Kush R. Varshney,
Adrian Weller,
Alex Pentland
Abstract:
We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the age…
▽ More
We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the agent decides dynamically to either collect more information from the set of available features or to stop and predict using the information that is currently available. Building on previous work exploring adversarial representation learning, we attain group fairness (demographic parity) by rewarding the agent with the adversary's loss, computed over the final feature set. Importantly, however, the framework provides a more general starting point for fair or private dynamic information discovery. Finally, we demonstrate empirically, using two real-world datasets, that we can trade-off fairness and predictive performance
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Sherlock: A Deep Learning Approach to Semantic Data Type Detection
Authors:
Madelon Hulsebos,
Kevin Hu,
Michiel Bakker,
Emanuel Zgraggen,
Arvind Satyanarayan,
Tim Kraska,
Çağatay Demiralp,
César Hidalgo
Abstract:
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number o…
▽ More
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository
Authors:
Kevin Hu,
Neil Gaikwad,
Michiel Bakker,
Madelon Hulsebos,
Emanuel Zgraggen,
César Hidalgo,
Tim Kraska,
Guoliang Li,
Arvind Satyanarayan,
Çağatay Demiralp
Abstract:
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data rep…
▽ More
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNet's utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
Active Fairness in Algorithmic Decision Making
Authors:
Alejandro Noriega-Campero,
Michiel A. Bakker,
Bernardo Garcia-Bulle,
Alex Pentland
Abstract:
Society increasingly relies on machine learning models for automated decision making. Yet, efficiency gains from automation have come paired with concern for algorithmic discrimination that can systematize inequality. Recent work has proposed optimal post-processing methods that randomize classification decisions for a fraction of individuals, in order to achieve fairness measures related to parit…
▽ More
Society increasingly relies on machine learning models for automated decision making. Yet, efficiency gains from automation have come paired with concern for algorithmic discrimination that can systematize inequality. Recent work has proposed optimal post-processing methods that randomize classification decisions for a fraction of individuals, in order to achieve fairness measures related to parity in errors and calibration. These methods, however, have raised concern due to the information inefficiency, intra-group unfairness, and Pareto sub-optimality they entail. The present work proposes an alternative active framework for fair classification, where, in deployment, a decision-maker adaptively acquires information according to the needs of different groups or individuals, towards balancing disparities in classification performance. We propose two such methods, where information collection is adapted to group- and individual-level needs respectively. We show on real-world datasets that these can achieve: 1) calibration and single error parity (e.g., equal opportunity); and 2) parity in both false positive and false negative rates (i.e., equal odds). Moreover, we show that by leveraging their additional degree of freedom, active approaches can substantially outperform randomization-based classifiers previously considered optimal, while avoiding limitations such as intra-group unfairness.
△ Less
Submitted 7 November, 2018; v1 submitted 28 September, 2018;
originally announced October 2018.
-
VizML: A Machine Learning Approach to Visualization Recommendation
Authors:
Kevin Z. Hu,
Michiel A. Bakker,
Stephen Li,
Tim Kraska,
César A. Hidalgo
Abstract:
Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation th…
▽ More
Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation that learns visualization design choices from a large corpus of datasets and associated visualizations. First, we identify five key design choices made by analysts while creating visualizations, such as selecting a visualization type and choosing to encode a column along the X- or Y-axis. We train models to predict these design choices using one million dataset-visualization pairs collected from a popular online visualization platform. Neural networks predict these design choices with high accuracy compared to baseline models. We report and interpret feature importances from one of these baseline models. To evaluate the generalizability and uncertainty of our approach, we benchmark with a crowdsourced test set, and show that the performance of our model is comparable to human performance when predicting consensus visualization type, and exceeds that of other ML-based systems.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Model-order reduction of biochemical reaction networks
Authors:
Shodhan Rao,
Arjan van der Schaft,
Karen van Eunen,
Barbara M. Bakker,
Bayu Jayawardhana
Abstract:
In this paper we propose a model-order reduction method for chemical reaction networks governed by general enzyme kinetics, including the mass-action and Michaelis-Menten kinetics. The model-order reduction method is based on the Kron reduction of the weighted Laplacian matrix which describes the graph structure of complexes in the chemical reaction network. We apply our method to a yeast glycolys…
▽ More
In this paper we propose a model-order reduction method for chemical reaction networks governed by general enzyme kinetics, including the mass-action and Michaelis-Menten kinetics. The model-order reduction method is based on the Kron reduction of the weighted Laplacian matrix which describes the graph structure of complexes in the chemical reaction network. We apply our method to a yeast glycolysis model, where the simulation result shows that the transient behaviour of a number of key metabolites of the reduced-order model is in good agreement with those of the full-order model.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
Alignment of Microtubule Imagery
Authors:
Feiyang Yu,
Ard Oerlemans,
Erwin M. Bakker
Abstract:
This work discusses preliminary work aimed at simulating and visualizing the growth process of a tiny structure inside the cell---the microtubule. Difficulty of recording the process lies in the fact that the tissue preparation method for electronic microscopes is highly destructive to live cells. Here in this paper, our approach is to take pictures of microtubules at different time slots and then…
▽ More
This work discusses preliminary work aimed at simulating and visualizing the growth process of a tiny structure inside the cell---the microtubule. Difficulty of recording the process lies in the fact that the tissue preparation method for electronic microscopes is highly destructive to live cells. Here in this paper, our approach is to take pictures of microtubules at different time slots and then appropriately combine these images into a coherent video. Experimental results are given on real data.
△ Less
Submitted 30 May, 2011;
originally announced May 2011.
-
SPARK00: A Benchmark Package for the Compiler Evaluation of Irregular/Sparse Codes
Authors:
H. L. A. van der Spek,
E. M. Bakker,
H. A. G. Wijshoff
Abstract:
We propose a set of benchmarks that specifically targets a major cause of performance degradation in high performance computing platforms: irregular access patterns. These benchmarks are meant to be used to asses the performance of optimizing compilers on codes with a varying degree of irregular access. The irregularity caused by the use of pointers and indirection arrays are a major challenge f…
▽ More
We propose a set of benchmarks that specifically targets a major cause of performance degradation in high performance computing platforms: irregular access patterns. These benchmarks are meant to be used to asses the performance of optimizing compilers on codes with a varying degree of irregular access. The irregularity caused by the use of pointers and indirection arrays are a major challenge for optimizing compilers. Codes containing such patterns are notoriously hard to optimize but they have a huge impact on the performance of modern architectures, which are under-utilized when encountering irregular memory accesses. In this paper, a set of benchmarks is described that explicitly measures the performance of kernels containing a variety of different access patterns found in real world applications. By offering a varying degree of complexity, we provide a platform for measuring the effectiveness of transformations. The difference in complexity stems from a difference in traversal patterns, the use of multiple indirections and control flow statements. The kernels used cover a variety of different access patterns, namely pointer traversals, indirection arrays, dynamic loop bounds and run-time dependent if-conditions. The kernels are small enough to be fully understood which makes this benchmark set very suitable for the evaluation of restructuring transformations.
△ Less
Submitted 26 May, 2008;
originally announced May 2008.