Search | arXiv e-print repository

On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH)

Authors: Mostafa Eslami, Maryam Babazadeh

Abstract: This paper introduces a hypothetical hybrid control framework for port-Hamiltonian (p$\mathcal{H}$) systems, employing a dynamic decomposition based on Data-Assisted Control (DAC). The system's evolution is split into two parts with fixed topology: Right-Hand Side (RHS)- an intrinsic Hamiltonian flow handling worst-case parametric uncertainties, and Left-Hand Side (LHS)- a dissipative/input flow a… ▽ More This paper introduces a hypothetical hybrid control framework for port-Hamiltonian (p$\mathcal{H}$) systems, employing a dynamic decomposition based on Data-Assisted Control (DAC). The system's evolution is split into two parts with fixed topology: Right-Hand Side (RHS)- an intrinsic Hamiltonian flow handling worst-case parametric uncertainties, and Left-Hand Side (LHS)- a dissipative/input flow addressing both structural and parametric uncertainties. A virtual port variable $Π$ serves as the interface between these two components. A nonlinear controller manages the intrinsic Hamiltonian flow, determining a desired port control value $Π_c$. Concurrently, Reinforcement Learning (RL) is applied to the dissipative/input flow to learn an agent for providing optimal policy in mapping $Π_c$ to the actual system input. This hybrid approach effectively manages RHS uncertainties while preserving the system's inherent structure. Key advantages include adjustable performance via LHS controller parameters, enhanced AI explainability and interpretability through the port variable $Π$, the ability to guarantee safety and state attainability with hard/soft constraints, reduced complexity in learning hypothesis classes compared to end-to-end solutions, and improved state/parameter estimation using LHS prior knowledge and system Hamiltonian to address partial observability. The paper details the p$\mathcal{H}$ formulation, derives the decomposition, and presents the modular controller architecture. Beyond design, crucial aspects of stability and robustness analysis and synthesis are investigated, paving the way for deeper theoretical investigations. An application example, a pendulum with nonlinear dynamics, is simulated to demonstrate the approach's empirical and phenomenological benefits for future research. △ Less

Submitted 8 June, 2025; originally announced June 2025.

Comments: This paper presents an early investigation of Data-Assisted Control (DAC) with reinforcement learning, showcasing its potential through a simple example. Theoretical analysis is ongoing to establish formal support and guarantees for the proposed approach

arXiv:2506.05687 [pdf, ps, other]

What Comes After Harm? Mapping Reparative Actions in AI through Justice Frameworks

Authors: Sijia Xiao, Haodi Zou, Alice Qian Zhang, Deepak Kumar, Hong Shen, Jason Hong, Motahhare Eslami

Abstract: As Artificial Intelligence (AI) systems are integrated into more aspects of society, they offer new capabilities but also cause a range of harms that are drawing increasing scrutiny. A large body of work in the Responsible AI community has focused on identifying and auditing these harms. However, much less is understood about what happens after harm occurs: what constitutes reparation, who initiat… ▽ More As Artificial Intelligence (AI) systems are integrated into more aspects of society, they offer new capabilities but also cause a range of harms that are drawing increasing scrutiny. A large body of work in the Responsible AI community has focused on identifying and auditing these harms. However, much less is understood about what happens after harm occurs: what constitutes reparation, who initiates it, and how effective these reparations are. In this paper, we develop a taxonomy of AI harm reparation based on a thematic analysis of real-world incidents. The taxonomy organizes reparative actions into four overarching goals: acknowledging harm, attributing responsibility, providing remedies, and enabling systemic change. We apply this framework to a dataset of 1,060 AI-related incidents, analyzing the prevalence of each action and the distribution of stakeholder involvement. Our findings show that reparation efforts are concentrated in early, symbolic stages, with limited actions toward accountability or structural reform. Drawing on theories of justice, we argue that existing responses fall short of delivering meaningful redress. This work contributes a foundation for advancing more accountable and reparative approaches to Responsible AI. △ Less

Submitted 5 June, 2025; originally announced June 2025.

arXiv:2506.00195 [pdf, ps, other]

Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences

Authors: Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, Maarten Sap

Abstract: Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience. Through a study of 480 participants evaluating 3,840 query-response pairs, we examine how different refusal strategies affect user perceptions across varying motivations. Our findings reveal that response strategy larg… ▽ More Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience. Through a study of 480 participants evaluating 3,840 query-response pairs, we examine how different refusal strategies affect user perceptions across varying motivations. Our findings reveal that response strategy largely shapes user experience, while actual user motivation has negligible impact. Partial compliance -- providing general information without actionable details -- emerges as the optimal strategy, reducing negative user perceptions by over 50% to flat-out refusals. Complementing this, we analyze response patterns of 9 state-of-the-art LLMs and evaluate how 6 reward models score different refusal strategies, demonstrating that models rarely deploy partial compliance naturally and reward models currently undervalue it. This work demonstrates that effective guardrails require focusing on crafting thoughtful refusals rather than detecting intent, offering a path toward AI safety mechanisms that ensure both safety and sustained user engagement. △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2505.04653 [pdf, ps, other]

Advancing Conversational Diagnostic AI with Multimodal Reasoning

Authors: Khaled Saab, Jan Freyberg, Chunjong Park, Tim Strother, Yong Cheng, Wei-Hung Weng, David G. T. Barrett, David Stutz, Nenad Tomasev, Anil Palepu, Valentin Liévin, Yash Sharma, Roma Ruparel, Abdullah Ahmed, Elahe Vedadi, Kimberly Kanada, Cian Hughes, Yun Liu, Geoff Brown, Yang Gao, Sean Li, S. Sara Mahdavi, James Manyika, Katherine Chou, Yossi Matias , et al. (11 additional authors not shown)

Abstract: Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in medical consultation, but the abil… ▽ More Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in medical consultation, but the ability of LLMs to reason over such data while preserving other attributes of competent diagnostic conversation remains unknown. Here we advance the conversational diagnosis and management performance of the Articulate Medical Intelligence Explorer (AMIE) through a new capability to gather and interpret multimodal data, and reason about this precisely during consultations. Leveraging Gemini 2.0 Flash, our system implements a state-aware dialogue framework, where conversation flow is dynamically controlled by intermediate model outputs reflecting patient states and evolving diagnoses. Follow-up questions are strategically directed by uncertainty in such patient states, leading to a more structured multimodal history-taking process that emulates experienced clinicians. We compared AMIE to primary care physicians (PCPs) in a randomized, blinded, OSCE-style study of chat-based consultations with patient actors. We constructed 105 evaluation scenarios using artifacts like smartphone skin photos, ECGs, and PDFs of clinical documents across diverse conditions and demographics. Our rubric assessed multimodal capabilities and other clinically meaningful axes like history-taking, diagnostic accuracy, management reasoning, communication, and empathy. Specialist evaluation showed AMIE to be superior to PCPs on 7/9 multimodal and 29/32 non-multimodal axes (including diagnostic accuracy). The results show clear progress in multimodal conversational diagnostic AI, but real-world translation needs further research. △ Less

Submitted 6 May, 2025; originally announced May 2025.

arXiv:2503.19252 [pdf, other]

MIRAGE: Multi-model Interface for Reviewing and Auditing Generative Text-to-Image AI

Authors: Matheus Kunzler Maldaner, Wesley Hanwen Deng, Jason Hong, Ken Holstein, Motahhare Eslami

Abstract: While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and usability in different applications. Recent years have seen growing interest in engaging diverse AI users in auditing generative AI that might impact their lives. To this end, we propose MIRAGE as a web-based tool where AI users can compare outputs… ▽ More While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and usability in different applications. Recent years have seen growing interest in engaging diverse AI users in auditing generative AI that might impact their lives. To this end, we propose MIRAGE as a web-based tool where AI users can compare outputs from multiple AI text-to-image (T2I) models by auditing AI-generated images, and report their findings in a structured way. We used MIRAGE to conduct a preliminary user study with five participants and found that MIRAGE users could leverage their own lived experiences and identities to surface previously unnoticed details around harmful biases when reviewing multiple T2I models' outputs compared to reviewing only one. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: 4 pages, 3 figures. Presented at the AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2024. Available at https://www.humancomputation.com/assets/wip_2024/HCOMP_24_WIP_4.pdf

Journal ref: AAAI Conference on Human Computation and Crowdsourcing (HCOMP), Demo Track, 2024

arXiv:2503.11113 [pdf, other]

doi 10.1145/3706599.3719757

Vipera: Towards systematic auditing of generative text-to-image models at scale

Authors: Yanwei Huang, Wesley Hanwen Deng, Sijia Xiao, Motahhare Eslami, Jason I. Hong, Adam Perer

Abstract: Generative text-to-image (T2I) models are known for their risks related such as bias, offense, and misinformation. Current AI auditing methods face challenges in scalability and thoroughness, and it is even more challenging to enable auditors to explore the auditing space in a structural and effective way. Vipera employs multiple visual cues including a scene graph to facilitate image collection s… ▽ More Generative text-to-image (T2I) models are known for their risks related such as bias, offense, and misinformation. Current AI auditing methods face challenges in scalability and thoroughness, and it is even more challenging to enable auditors to explore the auditing space in a structural and effective way. Vipera employs multiple visual cues including a scene graph to facilitate image collection sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, it leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. An observational user study demonstrates Vipera's effectiveness in helping auditors organize their analyses while engaging with diverse criteria. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: Accepted to CHI Late-Breaking Work (LBW) 2025

arXiv:2502.18689 [pdf, ps, other]

Emerging Practices in Participatory AI Design in Public Sector Innovation

Authors: Devansh Saxena, Zoe Kahn, Erina Seh-Young Moon, Lauren M. Chambers, Corey Jackson, Min Kyung Lee, Motahhare Eslami, Shion Guha, Sheena Erete, Lilly Irani, Deirdre Mulligan, John Zimmerman

Abstract: Local and federal agencies are rapidly adopting AI systems to augment or automate critical decisions, efficiently use resources, and improve public service delivery. AI systems are being used to support tasks associated with urban planning, security, surveillance, energy and critical infrastructure, and support decisions that directly affect citizens and their ability to access essential services.… ▽ More Local and federal agencies are rapidly adopting AI systems to augment or automate critical decisions, efficiently use resources, and improve public service delivery. AI systems are being used to support tasks associated with urban planning, security, surveillance, energy and critical infrastructure, and support decisions that directly affect citizens and their ability to access essential services. Local governments act as the governance tier closest to citizens and must play a critical role in upholding democratic values and building community trust especially as it relates to smart city initiatives that seek to transform public services through the adoption of AI. Community-centered and participatory approaches have been central for ensuring the appropriate adoption of technology; however, AI innovation introduces new challenges in this context because participatory AI design methods require more robust formulation and face higher standards for implementation in the public sector compared to the private sector. This requires us to reassess traditional methods used in this space as well as develop new resources and methods. This workshop will explore emerging practices in participatory algorithm design - or the use of public participation and community engagement - in the scoping, design, adoption, and implementation of public sector algorithms. △ Less

Submitted 25 February, 2025; originally announced February 2025.

Comments: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25), April 26-May 1, 2025, Yokohama, Japan

arXiv:2502.18576 [pdf, other]

Investigating Youth AI Auditing

Authors: Jaemarie Solyst, Cindy Peng, Wesley Hanwen Deng, Praneetha Pratapa, Jessica Hammer, Amy Ogan, Jason Hong, Motahhare Eslami

Abstract: Youth are active users and stakeholders of artificial intelligence (AI), yet they are often not included in responsible AI (RAI) practices. Emerging efforts in RAI largely focus on adult populations, missing an opportunity to get unique perspectives of youth. This study explores the potential of youth (teens under the age of 18) to engage meaningfully in RAI, specifically through AI auditing. In a… ▽ More Youth are active users and stakeholders of artificial intelligence (AI), yet they are often not included in responsible AI (RAI) practices. Emerging efforts in RAI largely focus on adult populations, missing an opportunity to get unique perspectives of youth. This study explores the potential of youth (teens under the age of 18) to engage meaningfully in RAI, specifically through AI auditing. In a workshop study with 17 teens, we investigated how youth can actively identify problematic behaviors in youth-relevant ubiquitous AI (text-to-image generative AI, autocompletion in search bar, image search) and the impacts of supporting AI auditing with critical AI literacy scaffolding with guided discussion about AI ethics and an auditing tool. We found that youth can contribute quality insights, shaped by their expertise (e.g., hobbies and passions), lived experiences (e.g., social identities), and age-related knowledge (e.g., understanding of fast-moving trends). We discuss how empowering youth in AI auditing can result in more responsible AI, support their learning through doing, and lead to implications for including youth in various participatory RAI processes. △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2501.14886 [pdf, other]

A Systematic Literature Review on Equity and Technology in HCI and Fairness: Navigating the Complexities and Nuances of Equity Research

Authors: Seyun Kim, Yuanchen Bai, Haiyi Zhu, Motahhare Eslami

Abstract: Equity is crucial to the ethical implications in technology development. However, implementing equity in practice comes with complexities and nuances. In response, the research community, especially the human-computer interaction (HCI) and Fairness community, has endeavored to integrate equity into technology design, addressing issues of societal inequities. With such increasing efforts, it is yet… ▽ More Equity is crucial to the ethical implications in technology development. However, implementing equity in practice comes with complexities and nuances. In response, the research community, especially the human-computer interaction (HCI) and Fairness community, has endeavored to integrate equity into technology design, addressing issues of societal inequities. With such increasing efforts, it is yet unclear why and how researchers discuss equity and its integration into technology, what research has been conducted, and what gaps need to be addressed. We conducted a systematic literature review on equity and technology, collecting and analyzing 202 papers published in HCI and Fairness-focused venues. Amidst the substantial growth of relevant publications within the past four years, we deliver three main contributions: (1) we elaborate a comprehensive understanding researchers' motivations for studying equity and technology, (2) we illustrate the different equity definitions and frameworks utilized to discuss equity, (3) we characterize the key themes addressing interventions as well as tensions and trade-offs when advancing and integrating equity to technology. Based on our findings, we elaborate an equity framework for researchers who seek to address existing gaps and advance equity in technology. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.01397 [pdf, other]

WeAudit: Scaffolding User Auditors and AI Practitioners in Auditing Generative AI

Authors: Wesley Hanwen Deng, Wang Claire, Howard Ziyu Han, Jason I. Hong, Kenneth Holstein, Motahhare Eslami

Abstract: There has been growing interest from both practitioners and researchers in engaging end users in AI auditing, to draw upon users' unique knowledge and lived experiences. However, we know little about how to effectively scaffold end users in auditing in ways that can generate actionable insights for AI practitioners. Through formative studies with both users and AI practitioners, we first identifie… ▽ More There has been growing interest from both practitioners and researchers in engaging end users in AI auditing, to draw upon users' unique knowledge and lived experiences. However, we know little about how to effectively scaffold end users in auditing in ways that can generate actionable insights for AI practitioners. Through formative studies with both users and AI practitioners, we first identified a set of design goals to support user-engaged AI auditing. We then developed WeAudit, a workflow and system that supports end users in auditing AI both individually and collectively. We evaluated WeAudit through a three-week user study with user auditors and interviews with industry Generative AI practitioners. Our findings offer insights into how WeAudit supports users in noticing and reflecting upon potential AI harms and in articulating their findings in ways that industry practitioners can act upon. Based on our observations and feedback from both users and practitioners, we identify several opportunities to better support user engagement in AI auditing processes. We discuss implications for future research to support effective and responsible user engagement in AI auditing and red-teaming. △ Less

Submitted 28 April, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

arXiv:2412.00077 [pdf, other]

Selfish Evolution: Making Discoveries in Extreme Label Noise with the Help of Overfitting Dynamics

Authors: Nima Sedaghat, Tanawan Chatchadanoraset, Colin Orion Chandler, Ashish Mahabal, Maryam Eslami

Abstract: Motivated by the scarcity of proper labels in an astrophysical application, we have developed a novel technique, called Selfish Evolution, which allows for the detection and correction of corrupted labels in a weakly supervised fashion. Unlike methods based on early stopping, we let the model train on the noisy dataset. Only then do we intervene and allow the model to overfit to individual samples… ▽ More Motivated by the scarcity of proper labels in an astrophysical application, we have developed a novel technique, called Selfish Evolution, which allows for the detection and correction of corrupted labels in a weakly supervised fashion. Unlike methods based on early stopping, we let the model train on the noisy dataset. Only then do we intervene and allow the model to overfit to individual samples. The ``evolution'' of the model during this process reveals patterns with enough information about the noisiness of the label, as well as its correct version. We train a secondary network on these spatiotemporal ``evolution cubes'' to correct potentially corrupted labels. We incorporate the technique in a closed-loop fashion, allowing for automatic convergence towards a mostly clean dataset, without presumptions about the state of the network in which we intervene. We evaluate on the main task of the Supernova-hunting dataset but also demonstrate efficiency on the more standard MNIST dataset. △ Less

Submitted 26 November, 2024; originally announced December 2024.

arXiv:2411.04994 [pdf, other]

doi 10.1145/3715275.3732049

Legacy Procurement Practices Shape How U.S. Cities Govern AI: Understanding Government Employees' Practices, Challenges, and Needs

Authors: Nari Johnson, Elise Silva, Harrison Leon, Motahhare Eslami, Beth Schwanke, Ravit Dotan, Hoda Heidari

Abstract: Most AI tools adopted by governments are not developed internally, but instead are acquired from third-party vendors in a process called public procurement. In this paper, we conduct the first empirical study of how United States cities' procurement practices shape critical decisions surrounding public sector AI. We conduct semi-structured interviews with 19 city employees who oversee AI procureme… ▽ More Most AI tools adopted by governments are not developed internally, but instead are acquired from third-party vendors in a process called public procurement. In this paper, we conduct the first empirical study of how United States cities' procurement practices shape critical decisions surrounding public sector AI. We conduct semi-structured interviews with 19 city employees who oversee AI procurement across 7 U.S. cities. We found that cities' legacy procurement practices, which are shaped by decades-old laws and norms, establish infrastructure that determines which AI is purchased, and which actors hold decision-making power over procured AI. We characterize the emerging actions cities have taken to adapt their purchasing practices to address algorithmic harms. From employees' reflections on real-world AI procurements, we identify three key challenges that motivate but are not fully addressed by existing AI procurement reform initiatives. Based on these findings, we discuss implications and opportunities for the FAccT community to support cities in foreseeing and preventing AI harms throughout the public procurement processes. △ Less

Submitted 12 May, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

Comments: 10 pages, 2 column format. In proceedings of ACM FAccT 2025

arXiv:2411.00869 [pdf, other]

doi 10.1109/URTC65039.2024.10937616

Federated Learning for Diabetic Retinopathy Diagnosis: Enhancing Accuracy and Generalizability in Under-Resourced Regions

Authors: Gajan Mohan Raj, Michael G. Morley, Mohammad Eslami

Abstract: Diabetic retinopathy is the leading cause of vision loss in working-age adults worldwide, yet under-resourced regions lack ophthalmologists. Current state-of-the-art deep learning systems struggle at these institutions due to limited generalizability. This paper explores a novel federated learning system for diabetic retinopathy diagnosis with the EfficientNetB0 architecture to leverage fundus dat… ▽ More Diabetic retinopathy is the leading cause of vision loss in working-age adults worldwide, yet under-resourced regions lack ophthalmologists. Current state-of-the-art deep learning systems struggle at these institutions due to limited generalizability. This paper explores a novel federated learning system for diabetic retinopathy diagnosis with the EfficientNetB0 architecture to leverage fundus data from multiple institutions to improve diagnostic generalizability at under-resourced hospitals while preserving patient-privacy. The federated model achieved 93.21% accuracy in five-category classification on an unseen dataset and 91.05% on lower-quality images from a simulated under-resourced institution. The model was deployed onto two apps for quick and accurate diagnosis. △ Less

Submitted 30 October, 2024; originally announced November 2024.

arXiv:2409.19430 [pdf, other]

'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants

Authors: Shivani Kapania, William Agnew, Motahhare Eslami, Hoda Heidari, Sarah Fox

Abstract: The recent excitement around generative models has sparked a wave of proposals suggesting the replacement of human participation and labor in research and development--e.g., through surveys, experiments, and interviews--with synthetic research data generated by large language models (LLMs). We conducted interviews with 19 qualitative researchers to understand their perspectives on this paradigm sh… ▽ More The recent excitement around generative models has sparked a wave of proposals suggesting the replacement of human participation and labor in research and development--e.g., through surveys, experiments, and interviews--with synthetic research data generated by large language models (LLMs). We conducted interviews with 19 qualitative researchers to understand their perspectives on this paradigm shift. Initially skeptical, researchers were surprised to see similar narratives emerge in the LLM-generated data when using the interview probe. However, over several conversational turns, they went on to identify fundamental limitations, such as how LLMs foreclose participants' consent and agency, produce responses lacking in palpability and contextual depth, and risk delegitimizing qualitative research methods. We argue that the use of LLMs as proxies for participants enacts the surrogate effect, raising ethical and epistemological concerns that extend beyond the technical limitations of current models to the core of whether LLMs fit within qualitative ways of knowing. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.12000 [pdf, ps, other]

"It Might be Technically Impressive, But It's Practically Useless to us": Motivations, Practices, Challenges, and Opportunities for Cross-Functional Collaboration around AI within the News Industry

Authors: Qing Xiao, Xianzhe Fan, Felix M. Simon, Bingbing Zhang, Motahhare Eslami

Abstract: Recently, an increasing number of news organizations have integrated artificial intelligence (AI) into their workflows, leading to a further influx of AI technologists and data workers into the news industry. This has initiated cross-functional collaborations between these professionals and journalists. Although prior research has explored the impact of AI-related roles entering the news industry,… ▽ More Recently, an increasing number of news organizations have integrated artificial intelligence (AI) into their workflows, leading to a further influx of AI technologists and data workers into the news industry. This has initiated cross-functional collaborations between these professionals and journalists. Although prior research has explored the impact of AI-related roles entering the news industry, there is a lack of studies on how internal cross-functional collaboration around AI unfolds between AI professionals and journalists within the news industry. Through interviews with 17 journalists, six AI technologists, and three AI workers with cross-functional experience from leading Chinese news organizations, we investigate the practices, challenges, and opportunities for internal cross-functional collaboration around AI in news industry. We first study how these journalists and AI professionals perceive existing internal cross-collaboration strategies. We explore the challenges of cross-functional collaboration and provide recommendations for enhancing future cross-functional collaboration around AI in the news industry. △ Less

Submitted 12 February, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

Comments: 19 pages, Accepted by CHI '25

arXiv:2409.00737 [pdf, other]

doi 10.1145/3678884.3681829

Data Collectives as a means to Improve Accountability, Combat Surveillance and Reduce Inequalities

Authors: Jane Hsieh, Angie Zhang, Seyun Kim, Varun Nagaraj Rao, Samantha Dalal, Alexandra Mateescu, Rafael Do Nascimento Grohmann, Motahhare Eslami, Min Kyung Lee, Haiyi Zhu

Abstract: Platform-based laborers face unprecedented challenges and working conditions that result from algorithmic opacity, insufficient data transparency, and unclear policies and regulations. The CSCW and HCI communities increasingly turn to worker data collectives as a means to advance related policy and regulation, hold platforms accountable for data transparency and disclosure, and empower the collect… ▽ More Platform-based laborers face unprecedented challenges and working conditions that result from algorithmic opacity, insufficient data transparency, and unclear policies and regulations. The CSCW and HCI communities increasingly turn to worker data collectives as a means to advance related policy and regulation, hold platforms accountable for data transparency and disclosure, and empower the collective worker voice. However, fundamental questions remain for designing, governing and sustaining such data infrastructures. In this workshop, we leverage frameworks such as data feminism to design sustainable and power-aware data collectives that tackle challenges present in various types of online labor platforms (e.g., ridesharing, freelancing, crowdwork, carework). While data collectives aim to support worker collectives and complement relevant policy initiatives, the goal of this workshop is to encourage their designers to consider topics of governance, privacy, trust, and transparency. In this one-day session, we convene research and advocacy community members to reflect on critical platform work issues (e.g., worker surveillance, discrimination, wage theft, insufficient platform accountability) as well as to collaborate on codesigning data collectives that ethically and equitably address these concerns by supporting working collectivism and informing policy development. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2408.14680 [pdf]

On-Chip Learning with Memristor-Based Neural Networks: Assessing Accuracy and Efficiency Under Device Variations, Conductance Errors, and Input Noise

Authors: M. Reza Eslami, Dhiman Biswas, Soheib Takhtardeshir, Sarah S. Sharif, Yaser M. Banad

Abstract: This paper presents a memristor-based compute-in-memory hardware accelerator for on-chip training and inference, focusing on its accuracy and efficiency against device variations, conductance errors, and input noise. Utilizing realistic SPICE models of commercially available silver-based metal self-directed channel (M-SDC) memristors, the study incorporates inherent device non-idealities into the… ▽ More This paper presents a memristor-based compute-in-memory hardware accelerator for on-chip training and inference, focusing on its accuracy and efficiency against device variations, conductance errors, and input noise. Utilizing realistic SPICE models of commercially available silver-based metal self-directed channel (M-SDC) memristors, the study incorporates inherent device non-idealities into the circuit simulations. The hardware, consisting of 30 memristors and 4 neurons, utilizes three different M-SDC structures with tungsten, chromium, and carbon media to perform binary image classification tasks. An on-chip training algorithm precisely tunes memristor conductance to achieve target weights. Results show that incorporating moderate noise (<15%) during training enhances robustness to device variations and noisy input data, achieving up to 97% accuracy despite conductance variations and input noises. The network tolerates a 10% conductance error without significant accuracy loss. Notably, omitting the initial memristor reset pulse during training considerably reduces training time and energy consumption. The hardware designed with chromium-based memristors exhibits superior performance, achieving a training time of 2.4 seconds and an energy consumption of 18.9 mJ. This research provides insights for developing robust and energy-efficient memristor-based neural networks for on-chip learning in edge applications. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2405.18579 [pdf]

doi 10.1145/3663384.3663407

Public Technologies Transforming Work of the Public and the Public Sector

Authors: Seyun Kim, Bonnie Fan, Willa Yunqi Yang, Jessie Ramey, Sarah E Fox, Haiyi Zhu, John Zimmerman, Motahhare Eslami

Abstract: Technologies adopted by the public sector have transformed the work practices of employees in public agencies by creating different means of communication and decision-making. Although much of the recent research in the future of work domain has concentrated on the effects of technological advancements on public sector employees, the influence on work practices of external stakeholders engaging wi… ▽ More Technologies adopted by the public sector have transformed the work practices of employees in public agencies by creating different means of communication and decision-making. Although much of the recent research in the future of work domain has concentrated on the effects of technological advancements on public sector employees, the influence on work practices of external stakeholders engaging with this sector remains under-explored. In this paper, we focus on a digital platform called OneStop which is deployed by several building departments across the U.S. and aims to integrate various steps and services into a single point of online contact between public sector employees and the public. Drawing on semi-structured interviews with 22 stakeholders, including local business owners, experts involved in the construction process, community representatives, and building department employees, we investigate how this technology transition has impacted the work of these different stakeholders. We observe a multifaceted perspective and experience caused by the adoption of OneStop. OneStop exacerbated inequitable practices for local business owners due to a lack of face-to-face interactions with the department employees. For the public sector employees, OneStop standardized the work practices, representing the building department's priorities and values. Based on our findings, we discuss tensions around standardization, equality, and equity in technology transition, as well as design implications for equitable practices in the public sector. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.03162 [pdf, other]

Advancing Multimodal Medical Capabilities of Gemini

Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.18416 [pdf, other]

Capabilities of Gemini Models in Medicine

Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.14511 [pdf]

Children's Overtrust and Shifting Perspectives of Generative AI

Authors: Jaemarie Solyst, Ellia Yang, Shixian Xie, Jessica Hammer, Amy Ogan, Motahhare Eslami

Abstract: The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to e… ▽ More The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to explore middle school girls' (N = 26) attitudes and reasoning about how genAI works. We focused on girls who are often disproportionately impacted by algorithmic bias. We found that: (1) middle school girls were initially overtrusting of genAI, (2) deliberate exposure to the limitations and mistakes of generative AI shifted this overtrust to disillusionment about genAI capabilities, though they were still optimistic for future possibilities of genAI, and (3) their ideas about school policy were nuanced. This work informs how children think about genAI like ChatGPT and its integration in learning settings. △ Less

Submitted 29 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Journal ref: Proceedings of the 18th International Scoeity of the Learning Sciences (ICLS) 2024

arXiv:2404.13802 [pdf, other]

doi 10.1145/3630106.3658910

The Fall of an Algorithm: Characterizing the Dynamics Toward Abandonment

Authors: Nari Johnson, Sanika Moharana, Christina N. Harrington, Nazanin Andalibi, Hoda Heidari, Motahhare Eslami

Abstract: As more algorithmic systems have come under scrutiny for their potential to inflict societal harms, an increasing number of organizations that hold power over harmful algorithms have chosen (or were required under the law) to abandon them. While social movements and calls to abandon harmful algorithms have emerged across application domains, little academic attention has been paid to studying aban… ▽ More As more algorithmic systems have come under scrutiny for their potential to inflict societal harms, an increasing number of organizations that hold power over harmful algorithms have chosen (or were required under the law) to abandon them. While social movements and calls to abandon harmful algorithms have emerged across application domains, little academic attention has been paid to studying abandonment as a means to mitigate algorithmic harms. In this paper, we take a first step towards conceptualizing "algorithm abandonment" as an organization's decision to stop designing, developing, or using an algorithmic system due to its (potential) harms. We conduct a thematic analysis of real-world cases of algorithm abandonment to characterize the dynamics leading to this outcome. Our analysis of 40 cases reveals that campaigns to abandon an algorithm follow a common process of six iterative phases: discovery, diagnosis, dissemination, dialogue, decision, and death, which we term the "6 D's of abandonment". In addition, we highlight key factors that facilitate (or prohibit) abandonment, which include characteristics of both the technical and social systems that the algorithm is embedded within. We discuss implications for several stakeholders, including proprietors and technologists who have the power to influence an algorithm's (dis)continued use, FAccT researchers, and policymakers. △ Less

Submitted 12 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 column format. In proceedings of ACM FAccT 2024

Journal ref: ACM Conference on Fairness, Accountability, and Transparency 2024

arXiv:2404.10897 [pdf]

The Future of Research on Social Technologies: CCC Workshop Visioning Report

Authors: Motahhare Eslami, Eric Gilbert, Sarita Schoenebeck, Eric P. S. Baumer, Eshwar Chandrasekharan, Michelle De Mooy, Karrie Karahalios, David Karger, Tressie McMillan Cottom, Andrés Monroy-Hernández, Loren Terveen, John Wihbey

Abstract: Social technologies are the systems, interfaces, features, infrastructures, and architectures that allow people to interact with each other online. These technologies dramatically shape the fabric of our everyday lives, from the information we consume to the people we interact with to the foundations of our culture and politics. While the benefits of social technologies are well documented, the ha… ▽ More Social technologies are the systems, interfaces, features, infrastructures, and architectures that allow people to interact with each other online. These technologies dramatically shape the fabric of our everyday lives, from the information we consume to the people we interact with to the foundations of our culture and politics. While the benefits of social technologies are well documented, the harms, too, have cast a long shadow. To address widespread problems like harassment, disinformation, information access, and mental health concerns, we need to rethink the foundations of how social technologies are designed, sustained, and governed. This report is based on discussions at the Computing Community Consortium Workshop, The Future of Research on Social Technologies, that was held November 2-3, 2023 in Washington, DC. The visioning workshop came together to focus on two questions. What should we know about social technologies, and what is needed to get there? The workshop brought together over 50 information and computer scientists, social scientists, communication and journalism scholars, and policy experts. We used a discussion format, with one day of guiding topics and a second day using an unconference model where participants created discussion topics. The interdisciplinary group of attendees discussed gaps in existing scholarship and the methods, resources, access, and collective effort needed to address those gaps. We also discussed approaches for translating scholarship for various audiences including citizens, funders, educators, industry professionals, and policymakers. This report presents a synthesis of major themes during our discussions. The themes presented are not a summary of what we know already, they are an exploration of what we do not know enough about, and what we should spend more effort and investment on in the coming years. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.07150 [pdf]

Breaking Political Filter Bubbles via Social Comparison

Authors: Nouran Soliman, Motahhare Eslami, Karrie Karahalios

Abstract: Online social platforms allow users to filter out content they do not like. According to selective exposure theory, people tend to view content they agree with more to get more self-assurance. This causes people to live in ideological filter bubbles. We report on a user study that encourages users to break the political filter bubble of their Twitter feed by reading more diverse viewpoints through… ▽ More Online social platforms allow users to filter out content they do not like. According to selective exposure theory, people tend to view content they agree with more to get more self-assurance. This causes people to live in ideological filter bubbles. We report on a user study that encourages users to break the political filter bubble of their Twitter feed by reading more diverse viewpoints through social comparison. The user study is conducted using political-bias analyzing and Twitter-mirroring tools to compare the political slant of what a user reads and what other Twitter users read about a topic, and in general. The results show that social comparison can have a great impact on users' reading behavior by motivating them to read viewpoints from the opposing political party. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: * Both of the first two authors contributed equally to this work

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.12162 [pdf, other]

doi 10.1109/TC.2024.3449082

SCARF: Securing Chips with a Robust Framework against Fabrication-time Hardware Trojans

Authors: Mohammad Eslami, Tara Ghasempouri, Samuel Pagliarini

Abstract: The globalization of the semiconductor industry has introduced security challenges to Integrated Circuits (ICs), particularly those related to the threat of Hardware Trojans (HTs) - malicious logic that can be introduced during IC fabrication. While significant efforts are directed towards verifying the correctness and reliability of ICs, their security is often overlooked. In this paper, we propo… ▽ More The globalization of the semiconductor industry has introduced security challenges to Integrated Circuits (ICs), particularly those related to the threat of Hardware Trojans (HTs) - malicious logic that can be introduced during IC fabrication. While significant efforts are directed towards verifying the correctness and reliability of ICs, their security is often overlooked. In this paper, we propose a comprehensive approach to enhance IC security from the front-end to back-end stages of design. Initially, we outline a systematic method to transform existing verification assets into potent security checkers by repurposing verification assertions. To further improve security, we introduce an innovative technique for integrating online monitors during physical synthesis - a back-end insertion providing an additional layer of defense. Experimental results demonstrate a significant increase in security, measured by our introduced metric, Security Coverage (SC), with a marginal rise in area and power consumption, typically under 20%. The insertion of online monitors during physical synthesis enhances security metrics by up to 33.5%. This holistic approach offers a comprehensive and resilient defense mechanism across the entire spectrum of IC design. △ Less

Submitted 28 August, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2312.11497 [pdf, other]

The Public Algorithms Survey in Allegheny County

Authors: Yu-Ru Lin, Beth Schwanke, Rosta Farzan, Bonnie Fan, Motahhare Eslami, Hong Shen, Sarah Fox

Abstract: This survey study focuses on public opinion regarding the use of algorithmic decision-making in government sectors, specifically in Allegheny County, Pennsylvania. Algorithms are becoming increasingly prevalent in various public domains, including both routine and high-stakes government functions. Despite their growing use, public sentiment remains divided, with concerns about privacy and accuracy… ▽ More This survey study focuses on public opinion regarding the use of algorithmic decision-making in government sectors, specifically in Allegheny County, Pennsylvania. Algorithms are becoming increasingly prevalent in various public domains, including both routine and high-stakes government functions. Despite their growing use, public sentiment remains divided, with concerns about privacy and accuracy juxtaposed against perceptions of fairness when compared to human decision-making. In April 2021, a survey was conducted among nearly 1,500 county residents to explore their awareness, experiences, and attitudes towards these algorithms. The study highlights diverse viewpoints influenced by factors such as race, age, education, gender, income, and urban or suburban living. The results demonstrate the complexity of public sentiment towards algorithmic governance and emphasize the need for a nuanced understanding and approach in policy and implementation. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2310.03559 [pdf, other]

MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images

Authors: Yanwu Xu, Li Sun, Wei Peng, Shuyue Jia, Katelyn Morrison, Adam Perer, Afrooz Zandifar, Shyam Visweswaran, Motahhare Eslami, Kayhan Batmanghelich

Abstract: This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by pr… ▽ More This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. The results of comparative assessments indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines, airways, and vascular structures. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks. △ Less

Submitted 15 October, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2308.06201 [pdf, other]

SALSy: Security-Aware Layout Synthesis

Authors: Mohammad Eslami, Tiago Perez, Samuel Pagliarini

Abstract: Integrated Circuits (ICs) are the target of diverse attacks during their lifetime. Fabrication-time attacks, such as the insertion of Hardware Trojans, can give an adversary access to privileged data and/or the means to corrupt the IC's internal computation. Post-fabrication attacks, where the end-user takes a malicious role, also attempt to obtain privileged information through means such as faul… ▽ More Integrated Circuits (ICs) are the target of diverse attacks during their lifetime. Fabrication-time attacks, such as the insertion of Hardware Trojans, can give an adversary access to privileged data and/or the means to corrupt the IC's internal computation. Post-fabrication attacks, where the end-user takes a malicious role, also attempt to obtain privileged information through means such as fault injection and probing. Taking these threats into account and at the same time, this paper proposes a methodology for Security-Aware Layout Synthesis (SALSy), such that ICs can be designed with security in mind in the same manner as power-performance-area (PPA) metrics are considered today, a concept known as security closure. Furthermore, the trade-offs between PPA and security are considered and a chip is fabricated in a 65nm CMOS commercial technology for validation purposes - a feature not seen in previous research on security closure. Measurements on the fabricated ICs indicate that SALSy promotes a modest increase in power in order to achieve significantly improved security metrics. △ Less

Submitted 21 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2306.06542 [pdf, ps, other]

Investigating Practices and Opportunities for Cross-functional Collaboration around AI Fairness in Industry Practice

Authors: Wesley Hanwen Deng, Nur Yildirim, Monica Chang, Motahhare Eslami, Ken Holstein, Michael Madaio

Abstract: An emerging body of research indicates that ineffective cross-functional collaboration -- the interdisciplinary work done by industry practitioners across roles -- represents a major barrier to addressing issues of fairness in AI design and development. In this research, we sought to better understand practitioners' current practices and tactics to enact cross-functional collaboration for AI fairn… ▽ More An emerging body of research indicates that ineffective cross-functional collaboration -- the interdisciplinary work done by industry practitioners across roles -- represents a major barrier to addressing issues of fairness in AI design and development. In this research, we sought to better understand practitioners' current practices and tactics to enact cross-functional collaboration for AI fairness, in order to identify opportunities to support more effective collaboration. We conducted a series of interviews and design workshops with 23 industry practitioners spanning various roles from 17 companies. We found that practitioners engaged in bridging work to overcome frictions in understanding, contextualization, and evaluation around AI fairness across roles. In addition, in organizational contexts with a lack of resources and incentives for fairness work, practitioners often piggybacked on existing requirements (e.g., for privacy assessments) and AI development norms (e.g., the use of quantitative evaluation metrics), although they worry that these tactics may be fundamentally compromised. Finally, we draw attention to the invisible labor that practitioners take on as part of this bridging and piggybacking work to enact interdisciplinary collaboration for fairness. We close by discussing opportunities for both FAccT researchers and AI practitioners to better support cross-functional collaboration for fairness in the design and development of AI systems. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23)

arXiv:2304.02134 [pdf]

doi 10.1145/3544548.3582074

Participation and Division of Labor in User-Driven Algorithm Audits: How Do Everyday Users Work together to Surface Algorithmic Harms?

Authors: Rena Li, Sara Kingsley, Chelsea Fan, Proteeti Sinha, Nora Wai, Jaimie Lee, Hong Shen, Motahhare Eslami, Jason Hong

Abstract: Recent years have witnessed an interesting phenomenon in which users come together to interrogate potentially harmful algorithmic behaviors they encounter in their everyday lives. Researchers have started to develop theoretical and empirical understandings of these user driven audits, with a hope to harness the power of users in detecting harmful machine behaviors. However, little is known about u… ▽ More Recent years have witnessed an interesting phenomenon in which users come together to interrogate potentially harmful algorithmic behaviors they encounter in their everyday lives. Researchers have started to develop theoretical and empirical understandings of these user driven audits, with a hope to harness the power of users in detecting harmful machine behaviors. However, little is known about user participation and their division of labor in these audits, which are essential to support these collective efforts in the future. Through collecting and analyzing 17,984 tweets from four recent cases of user driven audits, we shed light on patterns of user participation and engagement, especially with the top contributors in each case. We also identified the various roles user generated content played in these audits, including hypothesizing, data collection, amplification, contextualization, and escalation. We discuss implications for designing tools to support user driven audits and users who labor to raise awareness of algorithm bias. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2304.00167 [pdf, other]

Towards "Anytime, Anywhere" Community Learning and Engagement around the Design of Public Sector AI

Authors: Wesley Hanwen Deng, Motahhare Eslami, Kenneth Holstein

Abstract: Data-driven algorithmic and AI systems are increasingly being deployed to automate or augment decision processes across a wide range of public service settings. Yet community members are often unaware of the presence, operation, and impacts of these systems on their lives. With the shift towards algorithmic decision-making in public services, technology developers increasingly assume the role of d… ▽ More Data-driven algorithmic and AI systems are increasingly being deployed to automate or augment decision processes across a wide range of public service settings. Yet community members are often unaware of the presence, operation, and impacts of these systems on their lives. With the shift towards algorithmic decision-making in public services, technology developers increasingly assume the role of de-facto policymakers, and opportunities for democratic participation are foreclosed. In this position paper, we articulate an early vision around the design of ubiquitous infrastructure for public learning and engagement around civic AI technologies. Building on this vision, we provide a list of questions that we hope can prompt stimulating conversations among the HCI community. △ Less

Submitted 21 April, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

Journal ref: AI Literacy: Finding Common Threads between Education, Design, Policy, and Explainability Workshop at CHI 2023

arXiv:2302.13947 [pdf]

Investigating Girls' Perspectives and Knowledge Gaps on Ethics and Fairness in Artificial Intelligence in a Lightweight Workshop

Authors: Jaemarie Solyst, Alexis Axon, Angela E. B. Stewart, Motahhare Eslami, Amy Ogan

Abstract: Artificial intelligence (AI) is everywhere, with many children having increased exposure to AI technologies in daily life. We aimed to understand middle school girls' (a group often excluded group in tech) perceptions and knowledge gaps about AI. We created and explored the feasibility of a lightweight (less than 3 hours) educational workshop in which learners considered challenges in their lives… ▽ More Artificial intelligence (AI) is everywhere, with many children having increased exposure to AI technologies in daily life. We aimed to understand middle school girls' (a group often excluded group in tech) perceptions and knowledge gaps about AI. We created and explored the feasibility of a lightweight (less than 3 hours) educational workshop in which learners considered challenges in their lives and communities and critically considered how existing and future AI could have an impact. After the workshop, learners had nuanced perceptions of AI, understanding AI can both help and harm. We discuss design implications for creating educational experiences in AI and fairness that embolden learners. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 8 pages, 2 figures (a table and a graphic with two parts)

Journal ref: Proceedings of the 16th International Society of the Learning Sciences (ICLS) 2022, pages 807-814

arXiv:2210.06433 [pdf, other]

Self-supervised video pretraining yields robust and more human-aligned visual representations

Authors: Nikhil Parthasarathy, S. M. Ali Eslami, João Carreira, Olivier J. Hénaff

Abstract: Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human pe… ▽ More Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human perception: generalisation across tasks, robustness to perturbations, and consistency with human judgements. To that end we propose a novel procedure for curating videos, and develop a contrastive framework which learns from the complex transformations therein. This simple paradigm for distilling knowledge from videos, called VITO, yields general representations that far outperform prior video pretraining methods on image understanding tasks, and image pretraining methods on video understanding tasks. Moreover, VITO representations are significantly more robust to natural and synthetic deformations than image-, video-, and adversarially-trained ones. Finally, VITO's predictions are strongly aligned with human judgements, surpassing models that were specifically trained for that purpose. Together, these results suggest that video pretraining could be a simple way of learning unified, robust, and human-aligned representations of the visual world. △ Less

Submitted 10 January, 2025; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2210.03709 [pdf, other]

doi 10.1145/3544548.3581026

Understanding Practices, Challenges, and Opportunities for User-Engaged Algorithm Auditing in Industry Practice

Authors: Wesley Hanwen Deng, Bill Boyuan Guo, Alicia DeVrio, Hong Shen, Motahhare Eslami, Kenneth Holstein

Abstract: Recent years have seen growing interest among both researchers and practitioners in user-engaged approaches to algorithm auditing, which directly engage users in detecting problematic behaviors in algorithmic systems. However, we know little about industry practitioners' current practices and challenges around user-engaged auditing, nor what opportunities exist for them to better leverage such app… ▽ More Recent years have seen growing interest among both researchers and practitioners in user-engaged approaches to algorithm auditing, which directly engage users in detecting problematic behaviors in algorithmic systems. However, we know little about industry practitioners' current practices and challenges around user-engaged auditing, nor what opportunities exist for them to better leverage such approaches in practice. To investigate, we conducted a series of interviews and iterative co-design activities with practitioners who employ user-engaged auditing approaches in their work. Our findings reveal several challenges practitioners face in appropriately recruiting and incentivizing user auditors, scaffolding user audits, and deriving actionable insights from user-engaged audit reports. Furthermore, practitioners shared organizational obstacles to user-engaged auditing, surfacing a complex relationship between practitioners and user auditors. Based on these findings, we discuss opportunities for future HCI research to help realize the potential (and the mitigate risks) of user-engaged auditing in industry practice. △ Less

Submitted 21 February, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: 18 pages. In Proceedings of CHI 2023

Journal ref: CHI 2023: ACM Conference on Human Factors in Computing Systems. April 23-28, 2023, Hamburg, Germany

arXiv:2205.04599 [pdf, other]

Affective Medical Estimation and Decision Making via Visualized Learning and Deep Learning

Authors: Mohammad Eslami, Solale Tabarestani, Ehsan Adeli, Glyn Elwyn, Tobias Elze, Mengyu Wang, Nazlee Zebardast, Nassir Navab, Malek Adjouadi

Abstract: With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer… ▽ More With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer to as Visualized Learning for Machine Learning (VL4ML) that not only can serve to assist physicians and clinicians in making reasoned medical decisions, but it also allows to appreciate the uncertainty visualization, which could raise incertitude in making the appropriate classification or prediction. For the proof of concept, and to demonstrate the generalized nature of this visualized estimation approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. A survey analysis with more than 100 individuals is also conducted to assess users' feedback on this visualized estimation method. The experiments and the survey demonstrate the practical merits of the VL4ML that include: (1) appreciating visually clinical/medical estimations; (2) getting closer to the patients' preferences; (3) improving doctor-patient communication, and (4) visualizing the uncertainty introduced through the black box effect of the deployed ML algorithm. All the source codes are shared via a GitHub repository. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.02411 [pdf]

doi 10.1109/IKT54664.2021.9685175

A Novel Service Deployment Policy in Fog Computing Considering The Degree of Availability and Fog Landscape Utilization Using Multiobjective Evolutionary Algorithms

Authors: Maryam Eslami, Mehdi Sakhaei

Abstract: Fog computing is a promising paradigm for real-time and mission-critical Internet of Things (IoT) applications. Regarding the high distribution, heterogeneity, and limitation of fog resources, applications should be placed in a distributed manner to fully utilize these resources. In this paper, we propose a linear formulation for assuring the different availability requirements of application serv… ▽ More Fog computing is a promising paradigm for real-time and mission-critical Internet of Things (IoT) applications. Regarding the high distribution, heterogeneity, and limitation of fog resources, applications should be placed in a distributed manner to fully utilize these resources. In this paper, we propose a linear formulation for assuring the different availability requirements of application services while maximizing the utilization of fog resources. We also compare three multiobjective evolutionary algorithms, namely MOPSO, NSGA-II, and MOEA/D for a trade-off between the mentioned optimization goals. The evaluation results in the iFogSim simulator demonstrate the efficiency of all three algorithms and a generally better behavior of MOPSO algorithm in terms of obtained objective values, application deadline satisfaction, and execution time. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2201.12204 [pdf, other]

From data to functa: Your data point is a function and you can treat it like one

Authors: Emilien Dupont, Hyunjik Kim, S. M. Ali Eslami, Danilo Rezende, Dan Rosenbaum

Abstract: It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output t… ▽ More It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? In this context we refer to the data as functa, and propose a framework for deep learning on functa. This view presents a number of challenges around efficient conversion from data to functa, compact representation of functa, and effectively solving downstream tasks on functa. We outline a recipe to overcome these challenges and apply it to a wide range of data modalities including images, 3D shapes, neural radiance fields (NeRF) and data on manifolds. We demonstrate that this approach has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification. Code: https://github.com/deepmind/functa △ Less

Submitted 10 November, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.01130 [pdf, other]

doi 10.1109/ISQED54688.2022.9806292

Reusing Verification Assertions as Security Checkers for Hardware Trojan Detection

Authors: Mohammad Eslami, Tara Ghasempouri, Samuel Pagliarini

Abstract: Globalization in the semiconductor industry enables fabless design houses to reduce their costs, save time, and make use of newer technologies. However, the offshoring of Integrated Circuit (IC) fabrication has negative sides, including threats such as Hardware Trojans (HTs) - a type of malicious logic that is not trivial to detect. One aspect of IC design that is not affected by globalization is… ▽ More Globalization in the semiconductor industry enables fabless design houses to reduce their costs, save time, and make use of newer technologies. However, the offshoring of Integrated Circuit (IC) fabrication has negative sides, including threats such as Hardware Trojans (HTs) - a type of malicious logic that is not trivial to detect. One aspect of IC design that is not affected by globalization is the need for thorough verification. Verification engineers devise complex assets to make sure designs are bug-free, including assertions. This knowledge is typically not reused once verification is over. The premise of this paper is that verification assets that already exist can be turned into effective security checkers for HT detection. For this purpose, we show how assertions can be used as online monitors. To this end, we propose a security metric and an assertion selection flow that leverages Cadence JasperGold Security Path Verification (SPV). The experimental results show that our approach scales for industry-size circuits by analyzing more than 100 assertions for different Intellectual Properties (IPs) of the OpenTitan System-on-Chip (SoC). Moreover, our detection solution is pragmatic since it does not rely on the HT activation mechanism. △ Less

Submitted 30 January, 2023; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: 6 pages, 6 figures

Journal ref: 2022 23rd International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 2022, pp. 1-6

arXiv:2109.10777 [pdf, other]

Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images

Authors: Farzin Soleymani, Mohammad Eslami, Tobias Elze, Bernd Bischl, Mina Rezaei

Abstract: We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivari… ▽ More We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications. The source code of the method and all the experiments are available publicly at: https://github.com/csfarzin/DVC △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2109.05232

arXiv:2106.14108 [pdf, other]

Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs

Authors: Dan Rosenbaum, Marta Garnelo, Michal Zielinski, Charlie Beattie, Ellen Clancy, Andrea Huber, Pushmeet Kohli, Andrew W. Senior, John Jumper, Carl Doersch, S. M. Ali Eslami, Olaf Ronneberger, Jonas Adler

Abstract: Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to… ▽ More Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to play a key role in protein function. Recent methods for capturing conformational heterogeneity in cryo-EM data model it in volume space, making recovery of continuous atomic structures challenging. Here we present a fully deep-learning-based approach using variational auto-encoders (VAEs) to recover a continuous distribution of atomic protein structures and poses directly from picked particle images and demonstrate its efficacy on realistic simulated data. We hope that methods built on this work will allow incorporation of stronger prior information about protein structure and enable better understanding of non-rigid protein structures. △ Less

Submitted 26 June, 2021; originally announced June 2021.

arXiv:2106.13884 [pdf, other]

Multimodal Few-Shot Learning with Frozen Language Models

Authors: Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill

Abstract: When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each im… ▽ More When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks. △ Less

Submitted 3 July, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2106.07683 [pdf, other]

Extracting Global Dynamics of Loss Landscape in Deep Learning Models

Authors: Mohammed Eslami, Hamed Eramian, Marcio Gameiro, William Kalies, Konstantin Mischaikow

Abstract: Deep learning models evolve through training to learn the manifold in which the data exists to satisfy an objective. It is well known that evolution leads to different final states which produce inconsistent predictions of the same test data points. This calls for techniques to be able to empirically quantify the difference in the trajectories and highlight problematic regions. While much focus is… ▽ More Deep learning models evolve through training to learn the manifold in which the data exists to satisfy an objective. It is well known that evolution leads to different final states which produce inconsistent predictions of the same test data points. This calls for techniques to be able to empirically quantify the difference in the trajectories and highlight problematic regions. While much focus is placed on discovering what models learn, the question of how a model learns is less studied beyond theoretical landscape characterizations and local geometric approximations near optimal conditions. Here, we present a toolkit for the Dynamical Organization Of Deep Learning Loss Landscapes, or DOODL3. DOODL3 formulates the training of neural networks as a dynamical system, analyzes the learning process, and presents an interpretable global view of trajectories in the loss landscape. Our approach uses the coarseness of topology to capture the granularity of geometry to mitigate against states of instability or elongated training. Overall, our analysis presents an empirical framework to extract the global dynamics of a model and to use that information to guide the training of neural networks. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: 9 pages, 3 figures, Supplementary

arXiv:2105.14986 [pdf]

Feasibility Assessment of Multitasking in MRI Neuroimaging Analysis: Tissue Segmentation, Cross-Modality Conversion and Bias correction

Authors: Mohammad Eslami, Solale Tabarestani, Malek Adjouadi

Abstract: Neuroimaging is essential in brain studies for the diagnosis and identification of disease, structure, and function of the brain in its healthy and disease states. Literature shows that there are advantages of multitasking with some deep learning (DL) schemes in challenging neuroimaging applications. This study examines the feasibility of using multitasking in three different applications, includi… ▽ More Neuroimaging is essential in brain studies for the diagnosis and identification of disease, structure, and function of the brain in its healthy and disease states. Literature shows that there are advantages of multitasking with some deep learning (DL) schemes in challenging neuroimaging applications. This study examines the feasibility of using multitasking in three different applications, including tissue segmentation, cross-modality conversion, and bias-field correction. These applications reflect five different scenarios in which multitasking is explored and 280 training and testing sessions conducted for empirical evaluations. Two well-known networks, U-Net as a well-known convolutional neural network architecture, and a closed architecture based on the conditional generative adversarial network are implemented. Different metrics such as the normalized cross-correlation coefficient and Dice scores are used for comparison of methods and results of the different experiments. Statistical analysis is also provided by paired t-test. The present study explores the pros and cons of these methods and their practical impacts on multitasking in different implementation scenarios. This investigation shows that bias correction and cross-modality conversion applications are significantly easier than the segmentation application, and having multitasking with segmentation is not reasonable if one of them is identified as the main target application. However, when the main application is the segmentation of tissues, multitasking with cross-modality conversion is beneficial, especially for the U-net architecture. △ Less

Submitted 31 May, 2021; originally announced May 2021.

arXiv:2105.12196 [pdf, other]

From Motor Control to Team Play in Simulated Humanoid Football

Authors: Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents… ▽ More Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg. △ Less

Submitted 25 May, 2021; originally announced May 2021.

arXiv:2105.02980 [pdf, other]

doi 10.1145/3479577

Everyday algorithm auditing: Understanding the power of everyday users in surfacing harmful algorithmic behaviors

Authors: Hong Shen, Alicia DeVos, Motahhare Eslami, Kenneth Holstein

Abstract: A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems dete… ▽ More A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users' knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems. △ Less

Submitted 24 August, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: To appear in CSCW 2021. The co-first authors and co-senior authors each contributed equally to this work

arXiv:2105.00162 [pdf, other]

Generative Art Using Neural Visual Grammars and Dual Encoders

Authors: Chrisantha Fernando, S. M. Ali Eslami, Jean-Baptiste Alayrac, Piotr Mirowski, Dylan Banarse, Simon Osindero

Abstract: Whilst there are perhaps only a few scientific methods, there seem to be almost as many artistic methods as there are artists. Artistic processes appear to inhabit the highest order of open-endedness. To begin to understand some of the processes of art making it is helpful to try to automate them even partially. In this paper, a novel algorithm for producing generative art is described which allow… ▽ More Whilst there are perhaps only a few scientific methods, there seem to be almost as many artistic methods as there are artists. Artistic processes appear to inhabit the highest order of open-endedness. To begin to understand some of the processes of art making it is helpful to try to automate them even partially. In this paper, a novel algorithm for producing generative art is described which allows a user to input a text string, and which in a creative response to this string, outputs an image which interprets that string. It does so by evolving images using a hierarchical neural Lindenmeyer system, and evaluating these images along the way using an image text dual encoder trained on billions of images and their associated text from the internet. In doing so we have access to and control over an instance of an artistic process, allowing analysis of which aspects of the artistic process become the task of the algorithm, and which elements remain the responsibility of the artist. △ Less

Submitted 3 May, 2021; v1 submitted 1 May, 2021; originally announced May 2021.

arXiv:2101.04230 [pdf, other]

Explaining the Black-box Smoothly- A Counterfactual Approach

Authors: Sumedha Singla, Motahhare Eslami, Brian Pollack, Stephen Wallace, Kayhan Batmanghelich

Abstract: We propose a BlackBox Counterfactual Explainer, designed to explain image classification models for medical applications. Classical approaches (e.g., saliency maps) that assess feature importance do not explain "how" imaging features in important anatomical regions are relevant to the classification decision. Our framework explains the decision for a target class by gradually "exaggerating" the se… ▽ More We propose a BlackBox Counterfactual Explainer, designed to explain image classification models for medical applications. Classical approaches (e.g., saliency maps) that assess feature importance do not explain "how" imaging features in important anatomical regions are relevant to the classification decision. Our framework explains the decision for a target class by gradually "exaggerating" the semantic effect of the class in a query image. We adopted a Generative Adversarial Network (GAN) to generate a progressive set of perturbations to a query image, such that the classification decision changes from its original class to its negation. We used counterfactual explanations from our framework to audit a classifier trained on a chest x-ray dataset with multiple labels. We proposed clinically-relevant quantitative metrics such as cardiothoracic ratio and the score of a healthy costophrenic recess to evaluate our explanations. We conducted a human-grounded experiment with diagnostic radiology residents to compare different styles of explanations (no explanation, saliency map, cycleGAN explanation, and our counterfactual explanation) by evaluating different aspects of explanations: (1) understandability, (2) classifier's decision justification, (3) visual quality, (d) identity preservation, and (5) overall helpfulness of an explanation to the users. Our results show that our counterfactual explanation was the only explanation method that significantly improved the users' understanding of the classifier's decision compared to the no-explanation baseline. Our metrics established a benchmark for evaluating model explanation methods in medical images. Our explanations revealed that the classifier relied on clinically relevant radiographic features for its diagnostic decisions, thus making its decision-making process more transparent to the end-user. △ Less

Submitted 18 November, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: Preprint Accepted in Medical image Analysis journal

arXiv:2011.09192 [pdf, other]

Game Plan: What AI can do for Football, and What Football can do for AI

Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual). △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2007.05566 [pdf, other]

Contrastive Training for Improved Out-of-Distribution Detection

Authors: Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger

Abstract: Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to coll… ▽ More Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to collect in practice. We show in extensive experiments that contrastive training significantly helps OOD detection performance on a number of common benchmarks. By introducing and employing the Confusion Log Probability (CLP) score, which quantifies the difficulty of the OOD detection task by capturing the similarity of inlier and outlier datasets, we show that our method especially improves performance in the `near OOD' classes -- a particularly challenging setting for previous methods. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Showing 1–50 of 80 results for author: Eslami, M