-
Dynamic-Dark SLAM: RGB-Thermal Cooperative Robot Vision Strategy for Multi-Person Tracking in Both Well-Lit and Low-Light Scenes
Authors:
Tatsuro Sakai,
Kanji Tanaka,
Jonathan Tay Yu Liang,
Muhammad Adil Luqman,
Daiki Iwata
Abstract:
In robot vision, thermal cameras hold great potential for recognizing humans even in complete darkness. However, their application to multi-person tracking (MPT) has been limited due to data scarcity and the inherent difficulty of distinguishing individuals. In this study, we propose a cooperative MPT system that utilizes co-located RGB and thermal cameras, where pseudo-annotations (bounding boxes…
▽ More
In robot vision, thermal cameras hold great potential for recognizing humans even in complete darkness. However, their application to multi-person tracking (MPT) has been limited due to data scarcity and the inherent difficulty of distinguishing individuals. In this study, we propose a cooperative MPT system that utilizes co-located RGB and thermal cameras, where pseudo-annotations (bounding boxes and person IDs) are used to train both RGB and thermal trackers. Evaluation experiments demonstrate that the thermal tracker performs robustly in both bright and dark environments. Moreover, the results suggest that a tracker-switching strategy -- guided by a binary brightness classifier -- is more effective for information integration than a tracker-fusion approach. As an application example, we present an image change pattern recognition (ICPR) method, the ``human-as-landmark,'' which combines two key properties: the thermal recognizability of humans in dark environments and the rich landmark characteristics -- appearance, geometry, and semantics -- of static objects (occluders). Whereas conventional SLAM focuses on mapping static landmarks in well-lit environments, the present study takes a first step toward a new Human-Only SLAM paradigm, ``DD-SLAM,'' which aims to map even dynamic landmarks in complete darkness.
△ Less
Submitted 13 April, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Continual Multi-Robot Learning from Black-Box Visual Place Recognition Models
Authors:
Kenta Tsukahara,
Kanji Tanaka,
Daiki Iwata,
Jonathan Tay Yu Liang
Abstract:
In the context of visual place recognition (VPR), continual learning (CL) techniques offer significant potential for avoiding catastrophic forgetting when learning new places. However, existing CL methods often focus on knowledge transfer from a known model to a new one, overlooking the existence of unknown black-box models. We explore a novel multi-robot CL approach that enables knowledge transfe…
▽ More
In the context of visual place recognition (VPR), continual learning (CL) techniques offer significant potential for avoiding catastrophic forgetting when learning new places. However, existing CL methods often focus on knowledge transfer from a known model to a new one, overlooking the existence of unknown black-box models. We explore a novel multi-robot CL approach that enables knowledge transfer from black-box VPR models (teachers), such as those of local robots encountered by traveler robots (students) in unknown environments. Specifically, we introduce Membership Inference Attack, or MIA, the only major privacy attack applicable to black-box models, and leverage it to reconstruct pseudo training sets, which serve as the key knowledge to be exchanged between robots, from black-box VPR models. Furthermore, we aim to overcome the inherently low sampling efficiency of MIA by leveraging insights on place class prediction distribution and un-learned class detection imported from the VPR literature as a prior distribution. We also analyze both the individual effects of these methods and their combined impact. Experimental results demonstrate that our black-box MIA (BB-MIA) approach is remarkably powerful despite its simplicity, significantly enhancing the VPR capability of lower-performing robots through brief communication with other robots. This study contributes to optimizing knowledge sharing between robots in VPR and enhancing autonomy in open-world environments with multi-robot systems that are fault-tolerant and scalable.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
TableTalk: Scaffolding Spreadsheet Development with a Language Agent
Authors:
Jenny T. Liang,
Aayush Kumar,
Yasharth Bajpai,
Sumit Gulwani,
Vu Le,
Chris Parnin,
Arjun Radhakrishna,
Ashish Tiwari,
Emerson Murphy-Hill,
Guastavo Soares
Abstract:
Despite its ubiquity in the workforce, spreadsheet programming remains challenging as programmers need both spreadsheet-specific knowledge (e.g., APIs to write formulas) and problem-solving skills to create complex spreadsheets. Large language models (LLMs) can help automate aspects of this process, and recent advances in planning and reasoning have enabled language agents, which dynamically plan,…
▽ More
Despite its ubiquity in the workforce, spreadsheet programming remains challenging as programmers need both spreadsheet-specific knowledge (e.g., APIs to write formulas) and problem-solving skills to create complex spreadsheets. Large language models (LLMs) can help automate aspects of this process, and recent advances in planning and reasoning have enabled language agents, which dynamically plan, use tools, and take iterative actions to complete complex tasks. These agents observe, plan, and act, making them well-suited to scaffold spreadsheet programming by following expert processes.
We present TableTalk, a language agent that helps programmers build spreadsheets conversationally. Its design reifies three design principles -- scaffolding, flexibility, and incrementality -- which we derived from two studies of seven programmers and 62 Excel templates. TableTalk structures spreadsheet development by generating step-by-step plans and suggesting three next steps users can choose from. It also integrates tools that enable incremental spreadsheet construction. A user study with 20 programmers shows that TableTalk produces spreadsheets 2.3 times more likely to be preferred over a baseline agent, while reducing cognitive load and time spent reasoning about spreadsheet actions by 12.6%. TableTalk's approach has implications for human-agent collaboration. This includes providing persistent direct manipulation interfaces for stopping or undoing agent actions, while ensuring that such interfaces for accepting actions can be deactivated.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
How Developers Choose Debugging Strategies for Challenging Web Application Defects
Authors:
Maryam Arab,
Jenny T. Liang,
Valentina Hong,
Thomas D. LaToza
Abstract:
Effective debugging is a crucial aspect of software development, demanding problem-solving skills, expertise, and appropriate tools. Although previous research has studied expert developers' debugging strategies, the specific factors influencing strategy choice in complex scenarios remain underexplored. To investigate these contextual factors, we conducted two studies. First, we surveyed 35 develo…
▽ More
Effective debugging is a crucial aspect of software development, demanding problem-solving skills, expertise, and appropriate tools. Although previous research has studied expert developers' debugging strategies, the specific factors influencing strategy choice in complex scenarios remain underexplored. To investigate these contextual factors, we conducted two studies. First, we surveyed 35 developers to identify experiences with challenging debugging problems and contextual complexities. Second, we held semi-structured interviews with 16 experienced developers to gain deeper insight into strategic reasoning for complex debugging tasks. Insights from both groups enriched our understanding of debugging strategies at different expertise levels. We found that contextual factors interact in complex ways, and combinations of factors influence strategy choice, evolving throughout the debugging process. Hypothesis making is the baseline for debugging, with experience and code familiarity crucial for strategy selection. Our results show a gap between learning and effectively practicing strategies in challenging contexts, highlighting the need for carefully designed debugging tools and educational frameworks that align with problem contexts.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts
Authors:
Jenny T. Liang,
Melissa Lin,
Nikitha Rao,
Brad A. Myers
Abstract:
Generative pre-trained models power intelligent software features used by millions of users controlled by developer-written natural language prompts. Despite the impact of prompt-powered software, little is known about its development process and its relationship to programming. In this work, we argue that some prompts are programs and that the development of prompts is a distinct phenomenon in pr…
▽ More
Generative pre-trained models power intelligent software features used by millions of users controlled by developer-written natural language prompts. Despite the impact of prompt-powered software, little is known about its development process and its relationship to programming. In this work, we argue that some prompts are programs and that the development of prompts is a distinct phenomenon in programming known as "prompt programming". We develop an understanding of prompt programming using Straussian grounded theory through interviews with 20 developers engaged in prompt development across a variety of contexts, models, domains, and prompt structures. We contribute 15 observations to form a preliminary understanding of current prompt programming practices. For example, rather than building mental models of code, prompt programmers develop mental models of the foundation model (FM)'s behavior on the prompt by interacting with the FM. While prior research shows that experts have well-formed mental models, we find that prompt programmers who have developed dozens of prompts still struggle to develop reliable mental models. Our observations show that prompt programming differs from traditional software development, motivating the creation of prompt programming tools and providing implications for software engineering stakeholders.
△ Less
Submitted 24 April, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate
Authors:
Jimin Mun,
Cathy Buerger,
Jenny T. Liang,
Joshua Garland,
Maarten Sap
Abstract:
Counterspeech, i.e., direct responses against hate speech, has become an important tool to address the increasing amount of hate online while avoiding censorship. Although AI has been proposed to help scale up counterspeech efforts, this raises questions of how exactly AI could assist in this process, since counterspeech is a deeply empathetic and agentic process for those involved. In this work,…
▽ More
Counterspeech, i.e., direct responses against hate speech, has become an important tool to address the increasing amount of hate online while avoiding censorship. Although AI has been proposed to help scale up counterspeech efforts, this raises questions of how exactly AI could assist in this process, since counterspeech is a deeply empathetic and agentic process for those involved. In this work, we aim to answer this question, by conducting in-depth interviews with 10 extensively experienced counterspeakers and a large scale public survey with 342 everyday social media users. In participant responses, we identified four main types of barriers and AI needs related to resources, training, impact, and personal harms. However, our results also revealed overarching concerns of authenticity, agency, and functionality in using AI tools for counterspeech. To conclude, we discuss considerations for designing AI assistants that lower counterspeaking barriers without jeopardizing its meaning and purpose.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Can GPT-4 Replicate Empirical Software Engineering Research?
Authors:
Jenny T. Liang,
Carmen Badea,
Christian Bird,
Robert DeLine,
Denae Ford,
Nicole Forsgren,
Thomas Zimmermann
Abstract:
Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own…
▽ More
Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help replicate and thus democratize empirical software engineering research.
In this paper, we examine GPT-4's abilities to perform replications of empirical software engineering research on new data. We study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggles to generate ones that apply common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.
△ Less
Submitted 19 June, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Walking = Traversable? : Traversability Prediction via Multiple Human Object Tracking under Occlusion
Authors:
Jonathan Tay Yu Liang,
Kanji Tanaka
Abstract:
The emerging ``Floor plan from human trails (PfH)" technique has great potential for improving indoor robot navigation by predicting the traversability of occluded floors. This study presents an innovative approach that replaces first-person-view sensors with a third-person-view monocular camera mounted on the observer robot. This approach can gather measurements from multiple humans, expanding it…
▽ More
The emerging ``Floor plan from human trails (PfH)" technique has great potential for improving indoor robot navigation by predicting the traversability of occluded floors. This study presents an innovative approach that replaces first-person-view sensors with a third-person-view monocular camera mounted on the observer robot. This approach can gather measurements from multiple humans, expanding its range of applications. The key idea is to use two types of trackers, SLAM and MOT, to monitor stationary objects and moving humans and assess their interactions. This method achieves stable predictions of traversability even in challenging visual scenarios, such as occlusions, nonlinear perspectives, depth uncertainty, and intersections involving multiple humans. Additionally, we extend map quality metrics to apply to traversability maps, facilitating future research. We validate our proposed method through fusion and comparison with established techniques.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
Active Robot Vision for Distant Object Change Detection: A Lightweight Training Simulator Inspired by Multi-Armed Bandits
Authors:
Kouki Terashima,
Kanji Tanaka,
Ryogo Yamamoto,
Jonathan Tay Yu Liang
Abstract:
In ground-view object change detection, the recently emerging mapless navigation has great potential to navigate a robot to objects distantly detected (e.g., books, cups, clothes) and acquire high-resolution object images, to identify their change states (no-change/appear/disappear). However, naively performing full journeys for every distant object requires huge sense/plan/action costs, proportio…
▽ More
In ground-view object change detection, the recently emerging mapless navigation has great potential to navigate a robot to objects distantly detected (e.g., books, cups, clothes) and acquire high-resolution object images, to identify their change states (no-change/appear/disappear). However, naively performing full journeys for every distant object requires huge sense/plan/action costs, proportional to the number of objects and the robot-to-object distance. To address this issue, we explore a new map-based active vision problem in this work: ``Which journey should the robot select next?" However, the feasibility of the active vision framework remains unclear; Since distant objects are only uncertainly recognized, it is unclear whether they can provide sufficient cues for action planning. This work presents an efficient simulator for feasibility testing, to accelerate the early-stage R&D cycles (e.g., prototyping, training, testing, and evaluation). The proposed simulator is designed to identify the degree of difficulty that a robot vision system (sensors/recognizers/planners/actuators) would face when applied to a given environment (workspace/objects). Notably, it requires only one real-world journey experience per distant object to function, making it suitable for an efficient R&D cycle. Another contribution of this work is to present a new lightweight planner inspired by the traditional multi-armed bandit problem. Specifically, we build a lightweight map-based planner on top of the mapless planner, which constitutes a hierarchical action planner. We verified the effectiveness of the proposed framework using a semantically non-trivial scenario ``sofa as bookshelf".
△ Less
Submitted 24 October, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Authors:
Tongshuang Wu,
Haiyi Zhu,
Maya Albayrak,
Alexis Axon,
Amanda Bertsch,
Wenxing Deng,
Ziqi Ding,
Bill Guo,
Sireesh Gururaja,
Tzu-Sheng Kuo,
Jenny T. Liang,
Ryan Liu,
Ihita Mandal,
Jeremiah Milbauer,
Xiaolin Ni,
Namrata Padmanabhan,
Subhashini Ramkumar,
Alexis Sudjianto,
Jordan Taylor,
Ying-Jui Tseng,
Patricia Vaidos,
Zhijin Wu,
Wei Wu,
Chenyang Yang
Abstract:
LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these ``human computation algorithms,'' bu…
▽ More
LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these ``human computation algorithms,'' but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate 1) the relative LLM strengths on different tasks (by cross-comparing their performances on sub-tasks) and 2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.
△ Less
Submitted 8 January, 2025; v1 submitted 19 July, 2023;
originally announced July 2023.
-
NLPositionality: Characterizing Design Biases of Datasets and Models
Authors:
Sebastin Santy,
Jenny T. Liang,
Ronan Le Bras,
Katharina Reinecke,
Maarten Sap
Abstract:
Design biases in NLP systems, such as performance differences for different populations, often stem from their creator's positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases, they are hard to quantify because researcher, system, and dataset positionality is often unobserved. We introduce NLPositionality, a framework f…
▽ More
Design biases in NLP systems, such as performance differences for different populations, often stem from their creator's positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases, they are hard to quantify because researcher, system, and dataset positionality is often unobserved. We introduce NLPositionality, a framework for characterizing design biases and quantifying the positionality of NLP datasets and models. Our framework continuously collects annotations from a diverse pool of volunteer participants on LabintheWild, and statistically quantifies alignment with dataset labels and model predictions. We apply NLPositionality to existing datasets and models for two tasks -- social acceptability and hate speech detection. To date, we have collected 16,299 annotations in over a year for 600 instances from 1,096 annotators across 87 countries. We find that datasets and models align predominantly with Western, White, college-educated, and younger populations. Additionally, certain groups, such as non-binary people and non-native English speakers, are further marginalized by datasets and models as they rank least in alignment across all tasks. Finally, we draw from prior literature to discuss how researchers can examine their own positionality and that of their datasets and models, opening the door for more inclusive NLP systems.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges
Authors:
Jenny T. Liang,
Chenyang Yang,
Brad A. Myers
Abstract:
The software engineering community recently has witnessed widespread deployment of AI programming assistants, such as GitHub Copilot. However, in practice, developers do not accept AI programming assistants' initial suggestions at a high frequency. This leaves a number of open questions related to the usability of these tools. To understand developers' practices while using these tools and the imp…
▽ More
The software engineering community recently has witnessed widespread deployment of AI programming assistants, such as GitHub Copilot. However, in practice, developers do not accept AI programming assistants' initial suggestions at a high frequency. This leaves a number of open questions related to the usability of these tools. To understand developers' practices while using these tools and the important usability challenges they face, we administered a survey to a large population of developers and received responses from a diverse set of 410 developers. Through a mix of qualitative and quantitative analyses, we found that developers are most motivated to use AI programming assistants because they help developers reduce key-strokes, finish programming tasks quickly, and recall syntax, but resonate less with using them to help brainstorm potential solutions. We also found the most important reasons why developers do not use these tools are because these tools do not output code that addresses certain functional or non-functional requirements and because developers have trouble controlling the tool to generate the desired output. Our findings have implications for both creators and users of AI programming assistants, such as designing minimal cognitive effort interactions with these tools to reduce distractions for users while they are programming.
△ Less
Submitted 17 September, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
A Qualitative Study on the Implementation Design Decisions of Developers
Authors:
Jenny T. Liang,
Maryam Arab,
Minhyuk Ko,
Amy J. Ko,
Thomas D. LaToza
Abstract:
Decision-making is a key software engineering skill. Developers constantly make choices throughout the software development process, from requirements to implementation. While prior work has studied developer decision-making, the choices made while choosing what solution to write in code remain understudied. In this mixed-methods study, we examine the phenomenon where developers select one specifi…
▽ More
Decision-making is a key software engineering skill. Developers constantly make choices throughout the software development process, from requirements to implementation. While prior work has studied developer decision-making, the choices made while choosing what solution to write in code remain understudied. In this mixed-methods study, we examine the phenomenon where developers select one specific way to implement a behavior in code, given many potential alternatives. We call these decisions implementation design decisions. Our mixed-methods study includes 46 survey responses and 14 semi-structured interviews with professional developers about their decision types, considerations, processes, and expertise for implementation design decisions. We find that implementation design decisions, rather than being a natural outcome from higher levels of design, require constant monitoring of higher level design choices, such as requirements and architecture. We also show that developers have a consistent general structure to their implementation decision-making process, but no single process is exactly the same. We discuss the implications of our findings on research, education, and practice, including insights on teaching developers how to make implementation design decisions.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Understanding Skills for OSS Communities on GitHub
Authors:
Jenny T. Liang,
Thomas Zimmermann,
Denae Ford
Abstract:
The development of open source software (OSS) is a broad field which requires diverse skill sets. For example, maintainers help lead the project and promote its longevity, technical writers assist with documentation, bug reporters identify defects in software, and developers program the software. However, it is unknown which skills are used in OSS development as well as OSS contributors' general a…
▽ More
The development of open source software (OSS) is a broad field which requires diverse skill sets. For example, maintainers help lead the project and promote its longevity, technical writers assist with documentation, bug reporters identify defects in software, and developers program the software. However, it is unknown which skills are used in OSS development as well as OSS contributors' general attitudes towards skills in OSS. In this paper, we address this gap by administering a survey to a diverse set of 455 OSS contributors. Guided by these responses as well as prior literature on software development expertise and social factors of OSS, we develop a model of skills in OSS that considers the many contexts OSS contributors work in. This model has 45 skills in the following 9 categories: technical skills, working styles, problem solving, contribution types, project-specific skills, interpersonal skills, external relations, management, and characteristics. Through a mix of qualitative and quantitative analyses, we find that OSS contributors are actively motivated to improve skills and perceive many benefits in sharing their skills with others. We then use this analysis to derive a set of design implications and best practices for those who incorporate skills into OSS tools and platforms, such as GitHub.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Towards Mining OSS Skills from GitHub Activity
Authors:
Jenny T. Liang,
Thomas Zimmermann,
Denae Ford
Abstract:
Open source software (OSS) development relies on diverse skill sets. However, to our knowledge, there are no tools which detect OSS-related skills. In this paper, we present a novel method to detect OSS skills and prototype it in a tool called Disko. Our approach relies on identifying relevant signals, which are measurable activities or cues associated with a skill. Our tool detects how contributo…
▽ More
Open source software (OSS) development relies on diverse skill sets. However, to our knowledge, there are no tools which detect OSS-related skills. In this paper, we present a novel method to detect OSS skills and prototype it in a tool called Disko. Our approach relies on identifying relevant signals, which are measurable activities or cues associated with a skill. Our tool detects how contributors 1) teach others to be involved in OSS projects, 2) show commitment towards an OSS project, 3) have knowledge in specific programming languages, and 4) are familiar with OSS practices. We then evaluate the tool by administering a survey to 455 OSS contributors. We demonstrate that Disko yields promising results: it detects the presence of these skills with precision scores between 77% to 97%. We also find that over 54% of participants would display their high-proficiency skills. Our approach can be used to transform existing OSS experiences, such as identifying collaborators, matching mentors to mentees, and assigning project roles. Given the positive results and potential impact of our approach, we outline future research opportunities in interpreting and sharing OSS skills.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.