-
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Authors:
Wonwoong Cho,
Yan-Ying Chen,
Matthew Klenk,
David I. Inouye,
Yanxia Zhang
Abstract:
Text-to-Image (T2I) Diffusion Models have achieved remarkable performance in generating high quality images. However, enabling precise control of continuous attributes, especially multiple attributes simultaneously, in a new domain (e.g., numeric values like eye openness or car width) with text-only guidance remains a significant challenge. To address this, we introduce the Attribute (Att) Adapter…
▽ More
Text-to-Image (T2I) Diffusion Models have achieved remarkable performance in generating high quality images. However, enabling precise control of continuous attributes, especially multiple attributes simultaneously, in a new domain (e.g., numeric values like eye openness or car width) with text-only guidance remains a significant challenge. To address this, we introduce the Attribute (Att) Adapter, a novel plug-and-play module designed to enable fine-grained, multi-attributes control in pretrained diffusion models. Our approach learns a single control adapter from a set of sample images that can be unpaired and contain multiple visual attributes. The Att-Adapter leverages the decoupled cross attention module to naturally harmonize the multiple domain attributes with text conditioning. We further introduce Conditional Variational Autoencoder (CVAE) to the Att-Adapter to mitigate overfitting, matching the diverse nature of the visual world. Evaluations on two public datasets show that Att-Adapter outperforms all LoRA-based baselines in controlling continuous attributes. Additionally, our method enables a broader control range and also improves disentanglement across multiple attributes, surpassing StyleGAN-based techniques. Notably, Att-Adapter is flexible, requiring no paired synthetic data for training, and is easily scalable to multiple attributes within a single model.
△ Less
Submitted 1 April, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
ConjointNet: Enhancing Conjoint Analysis for Preference Prediction with Representation Learning
Authors:
Yanxia Zhang,
Francine Chen,
Shabnam Hakimi,
Totte Harinen,
Alex Filipowicz,
Yan-Ying Chen,
Rumen Iliev,
Nikos Arechiga,
Kalani Murakami,
Kent Lyons,
Charlene Wu,
Matt Klenk
Abstract:
Understanding consumer preferences is essential to product design and predicting market response to these new products. Choice-based conjoint analysis is widely used to model user preferences using their choices in surveys. However, traditional conjoint estimation techniques assume simple linear models. This assumption may lead to limited predictability and inaccurate estimation of product attribu…
▽ More
Understanding consumer preferences is essential to product design and predicting market response to these new products. Choice-based conjoint analysis is widely used to model user preferences using their choices in surveys. However, traditional conjoint estimation techniques assume simple linear models. This assumption may lead to limited predictability and inaccurate estimation of product attribute contributions, especially on data that has underlying non-linear relationships. In this work, we employ representation learning to efficiently alleviate this issue. We propose ConjointNet, which is composed of two novel neural architectures, to predict user preferences. We demonstrate that the proposed ConjointNet models outperform traditional conjoint estimate techniques on two preference datasets by over 5%, and offer insights into non-linear feature interactions.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Ethics of generative AI and manipulation: a design-oriented research agenda
Authors:
Michael Klenk
Abstract:
Generative AI enables automated, effective manipulation at scale. Despite the growing general ethical discussion around generative AI, the specific manipulation risks remain inadequately investigated. This article outlines essential inquiries encompassing conceptual, empirical, and design dimensions of manipulation, pivotal for comprehending and curbing manipulation risks. By highlighting these qu…
▽ More
Generative AI enables automated, effective manipulation at scale. Despite the growing general ethical discussion around generative AI, the specific manipulation risks remain inadequately investigated. This article outlines essential inquiries encompassing conceptual, empirical, and design dimensions of manipulation, pivotal for comprehending and curbing manipulation risks. By highlighting these questions, the article underscores the necessity of an appropriate conceptualisation of manipulation to ensure the responsible development of Generative AI technologies.
△ Less
Submitted 1 February, 2025;
originally announced March 2025.
-
Stylish and Functional: Guided Interpolation Subject to Physical Constraints
Authors:
Yan-Ying Chen,
Nikos Arechiga,
Chenyang Yuan,
Matthew Hong,
Matt Klenk,
Charlene Wu
Abstract:
Generative AI is revolutionizing engineering design practices by enabling rapid prototyping and manipulation of designs. One example of design manipulation involves taking two reference design images and using them as prompts to generate a design image that combines aspects of both. Real engineering designs have physical constraints and functional requirements in addition to aesthetic design consi…
▽ More
Generative AI is revolutionizing engineering design practices by enabling rapid prototyping and manipulation of designs. One example of design manipulation involves taking two reference design images and using them as prompts to generate a design image that combines aspects of both. Real engineering designs have physical constraints and functional requirements in addition to aesthetic design considerations. Internet-scale foundation models commonly used for image generation, however, are unable to take these physical constraints and functional requirements into consideration as part of the generation process. We consider the problem of generating a design inspired by two input designs, and propose a zero-shot framework toward enforcing physical, functional requirements over the generation process by leveraging a pretrained diffusion model as the backbone. As a case study, we consider the example of rotational symmetry in generation of wheel designs. Automotive wheels are required to be rotationally symmetric for physical stability. We formulate the requirement of rotational symmetry by the use of a symmetrizer, and we use this symmetrizer to guide the diffusion process towards symmetric wheel generations. Our experimental results find that the proposed approach makes generated interpolations with higher realism than methods in related work, as evaluated by Fréchet inception distance (FID). We also find that our approach generates designs that more closely satisfy physical and functional requirements than generating without the symmetry guidance.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis
Authors:
Rui Zhou,
Yanxia Zhang,
Chenyang Yuan,
Frank Permenter,
Nikos Arechiga,
Matt Klenk,
Faez Ahmed
Abstract:
This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. Firstly, it handles both partial and complete parametric inputs using a diffusion mod…
▽ More
This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. Firstly, it handles both partial and complete parametric inputs using a diffusion model that acts as a design autocomplete co-pilot, coupled with a parametric encoder to process the information. Secondly, the model utilizes assembly graphs to systematically assemble input component images, which are then processed through a component encoder to capture essential visual data. Thirdly, textual descriptions are integrated via CLIP encoding, ensuring a comprehensive interpretation of design intent. These diverse inputs are synthesized through a multimodal fusion technique, creating a joint embedding that acts as the input to a module inspired by ControlNet. This integration allows the model to apply robust multimodal control to foundation models, facilitating the generation of complex and precise engineering designs. This approach broadens the capabilities of AI-driven design tools and demonstrates significant advancements in precise control based on diverse data modalities for enhanced design generation.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Understanding the Cognitive Complexity in Language Elicited by Product Images
Authors:
Yan-Ying Chen,
Shabnam Hakimi,
Monica Van,
Francine Chen,
Matthew Hong,
Matt Klenk,
Charlene Wu
Abstract:
Product images (e.g., a phone) can be used to elicit a diverse set of consumer-reported features expressed through language, including surface-level perceptual attributes (e.g., "white") and more complex ones, like perceived utility (e.g., "battery"). The cognitive complexity of elicited language reveals the nature of cognitive processes and the context required to understand them; cognitive compl…
▽ More
Product images (e.g., a phone) can be used to elicit a diverse set of consumer-reported features expressed through language, including surface-level perceptual attributes (e.g., "white") and more complex ones, like perceived utility (e.g., "battery"). The cognitive complexity of elicited language reveals the nature of cognitive processes and the context required to understand them; cognitive complexity also predicts consumers' subsequent choices. This work offers an approach for measuring and validating the cognitive complexity of human language elicited by product images, providing a tool for understanding the cognitive processes of human as well as virtual respondents simulated by Large Language Models (LLMs). We also introduce a large dataset that includes diverse descriptive labels for product images, including human-rated complexity. We demonstrate that human-rated cognitive complexity can be approximated using a set of natural language models that, combined, roughly capture the complexity construct. Moreover, this approach is minimally supervised and scalable, even in use cases with limited human assessment of complexity.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models
Authors:
Rui Zhou,
Chenyang Yuan,
Frank Permenter,
Yanxia Zhang,
Nikos Arechiga,
Matt Klenk,
Faez Ahmed
Abstract:
This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs. This model functions as an AI design co-pilot, providing multiple design options for incomplete designs, which we demonstrate using the bicycle design CAD dataset. Through comparative evaluations, we demonstrate that our…
▽ More
This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs. This model functions as an AI design co-pilot, providing multiple design options for incomplete designs, which we demonstrate using the bicycle design CAD dataset. Through comparative evaluations, we demonstrate that our model significantly outperforms existing classical methods, such as MissForest, hotDeck, PPCA, and tabular generative method TabCSDI in both the accuracy and diversity of imputation options. Generative modeling also enables a broader exploration of design possibilities, thereby enhancing design decision-making by allowing engineers to explore a variety of design completions. The graph model combines GNNs with the structural information contained in assembly graphs, enabling the model to understand and predict the complex interdependencies between different design parameters. The graph model helps accurately capture and impute complex parametric interdependencies from an assembly graph, which is key for design problems. By learning from an existing dataset of designs, the imputation capability allows the model to act as an intelligent assistant that autocompletes CAD designs based on user-defined partial parametric design, effectively bridging the gap between ideation and realization. The proposed work provides a pathway to not only facilitate informed design decisions but also promote creative exploration in design.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad Tomašev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Algorithmic Transparency and Manipulation
Authors:
Michael Klenk
Abstract:
A series of recent papers raises worries about the manipulative potential of algorithmic transparency. But while the concern is apt and relevant, it is based on a fraught understanding of manipulation. Therefore, this paper draws attention to the indifference view of manipulation, which explains better than the vulnerability view why algorithmic transparency has manipulative potential. The paper a…
▽ More
A series of recent papers raises worries about the manipulative potential of algorithmic transparency. But while the concern is apt and relevant, it is based on a fraught understanding of manipulation. Therefore, this paper draws attention to the indifference view of manipulation, which explains better than the vulnerability view why algorithmic transparency has manipulative potential. The paper also raises pertinent research questions for future studies of manipulation in the context of algorithmic transparency.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Generative AI for Product Design: Getting the Right Design and the Design Right
Authors:
Matthew K. Hong,
Shabnam Hakimi,
Yan-Ying Chen,
Heishiro Toyoda,
Charlene Wu,
Matt Klenk
Abstract:
Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practi…
▽ More
Generative AI (GenAI) models excel in their ability to recognize patterns in existing data and generate new and unexpected content. Recent advances have motivated applications of GenAI tools (e.g., Stable Diffusion, ChatGPT) to professional practice across industries, including product design. While these generative capabilities may seem enticing on the surface, certain barriers limit their practical application for real-world use in industry settings. In this position paper, we articulate and situate these barriers within two phases of the product design process, namely "getting the right design" and "getting the design right," and propose a research agenda to stimulate discussions around opportunities for realizing the full potential of GenAI tools in product design.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Learning to Operate in Open Worlds by Adapting Planning Models
Authors:
Wiktor Piotrowski,
Roni Stern,
Yoni Sher,
Jacob Le,
Matthew Klenk,
Johan deKleer,
Shiwali Mohan
Abstract:
Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expe…
▽ More
Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Analogical Concept Memory for Architectures Implementing the Common Model of Cognition
Authors:
Shiwali Mohan,
Matthew Klenk
Abstract:
Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. In this paper, we explore how computational models of analogical processing can be brought into these architectures to enable concept acquisition from examples obtained interactively. We propose a new anal…
▽ More
Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. In this paper, we explore how computational models of analogical processing can be brought into these architectures to enable concept acquisition from examples obtained interactively. We propose a new analogical concept memory for Soar that augments its current system of declarative long-term memories. We frame the problem of concept learning as embedded within the larger context of interactive task learning (ITL) and embodied language processing (ELP). We demonstrate that the analogical learning methods implemented in the proposed memory can quickly learn a diverse types of novel concepts that are useful not only in recognition of a concept in the environment but also in action selection. Our approach has been instantiated in an implemented cognitive system AILEEN and evaluated on a simulated robotic domain.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Machine learning reveals how personalized climate communication can both succeed and backfire
Authors:
Totte Harinen,
Alexandre Filipowicz,
Shabnam Hakimi,
Rumen Iliev,
Matthew Klenk,
Emily Sumner
Abstract:
Different advertising messages work for different people. Machine learning can be an effective way to personalise climate communications. In this paper we use machine learning to reanalyse findings from a recent study, showing that online advertisements increased some people's belief in climate change while resulting in decreased belief in others. In particular, we show that the effect of the adve…
▽ More
Different advertising messages work for different people. Machine learning can be an effective way to personalise climate communications. In this paper we use machine learning to reanalyse findings from a recent study, showing that online advertisements increased some people's belief in climate change while resulting in decreased belief in others. In particular, we show that the effect of the advertisements could change depending on people's age and ethnicity.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Playing Angry Birds with a Domain-Independent PDDL+ Planner
Authors:
Wiktor Piotrowski,
Roni Stern,
Matthew Klenk,
Alexandre Perez,
Shiwali Mohan,
Johan de Kleer,
Jacob Le
Abstract:
This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL+, a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL+ planner to generate plans and executes them. In this demo paper, we present the system's PDDL+ model for this domain, identify key design…
▽ More
This demo paper presents the first system for playing the popular Angry Birds game using a domain-independent planner. Our system models Angry Birds levels using PDDL+, a planning language for mixed discrete/continuous domains. It uses a domain-independent PDDL+ planner to generate plans and executes them. In this demo paper, we present the system's PDDL+ model for this domain, identify key design decisions that reduce the problem complexity, and compare the performance of our system to model-specific methods for this domain. The results show that our system's performance is on par with other domain-specific systems for Angry Birds, suggesting the applicability of domain-independent planning to this benchmark AI challenge.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Characterizing an Analogical Concept Memory for Architectures Implementing the Common Model of Cognition
Authors:
Shiwali Mohan,
Matt Klenk,
Matthew Shreve,
Kent Evans,
Aaron Ang,
John Maxwell
Abstract:
Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. In this paper, we explore how computational models of analogical processing can be brought into these architectures to enable concept acquisition from examples obtained interactively. We propose a new anal…
▽ More
Architectures that implement the Common Model of Cognition - Soar, ACT-R, and Sigma - have a prominent place in research on cognitive modeling as well as on designing complex intelligent agents. In this paper, we explore how computational models of analogical processing can be brought into these architectures to enable concept acquisition from examples obtained interactively. We propose a new analogical concept memory for Soar that augments its current system of declarative long-term memories. We frame the problem of concept learning as embedded within the larger context of interactive task learning (ITL) and embodied language processing (ELP). We demonstrate that the analogical learning methods implemented in the proposed memory can quickly learn a diverse types of novel concepts that are useful not only in recognition of a concept in the environment but also in action selection. Our approach has been instantiated in an implemented cognitive system \textsc{Aileen} and evaluated on a simulated robotic domain.
△ Less
Submitted 29 July, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
An Extensible and Personalizable Multi-Modal Trip Planner
Authors:
Xudong Liu,
Christian Fritz,
Matthew Klenk
Abstract:
Despite a tremendous amount of work in the literature and in the commercial sectors, current approaches to multi-modal trip planning still fail to consistently generate plans that users deem optimal in practice. We believe that this is due to the fact that current planners fail to capture the true preferences of users, e.g., their preferences depend on aspects that are not modeled. An example of t…
▽ More
Despite a tremendous amount of work in the literature and in the commercial sectors, current approaches to multi-modal trip planning still fail to consistently generate plans that users deem optimal in practice. We believe that this is due to the fact that current planners fail to capture the true preferences of users, e.g., their preferences depend on aspects that are not modeled. An example of this could be a preference not to walk through an unsafe area at night. We present a novel multi-modal trip planner that allows users to upload auxiliary geographic data (e.g., crime rates) and to specify temporal constraints and preferences over these data in combination with typical metrics such as time and cost. Concretely, our planner supports the modes walking, biking, driving, public transit, and taxi, uses linear temporal logic to capture temporal constraints, and preferential cost functions to represent preferences. We show by examples that this allows the expression of very interesting preferences and constraints that, naturally, lead to quite diverse optimal plans.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Acceptable Planning: Influencing Individual Behavior to Reduce Transportation Energy Expenditure of a City
Authors:
Shiwali Mohan,
Hesham Rakha,
Matthew Klenk
Abstract:
Our research aims at developing intelligent systems to reduce the transportation-related energy expenditure of a large city by influencing individual behavior. We introduce COPTER - an intelligent travel assistant that evaluates multi-modal travel alternatives to find a plan that is acceptable to a person given their context and preferences. We propose a formulation for acceptable planning that br…
▽ More
Our research aims at developing intelligent systems to reduce the transportation-related energy expenditure of a large city by influencing individual behavior. We introduce COPTER - an intelligent travel assistant that evaluates multi-modal travel alternatives to find a plan that is acceptable to a person given their context and preferences. We propose a formulation for acceptable planning that brings together ideas from AI, machine learning, and economics. This formulation has been incorporated in COPTER that produces acceptable plans in real-time. We adopt a novel empirical evaluation framework that combines human decision data with a high fidelity multi-modal transportation simulation to demonstrate a 4\% energy reduction and 20\% delay reduction in a realistic deployment scenario in Los Angeles, California, USA.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.