-
From Stability to Inconsistency: A Study of Moral Preferences in LLMs
Authors:
Monika Jotautaite,
Mary Phuong,
Chatrik Singh Mangat,
Maria Angelica Martinez
Abstract:
As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum o…
▽ More
As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum of LLMs' revealed moral preferences by answering a range of real-world moral dilemmas. Our findings reveal that state-of-the-art models have remarkably homogeneous value preferences, yet demonstrate a lack of consistency.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Analysis of Eccentric Coaxial Waveguides Filled with Lossy Anisotropic Media via Finite Difference
Authors:
Raul O. Ribeiro,
Maria A. Martinez,
Guilherme S. Rosa,
Rafael A. Penchel
Abstract:
This study presents a finite difference method (FDM) to model the electromagnetic field propagation in eccentric coaxial waveguides filled with lossy uniaxially anisotropic media. The formulation utilizes conformal transformation to map the eccentric circular waveguide into an equivalent concentric one. In the concentric problem, we introduce a novel normalized Helmholtz equation to decouple TM an…
▽ More
This study presents a finite difference method (FDM) to model the electromagnetic field propagation in eccentric coaxial waveguides filled with lossy uniaxially anisotropic media. The formulation utilizes conformal transformation to map the eccentric circular waveguide into an equivalent concentric one. In the concentric problem, we introduce a novel normalized Helmholtz equation to decouple TM and TE modes, and we solve this non-homogeneous partial differential equation using the finite difference in cylindrical coordinates. The proposed approach was validated against perturbation-based, spectral element-based, and finite-integration-based numerical solutions. The preliminary results show that our solution is superior in computational time. Furthermore, our FDM formulation can be extended with minimal adaptations to model complex media problems, such as metamaterial devices, optical fibers, and geophysical exploration sensors.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
CELI: Controller-Embedded Language Model Interactions
Authors:
Jan-Samuel Wagner,
Dave DeCaprio,
Abishek Chiffon Muthu Raja,
Jonathan M. Holman,
Lauren K. Brady,
Sky C. Cheung,
Hosein Barzekar,
Eric Yang,
Mark Anthony Martinez II,
David Soong,
Sriram Sridhar,
Han Si,
Brandon W. Higgs,
Hisham Hamadeh,
Scott Ogden
Abstract:
We introduce Controller-Embedded Language Model Interactions (CELI), a framework that integrates control logic directly within language model (LM) prompts, facilitating complex, multi-stage task execution. CELI addresses limitations of existing prompt engineering and workflow optimization techniques by embedding control logic directly within the operational context of language models, enabling dyn…
▽ More
We introduce Controller-Embedded Language Model Interactions (CELI), a framework that integrates control logic directly within language model (LM) prompts, facilitating complex, multi-stage task execution. CELI addresses limitations of existing prompt engineering and workflow optimization techniques by embedding control logic directly within the operational context of language models, enabling dynamic adaptation to evolving task requirements. Our framework transfers control from the traditional programming execution environment to the LMs, allowing them to autonomously manage computational workflows while maintaining seamless interaction with external systems and functions. CELI supports arbitrary function calls with variable arguments, bridging the gap between LMs' adaptive reasoning capabilities and conventional software paradigms' structured control mechanisms. To evaluate CELI's versatility and effectiveness, we conducted case studies in two distinct domains: code generation (HumanEval benchmark) and multi-stage content generation (Wikipedia-style articles). The results demonstrate notable performance improvements across a range of domains. CELI achieved a 4.9 percentage point improvement over the best reported score of the baseline GPT-4 model on the HumanEval code generation benchmark. In multi-stage content generation, 94.4% of CELI-produced Wikipedia-style articles met or exceeded first draft quality when optimally configured, with 44.4% achieving high quality. These outcomes underscore CELI's potential for optimizing AI-driven workflows across diverse computational domains.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Authors:
Leo McKee-Reid,
Christoph Sträter,
Maria Angelica Martinez,
Joe Needham,
Mikita Balesni
Abstract:
Previous work has shown that training "helpful-only" LLMs with reinforcement learning on a curriculum of gameable environments can lead models to generalize to egregious specification gaming, such as editing their own reward function or modifying task checklists to appear more successful. We show that gpt-4o, gpt-4o-mini, o1-preview, and o1-mini - frontier models trained to be helpful, harmless, a…
▽ More
Previous work has shown that training "helpful-only" LLMs with reinforcement learning on a curriculum of gameable environments can lead models to generalize to egregious specification gaming, such as editing their own reward function or modifying task checklists to appear more successful. We show that gpt-4o, gpt-4o-mini, o1-preview, and o1-mini - frontier models trained to be helpful, harmless, and honest - can engage in specification gaming without training on a curriculum of tasks, purely from in-context iterative reflection (which we call in-context reinforcement learning, "ICRL"). We also show that using ICRL to generate highly-rewarded outputs for expert iteration (compared to the standard expert iteration reinforcement learning algorithm) may increase gpt-4o-mini's propensity to learn specification-gaming policies, generalizing (in very rare cases) to the most egregious strategy where gpt-4o-mini edits its own reward function. Our results point toward the strong ability of in-context reflection to discover rare specification-gaming strategies that models might not exhibit zero-shot or with normal training, highlighting the need for caution when relying on alignment of LLMs in zero-shot settings.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
The Advent of Technological Singularity: a Formal Metric
Authors:
Juan A. Lara,
David Lizcano,
María A. Martínez,
Juan Pazos
Abstract:
The Technological Singularity; that is, the possibility of achieving a General Artificial Intelligence (AGI) that surpasses human intelligence, is one of the vital paradigms of today's humanity. However, until now only opinions about its possibility and/or achievement were issued, therefore, in this work, a metric is presented, for the first time, to objectively measure the actual state in which t…
▽ More
The Technological Singularity; that is, the possibility of achieving a General Artificial Intelligence (AGI) that surpasses human intelligence, is one of the vital paradigms of today's humanity. However, until now only opinions about its possibility and/or achievement were issued, therefore, in this work, a metric is presented, for the first time, to objectively measure the actual state in which the advent of technological singularity is found.
△ Less
Submitted 25 June, 2019;
originally announced July 2019.