-
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Authors:
Ken Gu,
Zhihan Zhang,
Kate Lin,
Yuwei Zhang,
Akshay Paruchuri,
Hong Yu,
Mehran Kazemi,
Kumar Ayush,
A. Ali Heydari,
Maxwell A. Xu,
Girish Narayanswamy,
Yun Liu,
Ming-Zher Poh,
Yuzhe Yang,
Mark Malhotra,
Shwetak Patel,
Hamid Palangi,
Xuhai Xu,
Daniel McDuff,
Tim Althoff,
Xin Liu
Abstract:
Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro…
▽ More
Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Abstract Counterfactuals for Language Model Agents
Authors:
Edoardo Pona,
Milad Kazemi,
Yali Du,
David Watson,
Nicola Paoletti
Abstract:
Counterfactual inference is a powerful tool for analysing and evaluating autonomous agents, but its application to language model (LM) agents remains challenging. Existing work on counterfactuals in LMs has primarily focused on token-level counterfactuals, which are often inadequate for LM agents due to their open-ended action spaces. Unlike traditional agents with fixed, clearly defined action sp…
▽ More
Counterfactual inference is a powerful tool for analysing and evaluating autonomous agents, but its application to language model (LM) agents remains challenging. Existing work on counterfactuals in LMs has primarily focused on token-level counterfactuals, which are often inadequate for LM agents due to their open-ended action spaces. Unlike traditional agents with fixed, clearly defined action spaces, the actions of LM agents are often implicit in the strings they output, making their action spaces difficult to define and interpret. Furthermore, the meanings of individual tokens can shift depending on the context, adding complexity to token-level reasoning and sometimes leading to biased or meaningless counterfactuals. We introduce \emph{Abstract Counterfactuals}, a framework that emphasises high-level characteristics of actions and interactions within an environment, enabling counterfactual reasoning tailored to user-relevant features. Our experiments demonstrate that the approach produces consistent and meaningful counterfactuals while minimising the undesired side effects of token-level methods. We conduct experiments on text-based games and counterfactual text generation, while considering both token-level and latent-space interventions.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives
Authors:
Milad Kazemi,
Mateo Perez,
Fabio Somenzi,
Sadegh Soudjani,
Ashutosh Trivedi,
Alvaro Velasquez
Abstract:
Recent advances in reinforcement learning (RL) have renewed focus on the design of reward functions that shape agent behavior. Manually designing reward functions is tedious and error-prone. A principled alternative is to specify behaviors in a formal language that can be automatically translated into rewards. Omega-regular languages are a natural choice for this purpose, given their established r…
▽ More
Recent advances in reinforcement learning (RL) have renewed focus on the design of reward functions that shape agent behavior. Manually designing reward functions is tedious and error-prone. A principled alternative is to specify behaviors in a formal language that can be automatically translated into rewards. Omega-regular languages are a natural choice for this purpose, given their established role in formal verification and synthesis. However, existing methods using omega-regular specifications typically rely on discounted reward RL in episodic settings, with periodic resets. This setup misaligns with the semantics of omega-regular specifications, which describe properties over infinite behavior traces. In such cases, the average reward criterion and the continuing setting -- where the agent interacts with the environment over a single, uninterrupted lifetime -- are more appropriate.
To address the challenges of infinite-horizon, continuing tasks, we focus on absolute liveness specifications -- a subclass of omega-regular languages that cannot be violated by any finite behavior prefix, making them well-suited to the continuing setting. We present the first model-free RL framework that translates absolute liveness specifications to average-reward objectives. Our approach enables learning in communicating MDPs without episodic resetting. We also introduce a reward structure for lexicographic multi-objective optimization, aiming to maximize an external average-reward objective among the policies that also maximize the satisfaction probability of a given omega-regular specification. Our method guarantees convergence in unknown communicating MDPs and supports on-the-fly reductions that do not require full knowledge of the environment, thus enabling model-free RL. Empirical results show our average-reward approach in continuing setting outperforms discount-based methods across benchmarks.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
SimICD: A Closed-Loop Simulation Framework For ICD Therapy
Authors:
Hannah Lydon,
Milad Kazemi,
Martin Bishop,
Nicola Paoletti
Abstract:
Virtual studies of ICD behaviour are crucial for testing device functionality in a controlled environment prior to clinical application. Although previous works have shown the viability of using in silico testing for diagnosis, there is a notable gap in available models that can simulate therapy progression decisions during arrhythmic episodes. This work introduces SimICD, a simulation tool which…
▽ More
Virtual studies of ICD behaviour are crucial for testing device functionality in a controlled environment prior to clinical application. Although previous works have shown the viability of using in silico testing for diagnosis, there is a notable gap in available models that can simulate therapy progression decisions during arrhythmic episodes. This work introduces SimICD, a simulation tool which combines virtual ICD logic algorithms with cardiac electrophysiology simulations in a feedback loop, allowing the progression of ICD therapy protocols to be simulated for a range of tachy-arrhythmia episodes. Using a cohort of virtual patients, we demonstrate the ability of SimICD to simulate realistic cardiac signals and ICD responses that align with the logic of real-world devices, facilitating the reprogramming of ICD parameters to adapt to specific episodes.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
A Fully Asynchronous Unsourced Random Access Scheme
Authors:
Mert Ozates,
Mohammad Kazemi,
Gianluigi Liva,
Deniz Gündüz
Abstract:
We investigate fully asynchronous unsourced random access (URA), and propose a high-performing scheme that employs on-off division multiple access (ODMA). In this scheme, active users distribute their data over the transmit block based on a sparse transmission pattern without any limitations on the starting time. At the receiver side, we adopt a double sliding-window decoding approach, utilizing a…
▽ More
We investigate fully asynchronous unsourced random access (URA), and propose a high-performing scheme that employs on-off division multiple access (ODMA). In this scheme, active users distribute their data over the transmit block based on a sparse transmission pattern without any limitations on the starting time. At the receiver side, we adopt a double sliding-window decoding approach, utilizing a smaller inner decoding window of two block lengths within a larger outer window to enhance the interference cancellation process. Within the inner window, the receiver iteratively applies preamble-free joint starting time and pattern detection, single-user decoding, and successive interference cancellation operations. A notable feature of the proposed scheme is its elimination of the need for a preamble for starting time detection; this is achieved using ODMA transmission patterns. Numerical results demonstrate that the proposed asynchronous URA scheme outperforms existing alternatives.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Gemma 3 Technical Report
Authors:
Gemma Team,
Aishwarya Kamath,
Johan Ferret,
Shreya Pathak,
Nino Vieillard,
Ramona Merhej,
Sarah Perrin,
Tatiana Matejovicova,
Alexandre Ramé,
Morgane Rivière,
Louis Rouillard,
Thomas Mesnard,
Geoffrey Cideron,
Jean-bastien Grill,
Sabela Ramos,
Edouard Yvinec,
Michelle Casbon,
Etienne Pot,
Ivo Penchev,
Gaël Liu,
Francesco Visin,
Kathleen Kenealy,
Lucas Beyer,
Xiaohai Zhai,
Anton Tsitsulin
, et al. (191 additional authors not shown)
Abstract:
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie…
▽ More
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Characterization of Deletion/Substitution Channel Capacity for Small Deletion and Substitution Probabilities
Authors:
Mohammad Kazemi,
Tolga M. Duman
Abstract:
We consider binary input deletion/substitution channels, which model certain channels with synchronization errors encountered in practice. Specifically, we focus on the regime of small deletion and substitution probabilities, and by extending an approach developed for the deletion-only channel, we obtain an asymptotic characterization of the channel capacity for independent and identically distrib…
▽ More
We consider binary input deletion/substitution channels, which model certain channels with synchronization errors encountered in practice. Specifically, we focus on the regime of small deletion and substitution probabilities, and by extending an approach developed for the deletion-only channel, we obtain an asymptotic characterization of the channel capacity for independent and identically distributed deletion/substitution channels. We first present an upper bound on the capacity for arbitrary but fixed numbers of deletions and substitutions, and then we extend the result to the case of random deletions and substitutions. Our final result is as follows: The i.i.d. deletion/substitution channel capacity is approximately $1 - H(p_d) - H(p_s)$, for $p_d, p_s \approx0$, where $p_d$ is the deletion probability, and $p_s$ is the substitution probability.
△ Less
Submitted 23 April, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
BIG-Bench Extra Hard
Authors:
Mehran Kazemi,
Bahare Fatemi,
Hritik Bansal,
John Palowitch,
Chrysovalantis Anastasiou,
Sanket Vaibhav Mehta,
Lalit K. Jain,
Virginia Aglietti,
Disha Jindal,
Peter Chen,
Nishanth Dikkala,
Gladys Tyen,
Xin Liu,
Uri Shalit,
Silvia Chiappa,
Kate Olszewska,
Yi Tay,
Vinh Q. Tran,
Quoc V. Le,
Orhan Firat
Abstract:
Large language models (LLMs) are increasingly deployed in everyday applications, demanding robust general reasoning capabilities and diverse reasoning skillset. However, current LLM reasoning benchmarks predominantly focus on mathematical and coding abilities, leaving a gap in evaluating broader reasoning proficiencies. One particular exception is the BIG-Bench dataset, which has served as a cruci…
▽ More
Large language models (LLMs) are increasingly deployed in everyday applications, demanding robust general reasoning capabilities and diverse reasoning skillset. However, current LLM reasoning benchmarks predominantly focus on mathematical and coding abilities, leaving a gap in evaluating broader reasoning proficiencies. One particular exception is the BIG-Bench dataset, which has served as a crucial benchmark for evaluating the general reasoning capabilities of LLMs, thanks to its diverse set of challenging tasks that allowed for a comprehensive assessment of general reasoning across various skills within a unified framework. However, recent advances in LLMs have led to saturation on BIG-Bench, and its harder version BIG-Bench Hard (BBH). State-of-the-art models achieve near-perfect scores on many tasks in BBH, thus diminishing its utility. To address this limitation, we introduce BIG-Bench Extra Hard (BBEH), a new benchmark designed to push the boundaries of LLM reasoning evaluation. BBEH replaces each task in BBH with a novel task that probes a similar reasoning capability but exhibits significantly increased difficulty. We evaluate various models on BBEH and observe a (harmonic) average accuracy of 9.8\% for the best general-purpose model and 44.8\% for the best reasoning-specialized model, indicating substantial room for improvement and highlighting the ongoing challenge of achieving robust general reasoning in LLMs. We release BBEH publicly at: https://github.com/google-deepmind/bbeh.
△ Less
Submitted 6 May, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Robust Counterfactual Inference in Markov Decision Processes
Authors:
Jessica Lally,
Milad Kazemi,
Nicola Paoletti
Abstract:
This paper addresses a key limitation in existing counterfactual inference methods for Markov Decision Processes (MDPs). Current approaches assume a specific causal model to make counterfactuals identifiable. However, there are usually many causal models that align with the observational and interventional distributions of an MDP, each yielding different counterfactual distributions, so fixing a p…
▽ More
This paper addresses a key limitation in existing counterfactual inference methods for Markov Decision Processes (MDPs). Current approaches assume a specific causal model to make counterfactuals identifiable. However, there are usually many causal models that align with the observational and interventional distributions of an MDP, each yielding different counterfactual distributions, so fixing a particular causal model limits the validity (and usefulness) of counterfactual inference. We propose a novel non-parametric approach that computes tight bounds on counterfactual transition probabilities across all compatible causal models. Unlike previous methods that require solving prohibitively large optimisation problems (with variables that grow exponentially in the size of the MDP), our approach provides closed-form expressions for these bounds, making computation highly efficient and scalable for non-trivial MDPs. Once such an interval counterfactual MDP is constructed, our method identifies robust counterfactual policies that optimise the worst-case reward w.r.t. the uncertain interval MDP probabilities. We evaluate our method on various case studies, demonstrating improved robustness over existing methods.
△ Less
Submitted 27 March, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Investigation of intrinsic nonlinear effects in driven-dissipative optomechanical systems using the generalized linear response theory
Authors:
B. Askari,
A. Dalafi,
M. J. Kazemi
Abstract:
In this article, we study the effects of intrinsic nonlinear optomechanical interaction on the linear response of a driven-dissipative optomechanical system to a weak time-dependent perturbation. By calculating the linear response of the cavity optical mode to a weak probe laser in the framework of the generalized linear response theory, it is shown how the Stokes and anti-Stokes sideband amplitud…
▽ More
In this article, we study the effects of intrinsic nonlinear optomechanical interaction on the linear response of a driven-dissipative optomechanical system to a weak time-dependent perturbation. By calculating the linear response of the cavity optical mode to a weak probe laser in the framework of the generalized linear response theory, it is shown how the Stokes and anti-Stokes sideband amplitudes as well as the power reflection coefficient, and the density of states of the cavity optical mode are expressed in terms of photonic retarded Green's functions. Then, we derive the equations of motion of retarded Green's functions of the system from nonlinear quantum Langevin equations and solve them. It is shown that for a single-photon optomechanical coupling of the order of the cavity linewidth, the nonlinear effect does not manifest itself unless the system satisfies a resonance condition, where the frequency of the upper normal mode of the system is twice that of the lower one. Based on the generality of the present approach which works at all regimes, the validity of linearization approximation is also confirmed at the off-resonance regime.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Measuring Inaccuracies in the Proportional Hazard Rate Model based on Extropy using a Length-Biased Weighted Residual approach
Authors:
M. Hashempour,
M. R. Kazemi
Abstract:
In this paper, we consider the concept of the residual inaccuracy measure and extend it to its weighted version based on extropy. Properties of this measure are studied and the discrimination principle is applied in the class of proportional hazard rate (PHR) models. A characterization problem for the proposed weighted extropy-inaccuracy measure is studied. We propose some alternative expressions…
▽ More
In this paper, we consider the concept of the residual inaccuracy measure and extend it to its weighted version based on extropy. Properties of this measure are studied and the discrimination principle is applied in the class of proportional hazard rate (PHR) models. A characterization problem for the proposed weighted extropy-inaccuracy measure is studied. We propose some alternative expressions of weighted residual measure of inaccuracy. Additionally, we establish upper and lower limits and various inequalities related to the weighted residual inaccuracy measure using extropy. Non-parametric estimators based on the kernel density estimation method and empirical distribution function for the proposed measure are obtained and the performance of the estimators are also discussed using some simulation studies. Finally, a real dataset is applied for illustrating our new proposed measure. In general, our study highlights the potential of the weighted residual inaccuracy measure based on extropy as a powerful tool for improving the quality and reliability of data analysis and modelling across various disciplines. Researchers and practitioners can benefit from incorporating this measure into their analytical toolkit to enhance the accuracy and effectiveness of their work.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Update Estimation and Scheduling for Over-the-Air Federated Learning with Energy Harvesting Devices
Authors:
Furkan Bagci,
Busra Tegin,
Mohammad Kazemi,
Tolga M. Duman
Abstract:
We study over-the-air (OTA) federated learning (FL) for energy harvesting devices with heterogeneous data distribution over wireless fading multiple access channel (MAC). To address the impact of low energy arrivals and data heterogeneity on global learning, we propose user scheduling strategies. Specifically, we develop two approaches: 1) entropy-based scheduling for known data distributions and…
▽ More
We study over-the-air (OTA) federated learning (FL) for energy harvesting devices with heterogeneous data distribution over wireless fading multiple access channel (MAC). To address the impact of low energy arrivals and data heterogeneity on global learning, we propose user scheduling strategies. Specifically, we develop two approaches: 1) entropy-based scheduling for known data distributions and 2) least-squares-based user representation estimation for scheduling with unknown data distributions at the parameter server. Both methods aim to select diverse users, mitigating bias and enhancing convergence. Numerical and analytical results demonstrate improved learning performance by reducing redundancy and conserving energy.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
ODMA-Based Cell-Free Unsourced Random Access with Successive Interference Cancellation
Authors:
Mert Ozates,
Mohammad Kazemi,
Eduard Jorswieck,
Deniz Gunduz
Abstract:
We consider the unsourced random access problem with multiple receivers and propose a cell-free type solution for that. In our proposed scheme, the active users transmit their signals to the access points (APs) distributed in a geographical area and connected to a central processing unit (CPU). The transmitted signals are composed of a pilot and polar codeword, where the polar codeword bits occupy…
▽ More
We consider the unsourced random access problem with multiple receivers and propose a cell-free type solution for that. In our proposed scheme, the active users transmit their signals to the access points (APs) distributed in a geographical area and connected to a central processing unit (CPU). The transmitted signals are composed of a pilot and polar codeword, where the polar codeword bits occupy a small fraction of the data part of the transmission frame. The receiver operations of pilot detection and channel and symbol estimation take place at the APs, while the actual message bits are detected at the CPU by combining the symbol estimates from the APs forwarded over the fronthaul. The effect of the successfully decoded messages is then subtracted at the APs. Numerical examples illustrate that the proposed scheme can support up to 1400 users with a high energy efficiency, and the distributed structure decreases the error probability by more than two orders of magnitude.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Transformers Struggle to Learn to Search
Authors:
Abulhair Saparov,
Srushti Pawar,
Shreyas Pimpalgaonkar,
Nitish Joshi,
Richard Yuanzhe Pang,
Vishakh Padmakumar,
Seyed Mehran Kazemi,
Najoung Kim,
He He
Abstract:
Search is an ability foundational in many important tasks, and recent studies have shown that large language models (LLMs) struggle to perform search robustly. It is unknown whether this inability is due to a lack of data, insufficient model parameters, or fundamental limitations of the transformer architecture. In this work, we use the foundational graph connectivity problem as a testbed to gener…
▽ More
Search is an ability foundational in many important tasks, and recent studies have shown that large language models (LLMs) struggle to perform search robustly. It is unknown whether this inability is due to a lack of data, insufficient model parameters, or fundamental limitations of the transformer architecture. In this work, we use the foundational graph connectivity problem as a testbed to generate effectively limitless high-coverage data to train small transformers and test whether they can learn to perform search. We find that, when given the right training distribution, the transformer is able to learn to search.
We analyze the algorithm that the transformer has learned through a novel mechanistic interpretability technique that enables us to extract the computation graph from the trained model. We find that transformers perform search at every vertex in parallel: For each vertex in the input graph, transformers compute the set of vertices reachable from that vertex. Each layer then progressively expands these sets, allowing the model to search over a number of vertices exponential in $n_{\text{layers}}$.
However, we find that as the input graph size increases, the transformer has greater difficulty in learning the task. This difficulty is not resolved even as the number of parameters is increased, suggesting that increasing model scale will not lead to robust search abilities. We also find that performing search in-context (i.e., chain-of-thought) does not resolve this inability to learn to search on larger graphs.
△ Less
Submitted 16 March, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
EFX Allocations and Orientations on Bipartite Multi-graphs: A Complete Picture
Authors:
Mahyar Afshinmehr,
Alireza Danaei,
Mehrafarin Kazemi,
Kurt Mehlhorn,
Nidhi Rathi
Abstract:
We consider the fundamental problem of fairly allocating a set of indivisible items among agents having valuations that are represented by a multi-graph -- here, agents appear as the vertices and items as the edges between them and each vertex (agent) only values the set of its incident edges (items). The goal is to find a fair, i.e., envy-free up to any item (EFX) allocation. This model has recen…
▽ More
We consider the fundamental problem of fairly allocating a set of indivisible items among agents having valuations that are represented by a multi-graph -- here, agents appear as the vertices and items as the edges between them and each vertex (agent) only values the set of its incident edges (items). The goal is to find a fair, i.e., envy-free up to any item (EFX) allocation. This model has recently been introduced by Christodoulou et al. (EC'23) where they show that EFX allocations always exist on simple graphs for monotone valuations, i.e., where any two agents can share at most one edge (item). A natural question arises as to what happens when we go beyond simple graphs and study various classes of multi-graphs?
We answer the above question affirmatively for the valuation class of bipartite multi-graphs and multi-cycles. Our main positive result is that EFX allocations on bipartite multi-graphs (and multi-cycles) always exist and can be computed in polynomial time for additive valuations. We, therefore, push the frontiers of our understanding of EFX allocations and expand the scenarios where they are known to exist for an arbitrary number of agents. Next, we study EFX orientations (i.e., allocations where every item is allocated to one of its two endpoint agents) and give a complete picture of when they exist for bipartite multi-graphs dependent on two parameters -- the number of edges shared between any two agents and the diameter of the graph. Finally, we prove that it is NP-complete to determine whether a given fair division instance on a bipartite multi-graph admits an EFX orientation.
△ Less
Submitted 26 October, 2024; v1 submitted 22 October, 2024;
originally announced October 2024.
-
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Authors:
Aditya Sharma,
Aman Dalmia,
Mehran Kazemi,
Amal Zouaq,
Christopher J. Pal
Abstract:
Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training, such as ca…
▽ More
Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training, such as calculating the cosine of an arbitrary angle, and by difficulties in correctly applying relevant geometry formulas. To overcome these challenges, we present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library. By executing the code, we achieve accurate and deterministic calculations, contrasting the stochastic nature of autoregressive token prediction, while the function library minimizes errors in formula usage. We also propose a multimodal retrieval-augmented variant of GeoCoder, named RAG-GeoCoder, which incorporates a non-parametric memory module for retrieving functions from the geometry library, thereby reducing reliance on parametric memory. Our modular code-finetuning approach enhances the geometric reasoning capabilities of VLMs, yielding an average improvement of over 16% across various question complexities on the GeomVerse dataset compared to other finetuning methods.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Capacity Bounds for the Poisson-Repeat Channel
Authors:
Mohammad Kazemi,
Tolga M. Duman
Abstract:
We develop bounds on the capacity of Poisson-repeat channels (PRCs) for which each input bit is independently repeated according to a Poisson distribution. The upper bounds are obtained by considering an auxiliary channel where the output lengths corresponding to input blocks of a given length are provided as side information at the receiver. Numerical results show that the resulting upper bounds…
▽ More
We develop bounds on the capacity of Poisson-repeat channels (PRCs) for which each input bit is independently repeated according to a Poisson distribution. The upper bounds are obtained by considering an auxiliary channel where the output lengths corresponding to input blocks of a given length are provided as side information at the receiver. Numerical results show that the resulting upper bounds are significantly tighter than the best known one for a large range of the PRC parameter $λ$ (specifically, for $λ\ge 0.35$). We also describe a way of obtaining capacity lower bounds using information rates of the auxiliary channel and the entropy rate of the provided side information.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
MMS Approximations Under Additive Leveled Valuations
Authors:
Mahyar Afshinmehr,
Mehrafarin Kazemi,
Kurt Mehlhorn
Abstract:
We study the problem of fairly allocating indivisible goods to a set of agents with additive leveled valuations. A valuation function is called leveled if and only if bundles of larger size have larger value than bundles of smaller size. The economics literature has well studied such valuations.
We use the maximin-share (MMS) and EFX as standard notions of fairness. We show that an algorithm int…
▽ More
We study the problem of fairly allocating indivisible goods to a set of agents with additive leveled valuations. A valuation function is called leveled if and only if bundles of larger size have larger value than bundles of smaller size. The economics literature has well studied such valuations.
We use the maximin-share (MMS) and EFX as standard notions of fairness. We show that an algorithm introduced by Christodoulou et al. ([11]) constructs an allocation that is EFX and $\frac{\lfloor \frac{m}{n} \rfloor}{\lfloor \frac{m}{n} \rfloor + 1}\text{-MMS}$. In the paper, it was claimed that the allocation is EFX and $\frac{2}{3}\text{-MMS}$. However, the proof of the MMS-bound is incorrect. We give a counter-example to their proof and then prove a stronger approximation of MMS.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Unsourced Random Access: A Comprehensive Survey
Authors:
Mert Ozates,
Mohammad Javad Ahmadi,
Mohammad Kazemi,
Deniz Gündüz,
Tolga M. Duman
Abstract:
Multiple access communication systems enable numerous users to share common communication resources simultaneously, playing a crucial role in wireless networks. With the emergence of the sixth generation (6G) and beyond communication systems, supporting massive machine-type communications with sporadic activity patterns is expected to become a critical challenge. Unsourced random access (URA) has…
▽ More
Multiple access communication systems enable numerous users to share common communication resources simultaneously, playing a crucial role in wireless networks. With the emergence of the sixth generation (6G) and beyond communication systems, supporting massive machine-type communications with sporadic activity patterns is expected to become a critical challenge. Unsourced random access (URA) has emerged as a promising paradigm to address this challenge by decoupling user identification from data transmission through the use of a common codebook. This survey provides a comprehensive overview of URA solutions, covering both theoretical foundations and practical implementations. We present a systematic classification of URA solutions across three main channel models: Gaussian multiple access channels (GMACs), single-antenna fading, and multiple-input multiple-output (MIMO) fading channels. For each category, we analyze and compare state-of-the-art solutions in terms of performance, complexity, and practical feasibility. Additionally, we discuss critical challenges such as interference management, computational complexity, and synchronization issues. The survey concludes with promising future research directions and potential methods to address existing limitations, providing a roadmap for researchers and practitioners in this rapidly evolving field.
△ Less
Submitted 24 February, 2025; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries
Authors:
Kiran Vodrahalli,
Santiago Ontanon,
Nilesh Tripuraneni,
Kelvin Xu,
Sanil Jain,
Rakesh Shivanna,
Jeffrey Hui,
Nishanth Dikkala,
Mehran Kazemi,
Bahare Fatemi,
Rohan Anil,
Ethan Dyer,
Siamak Shakeri,
Roopali Vij,
Harsh Mehta,
Vinay Ramasesh,
Quoc Le,
Ed Chi,
Yifeng Lu,
Orhan Firat,
Angeliki Lazaridou,
Jean-Baptiste Lespiau,
Nithya Attaluri,
Kate Olszewska
Abstract:
We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of th…
▽ More
We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of the Latent Structure Queries framework (LSQ) is to construct tasks which require a model to ``chisel away'' the irrelevant information in the context, revealing a latent structure in the context. To verify a model's understanding of this latent structure, we query the model for details of the structure. Using LSQ, we produce three diagnostic long-context evaluations across code and natural-language domains intended to provide a stronger signal of long-context language model capabilities. We perform evaluations on several state-of-the-art models and demonstrate both that a) the proposed evaluations are high-signal and b) that there is significant room for improvement in synthesizing long-context information.
△ Less
Submitted 19 September, 2024; v1 submitted 19 September, 2024;
originally announced September 2024.
-
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Authors:
Hritik Bansal,
Arian Hosseini,
Rishabh Agarwal,
Vinh Q. Tran,
Mehran Kazemi
Abstract:
Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) model versus a weaker but cheaper…
▽ More
Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) model versus a weaker but cheaper (WC) model. We evaluate the generated data across three key metrics: coverage, diversity, and false positive rate, and show that the data from WC models may have higher coverage and diversity, but also exhibit higher false positive rates. We then finetune LMs on data from SE and WC models in different settings: knowledge distillation, self-improvement, and a novel weak-to-strong improvement setup where a weaker LM teaches reasoning to a stronger LM. Our findings reveal that models finetuned on WC-generated data consistently outperform those trained on SE-generated data across multiple benchmarks and multiple choices of WC and SE models. These results challenge the prevailing practice of relying on SE models for synthetic data generation, suggesting that WC may be the compute-optimal approach for training advanced LM reasoners.
△ Less
Submitted 7 October, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Generative Verifiers: Reward Modeling as Next-Token Prediction
Authors:
Lunjun Zhang,
Arian Hosseini,
Hritik Bansal,
Mehran Kazemi,
Aviral Kumar,
Rishabh Agarwal
Abstract:
Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the best one is selected. While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation ca…
▽ More
Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the best one is selected. While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. Compared to standard verifiers, such generative verifiers (GenRM) can benefit from several advantages of LLMs: they integrate seamlessly with instruction tuning, enable chain-of-thought reasoning, and can utilize additional test-time compute via majority voting for better verification. We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge, resulting in large performance gains with Best-of-N, namely 5% $\rightarrow$ 45.3% on algorithmic tasks and 73% $\rightarrow$ 93.4% on GSM8K. In easy-to-hard generalization settings, we observe improvements of 28% $\rightarrow$ 44.6% on MATH, and 37.9% $\rightarrow$ 53.5% on MMLU abstract algebra. Furthermore, we find that training GenRM with synthetic verification rationales is sufficient to pick out subtle errors on math problems. Finally, we demonstrate that GenRM scales favorably with model size and test-time compute.
△ Less
Submitted 22 February, 2025; v1 submitted 27 August, 2024;
originally announced August 2024.
-
RIS-Aided Unsourced Multiple Access (RISUMA): Coding Strategy and Performance Limits
Authors:
Mohammad Javad Ahmadi,
Mohammad Kazemi,
Tolga M. Duman
Abstract:
This paper considers an unsourced random access (URA) set-up equipped with a passive reconfigurable intelligent surface (RIS), where a massive number of unidentified users (only a small fraction of them being active at any given time) are connected to the base station (BS). We introduce a slotted coding scheme for which each active user chooses a slot at random for transmitting its signal, consist…
▽ More
This paper considers an unsourced random access (URA) set-up equipped with a passive reconfigurable intelligent surface (RIS), where a massive number of unidentified users (only a small fraction of them being active at any given time) are connected to the base station (BS). We introduce a slotted coding scheme for which each active user chooses a slot at random for transmitting its signal, consisting of a pilot part and a randomly spread polar codeword. The proposed decoder operates in two phases. In the first phase, called the RIS configuration phase, the BS detects the transmitted pilots. The detected pilots are then utilized to estimate the corresponding users' channel state information, using which the BS suitably selects RIS phase shift employing the proposed RIS design algorithms. The proposed channel estimator offers the capability to obtain the channel coefficients of the users whose pilots interfere with each other without prior access to the list of transmitted pilots or the number of active users. In the second phase, called the data phase, transmitted messages of active users are decoded. Moreover, we establish an approximate achievability bound for the RIS-based URA scheme, providing a valuable benchmark. Computer simulations show that the proposed scheme outperforms the state-of-the-art for RIS-aided URA.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code
Authors:
Mahdi Kazemi,
Aftab Hussain,
Md Rafiqul Islam Rabin,
Mohammad Amin Alipour,
Sen Lin
Abstract:
This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization te…
▽ More
This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization technique, to unlearn trojans from poisoned models. We compare the effectiveness of LYA against conventional techniques like fine-tuning, retraining, and vanilla gradient ascent. The subject models we investigate are BERT and CodeBERT, for sentiment analysis and code defect detection tasks, respectively. Our findings demonstrate that the combination of gradient ascent and FIM-based regularization, as done in LYA, outperforms existing methods in removing the trojan's influence from the poisoned model, while preserving its original functionality. To the best of our knowledge, this is the first work that compares and contrasts MU of trojans in LLMs, in the NL and Coding domain.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (173 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 October, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web
Authors:
John Palowitch,
Hamidreza Alvari,
Mehran Kazemi,
Tanvir Amin,
Filip Radlinski
Abstract:
Web authors frequently embed social media to support and enrich their content, creating the potential to derive web-based, cross-platform social media representations that can enable more effective social media retrieval systems and richer scientific analyses. As step toward such capabilities, we introduce a novel language modeling framework that enables automatic annotation of roles that social m…
▽ More
Web authors frequently embed social media to support and enrich their content, creating the potential to derive web-based, cross-platform social media representations that can enable more effective social media retrieval systems and richer scientific analyses. As step toward such capabilities, we introduce a novel language modeling framework that enables automatic annotation of roles that social media entities play in their embedded web context. Using related communication theory, we liken social media embeddings to quotes, formalize the page context as structured natural language signals, and identify a taxonomy of roles for quotes within the page context. We release SocialQuotes, a new data set built from the Common Crawl of over 32 million social quotes, 8.3k of them with crowdsourced quote annotations. Using SocialQuotes and the accompanying annotations, we provide a role classification case study, showing reasonable performance with modern-day LLMs, and exposing explainable aspects of our framework via page content ablations. We also classify a large batch of un-annotated quotes, revealing interesting cross-domain, cross-platform role distributions on the web.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
ReMI: A Dataset for Reasoning with Multiple Images
Authors:
Mehran Kazemi,
Nishanth Dikkala,
Ankit Anand,
Petar Devic,
Ishita Dasgupta,
Fangyu Liu,
Bahare Fatemi,
Pranjal Awasthi,
Dee Guo,
Sreenivas Gollapudi,
Ahmed Qureshi
Abstract:
With the continuous advancement of large language models (LLMs), it is essential to create new benchmarks to effectively evaluate their expanding capabilities and identify areas for improvement. This work focuses on multi-image reasoning, an emerging capability in state-of-the-art LLMs. We introduce ReMI, a dataset designed to assess LLMs' ability to Reason with Multiple Images. This dataset encom…
▽ More
With the continuous advancement of large language models (LLMs), it is essential to create new benchmarks to effectively evaluate their expanding capabilities and identify areas for improvement. This work focuses on multi-image reasoning, an emerging capability in state-of-the-art LLMs. We introduce ReMI, a dataset designed to assess LLMs' ability to Reason with Multiple Images. This dataset encompasses a diverse range of tasks, spanning various reasoning domains such as math, physics, logic, code, table/chart understanding, and spatial and temporal reasoning. It also covers a broad spectrum of characteristics found in multi-image reasoning scenarios. We have benchmarked several cutting-edge LLMs using ReMI and found a substantial gap between their performance and human-level proficiency. This highlights the challenges in multi-image reasoning and the need for further research. Our analysis also reveals the strengths and weaknesses of different models, shedding light on the types of reasoning that are currently attainable and areas where future models require improvement. To foster further research in this area, we are releasing ReMI publicly: https://huggingface.co/datasets/mehrankazemi/ReMI.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Authors:
Bahare Fatemi,
Mehran Kazemi,
Anton Tsitsulin,
Karishma Malkan,
Jinyeong Yim,
John Palowitch,
Sungyong Seo,
Jonathan Halcrow,
Bryan Perozzi
Abstract:
Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-trai…
▽ More
Large language models (LLMs) have showcased remarkable reasoning capabilities, yet they remain susceptible to errors, particularly in temporal reasoning tasks involving complex temporal logic. Existing research has explored LLM performance on temporal reasoning using diverse datasets and benchmarks. However, these studies often rely on real-world data that LLMs may have encountered during pre-training or employ anonymization techniques that can inadvertently introduce factual inconsistencies. In this work, we address these limitations by introducing novel synthetic datasets specifically designed to assess LLM temporal reasoning abilities in various scenarios. The diversity of question types across these datasets enables systematic investigation into the impact of the problem structure, size, question type, fact order, and other factors on LLM performance. Our findings provide valuable insights into the strengths and weaknesses of current LLMs in temporal reasoning tasks. To foster further research in this area, we are open-sourcing the datasets and evaluation framework used in our experiments: https://huggingface.co/datasets/baharef/ToT.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
An ODMA-Based Unsourced Random Access Scheme with a Multiple Antenna Receiver
Authors:
Mert Ozates,
Mohammad Kazemi,
Tolga M. Duman
Abstract:
We investigate the unsourced random access scheme assuming that the base station is equipped with multiple antennas, and propose a high-performing solution utilizing on-off-division multiple access. We assume that each user spreads its pilot sequence and polar codeword to the pilot and data parts of the transmission frame, respectively, based on a transmission pattern. The iterative receiver opera…
▽ More
We investigate the unsourced random access scheme assuming that the base station is equipped with multiple antennas, and propose a high-performing solution utilizing on-off-division multiple access. We assume that each user spreads its pilot sequence and polar codeword to the pilot and data parts of the transmission frame, respectively, based on a transmission pattern. The iterative receiver operation consists of pilot and pattern detection followed by channel vector and symbol estimation, polar decoding, and successive interference cancellation. Numerical findings demonstrate that the proposed scheme has superior performance compared to the state-of-the-art in various antenna settings.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Authors:
Clayton Sanford,
Bahare Fatemi,
Ethan Hall,
Anton Tsitsulin,
Mehran Kazemi,
Jonathan Halcrow,
Bryan Perozzi,
Vahab Mirrokni
Abstract:
Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extr…
▽ More
Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. Our novel representational hierarchy separates 9 algorithmic reasoning problems into classes solvable by transformers in different realistic parameter scaling regimes. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Optical transition parameters of the silicon T centre
Authors:
Chloe Clear,
Sara Hosseini,
Amirhossein AlizadehKhaledi,
Nicholas Brunelle,
Austin Woolverton,
Joshua Kanaganayagam,
Moein Kazemi,
Camille Chartrand,
Mehdi Keshavarz,
Yihuang Xiong,
Louis Alaerts,
Oney O. Soykal,
Geoffroy Hautier,
Valentin Karassiouk,
Mike Thewalt,
Daniel Higginbottom,
Stephanie Simmons
Abstract:
The silicon T centre's narrow, telecommunications-band optical emission, long spin coherence, and direct photonic integration have spurred interest in this emitter as a spin-photon interface for distributed quantum computing and networking. However, key parameters of the T centre's spin-selective optical transitions remain undetermined or ambiguous in literature. In this paper we present a Hamilto…
▽ More
The silicon T centre's narrow, telecommunications-band optical emission, long spin coherence, and direct photonic integration have spurred interest in this emitter as a spin-photon interface for distributed quantum computing and networking. However, key parameters of the T centre's spin-selective optical transitions remain undetermined or ambiguous in literature. In this paper we present a Hamiltonian of the T centre TX state and determine key parameters of the optical transition from T$_0$ to TX$_0$ from a combined analysis of published results, density functional theory, and new spectroscopy. We resolve ambiguous values of the internal defect potential in the literature, and we present the first measurements of electrically tuned T centre emission. As a result, we provide a model of the T centre's optical and spin properties under strain, electric, and magnetic fields that can be utilized for realizing quantum technologies.
△ Less
Submitted 8 November, 2024; v1 submitted 11 May, 2024;
originally announced May 2024.
-
Using Domain Knowledge to Guide Dialog Structure Induction via Neural Probabilistic Soft Logic
Authors:
Connor Pryor,
Quan Yuan,
Jeremiah Liu,
Mehran Kazemi,
Deepak Ramachandran,
Tania Bedrax-Weiss,
Lise Getoor
Abstract:
Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underpe…
▽ More
Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underperform when the training corpus is limited/noisy, or have difficulty when test dialogs exhibit distributional shifts from the training domain. This work explores a neural-symbolic approach as a potential solution to these problems. We introduce Neural Probabilistic Soft Logic Dialogue Structure Induction (NEUPSL DSI), a principled approach that injects symbolic knowledge into the latent space of a generative neural model. We conduct a thorough empirical investigation on the effect of NEUPSL DSI learning on hidden representation quality, few-shot learning, and out-of-domain generalization performance. Over three dialog structure induction datasets and across unsupervised and semi-supervised settings for standard and cross-domain generalization, the injection of symbolic knowledge using NEUPSL DSI provides a consistent boost in performance over the canonical baselines.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Conformal Off-Policy Prediction for Multi-Agent Systems
Authors:
Tom Kuipers,
Renukanandan Tumu,
Shuo Yang,
Milad Kazemi,
Rahul Mangharam,
Nicola Paoletti
Abstract:
Off-Policy Prediction (OPP), i.e., predicting the outcomes of a target policy using only data collected under a nominal (behavioural) policy, is a paramount problem in data-driven analysis of safety-critical systems where the deployment of a new policy may be unsafe. To achieve dependable off-policy predictions, recent work on Conformal Off-Policy Prediction (COPP) leverage the conformal predictio…
▽ More
Off-Policy Prediction (OPP), i.e., predicting the outcomes of a target policy using only data collected under a nominal (behavioural) policy, is a paramount problem in data-driven analysis of safety-critical systems where the deployment of a new policy may be unsafe. To achieve dependable off-policy predictions, recent work on Conformal Off-Policy Prediction (COPP) leverage the conformal prediction framework to derive prediction regions with probabilistic guarantees under the target process. Existing COPP methods can account for the distribution shifts induced by policy switching, but are limited to single-agent systems and scalar outcomes (e.g., rewards). In this work, we introduce MA-COPP, the first conformal prediction method to solve OPP problems involving multi-agent systems, deriving joint prediction regions for all agents' trajectories when one or more ego agents change their policies. Unlike the single-agent scenario, this setting introduces higher complexity as the distribution shifts affect predictions for all agents, not just the ego agents, and the prediction task involves full multi-dimensional trajectories, not just reward values. A key contribution of MA-COPP is to avoid enumeration or exhaustive search of the output space of agent trajectories, which is instead required by existing COPP methods to construct the prediction region. We achieve this by showing that an over-approximation of the true joint prediction region (JPR) can be constructed, without enumeration, from the maximum density ratio of the JPR trajectories. We evaluate the effectiveness of MA-COPP in multi-agent systems from the PettingZoo library and the F1TENTH autonomous racing environment, achieving nominal coverage in higher dimensions and various shift settings.
△ Less
Submitted 15 September, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
All-optical control of skyrmion configuration in CrI$_3$ monolayer
Authors:
M. Kazemi,
A. Kudlis,
P. F. Bessarab,
I. A. Shelykh
Abstract:
The potential for manipulating characteristics of skyrmions in a CrI$_3$ monolayer using circularly polarised light is explored. The effective skyrmion-light interaction is mediated by bright excitons whose magnetization is selectively influenced by the polarization of photons. The light-induced skyrmion dynamics is illustrated by the dependencies of the skyrmion size and the skyrmion lifetime on…
▽ More
The potential for manipulating characteristics of skyrmions in a CrI$_3$ monolayer using circularly polarised light is explored. The effective skyrmion-light interaction is mediated by bright excitons whose magnetization is selectively influenced by the polarization of photons. The light-induced skyrmion dynamics is illustrated by the dependencies of the skyrmion size and the skyrmion lifetime on the intensity and polarization of the incident light pulse. Two-dimensional magnets hosting excitons thus represent a promising platform for the control of topological magnetic structures by light.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Non-Heisenbergian quantum mechanics
Authors:
MohammadJavad Kazemi,
Ghadir Jafari
Abstract:
Relaxing the postulates of an axiomatic theory is a natural way to find more general theories, and historically, the discovery of non-Euclidean geometry is a famous example of this procedure. Here, we use this way to extend quantum mechanics by ignoring the heart of Heisenberg's quantum mechanics -- We do not assume the existence of a position operator that satisfies the Heisenberg commutation rel…
▽ More
Relaxing the postulates of an axiomatic theory is a natural way to find more general theories, and historically, the discovery of non-Euclidean geometry is a famous example of this procedure. Here, we use this way to extend quantum mechanics by ignoring the heart of Heisenberg's quantum mechanics -- We do not assume the existence of a position operator that satisfies the Heisenberg commutation relation, $[\hat x,\hat p]=i\hbar$. The remaining axioms of quantum theory, besides Galilean symmetry, lead to a more general quantum theory with a free parameter $l_0$ of length dimension, such that as $l_0 \to 0$ the theory reduces to standard quantum theory. Perhaps surprisingly, this non-Heisenberg quantum theory, without a priori assumption of the non-commutation relation, leads to a modified Heisenberg uncertainty relation, $Δx Δp\geq \sqrt{\hbar^2/4+l_0^2(Δp)^2}$, which ensures the existence of a minimal position uncertainty, $l_0$, as expected from various quantum gravity studies. By comparing the results of this framework with some observed data, which includes the first longitudinal normal modes of the bar gravitational wave detector AURIGA and the $1S-2S$ transition in the hydrogen atom, we obtain upper bounds on the $l_0$.
△ Less
Submitted 14 October, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Counterfactual Influence in Markov Decision Processes
Authors:
Milad Kazemi,
Jessica Lally,
Ekaterina Tishchenko,
Hana Chockler,
Nicola Paoletti
Abstract:
Our work addresses a fundamental problem in the context of counterfactual inference for Markov Decision Processes (MDPs). Given an MDP path $τ$, this kind of inference allows us to derive counterfactual paths $τ'$ describing what-if versions of $τ$ obtained under different action sequences than those observed in $τ$. However, as the counterfactual states and actions deviate from the observed ones…
▽ More
Our work addresses a fundamental problem in the context of counterfactual inference for Markov Decision Processes (MDPs). Given an MDP path $τ$, this kind of inference allows us to derive counterfactual paths $τ'$ describing what-if versions of $τ$ obtained under different action sequences than those observed in $τ$. However, as the counterfactual states and actions deviate from the observed ones over time, the observation $τ$ may no longer influence the counterfactual world, meaning that the analysis is no longer tailored to the individual observation, resulting in interventional outcomes rather than counterfactual ones. Even though this issue specifically affects the popular Gumbel-max structural causal model used for MDP counterfactuals, it has remained overlooked until now. In this work, we introduce a formal characterisation of influence based on comparing counterfactual and interventional distributions. We devise an algorithm to construct counterfactual models that automatically satisfy influence constraints. Leveraging such models, we derive counterfactual policies that are not just optimal for a given reward structure but also remain tailored to the observed path. Even though there is an unavoidable trade-off between policy optimality and strength of influence constraints, our experiments demonstrate that it is possible to derive (near-)optimal policies while remaining under the influence of the observation.
△ Less
Submitted 27 March, 2025; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Let Your Graph Do the Talking: Encoding Structured Data for LLMs
Authors:
Bryan Perozzi,
Bahare Fatemi,
Dustin Zelle,
Anton Tsitsulin,
Mehran Kazemi,
Rami Al-Rfou,
Jonathan Halcrow
Abstract:
How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representati…
▽ More
How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representation), our work is the first effort focused on the general encoding of structured data to be used for various reasoning tasks. We show that explicitly representing the graph structure allows significant improvements to graph reasoning tasks. Specifically, we see across the board improvements - up to 73% points - on node, edge and, graph-level tasks from the GraphQA benchmark.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
In-context Learning with Retrieved Demonstrations for Language Models: A Survey
Authors:
Man Luo,
Xin Xu,
Yue Liu,
Panupong Pasupat,
Mehran Kazemi
Abstract:
Language models, especially pre-trained large language models, have showcased remarkable abilities as few-shot in-context learners (ICL), adept at adapting to new tasks with just a few demonstrations in the input context. However, the model's ability to perform ICL is sensitive to the choice of the few-shot demonstrations. Instead of using a fixed set of demonstrations, one recent development is t…
▽ More
Language models, especially pre-trained large language models, have showcased remarkable abilities as few-shot in-context learners (ICL), adept at adapting to new tasks with just a few demonstrations in the input context. However, the model's ability to perform ICL is sensitive to the choice of the few-shot demonstrations. Instead of using a fixed set of demonstrations, one recent development is to retrieve demonstrations tailored to each input query. The implementation of demonstration retrieval is relatively straightforward, leveraging existing databases and retrieval systems. This not only improves the efficiency and scalability of the learning process but also has been shown to reduce biases inherent in manual example selection. In light of the encouraging results and growing research in ICL with retrieved demonstrations, we conduct an extensive review of studies in this area. In this survey, we discuss and compare different design choices for retrieval models, retrieval training procedures, and inference algorithms.
△ Less
Submitted 23 March, 2024; v1 submitted 21 January, 2024;
originally announced January 2024.
-
Evaluating Driver Readiness in Conditionally Automated Vehicles from Eye-Tracking Data and Head Pose
Authors:
Mostafa Kazemi,
Mahdi Rezaei,
Mohsen Azarmi
Abstract:
As automated driving technology advances, the role of the driver to resume control of the vehicle in conditionally automated vehicles becomes increasingly critical. In the SAE Level 3 or partly automated vehicles, the driver needs to be available and ready to intervene when necessary. This makes it essential to evaluate their readiness accurately. This article presents a comprehensive analysis of…
▽ More
As automated driving technology advances, the role of the driver to resume control of the vehicle in conditionally automated vehicles becomes increasingly critical. In the SAE Level 3 or partly automated vehicles, the driver needs to be available and ready to intervene when necessary. This makes it essential to evaluate their readiness accurately. This article presents a comprehensive analysis of driver readiness assessment by combining head pose features and eye-tracking data. The study explores the effectiveness of predictive models in evaluating driver readiness, addressing the challenges of dataset limitations and limited ground truth labels. Machine learning techniques, including LSTM architectures, are utilised to model driver readiness based on the Spatio-temporal status of the driver's head pose and eye gaze. The experiments in this article revealed that a Bidirectional LSTM architecture, combining both feature sets, achieves a mean absolute error of 0.363 on the DMD dataset, demonstrating superior performance in assessing driver readiness. The modular architecture of the proposed model also allows the integration of additional driver-specific features, such as steering wheel activity, enhancing its adaptability and real-world applicability.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Quantum Complexity vs Classical Complexity: A Survey
Authors:
Arash Vaezi,
Ali Movaghar,
Mohammad Ghodsi,
Seyed Mohammad Hussein Kazemi,
Negin Bagheri Noghrehy,
Seyed Mohsen Kazemi
Abstract:
Scientists have demonstrated that quantum computing has presented novel approaches to address computational challenges, each varying in complexity. Adapting problem-solving strategies is crucial to harness the full potential of quantum computing. Nonetheless, there are defined boundaries to the capabilities of quantum computing. This paper concentrates on aggregating prior research efforts dedicat…
▽ More
Scientists have demonstrated that quantum computing has presented novel approaches to address computational challenges, each varying in complexity. Adapting problem-solving strategies is crucial to harness the full potential of quantum computing. Nonetheless, there are defined boundaries to the capabilities of quantum computing. This paper concentrates on aggregating prior research efforts dedicated to solving intricate classical computational problems through quantum computing. The objective is to systematically compile an exhaustive inventory of these solutions and categorize a collection of demanding open problems that await further exploration. Through statistical analysis, we help the researchers with their further investigations.
△ Less
Submitted 11 September, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Authors:
Mehran Kazemi,
Hamidreza Alvari,
Ankit Anand,
Jialin Wu,
Xi Chen,
Radu Soricut
Abstract:
Large language models have shown impressive results for multi-hop mathematical reasoning when the input question is only textual. Many mathematical reasoning problems, however, contain both text and image. With the ever-increasing adoption of vision language models (VLMs), understanding their reasoning abilities for such problems is crucial. In this paper, we evaluate the reasoning capabilities of…
▽ More
Large language models have shown impressive results for multi-hop mathematical reasoning when the input question is only textual. Many mathematical reasoning problems, however, contain both text and image. With the ever-increasing adoption of vision language models (VLMs), understanding their reasoning abilities for such problems is crucial. In this paper, we evaluate the reasoning capabilities of VLMs along various axes through the lens of geometry problems. We procedurally create a synthetic dataset of geometry questions with controllable difficulty levels along multiple axes, thus enabling a systematic evaluation. The empirical results obtained using our benchmark for state-of-the-art VLMs indicate that these models are not as capable in subjects like geometry (and, by generalization, other topics requiring similar reasoning) as suggested by previous benchmarks. This is made especially clear by the construction of our benchmark at various depth levels, since solving higher-depth problems requires long chains of reasoning rather than additional memorized knowledge. We release the dataset for further research in this area.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1326 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 9 May, 2025; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Assume-Guarantee Reinforcement Learning
Authors:
Milad Kazemi,
Mateo Perez,
Fabio Somenzi,
Sadegh Soudjani,
Ashutosh Trivedi,
Alvaro Velasquez
Abstract:
We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where t…
▽ More
We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making \emph{assumptions} about the behaviors of neighboring components, and providing \emph{guarantees} about their own behavior. We express these \emph{assume-guarantee contracts} as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of satisfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
TaskLAMA: Probing the Complex Task Understanding of Language Models
Authors:
Quan Yuan,
Mehran Kazemi,
Xin Xu,
Isaac Noble,
Vaiva Imbrasaite,
Deepak Ramachandran
Abstract:
Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe…
▽ More
Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37% over the base model. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
UGSL: A Unified Framework for Benchmarking Graph Structure Learning
Authors:
Bahare Fatemi,
Sami Abu-El-Haija,
Anton Tsitsulin,
Mehran Kazemi,
Dustin Zelle,
Neslihan Bulut,
Jonathan Halcrow,
Bryan Perozzi
Abstract:
Graph neural networks (GNNs) demonstrate outstanding performance in a broad range of applications. While the majority of GNN applications assume that a graph structure is given, some recent methods substantially expanded the applicability of GNNs by showing that they may be effective even when no graph structure is explicitly provided. The GNN parameters and a graph structure are jointly learned.…
▽ More
Graph neural networks (GNNs) demonstrate outstanding performance in a broad range of applications. While the majority of GNN applications assume that a graph structure is given, some recent methods substantially expanded the applicability of GNNs by showing that they may be effective even when no graph structure is explicitly provided. The GNN parameters and a graph structure are jointly learned. Previous studies adopt different experimentation setups, making it difficult to compare their merits. In this paper, we propose a benchmarking strategy for graph structure learning using a unified framework. Our framework, called Unified Graph Structure Learning (UGSL), reformulates existing models into a single model. We implement a wide range of existing models in our framework and conduct extensive analyses of the effectiveness of different components in the framework. Our results provide a clear and concise understanding of the different methods in this area as well as their strengths and weaknesses. The benchmark code is available at https://github.com/google-research/google-research/tree/master/ugsl.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Application of Artificial Neural Networks for Investigation of Pressure Filtration Performance, a Zinc Leaching Filter Cake Moisture Modeling
Authors:
Masoume Kazemi,
Davood Moradkhani,
Alireza A. Alipour
Abstract:
Machine Learning (ML) is a powerful tool for material science applications. Artificial Neural Network (ANN) is a machine learning technique that can provide high prediction accuracy. This study aimed to develop an ANN model to predict the cake moisture of the pressure filtration process of zinc production. The cake moisture was influenced by seven parameters: temperature (35 and 65 Celsius), solid…
▽ More
Machine Learning (ML) is a powerful tool for material science applications. Artificial Neural Network (ANN) is a machine learning technique that can provide high prediction accuracy. This study aimed to develop an ANN model to predict the cake moisture of the pressure filtration process of zinc production. The cake moisture was influenced by seven parameters: temperature (35 and 65 Celsius), solid concentration (0.2 and 0.38 g/L), pH (2, 3.5, and 5), air-blow time (2, 10, and 15 min), cake thickness (14, 20, 26, and 34 mm), pressure, and filtration time. The study conducted 288 tests using two types of fabrics: polypropylene (S1) and polyester (S2). The ANN model was evaluated by the Coefficient of determination (R2), the Mean Square Error (MSE), and the Mean Absolute Error (MAE) metrics for both datasets. The results showed R2 values of 0.88 and 0.83, MSE values of 6.243x10-07 and 1.086x10-06, and MAE values of 0.00056 and 0.00088 for S1 and S2, respectively. These results indicated that the ANN model could predict the cake moisture of pressure filtration in the zinc leaching process with high accuracy.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling
Authors:
Masoume Kazemi,
Davood Moradkhani,
Alireza Abbas Alipour
Abstract:
The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector…
▽ More
The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector Machine (SVM). The models take continuous variables (extracted features) from the lab samples as inputs. Thus, regression models namely Random Forest Regression (RFR) and Support Vector Regression (SVR) were chosen. A total dataset was obtained during the pressure filtration process in two conditions: 1) Polypropylene (S1) and 2) Polyester fabrics (S2). To predict the cake moisture, solids concentration (0.2 and 0.38), temperature (35 and 65 centigrade), pH (2, 3.5, and 5), pressure, cake thickness (14, 20, 26, and 34 mm), air-blow time (2, 10 and 15 min) and filtration time were applied as input variables. The models' predictive accuracy was evaluated by the coefficient of determination (R2) parameter. The results revealed that the RFR model is superior to the SVR model for cake moisture prediction.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
TeleBTC: Trustless Wrapped Bitcoin
Authors:
Mahyar Daneshpajooh,
Niusha Moshrefi,
Mahdi Darabi,
Sina Hashemi,
Mehrafarin Kazemi
Abstract:
This paper introduces TeleBTC, a fully decentralized protocol designed to wrap Bitcoin (BTC) on programmable blockchains. The creation of a decentralized wrapped BTC presents challenges due to the non-programmable nature of Bitcoin, making it difficult to custody BTCs in a decentralized way. Existing solutions have addressed this challenge by introducing an external layer of validators who take cu…
▽ More
This paper introduces TeleBTC, a fully decentralized protocol designed to wrap Bitcoin (BTC) on programmable blockchains. The creation of a decentralized wrapped BTC presents challenges due to the non-programmable nature of Bitcoin, making it difficult to custody BTCs in a decentralized way. Existing solutions have addressed this challenge by introducing an external layer of validators who take custody of users' BTCs. However, the security and decentralization of this layer are inferior to the underlying blockchains on which wrapped BTC is built. Moreover, the process of joining or leaving for a validator has become overly complex and expensive. To overcome these limitations, we propose a novel approach that eliminates the need for such an external layer by leveraging the light client bridge protocol. Additionally, we employ economic mechanisms such as incentivization and slashing, resulting in a secure and trust-minimized wrapped BTC solution. With TeleBTC, users can seamlessly transfer their BTC to other blockchains and utilize it within decentralized applications. Furthermore, they can unwrap their TeleBTC and reclaim the native BTC. To address the high costs associated with light client bridges, we present an optimistic approach that minimizes the cost. This approach significantly reduces the operational expenses of running the protocol.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Unsourced Random Access Using Multiple Stages of Orthogonal Pilots: MIMO and Single-Antenna Structures
Authors:
Mohammad Javad Ahmadi,
Mohammad Kazemi,
Tolga M. Duman
Abstract:
We study the problem of unsourced random access (URA) over Rayleigh block-fading channels with a receiver equipped with multiple antennas. We propose a slotted structure with multiple stages of orthogonal pilots, each of which is randomly picked from a codebook. In the proposed signaling structure, each user encodes its message using a polar code and appends it to the selected pilot sequences to c…
▽ More
We study the problem of unsourced random access (URA) over Rayleigh block-fading channels with a receiver equipped with multiple antennas. We propose a slotted structure with multiple stages of orthogonal pilots, each of which is randomly picked from a codebook. In the proposed signaling structure, each user encodes its message using a polar code and appends it to the selected pilot sequences to construct its transmitted signal. Accordingly, the transmitted signal is composed of multiple orthogonal pilot parts and a polar-coded part, which is sent through a randomly selected slot. The performance of the proposed scheme is further improved by randomly dividing users into different groups each having a unique interleaver-power pair. We also apply the idea of multiple stages of orthogonal pilots to the case of a single receive antenna. In all the set-ups, we use an iterative approach for decoding the transmitted messages along with a suitable successive interference cancellation technique. The use of orthogonal pilots and the slotted structure lead to improved accuracy and reduced computational complexity in the proposed set-ups, and make the implementation with short blocklengths more viable. Performance of the proposed set-ups is illustrated via extensive simulation results which show that the proposed set-ups with multiple antennas perform better than the existing MIMO URA solutions for both short and large blocklengths, and that the proposed single-antenna set-ups are superior to the existing single-antenna URA schemes.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.