-
Princeton365: A Diverse Dataset with Accurate Camera Pose
Authors:
Karhan Kayan,
Stamatis Alexandropoulos,
Rishabh Jain,
Yiming Zuo,
Erich Liang,
Jia Deng
Abstract:
We introduce Princeton365, a large-scale diverse dataset of 365 videos with accurate camera pose. Our dataset bridges the gap between accuracy and data diversity in current SLAM benchmarks by introducing a novel ground truth collection framework that leverages calibration boards and a 360-camera. We collect indoor, outdoor, and object scanning videos with synchronized monocular and stereo RGB vide…
▽ More
We introduce Princeton365, a large-scale diverse dataset of 365 videos with accurate camera pose. Our dataset bridges the gap between accuracy and data diversity in current SLAM benchmarks by introducing a novel ground truth collection framework that leverages calibration boards and a 360-camera. We collect indoor, outdoor, and object scanning videos with synchronized monocular and stereo RGB video outputs as well as IMU. We further propose a new scene scale-aware evaluation metric for SLAM based on the the optical flow induced by the camera pose estimation error. In contrast to the current metrics, our new metric allows for comparison between the performance of SLAM methods across scenes as opposed to existing metrics such as Average Trajectory Error (ATE), allowing researchers to analyze the failure modes of their methods. We also propose a challenging Novel View Synthesis benchmark that covers cases not covered by current NVS benchmarks, such as fully non-Lambertian scenes with 360-degree camera trajectories. Please visit https://princeton365.cs.princeton.edu for the dataset, code, videos, and submission.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Authors:
Zhuochen Liu,
Rahul Jain,
Quan Nguyen
Abstract:
Recent advancements in reinforcement learning (RL) have leveraged neural networks to achieve state-of-the-art performance across various control tasks. However, these successes often come at the cost of significant computational resources, as training deep neural networks requires substantial time and data. In this paper, we introduce an actor-critic algorithm that utilizes randomized neural netwo…
▽ More
Recent advancements in reinforcement learning (RL) have leveraged neural networks to achieve state-of-the-art performance across various control tasks. However, these successes often come at the cost of significant computational resources, as training deep neural networks requires substantial time and data. In this paper, we introduce an actor-critic algorithm that utilizes randomized neural networks to drastically reduce computational costs while maintaining strong performance. Despite its simple architecture, our method effectively solves a range of control problems, including the locomotion control of a highly dynamic 12-motor quadruped robot, and achieves results comparable to leading algorithms such as Proximal Policy Optimization (PPO). Notably, our approach does not outperform other algorithms in terms of sample efficnency but rather in terms of wall-clock training time. That is, although our algorithm requires more timesteps to converge to an optimal policy, the actual time required for training turns out to be lower.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
An Exploratory Study on Multi-modal Generative AI in AR Storytelling
Authors:
Hyungjun Doh,
Jingyu Shi,
Rahul Jain,
Heesoo Kim,
Karthik Ramani
Abstract:
Storytelling in AR has gained attention due to its multi-modality and interactivity. However, generating multi-modal content for AR storytelling requires expertise and efforts for high-quality conveyance of the narrator's intention. Recently, Generative-AI (GenAI) has shown promising applications in multi-modal content generation. Despite the potential benefit, current research calls for validatin…
▽ More
Storytelling in AR has gained attention due to its multi-modality and interactivity. However, generating multi-modal content for AR storytelling requires expertise and efforts for high-quality conveyance of the narrator's intention. Recently, Generative-AI (GenAI) has shown promising applications in multi-modal content generation. Despite the potential benefit, current research calls for validating the effect of AI-generated content (AIGC) in AR Storytelling. Therefore, we conducted an exploratory study to investigate the utilization of GenAI. Analyzing 223 AR videos, we identified a design space for multi-modal AR Storytelling. Based on the design space, we developed a testbed facilitating multi-modal content generation and atomic elements in AR Storytelling. Through two studies with N=30 experienced storytellers and live presenters, we 1. revealed participants' preferences for modalities, 2. evaluated the interactions with AI to generate content, and 3. assessed the quality of the AIGC for AR Storytelling. We further discussed design considerations for future AR Storytelling with GenAI.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models
Authors:
Akhil Agnihotri,
Rahul Jain,
Deepak Ramachandran,
Zheng Wen
Abstract:
Post-training of LLMs with RLHF, and subsequently preference optimization algorithms such as DPO, IPO, etc., made a big difference in improving human alignment. However, all such techniques can only work with a single (human) objective. In practice, human users have multiple objectives, such as helpfulness and harmlessness, and there is no natural way to aggregate them into a single objective. In…
▽ More
Post-training of LLMs with RLHF, and subsequently preference optimization algorithms such as DPO, IPO, etc., made a big difference in improving human alignment. However, all such techniques can only work with a single (human) objective. In practice, human users have multiple objectives, such as helpfulness and harmlessness, and there is no natural way to aggregate them into a single objective. In this paper, we address the multi-objective preference-alignment problem, where a policy must optimize several, potentially conflicting, objectives. We introduce the Multi-Objective Preference Optimization (MOPO) algorithm, which frames alignment as a constrained KL-regularized optimization: the primary objective is maximized while secondary objectives are lower-bounded by tunable safety thresholds. Unlike prior work, MOPO operates directly on pairwise preference data, requires no point-wise reward assumption, and avoids heuristic prompt-context engineering. The method recovers policies on the Pareto front whenever the front is attainable; practically, it reduces to simple closed-form iterative updates suitable for large-scale training. On synthetic benchmarks with diverse canonical preference structures, we show that MOPO approximates the Pareto front. When fine-tuning a 1.3B-parameter language model on real-world human-preference datasets, MOPO attains higher rewards and yields policies that Pareto-dominate baselines; ablation studies confirm optimization stability and robustness to hyperparameters.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
Authors:
Mohammed Adnan,
Rohan Jain,
Ekansh Sharma,
Rahul Krishnan,
Yani Ioannou
Abstract:
The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained…
▽ More
The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10, CIFAR-100 and ImageNet) and models (VGG11, ResNet20 and ResNet50).
△ Less
Submitted 9 June, 2025; v1 submitted 8 May, 2025;
originally announced May 2025.
-
EOPose : Exemplar-based object reposing using Generalized Pose Correspondences
Authors:
Sarthak Mehrotra,
Rishabh Jain,
Mayur Hemani,
Balaji Krishnamurthy,
Mausoom Sarkar
Abstract:
Reposing objects in images has a myriad of applications, especially for e-commerce where several variants of product images need to be produced quickly. In this work, we leverage the recent advances in unsupervised keypoint correspondence detection between different object images of the same class to propose an end-to-end framework for generic object reposing. Our method, EOPose, takes a target po…
▽ More
Reposing objects in images has a myriad of applications, especially for e-commerce where several variants of product images need to be produced quickly. In this work, we leverage the recent advances in unsupervised keypoint correspondence detection between different object images of the same class to propose an end-to-end framework for generic object reposing. Our method, EOPose, takes a target pose-guidance image as input and uses its keypoint correspondence with the source object image to warp and re-render the latter into the target pose using a novel three-step approach. Unlike generative approaches, our method also preserves the fine-grained details of the object such as its exact colors, textures, and brand marks. We also prepare a new dataset of paired objects based on the Objaverse dataset to train and test our network. EOPose produces high-quality reposing output as evidenced by different image quality metrics (PSNR, SSIM and FID). Besides a description of the method and the dataset, the paper also includes detailed ablation and user studies to indicate the efficacy of the proposed method
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
World Food Atlas Project
Authors:
Ali Rostami,
Z Xie,
A Ishino,
Y Yamakata,
K Aizawa,
Ramesh Jain
Abstract:
A coronavirus pandemic is forcing people to be "at home" all over the world. In a life of hardly ever going out, we would have realized how the food we eat affects our bodies. What can we do to know our food more and control it better? To give us a clue, we are trying to build a World Food Atlas (WFA) that collects all the knowledge about food in the world. In this paper, we present two of our tri…
▽ More
A coronavirus pandemic is forcing people to be "at home" all over the world. In a life of hardly ever going out, we would have realized how the food we eat affects our bodies. What can we do to know our food more and control it better? To give us a clue, we are trying to build a World Food Atlas (WFA) that collects all the knowledge about food in the world. In this paper, we present two of our trials. The first is the Food Knowledge Graph (FKG), which is a graphical representation of knowledge about food and ingredient relationships derived from recipes and food nutrition data. The second is the FoodLog Athl and the RecipeLog that are applications for collecting people's detailed records about food habit. We also discuss several problems that we try to solve to build the WFA by integrating these two ideas.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Achieving the positivity of the secret key in a BB84 like quantum key distribution protocol
Authors:
Rashi Jain,
Satyabrata Adhikari
Abstract:
Woodhead [Phys. Rev. A \textbf{88}, 012331 (2013)] derived the lower bound of the secret key rate for a Bennett-Brassard (BB84) like quantum key distribution protocol under collective attacks. However, this lower bound does not always assure the generation of the secret key and thus the protocol may have to be aborted sometimes. Thus, we modify the Woodhead's lower bound of the secret key rate in…
▽ More
Woodhead [Phys. Rev. A \textbf{88}, 012331 (2013)] derived the lower bound of the secret key rate for a Bennett-Brassard (BB84) like quantum key distribution protocol under collective attacks. However, this lower bound does not always assure the generation of the secret key and thus the protocol may have to be aborted sometimes. Thus, we modify the Woodhead's lower bound of the secret key rate in such a way that the secret key is always generated in a BB84 like quantum key distribution protocol. Exploiting the obtained modified lower bound of the secret key rate, we analyze two state dependent quantum cloning machines such as (i) Wootters-Zurek QCM and (ii) Modified Buzek-Hillery QCM constructed by fixing the cloning machine parameters of Buzek Hillery quantum cloning machine (QCM), which may be used by the eavesdropper to extract information from the intercepted state. We, thereafter, show that it is possible for the communicating parties to distill a secret key, even in the presence of an eavesdropper. Moreover, we also discuss the effect of the efficiency of the QCM on the generation of the secret key for a successful key distribution protocol.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results
Authors:
Lei Sun,
Andrea Alfarano,
Peiqi Duan,
Shaolin Su,
Kaiwei Wang,
Boxin Shi,
Radu Timofte,
Danda Pani Paudel,
Luc Van Gool,
Qinglin Liu,
Wei Yu,
Xiaoqian Lv,
Lu Yang,
Shuigen Wang,
Shengping Zhang,
Xiangyang Ji,
Long Bao,
Yuqiang Yang,
Jinao Song,
Ziyi Wang,
Shuang Wen,
Heng Sun,
Kean Liu,
Mingchen Zhong,
Senyan Xu
, et al. (63 additional authors not shown)
Abstract:
This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on com…
▽ More
This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on computational complexity or model size. The task focuses on leveraging both events and images as inputs for single-image deblurring. A total of 199 participants registered, among whom 15 teams successfully submitted valid results, offering valuable insights into the current state of event-based image deblurring. We anticipate that this challenge will drive further advancements in event-based vision research.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Crust composition and the Shallow Heat Source in KS 1731-260
Authors:
R. Jain,
E. F. Brown,
H. Schatz,
A. V. Afanasjev,
M. Beard,
L. R. Gasques,
J. Grace,
A. Heger,
G. W. Hitt,
W. R. Hix,
R. Lau,
W. -J. Ong,
M. Wiescher,
Y. Xu
Abstract:
The presence of a shallow heat source of unknown origin in accreting neutron star crusts has been inferred by analyzing their cooling behavior in quiescence. To investigate a diverse bursting history for KS 1731-260 during accretion outbursts, we use realistic crust compositions and nuclear heating and cooling sources from detailed nuclear reaction network calculations to interpret observed coolin…
▽ More
The presence of a shallow heat source of unknown origin in accreting neutron star crusts has been inferred by analyzing their cooling behavior in quiescence. To investigate a diverse bursting history for KS 1731-260 during accretion outbursts, we use realistic crust compositions and nuclear heating and cooling sources from detailed nuclear reaction network calculations to interpret observed cooling curves. We find that the required strength of the shallow heat source is reduced by more than a factor of 3 compared to previous analysis, and obtain constraints on the most likely dominant surface burning modes of KS 1731-260 over its history. Our analysis suggests an impure nuclear pasta layer in the inner crust, though future observations will provide more stringent constraints.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Local symmetry and smoothness in the space of vector-valued continuous functions
Authors:
Mohit,
Ranjana Jain
Abstract:
In this article, we characterize the left symmetric points in $C(K,X)$, where $K$ is a compact Hausdorff space and $X$ is a Banach space. We also provide necessary and sufficient conditions for the right symmetric points in $C(K,X)$. Further, we identify the smooth points in the space $C_0(K,X)$, $K$ being locally compact Hausdorff space and $X$ being a Banach space.
In this article, we characterize the left symmetric points in $C(K,X)$, where $K$ is a compact Hausdorff space and $X$ is a Banach space. We also provide necessary and sufficient conditions for the right symmetric points in $C(K,X)$. Further, we identify the smooth points in the space $C_0(K,X)$, $K$ being locally compact Hausdorff space and $X$ being a Banach space.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
An Approach to Technical AGI Safety and Security
Authors:
Rohin Shah,
Alex Irpan,
Alexander Matt Turner,
Anna Wang,
Arthur Conmy,
David Lindner,
Jonah Brown-Cohen,
Lewis Ho,
Neel Nanda,
Raluca Ada Popa,
Rishub Jain,
Rory Greig,
Samuel Albanie,
Scott Emmons,
Sebastian Farquhar,
Sébastien Krier,
Senthooran Rajamanoharan,
Sophie Bridgers,
Tobi Ijitoye,
Tom Everitt,
Victoria Krakovna,
Vikrant Varma,
Vladimir Mikulik,
Zachary Kenton,
Dave Orr
, et al. (5 additional authors not shown)
Abstract:
Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, we focus on technical approaches to misuse and misalignment. For misuse, our strategy aims…
▽ More
Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, we focus on technical approaches to misuse and misalignment. For misuse, our strategy aims to prevent threat actors from accessing dangerous capabilities, by proactively identifying dangerous capabilities, and implementing robust security, access restrictions, monitoring, and model safety mitigations. To address misalignment, we outline two lines of defense. First, model-level mitigations such as amplified oversight and robust training can help to build an aligned model. Second, system-level security measures such as monitoring and access control can mitigate harm even if the model is misaligned. Techniques from interpretability, uncertainty estimation, and safer design patterns can enhance the effectiveness of these mitigations. Finally, we briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Command A: An Enterprise-Ready Large Language Model
Authors:
Team Cohere,
:,
Aakanksha,
Arash Ahmadian,
Marwan Ahmed,
Jay Alammar,
Milad Alizadeh,
Yazeed Alnumay,
Sophia Althammer,
Arkady Arkhangorodsky,
Viraat Aryabumi,
Dennis Aumiller,
Raphaël Avalos,
Zahara Aviv,
Sammie Bae,
Saurabh Baji,
Alexandre Barbet,
Max Bartolo,
Björn Bebensee,
Neeral Beladia,
Walter Beller-Morales,
Alexandre Bérard,
Andrew Berneshawi,
Anna Bialas,
Phil Blunsom
, et al. (205 additional authors not shown)
Abstract:
In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera…
▽ More
In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Generation (RAG) capabilities with grounding and tool use to automate sophisticated business processes. These abilities are achieved through a decentralised training approach, including self-refinement algorithms and model merging techniques. We also include results for Command R7B which shares capability and architectural similarities to Command A. Weights for both models have been released for research purposes. This technical report details our original training pipeline and presents an extensive evaluation of our models across a suite of enterprise-relevant tasks and public benchmarks, demonstrating excellent performance and efficiency.
△ Less
Submitted 14 April, 2025; v1 submitted 1 April, 2025;
originally announced April 2025.
-
Crossover between the zeptosecond and attosecond physics
Authors:
T. Nandi,
Yash Kumar,
Adya P. Mishra,
Nishchal R. Dwivedi,
Chandra Kumar,
Gajendra Singh,
N. Sowmya,
H. C. Manjunatha,
Sudhir R. Jain,
A. S. Kheifets
Abstract:
Nuclear orbiting resonances have been revealed at the sub-barrier energies as an atomic phenomenon by means of x-ray spectroscopy experiments. This interpretation is supported by several phenomenological models and theoretical estimates of the nuclear orbiting timescale and cross-section, inelastic scattering cross section including both nuclear and Coulomb excitation, and the Wigner-Smith time de…
▽ More
Nuclear orbiting resonances have been revealed at the sub-barrier energies as an atomic phenomenon by means of x-ray spectroscopy experiments. This interpretation is supported by several phenomenological models and theoretical estimates of the nuclear orbiting timescale and cross-section, inelastic scattering cross section including both nuclear and Coulomb excitation, and the Wigner-Smith time delay. We demonstrate that a multi-photon exchange during nuclear orbiting is responsible for an atomic excitation. Furthermore, proximity of the projectile and target nucleus during the nuclear orbiting modifies the effective charge of the projectile. Even though this orbiting induced excitation is triggered in zeptoseconds, it can still be observed in the attosecond time scale because of the Wigner-Smith time delay inherent to autoionization. Thus, we demonstrate the crossover between the zeptosecond and attosecond time scales which are native to nuclear and atomic physics, respectively. Markedly, this crossover may be the reason for x-ray production from ultra short nuclear processes ($\leq 10^{-21}$ sec). This explanation is likely to resolve the fission time scale anomaly and can stimulate cross-disciplinary research ranging from solid state to high-energy physics.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis
Authors:
Alexander Partin,
Priyanka Vasanthakumari,
Oleksandr Narykov,
Andreas Wilke,
Natasha Koussa,
Sara E. Jones,
Yitan Zhu,
Jamie C. Overbeek,
Rajeev Jain,
Gayara Demini Fernando,
Cesar Sanchez-Villalobos,
Cristina Garcia-Cardona,
Jamaludin Mohd-Yusof,
Nicholas Chia,
Justin M. Wozniak,
Souparno Ghosh,
Ranadip Pal,
Thomas S. Brettin,
M. Ryan Weil,
Rick L. Stevens
Abstract:
Deep learning (DL) and machine learning (ML) models have shown promise in drug response prediction (DRP), yet their ability to generalize across datasets remains an open question, raising concerns about their real-world applicability. Due to the lack of standardized benchmarking approaches, model evaluations and comparisons often rely on inconsistent datasets and evaluation criteria, making it dif…
▽ More
Deep learning (DL) and machine learning (ML) models have shown promise in drug response prediction (DRP), yet their ability to generalize across datasets remains an open question, raising concerns about their real-world applicability. Due to the lack of standardized benchmarking approaches, model evaluations and comparisons often rely on inconsistent datasets and evaluation criteria, making it difficult to assess true predictive capabilities. In this work, we introduce a benchmarking framework for evaluating cross-dataset prediction generalization in DRP models. Our framework incorporates five publicly available drug screening datasets, six standardized DRP models, and a scalable workflow for systematic evaluation. To assess model generalization, we introduce a set of evaluation metrics that quantify both absolute performance (e.g., predictive accuracy across datasets) and relative performance (e.g., performance drop compared to within-dataset results), enabling a more comprehensive assessment of model transferability. Our results reveal substantial performance drops when models are tested on unseen datasets, underscoring the importance of rigorous generalization assessments. While several models demonstrate relatively strong cross-dataset generalization, no single model consistently outperforms across all datasets. Furthermore, we identify CTRPv2 as the most effective source dataset for training, yielding higher generalization scores across target datasets. By sharing this standardized evaluation framework with the community, our study aims to establish a rigorous foundation for model comparison, and accelerate the development of robust DRP models for real-world applications.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview
Authors:
Norbert Tihanyi,
Tamas Bisztray,
Mohamed Amine Ferrag,
Bilel Cherif,
Richard A. Dubniczky,
Ridhi Jain,
Lucas C. Cordeiro
Abstract:
Software testing and verification are critical for ensuring the reliability and security of modern software systems. Traditionally, formal verification techniques, such as model checking and theorem proving, have provided rigorous frameworks for detecting bugs and vulnerabilities. However, these methods often face scalability challenges when applied to complex, real-world programs. Recently, the a…
▽ More
Software testing and verification are critical for ensuring the reliability and security of modern software systems. Traditionally, formal verification techniques, such as model checking and theorem proving, have provided rigorous frameworks for detecting bugs and vulnerabilities. However, these methods often face scalability challenges when applied to complex, real-world programs. Recently, the advent of Large Language Models (LLMs) has introduced a new paradigm for software analysis, leveraging their ability to understand insecure coding practices. Although LLMs demonstrate promising capabilities in tasks such as bug prediction and invariant generation, they lack the formal guarantees of classical methods. This paper presents a comprehensive study of state-of-the-art software testing and verification, focusing on three key approaches: classical formal methods, LLM-based analysis, and emerging hybrid techniques, which combine their strengths. We explore each approach's strengths, limitations, and practical applications, highlighting the potential of hybrid systems to address the weaknesses of standalone methods. We analyze whether integrating formal rigor with LLM-driven insights can enhance the effectiveness and scalability of software verification, exploring their viability as a pathway toward more robust and adaptive testing frameworks.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Streaming Algorithms for Network Design
Authors:
Chandra Chekuri,
Rhea Jain,
Sepideh Mahabadi,
Ali Vakilian
Abstract:
We consider the Survivable Network Design problem (SNDP) in the single-pass insertion-only streaming model. The input to SNDP is an edge-weighted graph $G = (V, E)$ and an integer connectivity requirement $r(uv)$ for each $u, v \in V$. The objective is to find a min-weight subgraph $H \subseteq G$ s.t., for every pair of $u, v \in V$, $u$ and $v$ are $r(uv)$-edge/vertex-connected. Recent work by J…
▽ More
We consider the Survivable Network Design problem (SNDP) in the single-pass insertion-only streaming model. The input to SNDP is an edge-weighted graph $G = (V, E)$ and an integer connectivity requirement $r(uv)$ for each $u, v \in V$. The objective is to find a min-weight subgraph $H \subseteq G$ s.t., for every pair of $u, v \in V$, $u$ and $v$ are $r(uv)$-edge/vertex-connected. Recent work by Jin et al. [JKMV24] obtained approximation algorithms for edge-connectivity augmentation, and via that, also derived algorithms for edge-connectivity SNDP (EC-SNDP). We consider vertex-connectivity setting (VC-SNDP) and obtain several results for it as well as improved results for EC-SNDP.
* We provide a general framework for solving connectivity problems in streaming; this is based on a connection to fault-tolerant spanners. For VC-SNDP, we provide an $O(tk)$-approximation in $\tilde O(k^{1-1/t}n^{1 + 1/t})$ space, where $k$ is the maximum connectivity requirement, assuming an exact algorithm at the end of the stream. Using a refined LP-based analysis, we provide an $O(βt)$-approximation where $β$ is the integrality gap of the natural cut-based LP relaxation. When applied to the EC-SNDP, our framework provides an $O(t)$-approximation in $\tilde O(k^{1/2-1/(2t)}n^{1 + 1/t} + kn)$ space, improving the $O(t \log k)$-approximation of [JKMV24] using $\tilde O(kn^{1+1/t})$ space; this also extends to element-connectivity SNDP.
* We consider vertex connectivity-augmentation in the link-arrival model. The input is a $k$-vertex-connected subgraph $G$, and the weighted links $L$ arrive in the stream; the goal is to store the min-weight set of links s.t. $G \cup L$ is $(k+1)$-vertex-connected. We obtain $O(1)$ approximations in near-linear space for $k = 1, 2$. Our result for $k=2$ is based on SPQR tree, a novel application for this well-known representation of $2$-connected graphs.
△ Less
Submitted 15 April, 2025; v1 submitted 1 March, 2025;
originally announced March 2025.
-
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
Authors:
Sohan Patnaik,
Rishabh Jain,
Balaji Krishnamurthy,
Mausoom Sarkar
Abstract:
Visual layouts are essential in graphic design fields such as advertising, posters, and web interfaces. The application of generative models for content-aware layout generation has recently gained traction. However, these models fail to understand the contextual aesthetic requirements of layout design and do not align with human-like preferences, primarily treating it as a prediction task without…
▽ More
Visual layouts are essential in graphic design fields such as advertising, posters, and web interfaces. The application of generative models for content-aware layout generation has recently gained traction. However, these models fail to understand the contextual aesthetic requirements of layout design and do not align with human-like preferences, primarily treating it as a prediction task without considering the final rendered output. To overcome these problems, we offer Aesthetic-Aware Preference Alignment(AAPA), a novel technique to train a Multi-modal Large Language Model (MLLM) for layout prediction that uses MLLM's aesthetic preferences for Direct Preference Optimization over graphic layouts. We propose a data filtering protocol utilizing our layout-quality heuristics for AAPA to ensure training happens on high-quality layouts. Additionally, we introduce a novel evaluation metric that uses another MLLM to compute the win rate of the generated layout against the ground-truth layout based on aesthetics criteria. We also demonstrate the applicability of AAPA for MLLMs of varying scales (1B to 8B parameters) and LLM families (Qwen, Phi, InternLM). By conducting thorough qualitative and quantitative analyses, we verify the efficacy of our approach on two challenging benchmarks - Crello and Webui, showcasing 17%, and 16 improvement over current State-of-The-Art methods, thereby highlighting the potential of MLLMs in aesthetic-aware layout generation.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Exploring $β$ decay and $β$-delayed neutron emission in exotic $^{46,47}$Cl isotopes
Authors:
Vandana Tripathi,
B. Longfellow,
A. Volya,
E. Rubino,
C. Benetti,
J. F. Perello,
S. L. Tabor,
S. N. Liddick,
P. C. Bender,
M. P. Carpenter,
J. J. Carroll,
A. Chester,
C. J. Chiara,
K. Childers,
B. R. Clark,
B. P. Crider,
J. T. Harke,
R. Jain,
S. Luitel,
M. J. Mogannam,
T. H. Ogunbeku,
A. L. Richard,
S. Saha,
O. A. Shehu,
R. Unz
, et al. (2 additional authors not shown)
Abstract:
In this paper, $β^-$ and $β$-delayed neutron decays of $^{46,47}$Cl are reported from an experiment carried out at the National Superconducting Cyclotron Laboratory using the Beta Counting System. The half-lives of both $^{46}$Cl and $^{47}$Cl were extracted. Based on the delayed $γ$-ray transitions observed, the level structure of $N = 28$ $^{46}$Ar was determined. Completely different sets of ex…
▽ More
In this paper, $β^-$ and $β$-delayed neutron decays of $^{46,47}$Cl are reported from an experiment carried out at the National Superconducting Cyclotron Laboratory using the Beta Counting System. The half-lives of both $^{46}$Cl and $^{47}$Cl were extracted. Based on the delayed $γ$-ray transitions observed, the level structure of $N = 28$ $^{46}$Ar was determined. Completely different sets of excited states above the first $2^+$ state in $^{46}$Ar were populated in the $^{46}$Cl $\beta0n$ and $^{47}$Cl $\beta1n$ decay channels. Two new $γ$-ray transitions in $^{47}$Ar were identified from the very weak $^{47}$Cl $\beta0n$ decay. Furthermore, $^{46}$Cl $\beta1n$ and $^{47}$Cl $\beta2n$ were also observed to yield different population patterns for levels in $^{45}$Ar, including states of different parities. The experimental results allow us to address some of the open questions related to the delayed neutron emission process. For isotopes with large neutron excess and high $Q_β$ values, delayed neutron emission remains an important decay mode and can be utilized as a powerful spectroscopic tool. Experimental results were compared with shell-model calculations using the FSU and $V_{MU}$ effective interactions.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Cardiac Evidence Backtracking for Eating Behavior Monitoring using Collocative Electrocardiogram Imagining
Authors:
Xu-Lu Zhang,
Zhen-Qun Yang,
Dong-Mei Jiang,
Ga Liao,
Qing Li,
Ramesh Jain,
Xiao-Yong Wei
Abstract:
Eating monitoring has remained an open challenge in medical research for years due to the lack of non-invasive sensors for continuous monitoring and the reliable methods for automatic behavior detection. In this paper, we present a pilot study using the wearable 24-hour ECG for sensing and tailoring the sophisticated deep learning for ad-hoc and interpretable detection. This is accomplished using…
▽ More
Eating monitoring has remained an open challenge in medical research for years due to the lack of non-invasive sensors for continuous monitoring and the reliable methods for automatic behavior detection. In this paper, we present a pilot study using the wearable 24-hour ECG for sensing and tailoring the sophisticated deep learning for ad-hoc and interpretable detection. This is accomplished using a collocative learning framework in which 1) we construct collocative tensors as pseudo-images from 1D ECG signals to improve the feasibility of 2D image-based deep models; 2) we formulate the cardiac logic of analyzing the ECG data in a comparative way as periodic attention regulators so as to guide the deep inference to collect evidence in a human comprehensible manner; and 3) we improve the interpretability of the framework by enabling the backtracking of evidence with a set of methods designed for Class Activation Mapping (CAM) decoding and decision tree/forest generation. The effectiveness of the proposed framework has been validated on the largest ECG dataset of eating behavior with superior performance over conventional models, and its capacity of cardiac evidence mining has also been verified through the consistency of the evidence it backtracked and that of the previous medical studies.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Constraining circular polarization of high-frequency gravitational waves with CMB
Authors:
Ashu Kushwaha,
Rajeev Kumar Jain
Abstract:
Circular polarization in the cosmic microwave background (CMB) offers a promising probe of the parity-violating physics of the early universe. In this paper, we propose a novel method to constrain the primordial circular polarization of high-frequency gravitational waves (GW) in the GHz range. An efficient conversion of gravitons to photons in a transverse cosmological magnetic field at the epoch…
▽ More
Circular polarization in the cosmic microwave background (CMB) offers a promising probe of the parity-violating physics of the early universe. In this paper, we propose a novel method to constrain the primordial circular polarization of high-frequency gravitational waves (GW) in the GHz range. An efficient conversion of gravitons to photons in a transverse cosmological magnetic field at the epoch of last scattering can generate excess chiral photons if the GW background is chiral in nature. This excess radiation distorts the CMB thermal black-body spectrum, which can be estimated by measuring the V-Stokes parameter in the CMB polarization. Using current upper limits on the angular power spectrum of circular polarization $C_l^{VV}$ from the CLASS, MIPOL, and SPIDER experiments, we obtain the most stringent constraints on the characteristic strain and circular polarization of the isotropic background of stochastic GWs at ${40\,\rm GHz}$ and ${150\,\rm GHz}$, respectively. Our work, therefore, provides an interesting possibility to constrain the circular polarization of high-frequency GWs using the V-mode polarization measurements of CMB.
△ Less
Submitted 28 February, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
LegalCore: A Dataset for Event Coreference Resolution in Legal Documents
Authors:
Kangda Wei,
Xi Shi,
Jonathan Tong,
Sai Ramana Reddy,
Anandhavelu Natarajan,
Rajiv Jain,
Aparna Garimella,
Ruihong Huang
Abstract:
Recognizing events and their coreferential mentions in a document is essential for understanding semantic meanings of text. The existing research on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference information. The legal contract docum…
▽ More
Recognizing events and their coreferential mentions in a document is essential for understanding semantic meanings of text. The existing research on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference information. The legal contract documents we annotated in this dataset are several times longer than news articles, with an average length of around 25k tokens per document. The annotations show that legal documents have dense event mentions and feature both short-distance and super long-distance coreference links between event mentions. We further benchmark mainstream Large Language Models (LLMs) on this dataset for both event detection and event coreference resolution tasks, and find that this dataset poses significant challenges for state-of-the-art open-source and proprietary LLMs, which perform significantly worse than a supervised baseline. We will publish the dataset as well as the code.
△ Less
Submitted 20 March, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models
Authors:
Shiyu Teng,
Jiaqing Liu,
Rahul Kumar Jain,
Shurong Chai,
Ruibo Hou,
Tomoko Tateyama,
Lanfen Lin,
Yen-wei Chen
Abstract:
Depression is one of the leading causes of disability worldwide, posing a severe burden on individuals, healthcare systems, and society at large. Recent advancements in Large Language Models (LLMs) have shown promise in addressing mental health challenges, including the detection of depression through text-based analysis. However, current LLM-based methods often struggle with nuanced symptom ident…
▽ More
Depression is one of the leading causes of disability worldwide, posing a severe burden on individuals, healthcare systems, and society at large. Recent advancements in Large Language Models (LLMs) have shown promise in addressing mental health challenges, including the detection of depression through text-based analysis. However, current LLM-based methods often struggle with nuanced symptom identification and lack a transparent, step-by-step reasoning process, making it difficult to accurately classify and explain mental health conditions. To address these challenges, we propose a Chain-of-Thought Prompting approach that enhances both the performance and interpretability of LLM-based depression detection. Our method breaks down the detection process into four stages: (1) sentiment analysis, (2) binary depression classification, (3) identification of underlying causes, and (4) assessment of severity. By guiding the model through these structured reasoning steps, we improve interpretability and reduce the risk of overlooking subtle clinical indicators. We validate our method on the E-DAIC dataset, where we test multiple state-of-the-art large language models. Experimental results indicate that our Chain-of-Thought Prompting technique yields superior performance in both classification accuracy and the granularity of diagnostic insights, compared to baseline approaches.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Robust LLM Alignment via Distributionally Robust Direct Preference Optimization
Authors:
Zaiyan Xu,
Sushil Vemuri,
Kishan Panaganti,
Dileep Kalathil,
Rahul Jain,
Deepak Ramachandran
Abstract:
A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they accurately represent real-world user preferences. However, user preferences vary significantly across geographical regions, demographics, linguistic patterns, and evolving cultural trends. This preferen…
▽ More
A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they accurately represent real-world user preferences. However, user preferences vary significantly across geographical regions, demographics, linguistic patterns, and evolving cultural trends. This preference distribution shift leads to catastrophic alignment failures in many real-world applications. We address this problem using the principled framework of distributionally robust optimization, and develop two novel distributionally robust direct preference optimization (DPO) algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO). We characterize the sample complexity of learning the optimal policy parameters for WDPO and KLDPO. Moreover, we propose scalable gradient descent-style learning algorithms by developing suitable approximations for the challenging minimax loss functions of WDPO and KLDPO. Our empirical experiments using benchmark data sets and LLMs demonstrate the superior performance of WDPO and KLDPO in substantially improving the alignment when there is a preference distribution shift.
△ Less
Submitted 27 May, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Active RLHF via Best Policy Learning from Trajectory Preference Feedback
Authors:
Akhil Agnihotri,
Rahul Jain,
Deepak Ramachandran,
Zheng Wen
Abstract:
We address the problem of best policy identification in preference-based reinforcement learning (PbRL), where learning occurs from noisy binary preferences over trajectory pairs rather than explicit numerical rewards. This approach is useful for post-training optimization of generative AI models during multi-turn user interactions, where preference feedback is more robust than handcrafted reward m…
▽ More
We address the problem of best policy identification in preference-based reinforcement learning (PbRL), where learning occurs from noisy binary preferences over trajectory pairs rather than explicit numerical rewards. This approach is useful for post-training optimization of generative AI models during multi-turn user interactions, where preference feedback is more robust than handcrafted reward models. In this setting, learning is driven by both an offline preference dataset -- collected from a rater of unknown `competence' -- and online data collected with pure exploration. Since offline datasets may exhibit out-of-distribution (OOD) biases, principled online data collection is necessary. To address this, we propose Posterior Sampling for Preference Learning ($\mathsf{PSPL}$), a novel algorithm inspired by Top-Two Thompson Sampling, that maintains independent posteriors over the true reward model and transition dynamics. We provide the first theoretical guarantees for PbRL in this setting, establishing an upper bound on the simple Bayesian regret of $\mathsf{PSPL}$. Since the exact algorithm can be computationally impractical, we also provide an approximate version that outperforms existing baselines.
△ Less
Submitted 16 May, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
The Effect of Covid-19 Lockdown on Human Behaviour Using Analytical Hierarchy Process
Authors:
Rashi Jain,
Mansi Yadav
Abstract:
The coronavirus pandemic corresponds to a serious global health crisis which not only changed the way people used to live but also how people behaved in their daily lives. Information from social and behavioural sciences can help in modifying human behaviour to comply with the recommendations of health officials, as the pandemic requires large-scale behaviour change and puts significant mental str…
▽ More
The coronavirus pandemic corresponds to a serious global health crisis which not only changed the way people used to live but also how people behaved in their daily lives. Information from social and behavioural sciences can help in modifying human behaviour to comply with the recommendations of health officials, as the pandemic requires large-scale behaviour change and puts significant mental stress on individuals. The aim of this paper is to examine the changes in human behaviour brought about by the COVID-19 pandemic, which has caused a global health crisis and altered the way people live and interact. The collection of data has been done through online mode and the behaviour of the people is observed, and the results were finally analysed using the Analytical Hierarchy Process (AHP) which is a multi-criteria decision-making method to rank the factors that had the greatest impact on the changes in human behaviour. During the study, parameters taken under consideration were the ones which were most likely to affect the human behaviour as an impact of COVID-19 lockdown on health, relationship with family and friends, overall lifestyle, online education and work from home, screen time etc. The paper explains each criterion and how it affected human behaviour the most.
△ Less
Submitted 18 January, 2025;
originally announced January 2025.
-
Object Detection with Deep Learning for Rare Event Search in the GADGET II TPC
Authors:
Tyler Wheeler,
S. Ravishankar,
C. Wrede,
A. Andalib,
A. Anthony,
Y. Ayyad,
B. Jain,
A. Jaros,
R. Mahajan,
L. Schaedig,
A. Adams,
S. Ahn,
J. M. Allmond,
D. Bardayan,
D. Bazin,
K. Bosmpotinis,
T. Budner,
S. R. Carmichael,
S. M. Cha,
A. Chen,
K. A. Chipps,
J. M. Christie,
I. Cox,
J. Dopfer,
M. Friedman
, et al. (28 additional authors not shown)
Abstract:
In the pursuit of identifying rare two-particle events within the GADGET II Time Projection Chamber (TPC), this paper presents a comprehensive approach for leveraging Convolutional Neural Networks (CNNs) and various data processing methods. To address the inherent complexities of 3D TPC track reconstructions, the data is expressed in 2D projections and 1D quantities. This approach capitalizes on t…
▽ More
In the pursuit of identifying rare two-particle events within the GADGET II Time Projection Chamber (TPC), this paper presents a comprehensive approach for leveraging Convolutional Neural Networks (CNNs) and various data processing methods. To address the inherent complexities of 3D TPC track reconstructions, the data is expressed in 2D projections and 1D quantities. This approach capitalizes on the diverse data modalities of the TPC, allowing for the efficient representation of the distinct features of the 3D events, with no loss in topology uniqueness. Additionally, it leverages the computational efficiency of 2D CNNs and benefits from the extensive availability of pre-trained models. Given the scarcity of real training data for the rare events of interest, simulated events are used to train the models to detect real events. To account for potential distribution shifts when predominantly depending on simulations, significant perturbations are embedded within the simulations. This produces a broad parameter space that works to account for potential physics parameter and detector response variations and uncertainties. These parameter-varied simulations are used to train sensitive 2D CNN object detectors. When combined with 1D histogram peak detection algorithms, this multi-modal detection framework is highly adept at identifying rare, two-particle events in data taken during experiment 21072 at the Facility for Rare Isotope Beams (FRIB), demonstrating a 100% recall for events of interest. We present the methods and outcomes of our investigation and discuss the potential future applications of these techniques.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
AI Governance through Markets
Authors:
Philip Moreira Tomei,
Rupal Jain,
Matija Franklin
Abstract:
This paper argues that market governance mechanisms should be considered a key approach in the governance of artificial intelligence (AI), alongside traditional regulatory frameworks. While current governance approaches have predominantly focused on regulation, we contend that market-based mechanisms offer effective incentives for responsible AI development. We examine four emerging vectors of mar…
▽ More
This paper argues that market governance mechanisms should be considered a key approach in the governance of artificial intelligence (AI), alongside traditional regulatory frameworks. While current governance approaches have predominantly focused on regulation, we contend that market-based mechanisms offer effective incentives for responsible AI development. We examine four emerging vectors of market governance: insurance, auditing, procurement, and due diligence, demonstrating how these mechanisms can affirm the relationship between AI risk and financial risk while addressing capital allocation inefficiencies. While we do not claim that market forces alone can adequately protect societal interests, we maintain that standardised AI disclosures and market mechanisms can create powerful incentives for safe and responsible AI development. This paper urges regulators, economists, and machine learning researchers to investigate and implement market-based approaches to AI governance.
△ Less
Submitted 5 March, 2025; v1 submitted 29 January, 2025;
originally announced January 2025.
-
CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence
Authors:
Jingyu Shi,
Rahul Jain,
Seungguen Chi,
Hyungjun Doh,
Hyunggun Chi,
Alexander J. Quinn,
Karthik Ramani
Abstract:
Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study wi…
▽ More
Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Assessing Large Language Models in Comprehending and Verifying Concurrent Programs across Memory Models
Authors:
Ridhi Jain,
Rahul Purandare
Abstract:
As concurrent programming becomes increasingly prevalent, effectively identifying and addressing concurrency issues such as data races and deadlocks is critical. This study evaluates the performance of several leading large language models (LLMs), including GPT-3.5-turbo, GPT-4, GPT-4o, GPT-4o-mini, and Mistral-AI's Large2, in understanding and analyzing concurrency issues within software programs…
▽ More
As concurrent programming becomes increasingly prevalent, effectively identifying and addressing concurrency issues such as data races and deadlocks is critical. This study evaluates the performance of several leading large language models (LLMs), including GPT-3.5-turbo, GPT-4, GPT-4o, GPT-4o-mini, and Mistral-AI's Large2, in understanding and analyzing concurrency issues within software programs. Given that relaxed memory models, such as Total Store Order (TSO) and Partial Store Order (PSO), are widely implemented and adapted in modern systems, supported even by commodity architectures like ARM and x86, our evaluation focuses not only on sequentially consistent memory models but also on these relaxed memory models. Specifically, we assess two main aspects: the models' capacity to detect concurrency problems under a sequentially consistent memory model and their ability to verify the correctness conditions of concurrent programs across both sequentially consistent and relaxed memory models. To do this, we leverage SV-COMP's pthread tests and 25 ARM Litmus tests designed to evaluate Total Store Order (TSO) and Partial Store Order (PSO) memory models. The experimental results reveal that GPT-4, GPT-4o, and Mistral-AI's Large2 demonstrate a robust understanding of concurrency issues, effectively identifying data races and deadlocks when assessed under a sequentially consistent memory model. However, despite its superior performance, all selected LLMs face significant challenges verifying program correctness under relaxed memory models. These LLMs exhibit limitations in accurately capturing memory ordering constraints, and their current capabilities fall short in verifying even small programs in these complex scenarios.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Layer-Wise Security Framework and Analysis for the Quantum Internet
Authors:
Zebo Yang,
Ali Ghubaish,
Raj Jain,
Ala Al-Fuqaha,
Aiman Erbad,
Ramana Kompella,
Hassan Shapourian,
Reza Nejabati
Abstract:
With its significant security potential, the quantum internet is poised to revolutionize technologies like cryptography and communications. Although it boasts enhanced security over traditional networks, the quantum internet still encounters unique security challenges essential for safeguarding its Confidentiality, Integrity, and Availability (CIA). This study explores these challenges by analyzin…
▽ More
With its significant security potential, the quantum internet is poised to revolutionize technologies like cryptography and communications. Although it boasts enhanced security over traditional networks, the quantum internet still encounters unique security challenges essential for safeguarding its Confidentiality, Integrity, and Availability (CIA). This study explores these challenges by analyzing the vulnerabilities and the corresponding mitigation strategies across different layers of the quantum internet, including physical, link, network, and application layers. We assess the severity of potential attacks, evaluate the expected effectiveness of mitigation strategies, and identify vulnerabilities within diverse network configurations, integrating both classical and quantum approaches. Our research highlights the dynamic nature of these security issues and emphasizes the necessity for adaptive security measures. The findings underline the need for ongoing research into the security dimension of the quantum internet to ensure its robustness, encourage its adoption, and maximize its impact on society.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
PT-Symmetric $SU(2)$-like Random Matrix Ensembles: Invariant Distributions and Spectral Fluctuations
Authors:
Stalin Abraham,
A. Bhagwat,
Sudhir Ranjan Jain
Abstract:
We consider an ensemble of $2\times 2$ normal matrices with complex entries representing operators in the quantum mechanics of 2 - level parity-time reversal (PT) symmetric systems. The randomness of the ensemble is endowed by obtaining probability distributions based on symmetry and statistical independence. The probability densities turn out to be power law with exponents that depend on the boun…
▽ More
We consider an ensemble of $2\times 2$ normal matrices with complex entries representing operators in the quantum mechanics of 2 - level parity-time reversal (PT) symmetric systems. The randomness of the ensemble is endowed by obtaining probability distributions based on symmetry and statistical independence. The probability densities turn out to be power law with exponents that depend on the boundedness of the domain. For small spacings, $σ$, the probability density varies as $σ^ν$, $ν\geq 2$. The degree of level repulsion is a parameter of great interest as it makes a connection to quantum chaos; the lower bound of $ν$ for our ensemble coincides with the Gaussian Unitary Ensemble. We believe that the systematic development presented here paves the way for further generalizations in the field of random matrix theory for PT-symmetric quantum systems.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
Controlling and engineering a quantum state in a multi-qubit system employing the quantum Zeno effect
Authors:
Dhruva Naik,
Garima Rajpoot,
Sudhir Ranjan Jain
Abstract:
Controlling quantum jumps is crucial for reliable quantum computing. In this work, we demonstrate how the quantum Zeno effect can be applied to a two qubit system interacting with an ancilla which is a component of surface code architecture used to control undesired transitions. Further, we show that by designing the interaction and tuning measurement frequency, the quantum Zeno effect can be used…
▽ More
Controlling quantum jumps is crucial for reliable quantum computing. In this work, we demonstrate how the quantum Zeno effect can be applied to a two qubit system interacting with an ancilla which is a component of surface code architecture used to control undesired transitions. Further, we show that by designing the interaction and tuning measurement frequency, the quantum Zeno effect can be used to achieve the desired target state in both single-qubit and two qubit systems.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
A quantized anomalous Hall effect above 4.2 K in stacked topological insulator/magnet bilayers
Authors:
Rakshit Jain,
Matthew Roddy,
Vishakha Gupta,
Benjamin Huang,
Hasan M. Sayeed,
Husain F. Alnaser,
Amit Vashist,
Kenji Watanabe,
Takashi Taniguchi,
Vikram V. Deshpande,
Taylor D. Sparks,
Daniel C. Ralph
Abstract:
Quantized anomalous Hall effects (QAHEs) occur in remarkable electronic states which possess not only quantized Hall signals but in some cases regions of dissipationless electron transport. The initial demonstrations of a QAHE in a magnetically-doped topological insulator (TI) required temperatures below 100 mK, and since then a major focus of the field has been to increase the temperature scale.…
▽ More
Quantized anomalous Hall effects (QAHEs) occur in remarkable electronic states which possess not only quantized Hall signals but in some cases regions of dissipationless electron transport. The initial demonstrations of a QAHE in a magnetically-doped topological insulator (TI) required temperatures below 100 mK, and since then a major focus of the field has been to increase the temperature scale. Here, we report quantized Hall signals up to 10 K (in what is known as the parity anomaly state) in TI/magnet bilayers made by mechanical assembly, rather than by conventional deposition techniques. This is a factor of 100 higher temperature than any previous realization of a QAHE in a proximity-coupled TI/magnet heterostructure made by deposition, and approximately twice the previous record for any QAHE system.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Enhancing FKG.in: automating Indian food composition analysis
Authors:
Saransh Kumar Gupta,
Lipika Dey,
Partha Pratim Das,
Geeta Trilok-Kumar,
Ramesh Jain
Abstract:
This paper presents a novel approach to compute food composition data for Indian recipes using a knowledge graph for Indian food (FKG.in) and LLMs. The primary focus is to provide a broad overview of an automated food composition analysis workflow and describe its core functionalities: nutrition data aggregation, food composition analysis, and LLM-augmented information resolution. This workflow ai…
▽ More
This paper presents a novel approach to compute food composition data for Indian recipes using a knowledge graph for Indian food (FKG.in) and LLMs. The primary focus is to provide a broad overview of an automated food composition analysis workflow and describe its core functionalities: nutrition data aggregation, food composition analysis, and LLM-augmented information resolution. This workflow aims to complement FKG.in and iteratively supplement food composition data from verified knowledge bases. Additionally, this paper highlights the challenges of representing Indian food and accessing food composition data digitally. It also reviews three key sources of food composition data: the Indian Food Composition Tables, the Indian Nutrient Databank, and the Nutritionix API. Furthermore, it briefly outlines how users can interact with the workflow to obtain diet-based health recommendations and detailed food composition information for numerous recipes. We then explore the complex challenges of analyzing Indian recipe information across dimensions such as structure, multilingualism, and uncertainty as well as present our ongoing work on LLM-based solutions to address these issues. The methods proposed in this workshop paper for AI-driven knowledge curation and information resolution are application-agnostic, generalizable, and replicable for any domain.
△ Less
Submitted 9 December, 2024; v1 submitted 6 December, 2024;
originally announced December 2024.
-
A grand-design spiral galaxy 1.5 billion years after the Big Bang with JWST
Authors:
Rashi Jain,
Yogesh Wadadekar
Abstract:
We report the discovery of a large ($\sim 10$ kpc diameter), massive ($\log(M_\star/M_\odot) = 10.15^{+0.01}_{-0.01}$), grand-design spiral galaxy with photometric redshift $z_{\text{phot}} = 4.03$ in the UNCOVER and Medium Band Mega Science surveys with JWST. This is the highest redshift spiral galaxy discovered with JWST so far. In the rest-frame near-UV and far-UV, we clearly see the beads-on-a…
▽ More
We report the discovery of a large ($\sim 10$ kpc diameter), massive ($\log(M_\star/M_\odot) = 10.15^{+0.01}_{-0.01}$), grand-design spiral galaxy with photometric redshift $z_{\text{phot}} = 4.03$ in the UNCOVER and Medium Band Mega Science surveys with JWST. This is the highest redshift spiral galaxy discovered with JWST so far. In the rest-frame near-UV and far-UV, we clearly see the beads-on-a-string pattern of star formation; in the rest-frame visible bands, each string appears as an arm. Spectral energy distribution modeling using the Bagpipes code is strongly constrained by detections and flux measurements in 21 JWST and HST filters. The stellar mass-weighted age is 228 Myr, implying that 50% of the stars in the galaxy formed after $z \sim 4.5$. This is a highly star-forming galaxy with a star formation rate (SFR) of $57.57^{+1.80}_{-1.90} \, M_\odot\, \text{yr}^{-1}$. We detect strong H-$α$ + [NII] emission from the entire disk. The detection of a spiral galaxy at $z \sim 4$ indicates that massive and large spiral galaxies and disks were already in place merely 1.5 billion years after the Big Bang.
△ Less
Submitted 23 December, 2024; v1 submitted 6 December, 2024;
originally announced December 2024.
-
Exploring cosmological imprints of phantom crossing with dynamical dark energy in Horndeski gravity
Authors:
Yashi Tiwari,
Ujjwal Upadhyay,
Rajeev Kumar Jain
Abstract:
In the current era of precision cosmology, the persistence of cosmological tensions, most notably the Hubble tension and the $S_8$ tension, challenges the standard $Λ$CDM model. To reconcile these tensions via late-time modifications to expansion history, various features such as phantom crossing in the dark energy equation of state, a negative energy density at high redshifts, etc., are favoured.…
▽ More
In the current era of precision cosmology, the persistence of cosmological tensions, most notably the Hubble tension and the $S_8$ tension, challenges the standard $Λ$CDM model. To reconcile these tensions via late-time modifications to expansion history, various features such as phantom crossing in the dark energy equation of state, a negative energy density at high redshifts, etc., are favoured. However, these scenarios cannot be realized within the framework of GR without introducing ghost or gradient instabilities. In this work, we investigate a dynamical dark energy scenario within the framework of Horndeski gravity, incorporating nonminimal coupling to gravity and self-interactions. We highlight that the model can exhibit novel features like phantom crossing and negative dark energy densities at high redshifts without introducing any instabilities. For this specific Horndeski model, we perform a comprehensive analysis of the background evolution along with the effects on perturbations, examining observables like growth rate, matter and CMB power spectrum. To check the consistency of the model with the observational data, we employ MCMC analysis using BAO/$fσ_8$, Supernovae, and CMB data. While the model does not outperform the standard $Λ$CDM framework in a combined likelihood analysis, there remains a preference for non-zero values of the model parameters within the data. This suggests that dynamical dark energy scenarios, particularly those with non-minimal couplings, merit further exploration as promising alternatives to GR, offering rich phenomenology that can be tested against a broader range of current and upcoming observational datasets.
△ Less
Submitted 15 February, 2025; v1 submitted 1 December, 2024;
originally announced December 2024.
-
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
Authors:
Vipula Rawte,
Sarthak Jain,
Aarush Sinha,
Garv Kaushik,
Aman Bansal,
Prathiksha Rumale Vishwanath,
Samyak Rajesh Jain,
Aishwarya Naresh Reganti,
Vinija Jain,
Aman Chadha,
Amit P. Sheth,
Amitava Das
Abstract:
Recent advances in Large Multimodal Models (LMMs) have expanded their capabilities to video understanding, with Text-to-Video (T2V) models excelling in generating videos from textual prompts. However, they still frequently produce hallucinated content, revealing AI-generated inconsistencies. We introduce ViBe (https://vibe-t2v-bench.github.io/): a large-scale dataset of hallucinated videos from op…
▽ More
Recent advances in Large Multimodal Models (LMMs) have expanded their capabilities to video understanding, with Text-to-Video (T2V) models excelling in generating videos from textual prompts. However, they still frequently produce hallucinated content, revealing AI-generated inconsistencies. We introduce ViBe (https://vibe-t2v-bench.github.io/): a large-scale dataset of hallucinated videos from open-source T2V models. We identify five major hallucination types: Vanishing Subject, Omission Error, Numeric Variability, Subject Dysmorphia, and Visual Incongruity. Using ten T2V models, we generated and manually annotated 3,782 videos from 837 diverse MS COCO captions. Our proposed benchmark includes a dataset of hallucinated videos and a classification framework using video embeddings. ViBe serves as a critical resource for evaluating T2V reliability and advancing hallucination detection. We establish classification as a baseline, with the TimeSFormer + CNN ensemble achieving the best performance (0.345 accuracy, 0.342 F1 score). While initial baselines proposed achieve modest accuracy, this highlights the difficulty of automated hallucination detection and the need for improved methods. Our research aims to drive the development of more robust T2V models and evaluate their outputs based on user preferences.
△ Less
Submitted 19 March, 2025; v1 submitted 16 November, 2024;
originally announced November 2024.
-
Some geometric properties of spaces of vector-valued integrable functions
Authors:
Mohit,
Ranjana Jain
Abstract:
We identify the smooth points of $L^1(μ,X)$, and provide some necessary and sufficient conditions for left and right symmetry of points with respect to Birkhoff-James orthogonality in $L^p(μ,X), 1\leq p<\infty$, where $μ$ is any complete positive measure and $X$ is a Banach space with some suitable properties.
We identify the smooth points of $L^1(μ,X)$, and provide some necessary and sufficient conditions for left and right symmetry of points with respect to Birkhoff-James orthogonality in $L^p(μ,X), 1\leq p<\infty$, where $μ$ is any complete positive measure and $X$ is a Banach space with some suitable properties.
△ Less
Submitted 4 April, 2025; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Machine-Learning-Enabled Measurements of Astrophysical (p,n) Reactions with the SECAR Recoil Separator
Authors:
P. Tsintari,
N. Dimitrakopoulos,
R. Garg,
K. Hermansen,
C. Marshall,
F. Montes,
G. Perdikakis,
H. Schatz,
K. Setoodehnia,
H. Arora,
G. P. A. Berg,
R. Bhandari,
J. C. Blackmon,
C. R. Brune,
K. A. Chipps,
M. Couder,
C. Deibel,
A. Hood,
M. Horana Gamage,
R. Jain,
C. Maher,
S. Miskovitch,
J. Pereira,
T. Ruland,
M. S. Smith
, et al. (7 additional authors not shown)
Abstract:
The synthesis of heavy elements in supernovae is affected by low-energy (n,p) and (p,n) reactions on unstable nuclei, yet experimental data on such reaction rates are scarce. The SECAR (SEparator for CApture Reactions) recoil separator at FRIB (Facility for Rare Isotope Beams) was originally designed to measure astrophysical reactions that change the mass of a nucleus significantly. We used a nove…
▽ More
The synthesis of heavy elements in supernovae is affected by low-energy (n,p) and (p,n) reactions on unstable nuclei, yet experimental data on such reaction rates are scarce. The SECAR (SEparator for CApture Reactions) recoil separator at FRIB (Facility for Rare Isotope Beams) was originally designed to measure astrophysical reactions that change the mass of a nucleus significantly. We used a novel approach that integrates machine learning with ion-optical simulations to find an ion-optical solution for the separator that enables the measurement of (p,n) reactions, despite the reaction leaving the mass of the nucleus nearly unchanged. A new measurement of the $^{58}$Fe(p,n)$^{58}$Co reaction in inverse kinematics with a 3.66$\pm$0.12 MeV/nucleon $^{58}$Fe beam (corresponding to 3.69$\pm$0.12 MeV proton energy in normal kinematics) yielded a cross-section of 20.3$\pm$6.3 mb and served as a benchmark for the new technique demonstrating its effectiveness in achieving the required performance criteria. This novel approach marks a significant advancement in experimental nuclear astrophysics, as it paves the way for studying astrophysically important (p,n) reactions on unstable nuclei produced at FRIB.
△ Less
Submitted 19 December, 2024; v1 submitted 31 October, 2024;
originally announced November 2024.
-
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
Authors:
Anish Pahilajani,
Devasha Trivedi,
Jincen Shuai,
Khin S. Yone,
Samyak Rajesh Jain,
Namyong Park,
Ryan A. Rossi,
Nesreen K. Ahmed,
Franck Dernoncourt,
Yu Wang
Abstract:
Large Language Models (LLMs) have excelled in multi-hop question-answering (M-QA) due to their advanced reasoning abilities. However, the impact of the inherent reasoning structures on LLM M-QA performance remains unclear, largely due to the absence of QA datasets that provide fine-grained reasoning structures. To address this gap, we introduce the Graph Reasoning-Structured Question Answering Dat…
▽ More
Large Language Models (LLMs) have excelled in multi-hop question-answering (M-QA) due to their advanced reasoning abilities. However, the impact of the inherent reasoning structures on LLM M-QA performance remains unclear, largely due to the absence of QA datasets that provide fine-grained reasoning structures. To address this gap, we introduce the Graph Reasoning-Structured Question Answering Dataset (GRS-QA), which includes both semantic contexts and reasoning structures for QA pairs. Unlike existing M-QA datasets, where different reasoning structures are entangled together, GRS-QA explicitly captures intricate reasoning pathways by constructing reasoning graphs, where nodes represent textual contexts and edges denote logical flows. These reasoning graphs of different structures enable a fine-grained evaluation of LLM reasoning capabilities across various reasoning structures. Our empirical analysis reveals that LLMs perform differently when handling questions with varying reasoning structures. This finding facilitates the exploration of textual structures as compared with semantics.
△ Less
Submitted 7 November, 2024; v1 submitted 1 November, 2024;
originally announced November 2024.
-
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Authors:
Rishabh Jain,
Vivek M. Bhasi,
Adwait Jog,
Anand Sivasubramaniam,
Mahmut T. Kandemir,
Chita R. Das
Abstract:
Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie suggestions). With growing model and dataset sizes pushing computation and memory requirements, GPUs are being increasingly preferred for executing DLRM inference.…
▽ More
Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie suggestions). With growing model and dataset sizes pushing computation and memory requirements, GPUs are being increasingly preferred for executing DLRM inference. However, serving newer DLRMs, while meeting acceptable latencies, continues to remain challenging, making traditional deployments increasingly more GPU-hungry, resulting in higher inference serving costs. In this paper, we show that the embedding stage continues to be the primary bottleneck in the GPU inference pipeline, leading up to a 3.2x embedding-only performance slowdown.
To thoroughly grasp the problem, we conduct a detailed microarchitecture characterization and highlight the presence of low occupancy in the standard embedding kernels. By leveraging direct compiler optimizations, we achieve optimal occupancy, pushing the performance by up to 53%. Yet, long memory latency stalls continue to exist. To tackle this challenge, we propose specialized plug-and-play-based software prefetching and L2 pinning techniques, which help in hiding and decreasing the latencies. Further, we propose combining them, as they complement each other. Experimental evaluations using A100 GPUs with large models and datasets show that our proposed techniques improve performance by up to 103% for the embedding stage, and up to 77% for the overall DLRM inference pipeline.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
A Polylogarithmic Approximation for Directed Steiner Forest in Planar Digraphs
Authors:
Chandra Chekuri,
Rhea Jain
Abstract:
We consider Directed Steiner Forest (DSF), a fundamental problem in network design. The input to DSF is a directed edge-weighted graph $G = (V, E)$ and a collection of vertex pairs $\{(s_i, t_i)\}_{i \in [k]}$. The goal is to find a minimum cost subgraph $H$ of $G$ such that $H$ contains an $s_i$-$t_i$ path for each $i \in [k]$. DSF is NP-Hard and is known to be hard to approximate to a factor of…
▽ More
We consider Directed Steiner Forest (DSF), a fundamental problem in network design. The input to DSF is a directed edge-weighted graph $G = (V, E)$ and a collection of vertex pairs $\{(s_i, t_i)\}_{i \in [k]}$. The goal is to find a minimum cost subgraph $H$ of $G$ such that $H$ contains an $s_i$-$t_i$ path for each $i \in [k]$. DSF is NP-Hard and is known to be hard to approximate to a factor of $Ω(2^{\log^{1 - ε}(n)})$ for any fixed $ε> 0$ [DK'99]. DSF admits approximation ratios of $O(k^{1/2 + ε})$ [CEGS'11] and $O(n^{2/3 + ε})$ [BBMRY'13].
In this work we show that in planar digraphs, an important and useful class of graphs in both theory and practice, DSF is much more tractable. We obtain an $O(\log^6 k)$-approximation algorithm via the junction tree technique. Our main technical contribution is to prove the existence of a low density junction tree in planar digraphs. To find an approximate junction tree we rely on recent results on rooted directed network design problems [FM'23, CJKZZ'24], in particular, on an LP-based algorithm for the Directed Steiner Tree problem [CJKZZ'24]. Our work and several other recent ones on algorithms for planar digraphs [FM'23, KS'21, CJKZZ'24] are built upon structural insights on planar graph reachability and shortest path separators [Thorup'04].
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding
Authors:
Manan Suri,
Puneet Mathur,
Franck Dernoncourt,
Rajiv Jain,
Vlad I Morariu,
Ramit Sawhney,
Preslav Nakov,
Dinesh Manocha
Abstract:
Document structure editing involves manipulating localized textual, visual, and layout components in document images based on the user's requests. Past works have shown that multimodal grounding of user requests in the document image and identifying the accurate structural components and their associated attributes remain key challenges for this task. To address these, we introduce the DocEdit-v2,…
▽ More
Document structure editing involves manipulating localized textual, visual, and layout components in document images based on the user's requests. Past works have shown that multimodal grounding of user requests in the document image and identifying the accurate structural components and their associated attributes remain key challenges for this task. To address these, we introduce the DocEdit-v2, a novel framework that performs end-to-end document editing by leveraging Large Multimodal Models (LMMs). It consists of three novel components: (1) Doc2Command, which simultaneously localizes edit regions of interest (RoI) and disambiguates user edit requests into edit commands; (2) LLM-based Command Reformulation prompting to tailor edit commands originally intended for specialized software into edit instructions suitable for generalist LMMs. (3) Moreover, DocEdit-v2 processes these outputs via Large Multimodal Models like GPT-4V and Gemini, to parse the document layout, execute edits on grounded Region of Interest (RoI), and generate the edited document image. Extensive experiments on the DocEdit dataset show that DocEdit-v2 significantly outperforms strong baselines on edit command generation (2-33%), RoI bounding box detection (12-31%), and overall document editing (1-12\%) tasks.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
FlexDoc: Flexible Document Adaptation through Optimizing both Content and Layout
Authors:
Yue Jiang,
Christof Lutteroth,
Rajiv Jain,
Christopher Tensmeyer,
Varun Manjunatha,
Wolfgang Stuerzlinger,
Vlad Morariu
Abstract:
Designing adaptive documents that are visually appealing across various devices and for diverse viewers is a challenging task. This is due to the wide variety of devices and different viewer requirements and preferences. Alterations to a document's content, style, or layout often necessitate numerous adjustments, potentially leading to a complete layout redesign. We introduce FlexDoc, a framework…
▽ More
Designing adaptive documents that are visually appealing across various devices and for diverse viewers is a challenging task. This is due to the wide variety of devices and different viewer requirements and preferences. Alterations to a document's content, style, or layout often necessitate numerous adjustments, potentially leading to a complete layout redesign. We introduce FlexDoc, a framework for creating and consuming documents that seamlessly adapt to different devices, author, and viewer preferences and interactions. It eliminates the need for manually creating multiple document layouts, as FlexDoc enables authors to define desired document properties using templates and employs both discrete and continuous optimization in a novel comprehensive optimization process, which leverages automatic text summarization and image carving techniques to adapt both layout and content during consumption dynamically. Furthermore, we demonstrate FlexDoc in multiple real-world application scenarios, such as news readers and academic papers.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Authors:
Norbert Tihanyi,
Tamas Bisztray,
Richard A. Dubniczky,
Rebeka Toth,
Bertalan Borsos,
Bilel Cherif,
Mohamed Amine Ferrag,
Lajos Muzsai,
Ridhi Jain,
Ryan Marinelli,
Lucas C. Cordeiro,
Merouane Debbah,
Vasileios Mavroeidis,
Audun Josang
Abstract:
As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address the…
▽ More
As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address these limitations, we introduce Dynamic Intelligence Assessment (DIA), a novel methodology for testing AI models using dynamic question templates and improved metrics across multiple disciplines such as mathematics, cryptography, cybersecurity, and computer science. The accompanying dataset, DIA-Bench, contains a diverse collection of challenge templates with mutable parameters presented in various formats, including text, PDFs, compiled binaries, visual puzzles, and CTF-style cybersecurity challenges. Our framework introduces four new metrics to assess a model's reliability and confidence across multiple attempts. These metrics revealed that even simple questions are frequently answered incorrectly when posed in varying forms, highlighting significant gaps in models' reliability. Notably, API models like GPT-4o often overestimated their mathematical capabilities, while ChatGPT-4o demonstrated better performance due to effective tool usage. In self-assessment, OpenAI's o1-mini proved to have the best judgement on what tasks it should attempt to solve. We evaluated 25 state-of-the-art LLMs using DIA-Bench, showing that current models struggle with complex tasks and often display unexpectedly low confidence, even with simpler questions. The DIA framework sets a new standard for assessing not only problem-solving but also a model's adaptive intelligence and ability to assess its limitations. The dataset is publicly available on the project's page: https://github.com/DIA-Bench.
△ Less
Submitted 22 November, 2024; v1 submitted 20 October, 2024;
originally announced October 2024.
-
Maximal chirality transfer in the photon-graviton conversion in the early universe
Authors:
Ashu Kushwaha,
Rajeev Kumar Jain
Abstract:
While photons and gravitons do not interact significantly, photons can be converted to gravitons in a background magnetic field -- a phenomenon known as the Gertsenshtein effect. In this paper, we investigate whether chiral electromagnetic (EM) waves can be converted to chiral gravitational waves (GW) in the presence of primordial magnetic fields during the radiation-dominated epoch of the early u…
▽ More
While photons and gravitons do not interact significantly, photons can be converted to gravitons in a background magnetic field -- a phenomenon known as the Gertsenshtein effect. In this paper, we investigate whether chiral electromagnetic (EM) waves can be converted to chiral gravitational waves (GW) in the presence of primordial magnetic fields during the radiation-dominated epoch of the early universe. We consider two situations wherein chirality is either present in the propagating EM waves or it exists in the background magnetic field. Our analysis shows that while the conversion probability increases with stronger magnetic fields, it remains insensitive to the chiral nature of the background magnetic field. Consequently, the net chirality parameter is independent of the chirality of the background field in both cases. Finally, we demonstrate that the present-day energy density of the produced chiral GWs peaks at a frequency of $\sim 100$ GHz, and the corresponding characteristic strain can be sensitive to current and future missions designed to detect high-frequency GWs.
△ Less
Submitted 22 December, 2024; v1 submitted 9 October, 2024;
originally announced October 2024.
-
MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging
Authors:
Noel C. F. Codella,
Ying Jin,
Shrey Jain,
Yu Gu,
Ho Hin Lee,
Asma Ben Abacha,
Alberto Santamaria-Pang,
Will Guyman,
Naiteek Sangani,
Sheng Zhang,
Hoifung Poon,
Stephanie Hyland,
Shruthi Bannur,
Javier Alvarez-Valle,
Xue Li,
John Garrett,
Alan McMillan,
Gaurav Rajguru,
Madhu Maddi,
Nilesh Vijayrania,
Rehaan Bhimai,
Nick Mecklenburg,
Rupal Jain,
Daniel Holstein,
Naveen Gaur
, et al. (6 additional authors not shown)
Abstract:
In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-ar…
▽ More
In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-art (SOTA) or human expert level performance across classification, image-image search, and fine-tuning tasks. Specifically, on public datasets, MedImageInsight achieves SOTA in CT 3D medical image retrieval, as well as SOTA in disease classification and search for chest X-ray, dermatology, and OCT imaging. Furthermore, MedImageInsight achieves human expert performance in bone age estimation (on both public and partner data), as well as AUC above 0.9 in most other domains. When paired with a text decoder, MedImageInsight achieves near SOTA level single image report findings generation with less than 10\% the parameters of other models. Compared to fine-tuning GPT-4o with only MIMIC-CXR data for the same task, MedImageInsight outperforms in clinical metrics, but underperforms on lexical metrics where GPT-4o sets a new SOTA. Importantly for regulatory purposes, MedImageInsight can generate ROC curves, adjust sensitivity and specificity based on clinical need, and provide evidence-based decision support through image-image search (which can also enable retrieval augmented generation). In an independent clinical evaluation of image-image search in chest X-ray, MedImageInsight outperformed every other publicly available foundation model evaluated by large margins (over 6 points AUC), and significantly outperformed other models in terms of AI fairness (across age and gender). We hope releasing MedImageInsight will help enhance collective progress in medical imaging AI research and development.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes
Authors:
Krishna C. Kalagarla,
Matthew Low,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of C…
▽ More
Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of Constrained Markov Decision Processes (CMDPs) to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting. The returned policies are guaranteed to satisfy the constraints with high probability and provide a lower bound on the achieved objective reward. We empirically find the returned policies to achieve near-optimal rewards while enjoying an order of magnitude reduction in problem size and execution time.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.