Skip to main content

Showing 1–50 of 492 results for author: Sachin

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.11973  [pdf, ps, other

    cs.LG

    Self-Regulating Cars: Automating Traffic Control in Free Flow Road Networks

    Authors: Ankit Bhardwaj, Rohail Asim, Sachin Chauhan, Yasir Zaki, Lakshminarayanan Subramanian

    Abstract: Free-flow road networks, such as suburban highways, are increasingly experiencing traffic congestion due to growing commuter inflow and limited infrastructure. Traditional control mechanisms, such as traffic signals or local heuristics, are ineffective or infeasible in these high-speed, signal-free environments. We introduce self-regulating cars, a reinforcement learning-based traffic control prot… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  2. arXiv:2506.10999  [pdf

    cs.SE cs.AI

    Automated Validation of COBOL to Java Transformation

    Authors: Atul Kumar, Diptikalyan Saha, Toshikai Yasue, Kohichi Ono, Saravanan Krishnan, Sandeep Hans, Fumiko Satoh, Gerald Mitchell, Sachin Kumar

    Abstract: Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterpriselevel code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code. We propose a framework and a tool t… ▽ More

    Submitted 14 April, 2025; originally announced June 2025.

    Comments: arXiv admin note: text overlap with arXiv:2504.10548

    Journal ref: ASE 2024

  3. arXiv:2506.05168  [pdf, ps, other

    cs.RO

    Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning

    Authors: Yunsheng Tian, Joshua Jacob, Yijiang Huang, Jialiang Zhao, Edward Gu, Pingchuan Ma, Annan Zhang, Farhad Javid, Branden Romero, Sachin Chitta, Shinjiro Sueda, Hui Li, Wojciech Matusik

    Abstract: Multi-part assembly poses significant challenges for robots to execute long-horizon, contact-rich manipulation with generalization across complex geometries. We present Fabrica, a dual-arm robotic system capable of end-to-end planning and control for autonomous assembly of general multi-part objects. For planning over long horizons, we develop hierarchies of precedence, sequence, grasp, and motion… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2506.04178  [pdf, ps, other

    cs.LG

    OpenThoughts: Data Recipes for Reasoning Models

    Authors: Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hritik Bansal, Marianna Nezhurina, Jean Mercat, Trung Vu, Zayne Sprague, Ashima Suvarna, Benjamin Feuer, Liangyu Chen, Zaid Khan, Eric Frankel, Sachin Grover, Caroline Choi, Niklas Muennighoff, Shiye Su, Wanjia Zhao, John Yang, Shreyas Pimpalgaonkar, Kartik Sharma, Charlie Cheng-Jie Ji, Yichuan Deng , et al. (25 additional authors not shown)

    Abstract: Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training rea… ▽ More

    Submitted 4 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: https://www.openthoughts.ai/blog/ot3. arXiv admin note: text overlap with arXiv:2505.23754 by other authors

  5. arXiv:2505.23791  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning

    Authors: Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Marc Vucovich, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty

    Abstract: Federated Learning (FL) is a collaborative learning framework designed to protect client data, yet it remains highly vulnerable to Intellectual Property (IP) threats. Model extraction (ME) attacks pose a significant risk to Machine Learning as a Service (MLaaS) platforms, enabling attackers to replicate confidential models by querying black-box (without internal insight) APIs. Despite FL's privacy… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted at IEEE IWCMC. 6 pages, 4 Figures, 3 tables

    ACM Class: I.2.6; D.4.6

  6. arXiv:2505.19364  [pdf, ps, other

    cs.CR

    RADEP: A Resilient Adaptive Defense Framework Against Model Extraction Attacks

    Authors: Amit Chakraborty, Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty

    Abstract: Machine Learning as a Service (MLaaS) enables users to leverage powerful machine learning models through cloud-based APIs, offering scalability and ease of deployment. However, these services are vulnerable to model extraction attacks, where adversaries repeatedly query the application programming interface (API) to reconstruct a functionally similar model, compromising intellectual property and s… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Presented at the IEEE International Wireless Communications and Mobile Computing Conference (IWCMC) 2025

    ACM Class: I.2.6; D.4.6; K.6.5

  7. arXiv:2505.17985  [pdf, ps, other

    physics.optics cs.RO

    AI-Driven Robotics for Free-Space Optics

    Authors: Shiekh Zia Uddin, Sachin Vaidya, Shrish Choudhary, Zhuo Chen, Raafat K. Salib, Luke Huang, Dirk R. Englund, Marin Soljačić

    Abstract: Tabletop optical experiments are foundational to research in many areas of science, including photonics, quantum optics, materials science, metrology, and biomedical imaging. However these experiments remain fundamentally reliant on manual design, assembly, and alignment, limiting throughput and reproducibility. Optics currently lacks generalizable robotic systems capable of operating across a div… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  8. arXiv:2505.13723  [pdf, ps, other

    cs.LG math.OC stat.ML

    Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

    Authors: Pratik Rathore, Zachary Frangella, Sachin Garg, Shaghayegh Fazliani, Michał Dereziński, Madeleine Udell

    Abstract: Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number o… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 28 pages, 6 figures, 2 tables

  9. The Effects of Demographic Instructions on LLM Personas

    Authors: Angel Felipe Magnossão de Paula, J. Shane Culpepper, Alistair Moffat, Sachin Pathiyan Cherumanal, Falk Scholer, Johanne Trippas

    Abstract: Social media platforms must filter sexist content in compliance with governmental regulations. Current machine learning approaches can reliably detect sexism based on standardized definitions, but often neglect the subjective nature of sexist language and fail to consider individual users' perspectives. To address this gap, we adopt a perspectivist approach, retaining diverse annotations rather th… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted at SIGIR'25, Padua, Italy

  10. arXiv:2505.10691  [pdf

    eess.IV cs.AI cs.CV

    Predicting Risk of Pulmonary Fibrosis Formation in PASC Patients

    Authors: Wanying Dou, Gorkem Durak, Koushik Biswas, Ziliang Hong, Andrea Mia Bejar, Elif Keles, Kaan Akin, Sukru Mehmet Erturk, Alpay Medetalibeyoglu, Marc Sala, Alexander Misharin, Hatice Savas, Mary Salvatore, Sachin Jambawalikar, Drew Torigian, Jayaram K. Udupa, Ulas Bagci

    Abstract: While the acute phase of the COVID-19 pandemic has subsided, its long-term effects persist through Post-Acute Sequelae of COVID-19 (PASC), commonly known as Long COVID. There remains substantial uncertainty regarding both its duration and optimal management strategies. PASC manifests as a diverse array of persistent or newly emerging symptoms--ranging from fatigue, dyspnea, and neurologic impairme… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  11. arXiv:2505.07440  [pdf, other

    cs.CL

    Matching Tasks with Industry Groups for Augmenting Commonsense Knowledge

    Authors: Rituraj Singh, Sachin Pawar, Girish Palshikar

    Abstract: Commonsense knowledge bases (KB) are a source of specialized knowledge that is widely used to improve machine learning applications. However, even for a large KB such as ConceptNet, capturing explicit knowledge from each industry domain is challenging. For example, only a few samples of general {\em tasks} performed by various industries are available in ConceptNet. Here, a task is a well-defined… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  12. arXiv:2505.03054  [pdf, other

    cs.AI cs.CL cs.SD eess.AS

    BLAB: Brutally Long Audio Bench

    Authors: Orevaoghene Ahia, Martijn Bartelds, Kabir Ahuja, Hila Gonen, Valentin Hofmann, Siddhant Arora, Shuyue Stella Li, Vishal Puttagunta, Mofetoluwa Adeyemi, Charishma Buchireddy, Ben Walls, Noah Bennett, Shinji Watanabe, Noah A. Smith, Yulia Tsvetkov, Sachin Kumar

    Abstract: Developing large audio language models (LMs) capable of understanding diverse spoken interactions is essential for accommodating the multimodal nature of human communication and can increase the accessibility of language technologies across different user populations. Recent work on audio LMs has primarily evaluated their performance on short audio segments, typically under 30 seconds, with limite… ▽ More

    Submitted 12 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  13. arXiv:2505.02329  [pdf, ps, other

    cs.CY cs.HC cs.SE

    Regulating Algorithmic Management: A Multi-Stakeholder Study of Challenges in Aligning Software and the Law for Workplace Scheduling

    Authors: Jonathan Lynn, Rachel Y. Kim, Sicun Gao, Daniel Schneider, Sachin S. Pandya, Min Kyung Lee

    Abstract: Algorithmic management (AM)'s impact on worker well-being has led to calls for regulation. However, little is known about the effectiveness and challenges in real-world AM regulation across the regulatory process -- rule operationalization, software use, and enforcement. Our multi-stakeholder study addresses this gap within workplace scheduling, one of the few AM domains with implemented regulatio… ▽ More

    Submitted 25 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

    Comments: To appear in FAccT'25

  14. arXiv:2504.21656  [pdf

    cs.NI

    DBSCAN-based Vehicle Clustering and UAV Placement for NOMA-based Resource Management in Cellular V2X Communications

    Authors: Hossein Davoudi, Behrouz Shahgholi Ghahfarokhi, Neda Moghim, Sachin Shetty

    Abstract: In the future wireless networks, terrestrial, aerial, space, and maritime wireless networks are integrated into a unified network to meet the needs of a fully connected global network. Nowadays, vehicular communication has become one of the challenging applications of wireless networks. In this article, we aim to address the radio resource management in Cellular V2X (C-V2X) networks using Unmanned… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  15. arXiv:2504.20939  [pdf, other

    cs.NI eess.SP

    Flexible Semantic-Aware Resource Allocation: Serving More Users Through Similarity Range Constraints

    Authors: Nasrin Gholami, Neda Moghim, Behrouz Shahgholi Ghahfarokhi, Pouyan Salavati, Christo Kurisummoottil Thomas, Sachin Shetty, Tahereh Rahmati

    Abstract: Semantic communication (SemCom) aims to enhance the resource efficiency of next-generation networks by transmitting the underlying meaning of messages, focusing on information relevant to the end user. Existing literature on SemCom primarily emphasizes learning the encoder and decoder through end-to-end deep learning frameworks, with the objective of minimizing a task-specific semantic loss functi… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  16. arXiv:2504.20910  [pdf, other

    cs.CY cs.AI cs.HC

    When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines

    Authors: Sachin R. Pendse, Darren Gergle, Rachel Kornfield, Jonah Meyerhoff, David Mohr, Jina Suh, Annie Wescott, Casey Williams, Jessica Schleider

    Abstract: Red-teaming is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted to ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

  17. arXiv:2504.18942  [pdf, other

    cs.CL cs.AI cs.LG

    LawFlow : Collecting and Simulating Lawyers' Thought Processes

    Authors: Debarati Das, Khanh Chi Le, Ritik Sachin Parkar, Karin De Langis, Brendan Madson, Chad M. Berryman, Robin M. Willis, Daniel H. Moses, Brett McDonnell, Daniel Schwarcz, Dongyeop Kang

    Abstract: Legal practitioners, particularly those early in their careers, face complex, high-stakes tasks that require adaptive, context-sensitive reasoning. While AI holds promise in supporting legal work, current datasets and models are narrowly focused on isolated subtasks and fail to capture the end-to-end decision-making required in real-world practice. To address this gap, we introduce LawFlow, a data… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: submitted to COLM 2025

  18. arXiv:2504.18671  [pdf, other

    cs.AI

    Proof-of-TBI -- Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction

    Authors: Ross Gore, Eranga Bandara, Sachin Shetty, Alberto E. Musto, Pratip Rana, Ambrosio Valencia-Romero, Christopher Rhea, Lobat Tayebi, Heather Richter, Atmaram Yarlagadda, Donna Edmonds, Steven Wallace, Donna Broshek

    Abstract: Mild Traumatic Brain Injury (TBI) detection presents significant challenges due to the subtle and often ambiguous presentation of symptoms in medical imaging, making accurate diagnosis a complex task. To address these challenges, we propose Proof-of-TBI, a medical diagnosis support system that integrates multiple fine-tuned vision-language models with the OpenAI-o3 reasoning large language model (… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  19. arXiv:2504.16980  [pdf, other

    cs.LG

    Safety Pretraining: Toward the Next Generation of Safe AI

    Authors: Pratyush Maini, Sachin Goyal, Dylan Sam, Alex Robey, Yash Savani, Yiding Jiang, Andy Zou, Zacharcy C. Lipton, J. Zico Kolter

    Abstract: As large language models (LLMs) are increasingly deployed in high-stakes settings, the risk of generating harmful or toxic content remains a central challenge. Post-hoc alignment methods are brittle: once unsafe patterns are learned during pretraining, they are hard to remove. We present a data-centric pretraining framework that builds safety into the model from the start. Our contributions includ… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  20. arXiv:2504.15753  [pdf, ps, other

    math.OC cs.LG eess.SY math.PR math.ST

    Markov Kernels, Distances and Optimal Control: A Parable of Linear Quadratic Non-Gaussian Distribution Steering

    Authors: Alexis M. H. Teter, Wenqing Wang, Sachin Shivakumar, Abhishek Halder

    Abstract: For a controllable linear time-varying (LTV) pair $(\boldsymbol{A}_t,\boldsymbol{B}_t)$ and $\boldsymbol{Q}_{t}$ positive semidefinite, we derive the Markov kernel for the Itô diffusion ${\mathrm{d}}\boldsymbol{x}_{t}=\boldsymbol{A}_{t}\boldsymbol{x}_t {\mathrm{d}} t + \sqrt{2}\boldsymbol{B}_{t}{\mathrm{d}}\boldsymbol{w}_{t}$ with an accompanying killing of probability mass at rate… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  21. arXiv:2504.14582  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu, Hyunhee Park, Suejin Han, Hakjae Jeon, Dafeng Zhang, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Lu Zhao, Yuyi Zhang, Pengyu Yan, Jiawei Hu, Pengwei Liu, Fengjun Guo, Hongyuan Yu , et al. (86 additional authors not shown)

    Abstract: This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that ach… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

  22. arXiv:2504.12276  [pdf, other

    cs.CV

    The Tenth NTIRE 2025 Image Denoising Challenge Report

    Authors: Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Jingyu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi , et al. (69 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  23. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  24. arXiv:2504.10548  [pdf

    cs.SE cs.AI

    Automated Testing of COBOL to Java Transformation

    Authors: Sandeep Hans, Atul Kumar, Toshikai Yasue, Kouichi Ono, Saravanan Krishnan, Devika Sondhi, Fumiko Satoh, Gerald Mitchell, Sachin Kumar, Diptikalyan Saha

    Abstract: Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterprise-level code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code, making manual validation of transl… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  25. arXiv:2504.09019  [pdf, other

    cs.NI

    Empirically Measuring Data Localization in the EU

    Authors: Alexander Gamero-Garrido, Kicho Yu, Sumukh Vasisht Shankar, Sachin Kumar Singh, Sindhya Balasubramanian, Alexander Wilcox, David Choffnes

    Abstract: EU data localization regulations limit data transfers to non-EU countries with the GDPR. However, BGP, DNS and other Internet protocols were not designed to enforce jurisdictional constraints, so implementing data localization is challenging. Despite initial research on the topic, little is known about if or how companies currently operate their server infrastructure to comply with the regulations… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: To appear in Proceedings on Privacy Enhancing Technologies (PETS) 2025

  26. arXiv:2504.07070  [pdf, other

    cs.CL

    A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

    Authors: Zhouhang Xie, Junda Wu, Yiran Shen, Yu Xia, Xintong Li, Aaron Chang, Ryan Rossi, Sachin Kumar, Bodhisattwa Prasad Majumder, Jingbo Shang, Prithviraj Ammanabrolu, Julian McAuley

    Abstract: Personalized preference alignment for large language models (LLMs), the process of tailoring LLMs to individual users' preferences, is an emerging research direction spanning the area of NLP and personalization. In this survey, we present an analysis of works on personalized alignment and modeling for LLMs. We introduce a taxonomy of preference alignment techniques, including training time, infere… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  27. arXiv:2504.06950  [pdf, other

    cs.CV

    PathSegDiff: Pathology Segmentation using Diffusion model representations

    Authors: Sachin Kumar Danisetty, Alexandros Graikos, Srikar Yellapragada, Dimitris Samaras

    Abstract: Image segmentation is crucial in many computational pathology pipelines, including accurate disease diagnosis, subtyping, outcome, and survivability prediction. The common approach for training a segmentation model relies on a pre-trained feature extractor and a dataset of paired image and mask annotations. These are used to train a lightweight prediction model that translates features into per-pi… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  28. arXiv:2504.04635  [pdf, other

    cs.CL

    Steering off Course: Reliability Challenges in Steering Language Models

    Authors: Patrick Queiroz Da Silva, Hari Sethuraman, Dheeraj Rajagopal, Hannaneh Hajishirzi, Sachin Kumar

    Abstract: Steering methods for language models (LMs) have gained traction as lightweight alternatives to fine-tuning, enabling targeted modifications to model activations. However, prior studies primarily report results on a few models, leaving critical gaps in understanding the robustness of these methods. In this work, we systematically examine three prominent steering methods -- DoLa, function vectors, a… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  29. arXiv:2504.02107  [pdf, ps, other

    cs.LG cs.CL

    TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

    Authors: Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri

    Abstract: Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) - orders of magnitude larger than previous continual language modeling benchmarks. We also design ti… ▽ More

    Submitted 6 June, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: Code available at: https://github.com/apple/ml-tic-lm

  30. arXiv:2503.23059  [pdf, ps, other

    cs.IT

    A Note on Function Correcting Codes for b-Symbol Read Channels

    Authors: Sachin Sampath, B. Sundar Rajan

    Abstract: Function-Correcting Codes (FCCs) is a novel paradigm in Error Control Coding introduced by Lenz et. al. 2023 for the binary substitution channel \cite{FCC}. FCCs aim to protect the function evaluation of data against errors instead of the data itself, thereby relaxing the redundancy requirements of the code. Later R. Premlal et. al. \cite{LFCC} gave new bounds on the optimal redundancy of FCCs and… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: Four pages, Extended version under preparation

  31. arXiv:2503.21943  [pdf, other

    cs.CV cs.AI eess.IV

    Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

    Authors: Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher Metzler

    Abstract: Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, w… ▽ More

    Submitted 7 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

    Comments: ShadowDirector Arxiv Version. Fix the arxiv title text issue

  32. arXiv:2503.19206  [pdf, other

    cs.CL cs.AI

    Overtrained Language Models Are Harder to Fine-Tune

    Authors: Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, Aditi Raghunathan

    Abstract: Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instructi… ▽ More

    Submitted 27 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: 72 pages, 65 figures, 6 tables

  33. arXiv:2503.18260  [pdf, other

    cs.CL cs.DC cs.LG

    Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems

    Authors: Mahak Shah, Akaash Vishal Hazarika, Meetu Malhotra, Sachin C. Patil, Joshit Mohanty

    Abstract: Sentiment analysis is a field within NLP that has gained importance because it is applied in various areas such as; social media surveillance, customer feedback evaluation and market research. At the same time, distributed systems allow for effective processing of large amounts of data. Therefore, this paper examines how sentiment analysis converges with distributed systems by concentrating on dif… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: IEEE 3rd International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC)

  34. arXiv:2503.09513  [pdf, other

    cs.CR cs.AI

    RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment

    Authors: Md Morshed Alam, Lokesh Chandra Das, Sandip Roy, Sachin Shetty, Weichao Wang

    Abstract: Internet of Things (IoT) platforms with trigger-action capability allow event conditions to trigger actions in IoT devices autonomously by creating a chain of interactions. Adversaries exploit this chain of interactions to maliciously inject fake event conditions into IoT hubs, triggering unauthorized actions on target IoT devices to implement remote injection attacks. Existing defense mechanisms… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  35. arXiv:2503.08290  [pdf, other

    cs.CV

    SegDesicNet: Lightweight Semantic Segmentation in Remote Sensing with Geo-Coordinate Embeddings for Domain Adaptation

    Authors: Sachin Verma, Frank Lindseth, Gabriel Kiss

    Abstract: Semantic segmentation is essential for analyzing highdefinition remote sensing images (HRSIs) because it allows the precise classification of objects and regions at the pixel level. However, remote sensing data present challenges owing to geographical location, weather, and environmental variations, making it difficult for semantic segmentation models to generalize across diverse scenarios. Existi… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: https://openaccess.thecvf.com/content/WACV2025/papers/Verma_SegDesicNet_Lightweight_Semantic_Segmentation_in_Remote_Sensing_with_Geo-Coordinate_Embeddings_WACV_2025_paper.pdf

    Journal ref: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

  36. arXiv:2503.01442  [pdf, other

    cs.SI cs.AI cs.IR

    Leveraging LLMs for Mental Health: Detection and Recommendations from Social Discussions

    Authors: Vaishali Aggarwal, Sachin Thukral, Krushil Patel, Arnab Chatterjee

    Abstract: Textual data from social platforms captures various aspects of mental health through discussions around and across issues, while users reach out for help and others sympathize and offer support. We propose a comprehensive framework that leverages Natural Language Processing (NLP) and Generative AI techniques to identify and assess mental health disorders, detect their severity, and create recommen… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 5 pages, 4 figures, 3 tables, to be published in WI-IAT 2024

  37. arXiv:2502.19399  [pdf, other

    cs.ET

    DROID: Discrete-Time Simulation for Ring-Oscillator-Based Ising Design

    Authors: Abhimanyu Kumar, Ramprasath S., Chris H. Kim, Ulya R. Karpuzcu, Sachin S. Sapatnekar

    Abstract: Many combinatorial problems can be mapped to Ising machines, i.e., networks of coupled oscillators that settle to a minimum-energy ground state, from which the problem solution is inferred. This work proposes DROID, a novel event-driven method for simulating the evolution of a CMOS Ising machine to its ground state. The approach is accurate under general delay-phase relations that include the effe… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  38. Dynamic LLM Routing and Selection based on User Preferences: Balancing Performance, Cost, and Ethics

    Authors: Deepak Babu Piskala, Vijay Raajaa, Sachin Mishra, Bruno Bozza

    Abstract: With the widespread deployment of large language models (LLMs) such as GPT4, BART, and LLaMA, the need for a system that can intelligently select the most suitable model for specific tasks while balancing cost, latency, accuracy, and ethical considerations has become increasingly important. Recognizing that not all tasks necessitate models with over 100 billion parameters, we introduce OptiRoute,… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Journal ref: International Journal of Computer Applications, Vol. 186, No. 51, November 2024, pp. 1-7

  39. arXiv:2502.13917  [pdf, ps, other

    cs.CL

    TESS 2: A Large-Scale Generalist Diffusion Language Model

    Authors: Jaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan

    Abstract: We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We fin… ▽ More

    Submitted 31 May, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: ACL 2025 camera-ready

  40. arXiv:2502.13450  [pdf, other

    cs.LG cs.AI

    Interleaved Gibbs Diffusion for Constrained Generation

    Authors: Gautham Govind Anil, Sachin Yadav, Dheeraj Nagaraj, Karthikeyan Shanmugam, Prateek Jain

    Abstract: We introduce Interleaved Gibbs Diffusion (IGD), a novel generative modeling framework for mixed continuous-discrete data, focusing on constrained generation problems. Prior works on discrete and continuous-discrete diffusion models assume factorized denoising distribution for fast generation, which can hinder the modeling of strong dependencies between random variables encountered in constrained g… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  41. arXiv:2502.12325  [pdf, other

    cs.CL

    From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs

    Authors: Kumari Nishu, Sachin Mehta, Samira Abnar, Mehrdad Farajtabar, Maxwell Horton, Mahyar Najibi, Moin Nabi, Minsik Cho, Devang Naik

    Abstract: Training large language models (LLMs) for different inference constraints is computationally expensive, limiting control over efficiency-accuracy trade-offs. Moreover, once trained, these models typically process tokens uniformly, regardless of their complexity, leading to static and inflexible behavior. In this paper, we introduce a post-training optimization framework, DynaMoE, that adapts a pre… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  42. arXiv:2502.07101  [pdf, other

    cs.CL

    SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation

    Authors: Saurabh Kumar Pandey, Sachin Vashistha, Debrup Das, Somak Aditya, Monojit Choudhury

    Abstract: To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Ban… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  43. arXiv:2502.03605  [pdf, other

    cs.AR

    Accelerating OTA Circuit Design: Transistor Sizing Based on a Transformer Model and Precomputed Lookup Tables

    Authors: Subhadip Ghosh, Endalk Y. Gebru, Chandramouli V. Kashyap, Ramesh Harjani, Sachin S. Sapatnekar

    Abstract: Device sizing is crucial for meeting performance specifications in operational transconductance amplifiers (OTAs), and this work proposes an automated sizing framework based on a transformer model. The approach first leverages the driving-point signal flow graph (DP-SFG) to map an OTA circuit and its specifications into transformer-friendly sequential data. A specialized tokenization approach is a… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: Title: Accelerating OTA Circuit Design: Transistor Sizing Based on a Transformer Model and Precomputed Lookup Tables Authors: Subhadip Ghosh, Endalk Y. Gebru, Chandramouli V. Kashyap, Ramesh Harjani, Sachin S. Sapatnekar Accepted in conference: Proceedings of Design, Automation and Test in Europe, 2025 No. of Pages: 7 No. of figures: 7 No. of tables: 9

    ACM Class: B.7.2

  44. arXiv:2502.01981  [pdf, other

    cs.DC cs.ET cs.PF cs.SE

    Evaluating Fault Tolerance and Scalability in Distributed File Systems: A Case Study of GFS, HDFS, and MinIO

    Authors: Shubham Malhotra, Fnu Yashu, Muhammad Saqib, Dipkumar Mehta, Jagdish Jangid, Sachin Dixit

    Abstract: Distributed File Systems (DFS) are essential for managing vast datasets across multiple servers, offering benefits in scalability, fault tolerance, and data accessibility. This paper presents a comprehensive evaluation of three prominent DFSs - Google File System (GFS), Hadoop Distributed File System (HDFS), and MinIO - focusing on their fault tolerance mechanisms and scalability under varying dat… ▽ More

    Submitted 28 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 9 pages, 3 figures, 3 tables

  45. arXiv:2502.01966  [pdf, other

    cs.CR cs.DC cs.ET cs.SE

    Optimizing Spot Instance Reliability and Security Using Cloud-Native Data and Tools

    Authors: Muhammad Saqib, Shubham Malhotra, Dipkumar Mehta, Jagdish Jangid, Fnu Yashu, Sachin Dixit

    Abstract: This paper represents "Cloudlab", a comprehensive, cloud - native laboratory designed to support network security research and training. Built on Google Cloud and adhering to GitOps methodologies, Cloudlab facilitates the the creation, testing, and deployment of secure, containerized workloads using Kubernetes and serverless architectures. The lab integrates tools like Palo Alto Networks firewalls… ▽ More

    Submitted 6 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 7 pages, 5 figures

  46. arXiv:2502.01800  [pdf, other

    cs.RO cs.AI cs.LG

    Flow-based Domain Randomization for Learning and Sequencing Robotic Skills

    Authors: Aidan Curtis, Eric Li, Michael Noseworthy, Nishad Gothoskar, Sachin Chitta, Hui Li, Leslie Pack Kaelbling, Nicole Carey

    Abstract: Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies trained in simulation. By randomizing environment properties during training, the learned policy can become robust to uncertainties along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate automatically… ▽ More

    Submitted 5 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  47. arXiv:2502.01129   

    cs.DC cs.AI cs.ET cs.LG

    Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless Networks

    Authors: Shubham Malhotra, Fnu Yashu, Muhammad Saqib, Dipkumar Mehta, Jagdish Jangid, Sachin Dixit

    Abstract: This report investigates the application of deep reinforcement learning (DRL) algorithms for dynamic resource allocation in wireless communication systems. An environment that includes a base station, multiple antennas, and user equipment is created. Using the RLlib library, various DRL algorithms such as Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) are then applied. These algorithm… ▽ More

    Submitted 13 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Upon further review, we found inconsistencies in our analysis and decided to conduct additional research before resubmitting a revised version

  48. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  49. arXiv:2501.11712  [pdf, other

    cs.CL

    YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives

    Authors: Nong Ming, Sachin Sharma, Jiho Noh

    Abstract: Questioning is a fundamental aspect of education, as it helps assess students' understanding, promotes critical thinking, and encourages active engagement. With the rise of artificial intelligence in education, there is a growing interest in developing intelligent systems that can automatically generate and answer questions and facilitate interactions in both virtual and in-person education settin… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: 11pages. Extended version, Jan 2025. A shortened version was resubmitted and published in IEEE Conference on Semantic Computing Feb 2025

  50. arXiv:2501.10809  [pdf

    cs.CV cs.AI

    Efficient auto-labeling of large-scale poultry datasets (ALPD) using an ensemble model with self- and active-learning approaches

    Authors: Ramesh Bahadur Bist, Lilong Chai, Shawna Weimer, Hannah Atungulua, Chantel Pennicott, Xiao Yang, Sachin Subedi, Chaitanya Pallerla, Yang Tian, Dongyi Wang

    Abstract: The rapid growth of artificial intelligence in poultry farming has highlighted the challenge of efficiently labeling large, diverse datasets. Manual annotation is time-consuming and costly, making it impractical for modern systems that continuously generate data. This study addresses this challenge by exploring semi-supervised auto-labeling methods, integrating self and active learning approaches… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 January, 2025; originally announced January 2025.