Skip to main content

Showing 1–50 of 182 results for author: Hariharan

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.09199  [pdf, ps, other

    cs.LG cs.AI cs.DC

    FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models

    Authors: Hariharan Ramesh, Jyotikrishna Dass

    Abstract: Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. However, several methods designed for federated LoRA present significant challenges in balancing communication efficiency, model accuracy, and computational cost, particularly among heterogeneous clients. These me… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 21 pages, 12 figures

  2. arXiv:2506.00172  [pdf, ps, other

    cs.LG

    Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents

    Authors: Kaivalya Hariharan, Uzay Girit, Atticus Wang, Jacob Andreas

    Abstract: Benchmarks for large language models (LLMs) have predominantly assessed short-horizon, localized reasoning. Existing long-horizon suites (e.g. SWE-bench) rely on manually curated issues, so expanding or tuning difficulty demands expensive human effort and evaluations quickly saturate. However, many real-world tasks, such as software engineering or scientific research, require agents to rapidly com… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: 21 pages, 14 figures

  3. arXiv:2505.11204  [pdf, other

    cs.LG cs.AI

    RanDeS: Randomized Delta Superposition for Multi-Model Compression

    Authors: Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan

    Abstract: From a multi-model compression perspective, model merging enables memory-efficient serving of multiple models fine-tuned from the same base, but suffers from degraded performance due to interference among their task-specific parameter adjustments (i.e., deltas). In this paper, we reformulate model merging as a compress-and-retrieve scheme, revealing that the task interference arises from the summa… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: https://github.com/Zhou-Hangyu/randes

  4. arXiv:2504.14039  [pdf, other

    cs.CL cs.AI

    MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks

    Authors: Jaime Raldua Veuthey, Zainab Ali Majid, Suhas Hariharan, Jacob Haimes

    Abstract: As Large Language Models (LLMs) advance, their potential for widespread societal impact grows simultaneously. Hence, rigorous LLM evaluations are both a technical necessity and social imperative. While numerous evaluation benchmarks have been developed, there remains a critical gap in meta-evaluation: effectively assessing benchmarks' quality. We propose MEQA, a framework for the meta-evaluation o… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  5. arXiv:2504.12110  [pdf, other

    cs.AI

    Towards LLM Agents for Earth Observation

    Authors: Chia Hsiang Kao, Wenting Zhao, Shreelekha Revankar, Samuel Speas, Snehal Bhagat, Rajeev Datta, Cheng Perng Phoo, Utkarsh Mall, Carl Vondrick, Kavita Bala, Bharath Hariharan

    Abstract: Earth Observation (EO) provides critical planetary data for environmental monitoring, disaster management, climate science, and other scientific domains. Here we ask: Are AI systems ready for reliable Earth Observation? We introduce \datasetnamenospace, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors. Using Google Earth Engine API… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 36 pages

  6. arXiv:2504.07093  [pdf, ps, other

    cs.CV

    FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

    Authors: Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec

    Abstract: A versatile video depth estimation model should (1) be accurate and consistent across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these… ▽ More

    Submitted 30 May, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  7. arXiv:2504.00409  [pdf

    cs.CL cs.AI

    Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding

    Authors: Mohanakrishnan Hariharan

    Abstract: Large language models (LLMs) have greatly improved their capability in performing NLP tasks. However, deeper semantic understanding, contextual coherence, and more subtle reasoning are still difficult to obtain. The paper discusses state-of-the-art methodologies that advance LLMs with more advanced NLU techniques, such as semantic parsing, knowledge integration, and contextual reinforcement learni… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  8. arXiv:2503.16335  [pdf

    cs.AI cs.ET

    Enhancing Software Quality Assurance with an Adaptive Differential Evolution based Quantum Variational Autoencoder-Transformer Model

    Authors: Seshu Babu Barma, Mohanakrishnan Hariharan, Satish Arvapalli

    Abstract: An AI-powered quality engineering platform uses artificial intelligence to boost software quality assessments through automated defect prediction and optimized performance alongside improved feature extraction. Existing models result in difficulties addressing noisy data types together with imbalances, pattern recognition complexities, ineffective feature extraction, and generalization weaknesses.… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  9. arXiv:2503.11511  [pdf, other

    eess.IV cs.AI cs.CV

    Alzheimer's Disease Classification Using Retinal OCT: TransnetOCT and Swin Transformer Models

    Authors: Siva Manohar Reddy Kesu, Neelam Sinha, Hariharan Ramasangu, Thomas Gregor Issac

    Abstract: Retinal optical coherence tomography (OCT) images are the biomarkers for neurodegenerative diseases, which are rising in prevalence. Early detection of Alzheimer's disease using retinal OCT is a primary challenging task. This work utilizes advanced deep learning techniques to classify retinal OCT images of subjects with Alzheimer's disease (AD) and healthy controls (CO). The goal is to enhance dia… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 18 pages, 25 figures

  10. arXiv:2503.06459  [pdf, other

    math.CO cs.DS

    Deterministically approximating the volume of a Kostka polytope

    Authors: Hariharan Narayanan, Piyush Srivastava

    Abstract: Polynomial-time deterministic approximation of volumes of polytopes, up to an approximation factor that grows at most sub-exponentially with the dimension, remains an open problem. Recent work on this question has focused on identifying interesting classes of polytopes for which such approximation algorithms can be obtained. In this paper, we focus on one such class of polytopes: the Kostka polyto… ▽ More

    Submitted 5 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: Added further discussion

  11. arXiv:2502.17541  [pdf, ps, other

    cs.AI cs.CL

    Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction

    Authors: Michal Bravansky, Vaclav Kubon, Suhas Hariharan, Robert Kirk

    Abstract: Interpreting data is central to modern research. Large language models (LLMs) show promise in providing such natural language interpretations of data, yet simple feature extraction methods such as prompting often fail to produce accurate and versatile descriptions for diverse datasets and lack control over granularity and scale. To address these limitations, we propose a domain-agnostic method for… ▽ More

    Submitted 29 May, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  12. arXiv:2502.14156  [pdf, other

    cs.CV

    Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

    Authors: Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan, Stewart Worrall, Julie Stephany Berrio Perez

    Abstract: Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems. However, existing V2X datasets are limited in scope, diversity, and quality. To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes collected from three connected autono… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  13. arXiv:2502.10638  [pdf, other

    cs.HC

    Script&Shift: A Layered Interface Paradigm for Integrating Content Development and Rhetorical Strategy with LLM Writing Assistants

    Authors: Momin Siddiqui, Roy Pea, Hari Subramonyam

    Abstract: Good writing is a dynamic process of knowledge transformation, where writers refine and evolve ideas through planning, translating, and reviewing. Generative AI-powered writing tools can enhance this process but may also disrupt the natural flow of writing, such as when using LLMs for complex tasks like restructuring content across different sections or creating smooth transitions. We introduce Sc… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  14. arXiv:2502.10060  [pdf, other

    cs.CV cs.LG

    DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery

    Authors: Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick

    Abstract: Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  15. arXiv:2502.06682  [pdf, other

    cs.CV

    Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

    Authors: Tai-Yu Pan, Sooyoung Jeon, Mengdi Fan, Jinsu Yoo, Zhenyang Feng, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

    Abstract: Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limite… ▽ More

    Submitted 1 April, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted to CVPR 2025

  16. arXiv:2501.04896  [pdf, other

    cs.LG cs.AI cs.CY

    Quantifying Itch and its Impact on Sleep Using Machine Learning and Radio Signals

    Authors: Michail Ouroutzoglou, Mingmin Zhao, Joshua Hellerstein, Hariharan Rahul, Asima Badic, Brian S. Kim, Dina Katabi

    Abstract: Chronic itch affects 13% of the US population, is highly debilitating, and underlies many medical conditions. A major challenge in clinical care and new therapeutics development is the lack of an objective measure for quantifying itch, leading to reliance on subjective measures like patients' self-assessment of itch severity. In this paper, we show that a home radio device paired with artificial i… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  17. arXiv:2501.04654  [pdf, other

    cs.DC cs.PF

    Recorder: Comprehensive Parallel I/O Tracing and Analysis

    Authors: Chen Wang, Izzet Yildirim, Hariharan Devarajan, Kathryn Mohror, Marc Snir

    Abstract: This paper presents Recorder, a parallel I/O tracing tool designed to capture comprehensive I/O information on HPC applications. Recorder traces I/O calls across various I/O layers, storing all function parameters for each captured call. The volume of stored information scales linearly the application's execution scale. To address this, we present a sophisticated pattern-recognition-based compress… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: 29 pages. Under Review. Submitted to the Journal of Supercomputing

  18. arXiv:2412.09551  [pdf, other

    cs.CV

    Video Creation by Demonstration

    Authors: Yihong Sun, Hao Zhou, Liangzhe Yuan, Jennifer J. Sun, Yandong Li, Xuhui Jia, Hartwig Adam, Bharath Hariharan, Long Zhao, Ting Liu

    Abstract: We explore a novel video creation experience, namely Video Creation by Demonstration. Given a demonstration video and a context image from a different scene, we generate a physically plausible video that continues naturally from the context image and carries out the action concepts from the demonstration. To enable this capability, we present $δ$-Diffusion, a self-supervised training approach that… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: Project page at https://delta-diffusion.github.io/

  19. arXiv:2412.08563  [pdf

    cs.CV cs.GR

    Physics Based Differentiable Rendering for Inverse Problems and Beyond

    Authors: Preetish Kakkar, Srijani Mukherjee, Hariharan Ragothaman, Vishal Mehta

    Abstract: Physics-based differentiable rendering (PBDR) has become an efficient method in computer vision, graphics, and machine learning for addressing an array of inverse problems. PBDR allows patterns to be generated from perceptions which can be applied to enhance object attributes like geometry, substances, and lighting by adding physical models of light propagation and materials interaction. Due to th… ▽ More

    Submitted 9 January, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

    Journal ref: Journal of Electrical systems, Vol. 20 No. 11s (2024)

  20. arXiv:2412.03174  [pdf, other

    cs.RO math.OC

    Resilient Timed Elastic Band Planner for Collision-Free Navigation in Unknown Environments

    Authors: Geesara Kulathunga, Abdurrahman Yilmaz, Zhuoling Huang, Ibrahim Hroob, Hariharan Arunachalam, Leonardo Guevara, Alexandr Klimchik, Grzegorz Cielniak, Marc Hanheide

    Abstract: In autonomous navigation, trajectory replanning, refinement, and control command generation are essential for effective motion planning. This paper presents a resilient approach to trajectory replanning addressing scenarios where the initial planner's solution becomes infeasible. The proposed method incorporates a hybrid A* algorithm to generate feasible trajectories when the primary planner fails… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  21. arXiv:2411.16016  [pdf

    cs.RO

    Establishing Design Routines for Efficient Control of Automated Robots

    Authors: Hariharan Ragothaman, Harihar M, SK Guhananthan

    Abstract: With continual advancements in technology, efforts to develop robots simulating human behavior have intensified. Cognitive robotics, combined with artificial intelligence (AI), has proven effective in surveying and research analysis. However, despite progress, human intervention remains necessary, and incorporating AI into robotic systems continues to pose challenges. This paper explores methodolo… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

    Journal ref: Young Scientists Convention, MSEC, March 2012

  22. arXiv:2411.13549  [pdf, other

    cs.CV

    Generating 3D-Consistent Videos from Unposed Internet Photos

    Authors: Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, Noah Snavely

    Abstract: We address the problem of generating videos from unposed internet photos. A handful of input images serve as keyframes, and our model interpolates between them to simulate a path moving between the cameras. Given random images, a model's ability to capture underlying geometry, recognize scene identity, and relate frames in terms of camera position and orientation reflects a fundamental understandi… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  23. arXiv:2411.08813  [pdf, other

    cs.AI

    Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

    Authors: Suhas Hariharan, Zainab Ali Majid, Jaime Raldua Veuthey, Jacob Haimes

    Abstract: A key development in the cybersecurity evaluations space is the work carried out by Meta, through their CyberSecEval approach. While this work is undoubtedly a useful contribution to a nascent field, there are notable features that limit its utility. Key drawbacks focus on the insecure code detection part of Meta's methodology. We explore these limitations, and use our exploration as a test case f… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024, 2 pages

  24. arXiv:2411.02095  [pdf

    cs.GR cs.CV cs.HC

    The evolution of volumetric video: A survey of smart transcoding and compression approaches

    Authors: Preetish Kakkar, Hariharan Ragothaman

    Abstract: Volumetric video, the capture and display of three-dimensional (3D) imagery, has emerged as a revolutionary technology poised to transform the media landscape, enabling immersive experiences that transcend the limitations of traditional 2D video. One of the key challenges in this domain is the efficient delivery of these high-bandwidth, data-intensive volumetric video streams, which requires innov… ▽ More

    Submitted 9 January, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Journal ref: International Journal of Computer Graphics & Animation (IJCGA) 2024

  25. arXiv:2411.00210  [pdf, other

    cs.CV

    Scale-Aware Recognition in Satellite Images under Resource Constraints

    Authors: Shreelekha Revankar, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala

    Abstract: Recognition of features in satellite imagery (forests, swimming pools, etc.) depends strongly on the spatial scale of the concept and therefore the resolution of the images. This poses two challenges: Which resolution is best suited for recognizing a given concept, and where and when should the costlier higher-resolution (HR) imagery be acquired? We present a novel scheme to address these challe… ▽ More

    Submitted 2 February, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

    Comments: 16 pages, 4 figures

  26. arXiv:2410.23891  [pdf, other

    cs.CV cs.AI

    AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

    Authors: Hangyu Zhou, Chia-Hsiang Kao, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala

    Abstract: Clouds in satellite imagery pose a significant challenge for downstream applications. A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset -- $\textit{AllClear}$ for cloud removal, featuring 23,742 globally distributed regions of interes… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: Accepted at NeurIPS 2024 Datasets and Benchmarks Track. Code and data available at https://allclear.cs.cornell.edu/

  27. arXiv:2410.02646  [pdf, other

    cs.CV

    Learning 3D Perception from Others' Predictions

    Authors: Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

    Abstract: Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equ… ▽ More

    Submitted 29 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025

  28. arXiv:2409.19841  [pdf, other

    cs.LG cs.AI cs.NE

    Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning

    Authors: Chia-Hsiang Kao, Bharath Hariharan

    Abstract: Despite its widespread use in neural networks, error backpropagation has faced criticism for its lack of biological plausibility, suffering from issues such as the backward locking problem and the weight transport problem. These limitations have motivated researchers to explore more biologically plausible learning algorithms that could potentially shed light on how biological neural systems adapt… ▽ More

    Submitted 23 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: Accepted at NeurIPS 2024. Code available at https://github.com/IandRover/CCL-NeurIPS24

  29. arXiv:2409.16484  [pdf, other

    cs.RO

    BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

    Authors: Kasun Weerakoon, Mohamed Elnoor, Gershom Seneviratne, Vignesh Rajagopal, Senthil Hariharan Arul, Jing Liang, Mohamed Khalid M Jaffar, Dinesh Manocha

    Abstract: We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and assoc… ▽ More

    Submitted 2 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

  30. arXiv:2409.15394  [pdf, other

    cs.LG cs.AI cs.GR math.NA

    Neural Control Variates with Automatic Integration

    Authors: Zilu Li, Guandao Yang, Qingqing Zhao, Xi Deng, Leonidas Guibas, Bharath Hariharan, Gordon Wetzstein

    Abstract: This paper presents a method to leverage arbitrary neural network architecture for control variates. Control variates are crucial in reducing the variance of Monte Carlo integration, but they hinge on finding a function that both correlates with the integrand and has a known analytical integral. Traditional approaches rely on heuristics to choose this function, which might not be expressive enough… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Journal ref: SIGGRAPH Conference Papers 2024

  31. arXiv:2408.10240  [pdf, other

    cs.HC cs.AI cs.CV

    AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People

    Authors: Seonghee Lee, Maho Kohga, Steve Landau, Sile O'Modhrain, Hari Subramonyam

    Abstract: People with visual impairments often struggle to create content that relies heavily on visual elements, particularly when conveying spatial and structural information. Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork. On the other hand, emerging generative AI-based text-to-image tools can produce exp… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  32. arXiv:2408.10239  [pdf, ps, other

    cs.CY cs.AI cs.LG cs.SE

    A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

    Authors: Neha R. Gupta, Jessica Hullman, Hari Subramonyam

    Abstract: Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly infor… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  33. arXiv:2408.08301  [pdf, other

    cs.RO

    VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps

    Authors: Senthil Hariharan Arul, Dhruva Kumar, Vivek Sugirtharaj, Richard Kim, Xuewei, Qi, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

    Abstract: We present VLPG-Nav, a visual language navigation method for guiding robots to specified objects within household scenes. Unlike existing methods primarily focused on navigating the robot toward objects, our approach considers the additional challenge of centering the object within the robot's camera view. Our method builds a visual language pose graph (VLPG) that functions as a spatial map of VL… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  34. arXiv:2407.19108  [pdf, other

    cs.CV

    ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

    Authors: Gemmechu Hassena, Jonathan Moon, Ryan Fujii, Andrew Yuen, Noah Snavely, Steve Marschner, Bharath Hariharan

    Abstract: Implicit neural fields have made remarkable progress in reconstructing 3D surfaces from multiple images; however, they encounter challenges when it comes to separating individual objects within a scene. Previous work has attempted to tackle this problem by introducing a framework to train separate signed distance fields (SDFs) simultaneously for each of N objects and using a regularization term to… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Project page is: https://objectcarver.github.io/

  35. arXiv:2407.04694  [pdf, other

    cs.CL cs.AI cs.LG

    Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

    Authors: Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

    Abstract: AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model". This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model's knowledge of itself and its circumstances as situational awareness. To quantify situational… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 page main body, 98 page appendix, 58 figures

  36. arXiv:2406.11819  [pdf, other

    cs.CV

    MegaScenes: Scene-Level View Synthesis at Scale

    Authors: Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely

    Abstract: Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes… ▽ More

    Submitted 21 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at ECCV 2024. Our project page is at https://megascenes.github.io

  37. arXiv:2406.06613  [pdf, other

    cs.CL cs.AI

    GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

    Authors: Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, Joshua Clymer, Arjun Yadav

    Abstract: Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benc… ▽ More

    Submitted 22 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  38. arXiv:2405.16034  [pdf, other

    cs.CV

    DiffuBox: Refining 3D Object Detection with Point Diffusion

    Authors: Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-… ▽ More

    Submitted 6 December, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  39. arXiv:2405.14841  [pdf, other

    cs.CV

    MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos

    Authors: Yihong Sun, Bharath Hariharan

    Abstract: Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised instance detection and segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interes… ▽ More

    Submitted 31 July, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: ECCV 2024

  40. arXiv:2405.12946  [pdf, other

    cs.HC cs.LG

    Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs

    Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam

    Abstract: Online programming videos, including tutorials and streamcasts, are widely popular and contain a wealth of expert knowledge. However, effectively utilizing these resources to achieve targeted learning goals can be challenging. Unlike direct tutoring, video content lacks tailored guidance based on individual learning paces, personalized feedback, and interactive engagement necessary for support and… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  41. arXiv:2405.02260  [pdf, other

    cs.HC

    Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows

    Authors: Jasmine Y. Shih, Vishal Mohanty, Yannis Katsis, Hariharan Subramonyam

    Abstract: Domain experts can play a crucial role in guiding data scientists to optimize machine learning models while ensuring contextual relevance for downstream use. However, in current workflows, such collaboration is challenging due to differing expertise, abstract documentation practices, and lack of access and visibility into low-level implementation artifacts. To address these challenges and enable d… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  42. arXiv:2404.17673  [pdf, other

    cs.LG cs.RO

    Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

    Authors: Hariharan Arunachalam, Marc Hanheide, Sariah Mghames

    Abstract: Automating the segregation process is a need for every sector experiencing a high volume of materials handling, repetitive and exhaustive operations, in addition to risky exposures. Learning automated pick-and-place operations can be efficiently done by introducing collaborative autonomous systems (e.g. manipulators) in the workplace and among human operators. In this paper, we propose a deep rein… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 5 pages

  43. arXiv:2404.05139  [pdf, other

    cs.CV cs.RO

    Better Monocular 3D Detectors with LiDAR from the Past

    Authors: Yurong You, Cheng Perng Phoo, Carlos Andres Diaz-Ruiz, Katie Z Luo, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q Weinberger

    Abstract: Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In th… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICRA 2024. The code can be found at https://github.com/YurongYou/AsyncDepth

  44. arXiv:2402.17721  [pdf, other

    cs.HC cs.SE

    Prototyping with Prompts: Emerging Approaches and Challenges in Generative AI Design for Collaborative Software Teams

    Authors: Hari Subramonyam, Divy Thakkar, Andrew Ku, Jürgen Dieber, Anoop Sinha

    Abstract: Generative AI models are increasingly being integrated into human task workflows, enabling the production of expressive content across a wide range of contexts. Unlike traditional human-AI design methods, the new approach to designing generative capabilities focuses heavily on prompt engineering strategies. This shift requires a deeper understanding of how collaborative software teams establish an… ▽ More

    Submitted 30 March, 2025; v1 submitted 27 February, 2024; originally announced February 2024.

  45. arXiv:2402.17699  [pdf, other

    cs.LG stat.ML

    Gradient-based Discrete Sampling with Automatic Cyclical Scheduling

    Authors: Patrick Pynadath, Riddhiman Bhattacharya, Arun Hariharan, Ruqi Zhang

    Abstract: Discrete distributions, particularly in high-dimensional deep models, are often highly multimodal due to inherent discontinuities. While gradient-based discrete sampling has proven effective, it is susceptible to becoming trapped in local modes due to the gradient information. To tackle this challenge, we propose an automatic cyclical scheduling, designed for efficient and accurate sampling in mul… ▽ More

    Submitted 24 October, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  46. AINeedsPlanner: A Workbook to Support Effective Collaboration Between AI Experts and Clients

    Authors: Dae Hyun Kim, Hyungyu Shin, Shakhnozakhon Yadgarova, Jinho Son, Hariharan Subramonyam, Juho Kim

    Abstract: Clients often partner with AI experts to develop AI applications tailored to their needs. In these partnerships, careful planning and clear communication are critical, as inaccurate or incomplete specifications can result in misaligned model characteristics, expensive reworks, and potential friction between collaborators. Unfortunately, given the complexity of requirements ranging from functionali… ▽ More

    Submitted 26 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: To appear in DIS 2024

  47. arXiv:2312.08793  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Forbidden Facts: An Investigation of Competing Objectives in Llama-2

    Authors: Tony T. Wang, Miles Wang, Kaivalya Hariharan, Nir Shavit

    Abstract: LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task. Specifically, we instruct Llama-2 to truthfully complete a factual recall statement while forbidding it from saying the correct answer. This often makes the model give incorrect answers. We decompose Llama-2 into 1… ▽ More

    Submitted 31 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted to the ATTRIB and SoLaR workshops at NeurIPS 2023; (v3: clarified experimental details)

  48. arXiv:2312.06960  [pdf, other

    cs.CV cs.LG

    Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

    Authors: Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala

    Abstract: We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  49. arXiv:2312.06131  [pdf, other

    cs.DC

    ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems

    Authors: Yiheng Xu, Pranav Sivaraman, Hariharan Devarajan, Kathryn Mohror, Abhinav Bhatele

    Abstract: Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O bottlenecks. However, determining if a given application could be accelerated using burst buffers is not straightforward even for storage experts. The relationshi… ▽ More

    Submitted 11 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  50. arXiv:2312.05984  [pdf, ps, other

    cs.CV cs.AI cs.GR cs.LG

    Accurate Differential Operators for Hybrid Neural Fields

    Authors: Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan

    Abstract: Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybri… ▽ More

    Submitted 1 June, 2025; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted in CVPR 2025. Project page is available at https://justachetan.github.io/hnf-derivatives/