Skip to main content

Showing 1–50 of 120 results for author: Hsu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.23414  [pdf, ps, other

    cs.CV

    A High-Throughput Platform to Bench Test Smartphone-Based Heart Rate Measurements Derived From Video

    Authors: Ming-Zher Poh, Jonathan Wang, Jonathan Hsu, Lawrence Cai, Eric Teasley, James A. Taylor, Jameson K. Rogers, Anupam Pathak, Shwetak Patel

    Abstract: Smartphone-based heart rate (HR) monitoring apps using finger-over-camera photoplethysmography (PPG) face significant challenges in performance evaluation and device compatibility due to device variability and fragmentation. Manual testing is impractical, and standardized methods are lacking. This paper presents a novel, high-throughput bench-testing platform to address this critical need. We desi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  2. arXiv:2505.18479  [pdf, other

    cs.CV

    Syn3DTxt: Embedding 3D Cues for Scene Text Generation

    Authors: Li-Syun Hsiung, Jun-Kai Tu, Kuan-Wu Chu, Yu-Hsuan Chiu, Yan-Tsung Peng, Sheng-Luen Chung, Gee-Sern Jison Hsu

    Abstract: This study aims to investigate the challenge of insufficient three-dimensional context in synthetic datasets for scene text rendering. Although recent advances in diffusion models and related techniques have improved certain aspects of scene text generation, most existing approaches continue to rely on 2D data, sourcing authentic training examples from movie posters and book covers, which limits t… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: CVPR workshop 2025: SyntaGen

  3. arXiv:2505.14613  [pdf, ps, other

    cs.LG q-bio.QM

    Virtual Cells: Predict, Explain, Discover

    Authors: Emmanuel Noutahi, Jason Hartford, Prudencio Tossou, Shawn Whitfield, Alisandra K. Denton, Cas Wognum, Kristina Ulicna, Michael Craig, Jonathan Hsu, Michael Cuccarese, Emmanuel Bengio, Dominique Beaini, Christopher Gibson, Daniel Cohen, Berton Earnshaw

    Abstract: Drug discovery is fundamentally a process of inferring the effects of treatments on patients, and would therefore benefit immensely from computational models that can reliably simulate patient responses, enabling researchers to generate and test large numbers of therapeutic hypotheses safely and economically before initiating costly clinical trials. Even a more specific model that predicts the fun… ▽ More

    Submitted 4 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  4. arXiv:2504.03943  [pdf, ps, other

    stat.ML cond-mat.mtrl-sci cs.LG

    Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects

    Authors: Imon Mia, Armi Tiihonen, Anna Ernst, Anusha Srivastava, Tonio Buonassisi, William Vandenberghe, Julia W. P. Hsu

    Abstract: Bayesian Optimization (BO) machine learning method is increasingly used to guide experimental optimization tasks in materials science. To emulate the large number of input variables and noise-containing results in experimental materials research, we perform batch BO simulation of six design variables with a range of noise levels. Two test cases relevant for materials science problems are examined:… ▽ More

    Submitted 11 June, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  5. arXiv:2503.19405  [pdf, other

    cs.CV

    Multi-modal 3D Pose and Shape Estimation with Computed Tomography

    Authors: Mingxiao Tu, Hoijoon Jung, Alireza Moghadam, Jineel Raythatha, Lachlan Allan, Jeremy Hsu, Andre Kyme, Jinman Kim

    Abstract: In perioperative care, precise in-bed 3D patient pose and shape estimation (PSE) can be vital in optimizing patient positioning in preoperative planning, enabling accurate overlay of medical images for augmented reality-based surgical navigation, and mitigating risks of prolonged immobility during recovery. Conventional PSE methods relying on modalities such as RGB-D, infrared, or pressure maps of… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  6. arXiv:2502.12481  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Predicate Hierarchies Improve Few-Shot State Classification

    Authors: Emily Jin, Joy Hsu, Jiajun Wu

    Abstract: State classification of objects and their relations is core to many long-horizon tasks, particularly in robot planning and manipulation. However, the combinatorial explosion of possible object-predicate combinations, coupled with the need to adapt to novel real-world environments, makes it a desideratum for state classification models to generalize to novel queries with few examples. To this end,… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: ICLR 2025. First two authors contributed equally. Project page: https://emilyzjin.github.io/projects/phier.html

  7. arXiv:2501.14550  [pdf, ps, other

    cs.PL cs.LO math.NA

    Bean: A Language for Backward Error Analysis

    Authors: Ariel E. Kellison, Laura Zielinski, David Bindel, Justin Hsu

    Abstract: Backward error analysis offers a method for assessing the quality of numerical programs in the presence of floating-point rounding errors. However, techniques from the numerical analysis literature for quantifying backward error require substantial human effort, and there are currently no tools or automated methods for statically deriving sound backward error bounds. To address this gap, we propos… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  8. arXiv:2411.10548  [pdf, ps, other

    cs.LG q-bio.BM

    BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

    Authors: Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef , et al. (68 additional authors not shown)

    Abstract: Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio… ▽ More

    Submitted 12 June, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

  9. arXiv:2411.09635  [pdf, ps, other

    stat.ML cs.LG

    Counterfactual Uncertainty Quantification of Factual Estimand of Efficacy from Before-and-After Treatment Repeated Measures Randomized Controlled Trials

    Authors: Xingya Wang, Yang Han, Yushi Liu, Szu-Yu Tang, Jason C. Hsu

    Abstract: This article quantifies the uncertainty reduction achievable for \textit{counterfactual} estimand, and cautions against potential bias when the estimand uses Digital Twins. Posed by Neyman (1923a) who showed unbiased \textit{point estimation} from designed \textit{factual} experiments is possible, \textit{counterfactual} uncertainty quantification (CUQ) remained an open challenge for about one hun… ▽ More

    Submitted 14 June, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

  10. arXiv:2410.19471  [pdf, other

    cs.LG cs.AI

    Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

    Authors: Ryan Park, Darren J. Hsu, C. Brian Roland, Maria Korshunova, Chen Tessler, Shie Mannor, Olivia Viessmann, Bruno Trentini

    Abstract: Inverse folding models play an important role in structure-based design by predicting amino acid sequences that fold into desired reference structures. Models like ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure. However, when applied to peptides, these models are prone to generating repetitive sequences that do not fol… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: Preprint. 10 pages plus appendices

  11. arXiv:2409.09090  [pdf, other

    cs.DL cs.CL

    An Evaluation of GPT-4V for Transcribing the Urban Renewal Hand-Written Collection

    Authors: Myeong Lee, Julia H. P. Hsu

    Abstract: Between 1960 and 1980, urban renewal transformed many cities, creating vast handwritten records. These documents posed a significant challenge for researchers due to their volume and handwritten nature. The launch of GPT-4V in November 2023 offered a breakthrough, enabling large-scale, efficient transcription and analysis of these historical urban renewal documents.

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Published in Digital Humanities (DH 2024). Aug 6-9. Arlington, VA

  12. arXiv:2409.08825  [pdf

    eess.SY cs.RO

    Flight Testing of Latch Valve with Lightweight LV-Servo Direct Drive Mechanism

    Authors: Hao-Che Huang, Chih-Shin Chang, Jui-Cheng Hsu, Shih-Sin Wei

    Abstract: In the field of rocket technology, the latch valve assumes a pivotal role in regulating the flow of fuel gases and liquids to ensure the requisite energy supply. This project endeavors to innovate by replacing the conventional step motor mechanism with a servo motor for latch valve control. The selected servo motor, boasting a more compact form factor and reduced mass, aligns seamlessly with the p… ▽ More

    Submitted 17 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: 21 pages, 14 figures and 1 table

    MSC Class: 74F10 ACM Class: J.2

  13. arXiv:2409.08202  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    What Makes a Maze Look Like a Maze?

    Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, Jiajun Wu

    Abstract: A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them. While off-the-shelf vision-language models excel at making literal interpretations of images (e.g., recognizing object categories such as t… ▽ More

    Submitted 17 February, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: ICLR 2025

  14. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    Deep Learning-Based Automatic Multi-Level Airway Collapse Monitoring on Obstructive Sleep Apnea Patients

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigated the use of deep learning to identify multi-level upper airway collapses in obstructive sleep apnea (OSA) patients based on snoring sounds. We fi-ne-tuned ResNet-50 and Audio Spectrogram Transformer (AST) models using snoring recordings from 37 subjects undergoing drug-induced sleep endoscopy (DISE) between 2020 and 2021. Snoring sounds were labeled according to the VOTE (Ve… ▽ More

    Submitted 9 January, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

  15. arXiv:2407.13460  [pdf, other

    cs.CV cs.LG

    SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

    Authors: Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu

    Abstract: Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE -- Semantic Ali… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  17. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.14068  [pdf, other

    cs.GT cs.PL

    Verifying Cake-Cutting, Faster

    Authors: Noah Bertram, Tean Lai, Justin Hsu

    Abstract: Envy-free cake-cutting protocols procedurally divide an infinitely divisible good among a set of agents so that no agent prefers another's allocation to their own. These protocols are highly complex and difficult to prove correct. Recently, Bertram, Levinson, and Hsu introduced a language called Slice for describing and verifying cake-cutting protocols. Slice programs can be translated to formulas… ▽ More

    Submitted 30 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 53 Pages, 12 Figures, CAV 2024

    ACM Class: D.3.1; J.4

  19. arXiv:2405.05876  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Composable Part-Based Manipulation

    Authors: Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu

    Abstract: In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of differen… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Presented at CoRL 2023. For videos and additional results, see our website: https://cpmcorl2023.github.io/

  20. arXiv:2405.04612  [pdf, ps, other

    cs.PL math.NA

    Numerical Fuzz: A Type System for Rounding Error Analysis

    Authors: Ariel E. Kellison, Justin Hsu

    Abstract: Algorithms operating on real numbers are implemented as floating-point computations in practice, but floating-point operations introduce roundoff errors that can degrade the accuracy of the result. We propose $Λ_{num}$, a functional programming language with a type system that can express quantitative bounds on roundoff error. Our type system combines a sensitivity analysis, enforced through a lin… ▽ More

    Submitted 8 April, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

  21. arXiv:2405.03864  [pdf, other

    cs.RO cs.AI

    Learning Planning Abstractions from Language

    Authors: Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu

    Abstract: This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action co… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally. The last two authors provide equal advising. Project website: https://parl2024.github.io/

  22. arXiv:2404.19696  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

    Authors: Chun Feng, Joy Hsu, Weiyu Liu, Jiajun Wu

    Abstract: 3D visual grounding is a challenging task that often requires direct and dense supervision, notably the semantic label for each object in the scene. In this paper, we instead study the naturally supervised setting that learns from only 3D scene and QA pairs, where prior works underperform. We propose the Language-Regularized Concept Learner (LARC), which uses constraints from language as regulariz… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. The first two authors contributed equally

  23. arXiv:2404.06479  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Visually Descriptive Language Model for Vector Graphics Reasoning

    Authors: Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji

    Abstract: Despite significant advancements, large multimodal models (LMMs) still struggle to bridge the gap between low-level visual perception -- focusing on shapes, sizes, and layouts -- and high-level language reasoning, such as semantics and logic. This limitation is evident in tasks that require precise visual perception, like comparing geometric properties or solving visual reasoning problems. To stud… ▽ More

    Submitted 12 June, 2025; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Project page: https://mikewangwzhl.github.io/VDLM/

    Journal ref: TMLR 2025

  24. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  25. arXiv:2401.05842  [pdf, ps, other

    cs.LO

    A Categorical Approach to DIBI Models

    Authors: Tao Gu, Jialu Bao, Justin Hsu, Alexandra Silva, Fabio Zanasi

    Abstract: The logic of Dependence and Independence Bunched Implications (DIBI) is a logic to reason about conditional independence (CI); for instance, DIBI formulas can characterise CI in probability distributions and relational databases, using the probabilistic and relational DIBI models, respectively. Despite the similarity of the probabilistic and relational models, a uniform, more abstract account rema… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 33 pages

  26. arXiv:2310.16035  [pdf, other

    cs.CV cs.AI cs.CL cs.LG stat.ML

    What's Left? Concept Grounding with Logic-Enhanced Foundation Models

    Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Recent works such as VisProg and ViperGPT have smartly composed foundation models for visual reasoning-using large language models (LLMs) to produce programs that can be executed by pre-trained vision-language models. However, they operate in limited domains, such as 2D images, not fully exploiting the generalization of language: abstract concepts like "left" can also be grounded in 3D, temporal,… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023. First two authors contributed equally. Project page: https://web.stanford.edu/~joycj/projects/left_neurips_2023

  27. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (269 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  28. arXiv:2310.02971  [pdf, other

    eess.AS cs.CL eess.SP

    Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

    Authors: Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder… ▽ More

    Submitted 14 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE ASRU 2023

  29. arXiv:2308.07024  [pdf, other

    cs.CV

    PGT-Net: Progressive Guided Multi-task Neural Network for Small-area Wet Fingerprint Denoising and Recognition

    Authors: Yu-Ting Li, Ching-Te Chiu, An-Ting Hsieh, Mao-Hsiu Hsu, Long Wenyong, Jui-Min Hsu

    Abstract: Fingerprint recognition on mobile devices is an important method for identity verification. However, real fingerprints usually contain sweat and moisture which leads to poor recognition performance. In addition, for rolling out slimmer and thinner phones, technology companies reduce the size of recognition sensors by embedding them with the power button. Therefore, the limited size of fingerprint… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  30. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  31. arXiv:2305.08953  [pdf, other

    cs.CV cs.AI cs.LG

    Motion Question Answering via Modular Motion Programs

    Authors: Mark Endo, Joy Hsu, Jiaman Li, Jiajun Wu

    Abstract: In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences. We generate a dataset of… ▽ More

    Submitted 17 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: In ICML 2023; first two authors contributed equally to this work

  32. arXiv:2304.13826  [pdf, other

    cs.AI cs.CV cs.RO

    Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

    Authors: Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao

    Abstract: Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills. Towards this goal, recent works have integrated semantic representations from large-scale pretrained vision-language (VL) models into manipulation models, imparting them with more general reasoning capabilities. However, we show that the conventio… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: ICLR 2023 camera-ready

  33. Cutting the Cake: A Language for Fair Division

    Authors: Noah Bertram, Alex Levinson, Justin Hsu

    Abstract: The fair division literature in economics considers how to divide resources between multiple agents such that the allocation is envy-free: each agent receives their favorite piece. Researchers have developed a variety of fair division protocols for the most standard setting, where the agents want to split a single item, however, the protocols are highly intricate and the proofs of envy-freeness in… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: 31 pages, 15 figures, PLDI 2023

    ACM Class: D.3.1; J.4

  34. arXiv:2303.13483  [pdf, other

    cs.CV cs.AI

    NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

    Authors: Joy Hsu, Jiayuan Mao, Jiajun Wu

    Abstract: Grounding object properties and relations in 3D scenes is a prerequisite for a wide range of artificial intelligence tasks, such as visually grounded dialogues and embodied manipulation. However, the variability of the 3D domain induces two fundamental challenges: 1) the expense of labeling and 2) the complexity of 3D grounded language. Hence, essential desiderata for models are to be data-efficie… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: In CVPR 2023

  35. arXiv:2303.01616  [pdf, other

    cs.PL cs.LO

    Separated and Shared Effects in Higher-Order Languages

    Authors: Pedro H. Azevedo de Amorim, Justin Hsu

    Abstract: Effectful programs interact in ways that go beyond simple input-output, making compositional reasoning challenging. Existing work has shown that when such programs are ``separate'', i.e., when programs do not interfere with each other, it can be easier to reason about them. While reasoning about separated resources has been well-studied, there has been little work on reasoning about separated effe… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  36. arXiv:2212.06817  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    RT-1: Robotics Transformer for Real-World Control at Scale

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath , et al. (26 additional authors not shown)

    Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, wher… ▽ More

    Submitted 11 August, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: See website at robotics-transformer1.github.io

  37. arXiv:2211.16663  [pdf, other

    cs.CV

    Geoclidean: Few-Shot Generalization in Euclidean Geometry

    Authors: Joy Hsu, Jiajun Wu, Noah D. Goodman

    Abstract: Euclidean geometry is among the earliest forms of mathematical thinking. While the geometric primitives underlying its constructions, such as perfect lines and circles, do not often occur in the natural world, humans rarely struggle to perceive and reason with them. Will computer vision models trained on natural images show the same sensitivity to Euclidean geometry? Here we explore these question… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: To appear at NeurIPS 2022

  38. Symbolic Execution for Randomized Programs

    Authors: Zachary Susag, Sumit Lahiri, Justin Hsu, Subhajit Roy

    Abstract: We propose a symbolic execution method for programs that can draw random samples. In contrast to existing work, our method can verify randomized programs with unknown inputs and can prove probabilistic properties that universally quantify over all possible inputs. Our technique augments standard symbolic execution with a new class of \emph{probabilistic symbolic variables}, which represent the res… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: 47 pages, 9 figures, to appear at OOPSLA 2022

    ACM Class: D.2.4; F.3.1; G.3

  39. arXiv:2205.09185  [pdf, other

    physics.ins-det cs.LG hep-ex nucl-ex physics.comp-ph

    AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

    Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

    Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More

    Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 16 pages, 18 figures, 2 appendices, 3 tables

  40. arXiv:2204.08105  [pdf, other

    cs.CL cs.IT

    Monte Carlo Tree Search for Interpreting Stress in Natural Language

    Authors: Kyle Swanson, Joy Hsu, Mirac Suzgun

    Abstract: Natural language processing can facilitate the analysis of a person's mental state from text they have written. Previous studies have developed models that can predict whether a person is experiencing a mental health condition from social media posts with high accuracy. Yet, these models cannot explain why the person is experiencing a particular mental state. In this work, we present a new method… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

    Comments: Second Workshop on LT-EDI at ACL 2022

  41. arXiv:2204.06407  [pdf, other

    cs.LG cs.AI

    Flexible Multiple-Objective Reinforcement Learning for Chip Placement

    Authors: Fu-Chieh Chang, Yu-Wei Tseng, Ya-Wen Yu, Ssu-Rui Lee, Alexandru Cioba, I-Lun Tseng, Da-shan Shiu, Jhih-Wei Hsu, Cheng-Yuan Wang, Chien-Yi Yang, Ren-Chu Wang, Yao-Wen Chang, Tai-Chen Chen, Tung-Chieh Chen

    Abstract: Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changi… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: A short version of this article is published in DAC'22:LBR (see ACM DOI 10.1145/3489517.3530617)

  42. arXiv:2204.03113  [pdf, other

    cs.PL cs.CR cs.NI

    P4BID: Information Flow Control in P4

    Authors: Karuna Grewal, Loris D'Antoni, Justin Hsu

    Abstract: Modern programmable network switches can implement custom applications using efficient packet processing hardware, and the programming language P4 provides high-level constructs to program such switches. The increase in speed and programmability has inspired research in dataplane programming, where many complex functionalities, e.g., key-value stores and load balancers, can be implemented entirely… ▽ More

    Submitted 14 June, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

  43. arXiv:2204.01691  [pdf, other

    cs.RO cs.CL cs.LG

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee , et al. (20 additional authors not shown)

    Abstract: Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embo… ▽ More

    Submitted 16 August, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: See website at https://say-can.github.io/ V1. Initial Upload. V2. Added PaLM results. Added study about new capabilities (drawer manipulation, chain of thought prompting, multilingual instructions). Added an ablation study of language model size. Added an open-source version of \algname on a simulated tabletop environment. Improved readability

  44. arXiv:2202.00478  [pdf

    cs.CL

    NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing

    Authors: Tanish Tyagi, Colin G. Magdamo, Ayush Noori, Zhaozhi Li, Xiao Liu, Mayuresh Deodhar, Zhuoqiao Hong, Wendong Ge, Elissa M. Ye, Yi-han Sheu, Haitham Alabsi, Laura Brenner, Gregory K. Robbins, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Alberto Serrano-Pozo, Dimitry Prokopenko, Rudolph E. Tanzi, Bradley T. Hyman, Deborah Blacker, Shibani S. Mukerji, M. Brandon Westover, Sudeshna Das

    Abstract: Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurr… ▽ More

    Submitted 20 June, 2022; v1 submitted 12 January, 2022; originally announced February 2022.

  45. arXiv:2112.00894  [pdf, other

    cs.CL

    Context-Dependent Semantic Parsing for Temporal Relation Extraction

    Authors: Bo-Ying Su, Shang-Ling Hsu, Kuan-Yin Lai, Jane Yung-jen Hsu

    Abstract: Extracting temporal relations among events from unstructured text has extensive applications, such as temporal reasoning and question answering. While it is difficult, recent development of Neural-symbolic methods has shown promising results on solving similar tasks. Current temporal relation extraction methods usually suffer from limited expressivity and inconsistent relation inference. For examp… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  46. arXiv:2111.14917  [pdf, ps, other

    cs.PL cs.LO

    A Separation Logic for Negative Dependence

    Authors: Jialu Bao, Marco Gaboardi, Justin Hsu, Joseph Tassarotti

    Abstract: Formal reasoning about hashing-based probabilistic data structures often requires reasoning about random variables where when one variable gets larger (such as the number of elements hashed into one bucket), the others tend to be smaller (like the number of elements hashed into the other buckets). This is an example of negative dependence, a generalization of probabilistic independence that has re… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: 61 pages, 9 figures, to appear in Proceedings of the ACM on Programming Languages (POPL 2022)

  47. arXiv:2111.09115  [pdf, other

    cs.CL cs.LG

    Using Deep Learning to Identify Patients with Cognitive Impairment in Electronic Health Records

    Authors: Tanish Tyagi, Colin G. Magdamo, Ayush Noori, Zhaozhi Li, Xiao Liu, Mayuresh Deodhar, Zhuoqiao Hong, Wendong Ge, Elissa M. Ye, Yi-han Sheu, Haitham Alabsi, Laura Brenner, Gregory K. Robbins, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Alberto Serrano-Pozo, Dimitry Prokopenko, Rudolph E. Tanzi, Bradley T. Hyman, Deborah Blacker, Shibani S. Mukerji, M. Brandon Westover, Sudeshna Das

    Abstract: Dementia is a neurodegenerative disorder that causes cognitive decline and affects more than 50 million people worldwide. Dementia is under-diagnosed by healthcare professionals - only one in four people who suffer from dementia are diagnosed. Even when a diagnosis is made, it may not be entered as a structured International Classification of Diseases (ICD) diagnosis code in a patient's charts. In… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Machine Learning for Health (ML4H) - Extended Abstract

  48. arXiv:2108.09448  [pdf, other

    cs.HC

    Thing Constellation Visualizer: Exploring Emergent Relationships of Everyday Objects

    Authors: Yi-Ching 'Janet' Huang, Yu-Ting Cheng, Rung-Huei Liang, Jane Yung-jen Hsu, Lin-Lin Chen

    Abstract: Designing future IoT ecosystems requires new approaches and perspectives to understand everyday practices. While researchers recognize the importance of understanding social aspects of everyday objects, limited studies have explored the possibilities of combining data-driven patterns with human interpretations to investigate emergent relationships among objects. This work presents Thing Constellat… ▽ More

    Submitted 25 August, 2021; v1 submitted 21 August, 2021; originally announced August 2021.

    Comments: Accepted at CSCW 2021

  49. Data-Driven Invariant Learning for Probabilistic Programs

    Authors: Jialu Bao, Nitesh Trivedi, Drashti Pathak, Justin Hsu, Subhajit Roy

    Abstract: Morgan and McIver's weakest pre-expectation framework is one of the most well-established methods for deductive verification of probabilistic programs. Roughly, the idea is to generalize binary state assertions to real-valued expectations, which can measure expected values of probabilistic program quantities. While loop-free programs can be analyzed by mechanically transforming expectations, verif… ▽ More

    Submitted 7 March, 2025; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: 37 pages

    Journal ref: Formal Methods in System Design 2024 (CAV Collection)

  50. arXiv:2106.00497  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Omnizart: A General Toolbox for Automatic Music Transcription

    Authors: Yu-Te Wu, Yin-Jyun Luo, Tsung-Ping Chen, I-Chieh Wei, Jui-Yang Hsu, Yi-Chin Chuang, Li Su

    Abstract: We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use with a compact command-line interface. To the best of our knowledge, Omnizart is the first transcription toolkit which offers models covering a wide c… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.