Skip to main content

Showing 1–50 of 226 results for author: Pfister, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.03866  [pdf, ps, other

    cs.LG cs.CV cs.HC

    A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime

    Authors: Shuning Jiang, Wei-Lun Chao, Daniel Haehn, Hanspeter Pfister, Jian Chen

    Abstract: We present a data-domain sampling regime for quantifying CNNs' graphic perception behaviors. This regime lets us evaluate CNNs' ratio estimation ability in bar charts from three perspectives: sensitivity to training-test distribution discrepancies, stability to limited samples, and relative expertise to human observers. After analyzing 16 million trials from 800 CNNs models and 6,825 trials from 1… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: This is a preprint of a paper that has been conditionally accepted for publication at IEEE VIS 2025. The final version may be different upon publication. 9 pages main text, 11 pages supplementary contents, 37 figures

  2. arXiv:2506.17403  [pdf, ps, other

    cs.CV

    Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos

    Authors: Zhiyi Shi, Junsik Kim, Helen Y. Yang, Yonghyun Song, Hyun-Jic Oh, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister

    Abstract: Automating embryo viability prediction for in vitro fertilization (IVF) is important but challenging due to the limited availability of labeled pregnancy outcome data, as only a small fraction of embryos are labeled after transfer. Self-supervised learning (SSL) can leverage both labeled and unlabeled data to improve prediction. However, existing SSL methods for videos are not directly applicable… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Preprint submitted to Medical Image Analysis

  3. arXiv:2506.17076  [pdf, ps, other

    cs.IT cs.LG

    Neural Polar Decoders for DNA Data Storage

    Authors: Ziv Aharoni, Henry D. Pfister

    Abstract: Synchronization errors, such as insertions and deletions, present a fundamental challenge in DNA-based data storage systems, arising from both synthesis and sequencing noise. These channels are often modeled as insertion-deletion-substitution (IDS) channels, for which designing maximum-likelihood decoders is computationally expensive. In this work, we propose a data-driven approach based on neural… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  4. arXiv:2506.15836  [pdf, ps, other

    cs.IT cs.LG

    Code Rate Optimization via Neural Polar Decoders

    Authors: Ziv Aharoni, Bashar Huleihel, Henry D Pfister, Haim H Permuter

    Abstract: This paper proposes a method to optimize communication code rates via the application of neural polar decoders (NPDs). Employing this approach enables simultaneous optimization of code rates over input distributions while providing a practical coding scheme within the framework of polar codes. The proposed approach is designed for scenarios where the channel model is unknown, treating the channel… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  5. arXiv:2506.15786  [pdf, ps, other

    cs.GR cs.AI cs.LG physics.comp-ph physics.optics

    Graphics4Science: Computer Graphics for Scientific Impacts

    Authors: Peter Yichen Chen, Minghao Guo, Hanspeter Pfister, Ming Lin, William Freeman, Qixing Huang, Han-Wei Shen, Wojciech Matusik

    Abstract: Computer graphics, often associated with films, games, and visual effects, has long been a powerful tool for addressing scientific challenges--from its origins in 3D visualization for medical imaging to its role in modern computational modeling and simulation. This course explores the deep and evolving relationship between computer graphics and science, highlighting past achievements, ongoing cont… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  6. arXiv:2506.13638  [pdf, ps, other

    cs.CV cs.AI

    DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

    Authors: Zhiyi Shi, Binjie Wang, Chongjie Si, Yichen Wu, Junsik Kim, Hanspeter Pfister

    Abstract: Model editing aims to efficiently update a pre-trained model's knowledge without the need for time-consuming full retraining. While existing pioneering editing methods achieve promising results, they primarily focus on editing single-modal language models (LLMs). However, for vision-language models (VLMs), which involve multiple modalities, the role and impact of each modality on editing performan… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Under Review

  7. arXiv:2505.18306  [pdf, ps, other

    cs.CV

    CTRL-GS: Cascaded Temporal Residue Learning for 4D Gaussian Splatting

    Authors: Karly Hou, Wanhua Li, Hanspeter Pfister

    Abstract: Recently, Gaussian Splatting methods have emerged as a desirable substitute for prior Radiance Field methods for novel-view synthesis of scenes captured with multi-view images or videos. In this work, we propose a novel extension to 4D Gaussian Splatting for dynamic scenes. Drawing on ideas from residual learning, we hierarchically decompose the dynamic scene into a "video-segment-frame" structure… ▽ More

    Submitted 31 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted to 4D Vision Workshop @ CVPR 2025

  8. arXiv:2504.15394  [pdf, ps, other

    cs.IT

    Capacity on BMS Channels via Code Symmetry and Nesting

    Authors: Henry D. Pfister, Galen Reeves

    Abstract: The past decade has seen notable advances in our understanding of structured error-correcting codes, particularly binary Reed--Muller (RM) codes. While initial breakthroughs were for erasure channels based on symmetry, extending these results to the binary symmetric channel (BSC) and other binary memoryless symmetric (BMS) channels required new tools and conditions. Recent work uses nesting to obt… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 35 pages, 1 figure

  9. arXiv:2503.24270  [pdf, other

    cs.CV cs.AI

    Visual Acoustic Fields

    Authors: Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang

    Abstract: Objects produce different sounds when hit, and humans can intuitively infer how an object might sound based on its appearance and material properties. Inspired by this intuition, we propose Visual Acoustic Fields, a framework that bridges hitting sounds and visual signals within a 3D space using 3D Gaussian Splatting (3DGS). Our approach features two key modules: sound generation and sound localiz… ▽ More

    Submitted 31 March, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  10. arXiv:2503.10437  [pdf, other

    cs.CV

    4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

    Authors: Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister

    Abstract: Learning 4D language fields to enable time-sensitive, open-ended language queries in dynamic scenes is essential for many real-world applications. While LangSplat successfully grounds CLIP features into 3D Gaussian representations, achieving precision and efficiency in 3D static scenes, it lacks the ability to handle dynamic 4D fields as CLIP, designed for static image-text tasks, cannot capture t… ▽ More

    Submitted 31 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project Page: https://4d-langsplat.github.io

  11. Enhancing User Performance and Human Factors through Visual Guidance in AR Assembly Tasks

    Authors: Leon Pietschmann, Michel Schimpf, Zhu-Tian Chen, Hanspeter Pfister, Thomas Bohné

    Abstract: This study investigates the influence of Visual Guidance (VG) on user performance and human factors within Augmented Reality (AR) via a between-subjects experiment. VG is a crucial component in AR applications, serving as a bridge between digital information and real-world interactions. Unlike prior research, which often produced inconsistent outcomes, our study focuses on varying types of support… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  12. arXiv:2503.00086  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Generalization of CNNs on Relational Reasoning with Bar Charts

    Authors: Zhenxing Cui, Lu Chen, Yunhai Wang, Daniel Haehn, Yong Wang, Hanspeter Pfister

    Abstract: This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar cha… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Accepted by TVCG. GitHub repository: https://github.com/Ideas-Laboratory/Graphical-Perception

  13. arXiv:2502.08621  [pdf, other

    cs.HC

    SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment

    Authors: Tica Lin, Ruxun Xiang, Gardenia Liu, Divyanshu Tiwari, Meng-Chia Chiang, Chenjiayi Ye, Hanspeter Pfister, Chen Zhu-Tian

    Abstract: Video storytelling is essential for sports performance analysis and fan engagement, enabling sports professionals and fans to effectively communicate and interpret the spatial and temporal dynamics of gameplay. Traditional methods rely on manual annotation and verbal explanations, placing significant demands on creators for video editing skills and on viewers for cognitive focus. However, these ap… ▽ More

    Submitted 14 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Accepted at PacificVIS 2025

  14. arXiv:2502.03785  [pdf, ps, other

    cs.IT quant-ph

    Reed-Muller Codes on CQ Channels via a New Correlation Bound for Quantum Observables

    Authors: Avijit Mandal, Henry D. Pfister

    Abstract: The question of whether Reed-Muller (RM) codes achieve capacity on binary memoryless symmetric (BMS) channels has drawn attention since it was resolved positively for the binary erasure channel by Kudekar et al. in 2016. In 2021, Reeves and Pfister extended this to prove the bit-error probability vanishes on BMS channels when the code rate is less than capacity. In 2023, Abbe and Sandon improved t… ▽ More

    Submitted 8 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: Extended Version of ISIT 2025 Submission

  15. arXiv:2502.02305  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Information-Theoretic Proofs for Diffusion Sampling

    Authors: Galen Reeves, Henry D. Pfister

    Abstract: This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works directly with discrete-time stochastic processes and yields precise non-asymptotic convergence guarantees under broad assumptions. The key insight is to couple the s… ▽ More

    Submitted 23 June, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  16. arXiv:2501.13198  [pdf, other

    cs.LG

    SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

    Authors: Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, Ying Wei

    Abstract: Continual Learning (CL) with foundation models has recently emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks. However, existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks, which poses significant scalability challenges as the… ▽ More

    Submitted 6 March, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  17. arXiv:2501.05748  [pdf, ps, other

    cs.IT

    From Bit to Block: Decoding on Erasure Channels

    Authors: Henry D. Pfister, Oscar Sprumont, Gilles Zémor

    Abstract: We provide a general framework for bounding the block error threshold of a linear code $C\subseteq \mathbb{F}_2^N$ over the erasure channel in terms of its bit error threshold. Our approach relies on understanding the minimum support weight of any $r$-dimensional subcode of $C$, for all small values of $r$. As a proof of concept, we use our machinery to obtain a new proof of the celebrated result… ▽ More

    Submitted 25 February, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

    Comments: Slightly simplified and improved the analysis

  18. arXiv:2412.14462  [pdf, other

    cs.CV

    Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

    Authors: Jixuan He, Wanhua Li, Ye Liu, Junsik Kim, Donglai Wei, Hanspeter Pfister

    Abstract: As a common image editing operation, image composition involves integrating foreground objects into background scenes. In this paper, we expand the application of the concept of Affordance from human-centered image composition tasks to a more general object-scene composition framework, addressing the complex interplay between foreground objects and background scenes. Following the principle of Aff… ▽ More

    Submitted 20 April, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Code is available at: https://github.com/KaKituken/affordance-aware-any. Project page at: https://kakituken.github.io/affordance-any.github.io/

  19. arXiv:2412.08817  [pdf, ps, other

    cs.IT quant-ph

    Cluster Decomposition for Improved Erasure Decoding of Quantum LDPC Codes

    Authors: Hanwen Yao, Mert Gökduman, Henry D. Pfister

    Abstract: We introduce a new erasure decoder that applies to arbitrary quantum LDPC codes. Dubbed the cluster decoder, it generalizes the decomposition idea of Vertical-Horizontal (VH) decoding introduced by Connelly et al. in 2022. Like the VH decoder, the idea is to first run the peeling decoder and then post-process the resulting stopping set. The cluster decoder breaks the stopping set into a tree of cl… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: 12 pages, 8 figures

  20. arXiv:2412.07947  [pdf, other

    cs.LG cs.AI

    GPT-2 Through the Lens of Vector Symbolic Architectures

    Authors: Johannes Knittel, Tushaar Gangavarapu, Hendrik Strobelt, Hanspeter Pfister

    Abstract: Understanding the general priniciples behind transformer models remains a complex endeavor. Experiments with probing and disentangling features using sparse autoencoders (SAE) suggest that these models might manage linear features embedded as directions in the residual stream. This paper explores the resemblance between decoder-only transformer architecture and vector symbolic architectures (VSA)… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: 2nd Workshop on Attributing Model Behavior at Scale (ATTRIB) at NeurIPS 2024

  21. Erasure Decoding for Quantum LDPC Codes via Belief Propagation with Guided Decimation

    Authors: Mert Gökduman, Hanwen Yao, Henry D. Pfister

    Abstract: Quantum low-density parity-check (LDPC) codes are a promising family of quantum error-correcting codes for fault tolerant quantum computing with low overhead. Decoding quantum LDPC codes on quantum erasure channels has received more attention recently due to advances in erasure conversion for various types of qubits including neutral atoms, trapped ions, and superconducting qubits. Belief propagat… ▽ More

    Submitted 15 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: Published in 2024 60th Annual Allerton Conference Proceedings

  22. arXiv:2411.00257  [pdf, other

    cs.AI cs.CV

    Understanding Graphical Perception in Data Visualization through Zero-shot Prompting of Vision-Language Models

    Authors: Grace Guo, Jenna Jiayi Kang, Raj Sanjay Shah, Hanspeter Pfister, Sashank Varma

    Abstract: Vision Language Models (VLMs) have been successful at many chart comprehension tasks that require attending to both the images of charts and their accompanying textual descriptions. However, it is not well established how VLM performance profiles map to human-like behaviors. If VLMs can be shown to have human-like chart comprehension abilities, they can then be applied to a broader range of tasks,… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  23. arXiv:2410.21411  [pdf, other

    cs.CV

    SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization

    Authors: Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister

    Abstract: Social relation reasoning aims to identify relation categories such as friends, spouses, and colleagues from images. While current methods adopt the paradigm of training a dedicated network end-to-end using labeled image data, they are limited in terms of generalizability and interpretability. To address these issues, we first present a simple yet well-crafted framework named {\name}, which combin… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024. Project page: https://mengzibin.github.io/SocialGPT.github.io/

  24. arXiv:2410.15581  [pdf, other

    cs.CV cs.LG

    Multimodal Learning for Embryo Viability Prediction in Clinical IVF

    Authors: Junsik Kim, Zhiyi Shi, Davin Jeong, Johannes Knittel, Helen Y. Yang, Yonghyun Song, Wanhua Li, Yicong Li, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister

    Abstract: In clinical In-Vitro Fertilization (IVF), identifying the most viable embryo for transfer is important to increasing the likelihood of a successful pregnancy. Traditionally, this process involves embryologists manually assessing embryos' static morphological features at specific intervals using light microscopy. This manual evaluation is not only time-intensive and costly, due to the need for expe… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: Accepted to MICCAI 2024

  25. arXiv:2410.11201  [pdf, other

    cs.CV cs.AI cs.LG

    Tree of Attributes Prompt Learning for Vision-Language Models

    Authors: Tong Ding, Wanhua Li, Zhongqi Miao, Hanspeter Pfister

    Abstract: Prompt learning has proven effective in adapting vision language models for downstream tasks. However, existing methods usually append learnable prompt tokens solely with the category names to obtain textual features, which fails to fully leverage the rich context indicated in the category name. To address this issue, we propose the Tree of Attributes Prompt learning (TAP), which first instructs L… ▽ More

    Submitted 21 April, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

  26. arXiv:2410.04634  [pdf, other

    cs.CV

    Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models

    Authors: Salma S. Abdel Magid, Weiwei Pan, Simon Warchol, Grace Guo, Junsik Kim, Mahia Rahman, Hanspeter Pfister

    Abstract: Text-to-image (T2I) models are increasingly used in impactful real-life applications. As such, there is a growing need to audit these models to ensure that they generate desirable, task-appropriate images. However, systematically inspecting the associations between prompts and generated content in a human-understandable way remains challenging. To address this, we propose Concept2Concept, a framew… ▽ More

    Submitted 7 May, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

  27. arXiv:2409.13859  [pdf, other

    cs.HC cs.GR

    PanoCoach: Enhancing Tactical Coaching and Communication in Soccer with Mixed-Reality Telepresence

    Authors: Andrew Kang, Hanspeter Pfister, Tica Lin

    Abstract: Soccer, as a dynamic team sport, requires seamless coordination and integration of tactical strategies across all players. Adapting to new tactical systems is a critical but often challenging aspect of soccer at all professional levels. Even the best players can struggle with this process, primarily due to the complexities of conveying and internalizing intricate tactical patterns. Traditional com… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 4 pages, 2 figures; Presented at IEEE VIS Workshop

  28. arXiv:2409.01035  [pdf, other

    cs.CL cs.CV cs.LG

    Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning

    Authors: Chongjie Si, Zhiyi Shi, Shifan Zhang, Xiaokang Yang, Hanspeter Pfister, Wei Shen

    Abstract: Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions (TSDs), which are critical for transitioning large models from… ▽ More

    Submitted 20 April, 2025; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Codes in https://github.com/Chongjie-Si/Subspace-Tuning

  29. arXiv:2408.09064  [pdf, other

    cs.CV cs.LG

    MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

    Authors: Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

    Abstract: Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  30. arXiv:2408.05123  [pdf, other

    cs.HC

    Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video

    Authors: Chunggi Lee, Tica Lin, Hanspeter Pfister, Chen Zhu-Tian

    Abstract: As basketball's popularity surges, fans often find themselves confused and overwhelmed by the rapid game pace and complexity. Basketball tactics, involving a complex series of actions, require substantial knowledge to be fully understood. This complexity leads to a need for additional information and explanation, which can distract fans from the game. To tackle these challenges, we present Sportif… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 14 pages, 8 figures, conference

  31. arXiv:2407.13676  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

    Authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

    Abstract: Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interactive sound source localization. Cross-modal interaction is vital for understanding semantically matched or mismatched audio-visual events, such as sil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Journal Extension of ICCV 2023 paper (arXiV:2309.10724). Code is available at https://github.com/kaistmm/SSLalignment

  32. Lite2Relight: 3D-aware Single Image Portrait Relighting

    Authors: Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt

    Abstract: Achieving photorealistic 3D view synthesis and relighting of human portraits is pivotal for advancing AR/VR applications. Existing methodologies in portrait relighting demonstrate substantial limitations in terms of generalization and 3D consistency, coupled with inaccuracies in physically realistic lighting and identity preservation. Furthermore, personalization from a single view is difficult to… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

  33. arXiv:2406.16935  [pdf, other

    eess.SP cs.AI

    Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex

    Authors: Spandan Madan, Will Xiao, Mingran Cao, Hanspeter Pfister, Margaret Livingstone, Gabriel Kreiman

    Abstract: We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \tex… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  34. arXiv:2406.11331  [pdf, other

    cs.CV cs.IR cs.LG

    They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

    Authors: Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister

    Abstract: Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks. In this work, we propose a novel framework to generate synthetic counterfactual images to create a diverse and balanced dataset that can be use… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  35. arXiv:2406.10772  [pdf, ps, other

    cs.DM

    On the maximal L1 influence of real-valued boolean functions

    Authors: Andrew J. Young, Henry D. Pfister

    Abstract: We show that any sequence of well-behaved (e.g. bounded and non-constant) real-valued functions of $n$ boolean variables $\{f_n\}$ admits a sequence of coordinates whose $L^1$ influence under the $p$-biased distribution, for any $p\in(0,1)$, is $Ω(\text{var}(f_n) \frac{\ln n}{n})$.

    Submitted 15 June, 2024; originally announced June 2024.

  36. arXiv:2405.20643  [pdf, other

    cs.CV cs.AI

    Learning Gaze-aware Compositional GAN

    Authors: Nerea Aranjuelo, Siyu Huang, Ignacio Arganda-Carreras, Luis Unzueta, Oihana Otaegui, Hanspeter Pfister, Donglai Wei

    Abstract: Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data so… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by ETRA 2024 as Full paper, and as journal paper in Proceedings of the ACM on Computer Graphics and Interactive Techniques

    Journal ref: Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2024

  37. arXiv:2404.14435  [pdf, other

    cs.CV eess.IV

    Frenet-Serret Frame-based Decomposition for Part Segmentation of 3D Curvilinear Structures

    Authors: Leslie Gu, Jason Ken Adhinarta, Mikhail Bessmeltsev, Jiancheng Yang, Yongjie Jessica Zhang, Wenjie Yin, Daniel Berger, Jeff Lichtman, Hanspeter Pfister, Donglai Wei

    Abstract: Accurately segmenting 3D curvilinear structures in medical imaging remains challenging due to their complex geometry and the scarcity of diverse, large-scale datasets for algorithm development and evaluation. In this paper, we use dendritic spine segmentation as a case study and address these challenges by introducing a novel Frenet--Serret Frame-based Decomposition, which decomposes 3D curvilinea… ▽ More

    Submitted 24 October, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures

  38. arXiv:2404.01976  [pdf, other

    cs.CV cs.AI cs.LG

    Joint-Task Regularization for Partially Labeled Multi-Task Learning

    Authors: Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister

    Abstract: Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. Most multi-task learning methods depend on fully labeled datasets wherein each input example is accompanied by ground-truth labels for all target tasks. Unfortunately, curating such datasets can be prohibitively expensive and impractical, espe… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted paper to CVPR 2024 (main conference)

  39. arXiv:2404.00801  [pdf, other

    cs.CV

    $R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

    Authors: Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen

    Abstract: Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already show… ▽ More

    Submitted 21 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: ECCV 2024 Camera Ready

  40. arXiv:2402.18684  [pdf, ps, other

    quant-ph cs.IT

    Quantum State Compression with Polar Codes

    Authors: Jack Weinberg, Avijit Mandal, Henry D. Pfister

    Abstract: In the quantum compression scheme proposed by Schumacher, Alice compresses a message that Bob decompresses. In that approach, there is some probability of failure and, even when successful, some distortion of the state. For sufficiently large blocklengths, both of these imperfections can be made arbitrarily small while achieving a compression rate that asymptotically approaches the source coding b… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Extended Version of ISIT 2024 Submission

  41. arXiv:2402.10962  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

    Authors: Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating… ▽ More

    Submitted 25 July, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: COLM 2024; Code and data: https://github.com/likenneth/persona_drift

  42. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  43. arXiv:2402.03700  [pdf, other

    cs.HC cs.AI

    GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

    Authors: Tica Lin, Hanspeter Pfister, Jui-Hsien Wang

    Abstract: The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: To Appear in IEEE PacificVis 2024

  44. arXiv:2401.15856  [pdf, other

    cs.LG cs.AI

    The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function

    Authors: Serena Bono, Spandan Madan, Ishaan Grover, Mao Yasueda, Cynthia Breazeal, Hanspeter Pfister, Gabriel Kreiman

    Abstract: Is it better to perform tennis training in a pristine indoor environment or a noisy outdoor one? To model this problem, here we investigate whether shifts in the transition probabilities between the training and testing environments in reinforcement learning problems can lead to better performance under certain conditions. We generate new Markov Decision Processes (MDPs) starting from a given MDP,… ▽ More

    Submitted 8 January, 2025; v1 submitted 28 January, 2024; originally announced January 2024.

  45. arXiv:2401.13961  [pdf, other

    cs.CV

    TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images

    Authors: Jia Wan, Wanhua Li, Jason Ken Adhinarta, Atmadeep Banerjee, Evelina Sjostedt, Jingpeng Wu, Jeff Lichtman, Hanspeter Pfister, Donglai Wei

    Abstract: While imaging techniques at macro and mesoscales have garnered substantial attention and resources, microscale Volume Electron Microscopy (vEM) imaging, capable of revealing intricate vascular details, has lacked the necessary benchmarking infrastructure. In this paper, we address a significant gap in this field of neuroimaging by introducing the first-in-class public benchmark, BvEM, designed spe… ▽ More

    Submitted 15 August, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: BvEM-Mouse can be visualized at: https://tinyurl.com/yc2s38x9

  46. arXiv:2401.07167  [pdf, ps, other

    cs.IT

    Polar Codes for CQ Channels: Decoding via Belief-Propagation with Quantum Messages

    Authors: Avijit Mandal, S. Brandsen, Henry D. Pfister

    Abstract: This paper considers the design and decoding of polar codes for general classical-quantum (CQ) channels. It focuses on decoding via belief-propagation with quantum messages (BPQM) and, in particular, the idea of paired-measurement BPQM (PM-BPQM) decoding. Since the PM-BPQM decoder admits a classical density evolution (DE) analysis, one can use DE to design a polar code for any CQ channel and then… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  47. arXiv:2312.16084  [pdf, other

    cs.CV

    LangSplat: 3D Language Gaussian Splatting

    Authors: Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister

    Abstract: Humans live in a 3D world and commonly use natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP langua… ▽ More

    Submitted 31 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project Page: https://langsplat.github.io

  48. arXiv:2312.14965  [pdf, other

    cs.CV cs.LG

    Unraveling the Temporal Dynamics of the Unet in Diffusion Models

    Authors: Vidya Prasad, Chen Zhu-Tian, Anna Vilanova, Hanspeter Pfister, Nicola Pezzotti, Hendrik Strobelt

    Abstract: Diffusion models have garnered significant attention since they can effectively learn complex multivariate Gaussian distributions, resulting in diverse, high-quality outcomes. They introduce Gaussian noise into training data and reconstruct the original data iteratively. Central to this iterative process is a single Unet, adapting across time steps to facilitate generation. Recent work revealed th… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  49. arXiv:2312.10950  [pdf, other

    cs.IT quant-ph

    Belief Propagation Decoding of Quantum LDPC Codes with Guided Decimation

    Authors: Hanwen Yao, Waleed Abu Laban, Christian Häger, Alexandre Graell i Amat, Henry D. Pfister

    Abstract: Quantum low-density parity-check (QLDPC) codes have emerged as a promising technique for quantum error correction. A variety of decoders have been proposed for QLDPC codes and many of them utilize belief propagation (BP) decoding in some fashion. However, the use of BP decoding for degenerate QLDPC codes is known to have issues with convergence. These issues are typically attributed to short cycle… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 19 pages, 8 figures

  50. arXiv:2310.16783  [pdf, other

    cs.CV

    S$^3$-TTA: Scale-Style Selection for Test-Time Augmentation in Biomedical Image Segmentation

    Authors: Kangxian Xie, Siyu Huang, Sebastian Andres Cajas Ordonez, Hanspeter Pfister, Donglai Wei

    Abstract: Deep-learning models have been successful in biomedical image segmentation. To generalize for real-world deployment, test-time augmentation (TTA) methods are often used to transform the test image into different versions that are hopefully closer to the training domain. Unfortunately, due to the vast diversity of instance scale and image styles, many augmented test images produce undesirable resul… ▽ More

    Submitted 6 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.