Skip to main content

Showing 1–50 of 108 results for author: Sha, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07110  [pdf

    cs.HC cs.CV

    DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems

    Authors: Tong Zhang, Fenghua Shao, Runsheng Zhang, Yifan Zhuang, Liuqingqing Yang

    Abstract: Based on the DeepSORT algorithm, this study explores the application of visual tracking technology in intelligent human-computer interaction, especially in the field of gesture recognition and tracking. With the rapid development of artificial intelligence and deep learning technology, visual-based interaction has gradually replaced traditional input devices and become an important way for intelli… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  2. arXiv:2504.09737  [pdf, other

    cs.AI cs.CL cs.HC cs.LG

    Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025

    Authors: Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou

    Abstract: Peer review at AI conferences is stressed by rapidly rising submission volumes, leading to deteriorating review quality and increased author dissatisfaction. To address these issues, we developed Review Feedback Agent, a system leveraging multiple large language models (LLMs) to improve review clarity and actionability by providing automated feedback on vague comments, content misunderstandings, a… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 30 pages, 7 figures

  3. arXiv:2503.20229  [pdf

    cs.HC

    Automated UI Interface Generation via Diffusion Models: Enhancing Personalization and Efficiency

    Authors: Yifei Duan, Liuqingqing Yang, Tong Zhang, Zhijun Song, Fenghua Shao

    Abstract: This study proposes a UI interface generation method based on a diffusion model, aiming to achieve high-quality, diversified, and personalized interface design through generative artificial intelligence technology. The diffusion model is based on its step-by-step denoising generation process. By combining the conditional generation mechanism, design optimization module, and user feedback mechanism… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2503.17523  [pdf, other

    cs.CL cs.AI

    Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models

    Authors: Linlu Qiu, Fei Sha, Kelsey Allen, Yoon Kim, Tal Linzen, Sjoerd van Steenkiste

    Abstract: Artificial intelligence systems based on large language models (LLMs) are increasingly used as agents that interact with users and with the world. To do so successfully, LLMs need to construct internal representations of the world and form probabilistic beliefs about those representations. To provide a user with personalized recommendations, for example, the LLM needs to gradually infer the user's… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  5. arXiv:2502.10961  [pdf, other

    cs.LG cs.AI

    Graders should cheat: privileged information enables expert-level automated evaluations

    Authors: Jin Peng Zhou, Sébastien M. R. Arnold, Nan Ding, Kilian Q. Weinberger, Nan Hua, Fei Sha

    Abstract: Auto-evaluating language models (LMs), i.e., using a grader LM to evaluate the candidate LM, is an appealing way to accelerate the evaluation process and the cost associated with it. But this presents a paradox: how can we trust the grader LM, which is presumably weaker than the candidate LM, to assess problems that are beyond the frontier of the capabilities of either model or both? For instance,… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

  6. arXiv:2501.09426  [pdf, other

    cs.CL

    AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling

    Authors: Ancheng Xu, Di Yang, Renhao Li, Jingwei Zhu, Minghuan Tan, Min Yang, Wanxin Qiu, Mingchen Ma, Haihong Wu, Bingyu Li, Feng Sha, Chengming Li, Xiping Hu, Qiang Qu, Derek F. Wong, Ruifeng Xu

    Abstract: Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues, while online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and widely used approach in psychological counseling. The advent of large language models (LLMs) and a… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  7. arXiv:2412.18321  [pdf

    cs.CV

    Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer

    Authors: Fenghua Shao, Tong Zhang, Shang Gao, Qi Sun, Liuqingqing Yang

    Abstract: This study mainly explores the application of natural gesture recognition based on computer vision in human-computer interaction, aiming to improve the fluency and naturalness of human-computer interaction through gesture recognition technology. In the fields of virtual reality, augmented reality and smart home, traditional input methods have gradually failed to meet the needs of users for interac… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  8. arXiv:2412.08079  [pdf, other

    cs.LG math.NA physics.ao-ph

    Statistical Downscaling via High-Dimensional Distribution Matching with Generative Models

    Authors: Zhong Yi Wan, Ignacio Lopez-Gomez, Robert Carver, Tapio Schneider, John Anderson, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: Statistical downscaling is a technique used in climate modeling to increase the resolution of climate simulations. High-resolution climate information is essential for various high-impact applications, including natural hazard risk assessment. However, simulating climate at high resolution is intractable. Thus, climate simulations are often conducted at a coarse scale and then downscaled to the de… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  9. arXiv:2412.04746  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance

    Authors: Xuchan Bao, Judith Yue Li, Zhong Yi Wan, Kun Su, Timo Denk, Joonseok Lee, Dima Kuzmin, Fei Sha

    Abstract: Modern music retrieval systems often rely on fixed representations of user preferences, limiting their ability to capture users' diverse and uncertain retrieval needs. To address this limitation, we introduce Diff4Steer, a novel generative retrieval framework that employs lightweight diffusion models to synthesize diverse seed embeddings from user queries that represent potential directions for mu… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: NeurIPS 2024 Creative AI Track

    Journal ref: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

  10. arXiv:2411.16773  [pdf, other

    cs.CV

    MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing

    Authors: Feifei Shao, Ping Liu, Zhao Wang, Yawei Luo, Hongwei Wang, Jun Xiao

    Abstract: Point cloud processing (PCP) encompasses tasks like reconstruction, denoising, registration, and segmentation, each often requiring specialized models to address unique task characteristics. While in-context learning (ICL) has shown promise across tasks by using a single model with task-specific demonstration prompts, its application to PCP reveals significant limitations. We identify inter-task a… ▽ More

    Submitted 27 November, 2024; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: 15 pages, 6 figures, 3 tables

  11. arXiv:2411.15598  [pdf

    cs.HC

    Optimizing Gesture Recognition for Seamless UI Interaction Using Convolutional Neural Networks

    Authors: Qi Sun, Tong Zhang, Shang Gao, Liuqingqing Yang, Fenghua Shao

    Abstract: This study introduces an advanced gesture recognition and user interface (UI) interaction system powered by deep learning, highlighting its transformative impact on UI design and functionality. By utilizing optimized convolutional neural networks (CNNs), the system achieves high-precision gesture recognition, significantly improving user interactions with digital interfaces. The process begins wit… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  12. arXiv:2411.07728  [pdf, other

    cs.CV cs.AI eess.IV

    No-Reference Point Cloud Quality Assessment via Graph Convolutional Network

    Authors: Wu Chen, Qiuping Jiang, Wei Zhou, Feng Shao, Guangtao Zhai, Weisi Lin

    Abstract: Three-dimensional (3D) point cloud, as an emerging visual media format, is increasingly favored by consumers as it can provide more realistic visual information than two-dimensional (2D) data. Similar to 2D plane images and videos, point clouds inevitably suffer from quality degradation and information loss through multimedia communication systems. Therefore, automatic point cloud quality assessme… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Accepted by IEEE Transactions on Multimedia

  13. arXiv:2410.01776  [pdf, other

    physics.ao-ph cs.LG

    Dynamical-generative downscaling of climate model ensembles

    Authors: Ignacio Lopez-Gomez, Zhong Yi Wan, Leonardo Zepeda-Núñez, Tapio Schneider, John Anderson, Fei Sha

    Abstract: Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  14. arXiv:2409.18359  [pdf, other

    cs.LG math.NA physics.flu-dyn

    Generative AI for fast and accurate statistical computation of fluids

    Authors: Roberto Molinaro, Samuel Lanthaler, Bogdan Raonić, Tobias Rohner, Victor Armegioiu, Stephan Simonis, Dana Grund, Yannick Ramic, Zhong Yi Wan, Fei Sha, Siddhartha Mishra, Leonardo Zepeda-Núñez

    Abstract: We present a generative AI algorithm for addressing the pressing task of fast, accurate, and robust statistical computation of three-dimensional turbulent fluid flows. Our algorithm, termed as GenCFD, is based on an end-to-end conditional score-based diffusion model. Through extensive numerical experimentation with a set of challenging fluid flows, we demonstrate that GenCFD provides an accurate a… ▽ More

    Submitted 2 February, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: 120 pages, 33 figures

  15. arXiv:2409.09217  [pdf, other

    math.NA cs.LG

    Rational-WENO: A lightweight, physically-consistent three-point weighted essentially non-oscillatory scheme

    Authors: Shantanu Shahane, Sheide Chammas, Deniz A. Bezgin, Aaron B. Buhendwa, Steffen J. Schmidt, Nikolaus A. Adams, Spencer H. Bryngelson, Yi-Fan Chen, Qing Wang, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: Conventional WENO3 methods are known to be highly dissipative at lower resolutions, introducing significant errors in the pre-asymptotic regime. In this paper, we employ a rational neural network to accurately estimate the local smoothness of the solution, dynamically adapting the stencil weights based on local solution features. As rational neural networks can represent fast transitions between s… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  16. arXiv:2408.02688  [pdf, other

    cs.LG math.DS physics.ao-ph physics.flu-dyn

    A probabilistic framework for learning non-intrusive corrections to long-time climate simulations from short-time training data

    Authors: Benedikt Barthel Sorensen, Leonardo Zepeda-Núñez, Ignacio Lopez-Gomez, Zhong Yi Wan, Rob Carver, Fei Sha, Themistoklis Sapsis

    Abstract: Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for application… ▽ More

    Submitted 22 November, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  17. arXiv:2402.04467  [pdf, other

    cs.LG math.DS

    DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

    Authors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invari… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/ergodic

  18. NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation

    Authors: Shaofei Li, Feng Dong, Xusheng Xiao, Haoyu Wang, Fei Shao, Jiedong Chen, Yao Guo, Xiangqun Chen, Ding Li

    Abstract: Advanced Persistent Threats (APT) attacks have plagued modern enterprises, causing significant financial losses. To counter these attacks, researchers propose techniques that capture the complex and stealthy scenarios of APT attacks by using provenance graphs to model system entities and their dependencies. Particularly, to accelerate attack detection and reduce financial losses, online provenance… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: The final version of this paper is going to appear in the Conference on Network and Distributed System Security Symposium (NDSS'24), 26 Feb - 1 Mar 2024, San Diego, California

  19. arXiv:2311.00445  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

    Authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen

    Abstract: A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate such human biases, or are they able to overcome them? Focusing on the case of sy… ▽ More

    Submitted 11 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  20. arXiv:2310.19956  [pdf, other

    cs.CL

    The Impact of Depth on Compositional Generalization in Transformer Language Models

    Authors: Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

    Abstract: To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by theoretical and empirical work, that deeper transformers generalize more compositionally. Simply adding layers increases the total number o… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NAACL 2024

  21. arXiv:2308.15560  [pdf, other

    physics.ao-ph cs.AI

    WeatherBench 2: A benchmark for the next generation of data-driven global weather models

    Authors: Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, Fei Sha

    Abstract: WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and… ▽ More

    Submitted 26 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  22. arXiv:2307.09972  [pdf, other

    cs.CV

    Fine-grained Text-Video Retrieval with Frozen Image Encoders

    Authors: Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu

    Abstract: State-of-the-art text-video retrieval (TVR) methods typically utilize CLIP and cosine similarity for efficient retrieval. Meanwhile, cross attention methods, which employ a transformer decoder to compute attention between each text query and all frames in a video, offer a more comprehensive interaction between text and videos. However, these methods lack important fine-grained spatial information… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  23. arXiv:2306.14066  [pdf, other

    cs.LG physics.ao-ph

    SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models

    Authors: Lizao Li, Rob Carver, Ignacio Lopez-Gomez, Fei Sha, John Anderson

    Abstract: Uncertainty quantification is crucial to decision-making. A prominent example is probabilistic forecasting in numerical weather prediction. The dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. This is done by running many physics-based simulations under different conditions, which is a computationally costly process. We propose to amorti… ▽ More

    Submitted 8 October, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

    Comments: fixed a mistake of the previous version; the paper has not been submitted to neurips 2023

  24. arXiv:2306.09224  [pdf, other

    cs.CV

    Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

    Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

    Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More

    Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: ICCV'23

  25. arXiv:2306.07526  [pdf, other

    cs.LG cs.AI

    User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

    Authors: Marc Finzi, Anudhyan Boral, Andrew Gordon Wilson, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about out… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: ICML 2023 Conference

  26. arXiv:2306.01174  [pdf, other

    cs.LG math.NA

    Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations

    Authors: Anudhyan Boral, Zhong Yi Wan, Leonardo Zepeda-Núñez, James Lottes, Qing Wang, Yi-fan Chen, John Roberts Anderson, Fei Sha

    Abstract: We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginaliz… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 18 pages

  27. arXiv:2305.15618  [pdf, other

    cs.LG physics.app-ph

    Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models

    Authors: Zhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optim… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 (spotlight)

  28. arXiv:2305.15354  [pdf, other

    cs.CV

    Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization

    Authors: Feifei Shao, Yawei Luo, Lei Chen, Ping Liu, Wei Yang, Yi Yang, Jun Xiao

    Abstract: Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the ori… ▽ More

    Submitted 9 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 10 pages, 6 figures, 8 tables

  29. arXiv:2305.06594  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

    Authors: Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

    Abstract: Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally alig… ▽ More

    Submitted 22 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: accepted at AAAI 2024, music samples available at https://tinyurl.com/v2meow

  30. arXiv:2302.06009  [pdf, other

    cs.LG cs.CV

    Policy-Induced Self-Supervision Improves Representation Finetuning in Visual RL

    Authors: Sébastien M. R. Arnold, Fei Sha

    Abstract: We study how to transfer representations pretrained on source tasks to target tasks in visual percept based RL. We analyze two popular approaches: freezing or finetuning the pretrained representations. Empirical studies on a set of popular tasks reveal several properties of pretrained representations. First, finetuning is required even when pretrained representations perfectly capture the informat… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  31. arXiv:2301.10448  [pdf, other

    cs.CL cs.AI cs.LG

    Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

    Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William Cohen

    Abstract: Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs… ▽ More

    Submitted 2 June, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: ICML 2023

  32. arXiv:2301.10391  [pdf, other

    cs.LG physics.comp-ph

    Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems

    Authors: Zhong Yi Wan, Leonardo Zepeda-Núñez, Anudhyan Boral, Fei Sha

    Abstract: We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. Those systems have slow-decaying Kolmogorov n-width that hinders standard methods, including reduced order modeling, from producing high-fidelity simulations at low cost. In this work, we construct hypernetwork-based laten… ▽ More

    Submitted 6 February, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: 25 pages, 9 figures

  33. arXiv:2301.09416  [pdf, other

    cs.CV

    Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

    Authors: Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu

    Abstract: Most existing transformer based video instance segmentation methods extract per frame features independently, hence it is challenging to solve the appearance deformation problem. In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder. Specifically, in transformer encoder, we propo… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  34. arXiv:2301.01060  [pdf, other

    cs.CV

    Knowledge-guided Causal Intervention for Weakly-supervised Object Localization

    Authors: Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao

    Abstract: Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels. First, the ``entangled context'' issue arises from object-context co-occurrence (\eg, fish and water), making the model inspection hard to distinguish object boundaries clearly. Sec… ▽ More

    Submitted 12 March, 2024; v1 submitted 3 January, 2023; originally announced January 2023.

    Comments: 13 pages, 7 figures, 7 tables

  35. arXiv:2212.08153  [pdf, other

    cs.CL cs.AI cs.LG

    FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

    Authors: Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen

    Abstract: Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while… ▽ More

    Submitted 2 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: ACL Findings 2023

  36. arXiv:2209.14899  [pdf, other

    cs.CL

    Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing

    Authors: Yury Zemlyanskiy, Michiel de Jong, Joshua Ainslie, Panupong Pasupat, Peter Shaw, Linlu Qiu, Sumit Sanghai, Fei Sha

    Abstract: A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of que… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: To appear in the proceedings of COLING 2022

  37. arXiv:2208.00623  [pdf, other

    cs.CV cs.MM eess.IV

    Quality Evaluation of Arbitrary Style Transfer: Subjective Study and Objective Metric

    Authors: Hangwei Chen, Feng Shao, Xiongli Chai, Yuese Gu, Qiuping Jiang, Xiangchao Meng, Yo-Sung Ho

    Abstract: Arbitrary neural style transfer is a vital topic with great research value and wide industrial application, which strives to render the structure of one image using the style of another. Recent researches have devoted great efforts on the task of arbitrary style transfer (AST) for improving the stylization quality. However, there are very few explorations about the quality evaluation of AST images… ▽ More

    Submitted 29 January, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology 2022, Code and Dataset: https://github.com/Hangwei-Chen/AST-IQAD-SRQE

  38. arXiv:2205.14205  [pdf, other

    cs.LG

    ALMA: Hierarchical Learning for Composite Multi-Agent Tasks

    Authors: Shariq Iqbal, Robby Costales, Fei Sha

    Abstract: Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and… ▽ More

    Submitted 25 September, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022 Camera Ready

  39. arXiv:2205.12253  [pdf, other

    cs.CL

    Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

    Authors: Linlu Qiu, Peter Shaw, Panupong Pasupat, Tianze Shi, Jonathan Herzig, Emily Pitler, Fei Sha, Kristina Toutanova

    Abstract: Despite their strong performance on many tasks, pre-trained language models have been shown to struggle on out-of-distribution compositional generalization. Meanwhile, recent work has shown considerable improvements on many NLP tasks from model scaling. Can scaling up model size also improve compositional generalization in semantic parsing? We evaluate encoder-decoder models up to 11B parameters a… ▽ More

    Submitted 24 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  40. arXiv:2203.12686  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Possibility Before Utility: Learning And Using Hierarchical Affordances

    Authors: Robby Costales, Shariq Iqbal, Fei Sha

    Abstract: Reinforcement learning algorithms struggle on tasks with complex hierarchical dependency structures. Humans and other intelligent agents do not waste time assessing the utility of every high-level action in existence, but instead only consider ones they deem possible in the first place. By focusing only on what is feasible, or "afforded", at the present moment, an agent can spend more time both ev… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: ICLR 2022 camera-ready

  41. arXiv:2202.12588  [pdf, other

    cs.CV

    Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

    Authors: Feifei Shao, Yawei Luo, Ping Liu, Jie Chen, Yi Yang, Yulei Lu, Jun Xiao

    Abstract: The expensive annotation cost is notoriously known as the main constraint for the development of the point cloud semantic segmentation technique. Active learning methods endeavor to reduce such cost by selecting and labeling only a subset of the point clouds, yet previous attempts ignore the spatial-structural diversity of the selected samples, inducing the model to select clustered candidates wit… ▽ More

    Submitted 18 April, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: 9 pages, 6 figures, 2 tables

  42. arXiv:2202.07808  [pdf, other

    cs.LG

    Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

    Authors: Sebastien M. R. Arnold, Pierre L'Ecuyer, Liyu Chen, Yi-fan Chen, Fei Sha

    Abstract: Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration. These integrals are rarely analytically solvable and typically estimated with the Monte Carlo method, which induces high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We co… ▽ More

    Submitted 21 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: AISTATS 2022 camera ready; more info at: http://seba1511.net/projects/qrl/

  43. arXiv:2112.07610  [pdf, other

    cs.CL

    Improving Compositional Generalization with Latent Structure and Data Augmentation

    Authors: Linlu Qiu, Peter Shaw, Panupong Pasupat, Paweł Krzysztof Nowak, Tal Linzen, Fei Sha, Kristina Toutanova

    Abstract: Generic unstructured neural networks have been shown to struggle on out-of-distribution compositional generalization. Compositional data augmentation via example recombination has transferred some prior knowledge about compositionality to such black-box neural models for several semantic parsing tasks, but this often required task-specific engineering or provided limited gains. We present a more… ▽ More

    Submitted 4 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  44. arXiv:2112.07175  [pdf, other

    cs.CV

    Co-training Transformer with Videos and Images Improves Action Recognition

    Authors: Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha

    Abstract: In learning action recognition, models are typically pre-trained on object recognition with images, such as ImageNet, and later fine-tuned on target action recognition with videos. This approach has achieved good empirical performance especially with recent transformer-based video architectures. While recently many works aim to design more advanced transformer architectures for action recognition,… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  45. arXiv:2111.05013  [pdf, other

    cs.CL cs.LG

    Learning to Generalize Compositionally by Transferring Across Semantic Parsing Tasks

    Authors: Wang Zhu, Peter Shaw, Tal Linzen, Fei Sha

    Abstract: Neural network models often generalize poorly to mismatched domains or distributions. In NLP, this issue arises in particular when models are expected to generalize compositionally, that is, to novel combinations of familiar words and constructions. We investigate learning representations that facilitate transfer learning from one compositional task to another: the representation and the task-spec… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  46. arXiv:2111.01008  [pdf, other

    cs.LG physics.comp-ph

    HyperPINN: Learning parameterized differential equations with physics-informed hypernetworks

    Authors: Filipe de Avila Belbute-Peres, Yi-fan Chen, Fei Sha

    Abstract: Many types of physics-informed neural network models have been proposed in recent years as approaches for learning solutions to differential equations. When a particular task requires solving a differential equation at multiple parameterizations, this requires either re-training the model, or expanding its representation capacity to include the parameterization -- both solution that increase its c… ▽ More

    Submitted 28 October, 2021; originally announced November 2021.

  47. arXiv:2110.06176  [pdf, other

    cs.CL cs.AI cs.LG

    Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

    Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, William Cohen

    Abstract: Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a tab… ▽ More

    Submitted 19 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

  48. arXiv:2109.14115  [pdf, other

    cs.CV cs.AI

    Visually Grounded Concept Composition

    Authors: Bowen Zhang, Hexiang Hu, Linlu Qiu, Peter Shaw, Fei Sha

    Abstract: We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images. We propose Concept and Relation Graph (CRG), which builds on top of constituency analysis and consists of recursively combined concepts with predicate functions. Meanwhile, we propose a concept composition neural network called Composer to leverage the CRG for visually grounded concept lear… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021

  49. arXiv:2109.12243  [pdf, other

    cs.CL

    Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

    Authors: Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha

    Abstract: We analyze the grounded SCAN (gSCAN) benchmark, which was recently proposed to study systematic generalization for grounded language understanding. First, we study which aspects of the original benchmark can be solved by commonly used methods in multi-modal research. We find that a general-purpose Transformer-based model with cross-modal attention achieves strong performance on a majority of the g… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  50. Towards Top-Down Just Noticeable Difference Estimation of Natural Images

    Authors: Qiuping Jiang, Zhentao Liu, Shiqi Wang, Feng Shao, Weisi Lin

    Abstract: Just noticeable difference (JND) of natural images refers to the maximum pixel intensity change magnitude that typical human visual system (HVS) cannot perceive. Existing efforts on JND estimation mainly dedicate to modeling the diverse masking effects in either/both spatial or/and frequency domains, and then fusing them into an overall JND estimate. In this work, we turn to a dramatically differe… ▽ More

    Submitted 24 May, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: 16 pages, 16 figures

    Journal ref: IEEE Transactions on Image Processing, 2022