Skip to main content

Showing 1–50 of 129 results for author: Liao, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21028  [pdf, ps, other

    cs.LG

    TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence

    Authors: Feng Jiang, Mangal Prakash, Hehuan Ma, Jianyuan Deng, Yuzhi Guo, Amina Mollaysa, Tommaso Mansi, Rui Liao, Junzhou Huang

    Abstract: Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, te… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.15864  [pdf, ps, other

    cs.LG

    Improving Rectified Flow with Boundary Conditions

    Authors: Xixi Hu, Runlong Liao, Keyang Xu, Bo Liu, Yeqing Li, Eugene Ie, Hongliang Fei, Qiang Liu

    Abstract: Rectified Flow offers a simple and effective approach to high-quality generative modeling by learning a velocity field. However, we identify a limitation in directly modeling the velocity with an unconstrained neural network: the learned velocity often fails to satisfy certain boundary conditions, leading to inaccurate velocity field estimations that deviate from the desired ODE. This issue is par… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 14 pages

  3. arXiv:2506.11182  [pdf, ps, other

    q-bio.GN cs.AI

    Multimodal Modeling of CRISPR-Cas12 Activity Using Foundation Models and Chromatin Accessibility Data

    Authors: Azim Dehghani Amirabad, Yanfei Zhang, Artem Moskalev, Sowmya Rajesh, Tommaso Mansi, Shuwei Li, Mangal Prakash, Rui Liao

    Abstract: Predicting guide RNA (gRNA) activity is critical for effective CRISPR-Cas12 genome editing but remains challenging due to limited data, variation across protospacer adjacent motifs (PAMs-short sequence requirements for Cas binding), and reliance on large-scale training. We investigate whether pre-trained biological foundation model originally trained on transcriptomic data can improve gRNA activit… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: This manuscript has been accepted by ICML workshop 2025

  4. arXiv:2506.10453  [pdf, ps, other

    cs.CV eess.IV

    Rethinking Generative Human Video Coding with Implicit Motion Transformation

    Authors: Bolin Chen, Ru-Ling Liao, Jie Chen, Yan Ye

    Abstract: Beyond traditional hybrid-based video codec, generative video codec could achieve promising compression performance by evolving high-dimensional signals into compact feature representations for bitstream compactness at the encoder side and developing explicit motion fields as intermediate supervision for high-quality reconstruction at the decoder side. This paradigm has achieved significant succes… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  5. arXiv:2506.08936  [pdf, ps, other

    cs.LG

    BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models

    Authors: Amina Mollaysa, Artem Moskale, Pushpak Pati, Tommaso Mansi, Mangal Prakash, Rui Liao

    Abstract: We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Proceedings of ICML 2025 Workshop on Multi-modal Foundation Proceedings of ICML 2025 Workshop on Multi-modal Foundation Proceedings of ICML 2025 Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences

  6. arXiv:2506.08862  [pdf, ps, other

    cs.CV cs.LG

    StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

    Authors: Zike Wu, Qi Yan, Xuanyu Yi, Lele Wang, Renjie Liao

    Abstract: Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams is crucial for numerous real-world applications. However, existing methods struggle to jointly address three key challenges: 1) processing uncalibrated inputs in real time, 2) accurately modeling dynamic scene evolution, and 3) maintaining long-term stability and computational efficiency. To this end, we introduce Stream… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  7. arXiv:2506.08541  [pdf, ps, other

    cs.CV cs.AI

    TrajFlow: Multi-modal Motion Prediction via Flow Matching

    Authors: Qi Yan, Brian Zhang, Yutong Zhang, Daniel Yang, Joshua White, Di Chen, Jiachao Liu, Langechuan Liu, Binnan Zhuang, Shaoshuai Shi, Renjie Liao

    Abstract: Efficient and accurate motion prediction is crucial for ensuring safety and informed decision-making in autonomous driving, particularly under dynamic real-world conditions that necessitate multi-modal forecasts. We introduce TrajFlow, a novel flow matching-based motion prediction framework that addresses the scalability and efficiency challenges of existing generative trajectory prediction method… ▽ More

    Submitted 5 July, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: IROS 2025

  8. arXiv:2506.04542  [pdf, other

    cs.LG

    Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction

    Authors: Yuanpei Gao, Qi Yan, Yan Leng, Renjie Liao

    Abstract: While deep learning methods have achieved strong performance in time series prediction, their black-box nature and inability to explicitly model underlying stochastic processes often limit their generalization to non-stationary data, especially in the presence of abrupt changes. In this work, we introduce Neural MJD, a neural network based non-stationary Merton jump diffusion (MJD) model. Our mode… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  9. arXiv:2506.04439  [pdf, ps, other

    cs.LG

    RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

    Authors: Robin Yadav, Qi Yan, Guy Wolf, Avishek Joey Bose, Renjie Liao

    Abstract: A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product molecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet di… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  10. arXiv:2506.00327  [pdf, ps, other

    cs.CV cs.AI

    Latent Guidance in Diffusion Models for Perceptual Evaluations

    Authors: Shreshth Saini, Ru-Ling Liao, Yan Ye, Alan C. Bovik

    Abstract: Despite recent advancements in latent diffusion models that generate high-dimensional image data and perform various downstream tasks, there has been little exploration into perceptual consistency within these models on the task of No-Reference Image Quality Assessment (NR-IQA). In this paper, we hypothesize that latent diffusion models implicitly exhibit perceptually consistent local regions with… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: 24 Pages, 7 figures, 10 Tables

  11. arXiv:2505.22560  [pdf, ps, other

    cs.LG

    Geometric Hyena Networks for Large-scale Equivariant Learning

    Authors: Artem Moskalev, Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, Tommaso Mansi

    Abstract: Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equivariance and global context at scale. Standard methods such as equivariant self-attention suffer from quadratic complexity, while local methods such as distance-based message passing sacrifice global… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  12. arXiv:2505.16152  [pdf, other

    eess.IV cs.CV

    Compressing Human Body Video with Interactive Semantics: A Generative Approach

    Authors: Bolin Chen, Shanzhi Yin, Hanwei Zhu, Lingyu Zhu, Zihan Zhang, Jie Chen, Ru-Ling Liao, Shiqi Wang, Yan Ye

    Abstract: In this paper, we propose to compress human body video with interactive semantics, which can facilitate video coding to be interactive and controllable by manipulating semantic-level representations embedded in the coded bitstream. In particular, the proposed encoder employs a 3D human model to disentangle nonlinear dynamics and complex motion of human body signal into a series of configurable emb… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  13. arXiv:2504.13109  [pdf, other

    cs.CV

    UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

    Authors: Guanlong Jiao, Biqing Huang, Kuan-Chieh Wang, Renjie Liao

    Abstract: Flow matching models have emerged as a strong alternative to diffusion models, but existing inversion and editing methods designed for diffusion are often ineffective or inapplicable to them. The straight-line, non-crossing trajectories of flow models pose challenges for diffusion-based approaches but also open avenues for novel solutions. In this paper, we introduce a predictor-corrector-based fr… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Project page: https://uniedit-flow.github.io/

  14. arXiv:2503.12015  [pdf, other

    cs.CV

    QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution

    Authors: Donglin Yang, Paul Vicol, Xiaojuan Qi, Renjie Liao, Xiaofan Zhang

    Abstract: Deep learning-based super-resolution (SR) methods often perform pixel-wise computations uniformly across entire images, even in homogeneous regions where high-resolution refinement is redundant. We propose the Quadtree Diffusion Model (QDM), a region-adaptive diffusion framework that leverages a quadtree structure to selectively enhance detail-rich regions while reducing computations in homogeneou… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  15. arXiv:2503.09950  [pdf, other

    cs.CV cs.AI cs.LG

    MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

    Authors: Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, Renjie Liao

    Abstract: In this paper, we address the problem of human trajectory forecasting, which aims to predict the inherently multi-modal future movements of humans based on their past trajectories and other contextual cues. We propose a novel motion prediction conditional flow matching model, termed MoFlow, to predict K-shot future trajectories for all agents in a given scene. We design a novel flow matching loss… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  16. arXiv:2503.09013  [pdf, other

    cs.CV

    Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal

    Authors: Rongxin Liao, Feng Li, Yanyan Wei, Zenglin Shi, Le Zhang, Huihui Bai, Meng Wang

    Abstract: Universal adverse weather removal (UAWR) seeks to address various weather degradations within a unified framework. Recent methods are inspired by prompt learning using pre-trained vision-language models (e.g., CLIP), leveraging degradation-aware prompts to facilitate weather-free image restoration, yielding significant improvements. In this work, we propose CyclicPrompt, an innovative cyclic promp… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  17. arXiv:2503.04483  [pdf, ps, other

    stat.ML cs.LG q-bio.QM

    InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference

    Authors: Tianyu Cui, Song-Jun Xu, Artem Moskalev, Shuwei Li, Tommaso Mansi, Mangal Prakash, Rui Liao

    Abstract: Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases, such as class imbalances of GT interactions, rather than true regulatory mechanisms. To address these issues, we int… ▽ More

    Submitted 8 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: ICML 2025

  18. arXiv:2502.17085  [pdf, other

    cs.CV eess.IV

    Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence

    Authors: Bolin Chen, Hanwei Zhu, Shanzhi Yin, Lingyu Zhu, Jie Chen, Ru-Ling Liao, Shiqi Wang, Yan Ye

    Abstract: Generative model based compact video compression is typically operated within a relative narrow range of bitrates, and often with an emphasis on ultra-low rate applications. There has been an increasing consensus in the video communication industry that full bitrate coverage should be enabled by generative coding. However, this is an extremely difficult task, largely because generation and compres… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  19. arXiv:2501.14275  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

    Authors: Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao

    Abstract: Advances in Large Language Models (LLMs) have sparked interest in their ability to solve Olympiad-level math problems. However, the training and evaluation of these models are constrained by the limited size and quality of available datasets, as creating large-scale data for such advanced problems requires extensive effort from human experts. In addition, current benchmarks are prone to contaminat… ▽ More

    Submitted 26 June, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: ICML 2025 Camera Ready

  20. arXiv:2501.01930  [pdf, other

    cs.LG

    GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction

    Authors: Yuwei Miao, Yuzhi Guo, Hehuan Ma, Jingquan Yan, Feng Jiang, Rui Liao, Junzhou Huang

    Abstract: Exploring the functions of genes and gene products is crucial to a wide range of fields, including medical research, evolutionary biology, and environmental science. However, discovering new functions largely relies on expensive and exhaustive wet lab experiments. Existing methods of automatic function annotation or prediction mainly focus on protein function prediction with sequence, 3D-structure… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: Accept by AAAI-25

  21. arXiv:2412.19413  [pdf, other

    cs.CV

    Multi-scale Latent Point Consistency Models for 3D Shape Generation

    Authors: Bi'an Du, Wei Hu, Renjie Liao

    Abstract: Consistency Models (CMs) have significantly accelerated the sampling process in diffusion models, yielding impressive results in synthesizing high-resolution images. To explore and extend these advancements to point-cloud-based 3D shape generation, we propose a novel Multi-scale Latent Point Consistency Model (MLPCM). Our MLPCM follows a latent diffusion framework and introduces hierarchical level… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

  22. arXiv:2411.17236  [pdf, other

    cs.LG cs.AI

    From Graph Diffusion to Graph Classification

    Authors: Jia Jun Cheng Xian, Sadegh Mahdavi, Renjie Liao, Oliver Schulte

    Abstract: Generative models such as diffusion models have achieved remarkable success in state-of-the-art image and text tasks. Recently, score-based diffusion models have extended their success beyond image generation, showing competitive performance with discriminative methods in image {\em classification} tasks~\cite{zimmermann2021score}. However, their application to classification in the {\em graph} do… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  23. arXiv:2410.23628  [pdf

    eess.IV cs.CV physics.med-ph

    Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data

    Authors: Yucun Hou, Fenglin Zhan, Xin Cheng, Chenxi Li, Ziquan Yuan, Runze Liao, Haihao Wang, Jianlang Hua, Jing Wu, Jianyong Jiang

    Abstract: Positron emission tomography (PET) is a critical tool for diagnosing tumors and neurological disorders but poses radiation risks to patients, particularly to sensitive populations. While reducing injected radiation dose mitigates this risk, it often compromises image quality. To reconstruct full-dose-quality images from low-dose scans, we propose a Cycle-constrained Adversarial Denoising Convoluti… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  24. arXiv:2410.16613  [pdf, other

    eess.SP cs.AI cs.LG cs.NE q-bio.NC

    Real-time Sub-milliwatt Epilepsy Detection Implemented on a Spiking Neural Network Edge Inference Processor

    Authors: Ruixin Lia, Guoxu Zhaoa, Dylan Richard Muir, Yuya Ling, Karla Burelo, Mina Khoei, Dong Wang, Yannan Xing, Ning Qiao

    Abstract: Analyzing electroencephalogram (EEG) signals to detect the epileptic seizure status of a subject presents a challenge to existing technologies aimed at providing timely and efficient diagnosis. In this study, we aimed to detect interictal and ictal periods of epileptic seizures using a spiking neural network (SNN). Our proposed approach provides an online and real-time preliminary diagnosis of epi… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Journal ref: Computers in Biology and Medicine(2024), 183, 109225

  25. arXiv:2410.15105  [pdf, other

    cs.CV

    Standardizing Generative Face Video Compression using Supplemental Enhancement Information

    Authors: Bolin Chen, Yan Ye, Jie Chen, Ru-Ling Liao, Shanzhi Yin, Shiqi Wang, Kaifa Yang, Yue Li, Yiling Xu, Ye-Kui Wang, Shiv Gehlot, Guan-Ming Su, Peng Yin, Sean McCarthy, Gary J. Sullivan

    Abstract: This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI), where a series of compact spatial and temporal representations of a face video signal (i.e., 2D/3D key-points, facial semantics and compact features) can be coded using SEI message and inserted into the coded video bitstream. At the time of writing, the proposed GFVC approach u… ▽ More

    Submitted 18 December, 2024; v1 submitted 19 October, 2024; originally announced October 2024.

  26. arXiv:2410.12459  [pdf, other

    cs.LG cs.CE

    HELM: Hierarchical Encoding for mRNA Language Modeling

    Authors: Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi, Artem Moskalev, Rui Liao

    Abstract: Messenger RNA (mRNA) plays a crucial role in protein synthesis, with its codon structure directly impacting biological properties. While Language Models (LMs) have shown promise in analyzing biological sequences, existing approaches fail to account for the hierarchical nature of mRNA's codon structure. We introduce Hierarchical Encoding for mRNA Language Modeling (HELM), a novel pre-training strat… ▽ More

    Submitted 12 March, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

  27. arXiv:2410.11933  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Beyond Sequence: Impact of Geometric Context for RNA Property Prediction

    Authors: Junjie Xu, Artem Moskalev, Tommaso Mansi, Mangal Prakash, Rui Liao

    Abstract: Accurate prediction of RNA properties, such as stability and interactions, is crucial for advancing our understanding of biological processes and developing RNA-based therapeutics. RNA structures can be represented as 1D sequences, 2D topological graphs, or 3D all-atom models, each offering different insights into its function. Existing works predominantly focus on 1D sequence-based models, which… ▽ More

    Submitted 20 April, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

  28. arXiv:2410.08485  [pdf, other

    eess.IV cs.CV

    Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens

    Authors: Bolin Chen, Shanzhi Yin, Zihan Zhang, Jie Chen, Ru-Ling Liao, Lingyu Zhu, Shiqi Wang, Yan Ye

    Abstract: Recently, deep generative models have greatly advanced the progress of face video coding towards promising rate-distortion performance and diverse application functionalities. Beyond traditional hybrid video coding paradigms, Generative Face Video Compression (GFVC) relying on the strong capabilities of deep generative models and the philosophy of early Model-Based Coding (MBC) can facilitate the… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  29. arXiv:2410.02942  [pdf, other

    cs.LG cs.AI cs.CV

    SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups

    Authors: Yongxing Zhang, Donglin Yang, Renjie Liao

    Abstract: Finite symmetric groups $S_n$ are essential in fields such as combinatorics, physics, and chemistry. However, learning a probability distribution over $S_n$ poses significant challenges due to its intractable size and discrete nature. In this paper, we introduce SymmetricDiffusers, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over $S_n$ by decomp… ▽ More

    Submitted 5 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 Oral

  30. arXiv:2409.20365  [pdf, other

    cs.CV

    VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

    Authors: Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp

    Abstract: In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in… ▽ More

    Submitted 4 October, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Findings; 22 pages; Code: https://github.com/mayhugotong/VideoINSTA

  31. arXiv:2408.14475  [pdf, other

    cs.OH cs.RO

    Crowdsense Roadside Parking Spaces with Dynamic Gap Reduction Algorithm

    Authors: Wenjun Zheng, Zhan Shi, Qianyu Ou, Ruizhi Liao

    Abstract: In the context of smart city development, mobile sensing emerges as a cost-effective alternative to fixed sensing for on-street parking detection. However, its practicality is often challenged by the inherent accuracy limitations arising from detection intervals. This paper introduces a novel Dynamic Gap Reduction Algorithm (DGRA), which is a crowdsensing-based approach aimed at addressing this qu… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  32. arXiv:2407.16124  [pdf, other

    cs.CV

    Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos

    Authors: Jiahe Liu, Youran Qu, Qi Yan, Xiaohui Zeng, Lele Wang, Renjie Liao

    Abstract: Significant advancements have been made in video generative models recently. Unlike image generation, video generation presents greater challenges, requiring not only generating high-quality frames but also ensuring temporal consistency across these frames. Despite the impressive progress, research on metrics for evaluating the quality of generated videos, especially concerning temporal and motion… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  33. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024

  34. arXiv:2407.01049  [pdf, other

    cs.LG

    SE(3)-Hyena Operator for Scalable Equivariant Learning

    Authors: Artem Moskalev, Mangal Prakash, Rui Liao, Tommaso Mansi

    Abstract: Modeling global geometric context while maintaining equivariance is crucial for accurate predictions in many fields such as biology, chemistry, or vision. Yet, this is challenging due to the computational demands of processing high-dimensional data at scale. Existing approaches such as equivariant self-attention or distance-based message passing, suffer from quadratic complexity with respect to se… ▽ More

    Submitted 13 August, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  35. arXiv:2406.08959  [pdf, other

    cs.HC cs.AI

    Beyond Recommendations: From Backward to Forward AI Support of Pilots' Decision-Making Process

    Authors: Zelun Tony Zhang, Sebastian S. Feger, Lucas Dullenkopf, Rulu Liao, Lukas Süsslin, Yuanting Liu, Andreas Butz

    Abstract: AI is anticipated to enhance human decision-making in high-stakes domains like aviation, but adoption is often hindered by challenges such as inappropriate reliance and poor alignment with users' decision-making. Recent research suggests that a core underlying issue is the recommendation-centric design of many AI systems, i.e., they give end-to-end recommendations and ignore the rest of the decisi… ▽ More

    Submitted 20 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to CSCW 2024, to be published in PACM HCI Vol. 8, No. CSCW2

  36. arXiv:2405.00915  [pdf, other

    cs.CV cs.AI cs.LG

    EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

    Authors: Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam

    Abstract: We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by assoc… ▽ More

    Submitted 27 February, 2025; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: Nectar Track at 3DV 2025

  37. arXiv:2404.17170  [pdf, other

    cs.CV eess.IV

    Image Quality Assessment With Compressed Sampling

    Authors: Ronghua Liao, Chen Hui, Lang Yuan, Haiqi Zhu, Feng Jiang

    Abstract: No-Reference Image Quality Assessment (NR-IQA) aims at estimating image quality in accordance with subjective human perception. However, most methods focus on exploring increasingly complex networks to improve the final performance,accompanied by limitations on input images. Especially when applied to high-resolution (HR) images, these methods offen have to adjust the size of original image to mee… ▽ More

    Submitted 11 September, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  38. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  39. arXiv:2404.08567  [pdf, other

    cs.CL cs.AI

    CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

    Authors: Ruqi Liao, Chuqing Zhao, Jin Li, Weiqi Feng

    Abstract: In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations,… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  40. arXiv:2403.19895  [pdf, ps, other

    cs.IT cs.LG

    An Information-Theoretic Framework for Out-of-Distribution Generalization with Applications to Stochastic Gradient Langevin Dynamics

    Authors: Wenliang Liu, Guanding Yu, Lele Wang, Renjie Liao

    Abstract: We study the Out-of-Distribution (OOD) generalization in machine learning and propose a general framework that establishes information-theoretic generalization bounds. Our framework interpolates freely between Integral Probability Metric (IPM) and $f$-divergence, which naturally recovers some known results (including Wasserstein- and KL-bounds), as well as yields new generalization bounds. Additio… ▽ More

    Submitted 13 December, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: This work was accepted in part at the 2024 IEEE International Symposium on Information Theory and the 2024 Canadian Workshop on Information Theory. This work was submitted to IEEE Transactions on Information Theory

  41. arXiv:2403.13660  [pdf

    cs.CV

    ProMamba: Prompt-Mamba for polyp segmentation

    Authors: Jianhao Xie, Ruofan Liao, Ziang Zhang, Sida Yi, Yuesheng Zhu, Guibo Luo

    Abstract: Detecting polyps through colonoscopy is an important task in medical image segmentation, which provides significant assistance and reference value for clinical surgery. However, accurate segmentation of polyps is a challenging task due to two main reasons. Firstly, polyps exhibit various shapes and colors. Secondly, the boundaries between polyps and their normal surroundings are often unclear. Add… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 10 pages, 2 figures,3 tabels

  42. arXiv:2402.17464  [pdf, other

    cs.CV

    Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

    Authors: Bi'an Du, Xiang Gao, Wei Hu, Renjie Liao

    Abstract: Generative 3D part assembly involves understanding part relationships and predicting their 6-DoF poses for assembling a realistic 3D shape. Prior work often focus on the geometry of individual parts, neglecting part-whole hierarchies of objects. Leveraging two key observations: 1) super-part poses provide strong hints about part poses, and 2) predicting super-part poses is easier due to fewer supe… ▽ More

    Submitted 26 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  43. arXiv:2402.06463  [pdf, other

    eess.IV cs.CV cs.LG

    Cardiac ultrasound simulation for autonomous ultrasound navigation

    Authors: Abdoul Aziz Amadou, Laura Peralta, Paul Dryburgh, Paul Klein, Kaloian Petkov, Richard James Housden, Vivek Singh, Rui Liao, Young-Ho Kim, Florin Christian Ghesu, Tommaso Mansi, Ronak Rajani, Alistair Young, Kawal Rhode

    Abstract: Ultrasound is well-established as an imaging modality for diagnostic and interventional purposes. However, the image quality varies with operator skills as acquiring and interpreting ultrasound images requires extensive training due to the imaging artefacts, the range of acquisition parameters and the variability of patient anatomies. Automating the image acquisition task could improve acquisition… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 24 pages, 10 figures, 5 tables

    ACM Class: I.6.0; I.5.4; J.3

  44. arXiv:2401.01130  [pdf, other

    cs.CV

    Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

    Authors: Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal

    Abstract: In this paper, we present a novel generative task: joint scene graph - image generation. While previous works have explored image generation conditioned on scene graphs or layouts, our task is distinctive and important as it involves generating scene graphs themselves unconditionally from noise, enabling efficient and interpretable control for image generation. Our task is challenging, requiring t… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  45. arXiv:2311.10112  [pdf, other

    cs.AI cs.CL cs.LG

    zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models

    Authors: Zifeng Ding, Heling Cai, Jingpei Wu, Yunpu Ma, Ruotong Liao, Bo Xiong, Volker Tresp

    Abstract: Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasti… ▽ More

    Submitted 15 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024 main conference

  46. arXiv:2310.08487  [pdf, other

    cs.CL

    GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models

    Authors: Yuanchun Shen, Ruotong Liao, Zhen Han, Yunpu Ma, Volker Tresp

    Abstract: While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential so… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  47. arXiv:2310.07793  [pdf, other

    cs.CL cs.AI cs.LG

    GenTKG: Generative Forecasting on Temporal Knowledge Graph with Large Language Models

    Authors: Ruotong Liao, Xu Jia, Yangzhe Li, Yunpu Ma, Volker Tresp

    Abstract: The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring tempor… ▽ More

    Submitted 16 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 14 pages, Findings of NAACL 2024, Spotlight on TGL@NeurIPS2023

  48. arXiv:2308.13217  [pdf, other

    cs.CV cs.LG

    GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis

    Authors: Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang, Purang Abolmaesumi, Renjie Liao

    Abstract: Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layer… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: To be published in MLMI 2023

  49. arXiv:2307.12980  [pdf, other

    cs.CV

    A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

    Authors: Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr

    Abstract: Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on p… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  50. arXiv:2307.12229  [pdf, other

    cs.CV cs.LG

    EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

    Authors: Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao

    Abstract: The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, l… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: To be published in MICCAI 2023