-
FLARE: A Dataflow-Aware and Scalable Hardware Architecture for Neural-Hybrid Scientific Lossy Compression
Authors:
Wenqi Jia,
Ying Huang,
Jian Xu,
Zhewen Hu,
Sian Jin,
Jiannan Tian,
Yuede Ji,
Miao Yin
Abstract:
Scientific simulation leveraging high-performance computing (HPC) systems is crucial for modeling complex systems and phenomena in fields such as astrophysics, climate science, and fluid dynamics, generating massive datasets that often reach petabyte to exabyte scales. However, managing these vast data volumes introduces significant I/O and network bottlenecks, limiting practical performance and s…
▽ More
Scientific simulation leveraging high-performance computing (HPC) systems is crucial for modeling complex systems and phenomena in fields such as astrophysics, climate science, and fluid dynamics, generating massive datasets that often reach petabyte to exabyte scales. However, managing these vast data volumes introduces significant I/O and network bottlenecks, limiting practical performance and scalability. While cutting-edge lossy compression frameworks powered by deep neural networks (DNNs) have demonstrated superior compression ratios by capturing complex data correlations, their integration into HPC workflows poses substantial challenges due to the hybrid non-neural and neural computation patterns, causing excessive memory access overhead, large sequential stalls, and limited adaptability to varying data sizes and workloads in existing hardware platforms. To overcome these challenges and push the limit of high-performance scientific computing, we for the first time propose FLARE, a dataflow-aware and scalable hardware architecture for neural-hybrid scientific lossy compression. FLARE minimizes off-chip data access, reduces bubble overhead through efficient dataflow, and adopts a modular design that provides both scalability and flexibility, significantly enhancing throughput and energy efficiency on modern HPC systems. Particularly, the proposed FLARE achieves runtime speedups ranging from $3.50 \times$ to $96.07 \times$, and energy efficiency improvements ranging from $24.51 \times$ to $520.68 \times$, across various datasets and hardware platforms.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Bayesian Invariance Modeling of Multi-Environment Data
Authors:
Luhuan Wu,
Mingzhang Yin,
Yixin Wang,
John P. Cunningham,
David M. Blei
Abstract:
Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here w…
▽ More
Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.
△ Less
Submitted 2 July, 2025; v1 submitted 27 June, 2025;
originally announced June 2025.
-
Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation
Authors:
Tao Li,
Haozhe Lei,
Mingsheng Yin,
Yaqi Hu
Abstract:
When using reinforcement learning (RL) to tackle physical control tasks, inductive biases that encode physics priors can help improve sample efficiency during training and enhance generalization in testing. However, the current practice of incorporating these helpful physics-informed inductive biases inevitably runs into significant manual labor and domain expertise, making them prohibitive for ge…
▽ More
When using reinforcement learning (RL) to tackle physical control tasks, inductive biases that encode physics priors can help improve sample efficiency during training and enhance generalization in testing. However, the current practice of incorporating these helpful physics-informed inductive biases inevitably runs into significant manual labor and domain expertise, making them prohibitive for general users. This work explores a symbolic approach to distill physics-informed inductive biases into RL agents, where the physics priors are expressed in a domain-specific language (DSL) that is human-readable and naturally explainable. Yet, the DSL priors do not translate directly into an implementable policy due to partial and noisy observations and additional physical constraints in navigation tasks. To address this gap, we develop a physics-informed program-guided RL (PiPRL) framework with applications to indoor navigation. PiPRL adopts a hierarchical and modularized neuro-symbolic integration, where a meta symbolic program receives semantically meaningful features from a neural perception module, which form the bases for symbolic programming that encodes physics priors and guides the RL process of a low-level neural controller. Extensive experiments demonstrate that PiPRL consistently outperforms purely symbolic or neural policies and reduces training time by over 26% with the help of the program-based inductive biases.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
ZeroReg3D: A Zero-shot Registration Pipeline for 3D Consecutive Histopathology Image Reconstruction
Authors:
Juming Xiong,
Ruining Deng,
Jialin Yue,
Siqi Lu,
Junlin Guo,
Marilyn Lionts,
Tianyuan Yao,
Can Cui,
Junchao Zhu,
Chongyu Qu,
Mengmeng Yin,
Haichun Yang,
Yuankai Huo
Abstract:
Histological analysis plays a crucial role in understanding tissue structure and pathology. While recent advancements in registration methods have improved 2D histological analysis, they often struggle to preserve critical 3D spatial relationships, limiting their utility in both clinical and research applications. Specifically, constructing accurate 3D models from 2D slices remains challenging due…
▽ More
Histological analysis plays a crucial role in understanding tissue structure and pathology. While recent advancements in registration methods have improved 2D histological analysis, they often struggle to preserve critical 3D spatial relationships, limiting their utility in both clinical and research applications. Specifically, constructing accurate 3D models from 2D slices remains challenging due to tissue deformation, sectioning artifacts, variability in imaging techniques, and inconsistent illumination. Deep learning-based registration methods have demonstrated improved performance but suffer from limited generalizability and require large-scale training data. In contrast, non-deep-learning approaches offer better generalizability but often compromise on accuracy. In this study, we introduced ZeroReg3D, a novel zero-shot registration pipeline tailored for accurate 3D reconstruction from serial histological sections. By combining zero-shot deep learning-based keypoint matching with optimization-based affine and non-rigid registration techniques, ZeroReg3D effectively addresses critical challenges such as tissue deformation, sectioning artifacts, staining variability, and inconsistent illumination without requiring retraining or fine-tuning. The code has been made publicly available at https://github.com/hrlblab/ZeroReg3D
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Co-persona: Leveraging LLMs and Expert Collaboration to Understand User Personas through Social Media Data Analysis
Authors:
Min Yin,
Haoyu Liu,
Boyi Lian,
Chunlei Chai
Abstract:
This study introduces Co-Persona, a methodological framework bridging large-scale social media analysis with authentic user understanding through systematic integration of Large Language Models and expert validation. Through a case study of B.Co, a Chinese manufacturer, we investigated Co-Persona application in bedside lamp development. Our methodology analyzed over 38 million posts from Xiao Hong…
▽ More
This study introduces Co-Persona, a methodological framework bridging large-scale social media analysis with authentic user understanding through systematic integration of Large Language Models and expert validation. Through a case study of B.Co, a Chinese manufacturer, we investigated Co-Persona application in bedside lamp development. Our methodology analyzed over 38 million posts from Xiao Hongshu, employing multi-stage data processing combining advanced NLP with expert validation. Analysis revealed five user personas derived from bedtime behaviors: Health Aficionados, Night Owls, Interior Decorators, Child-care Workers, and Workaholics-each showing unique pre-sleep activities and product preferences. Findings demonstrate Co-Persona enhances manufacturers' ability to process large datasets while maintaining user understanding. The methodology provides structured approaches for targeted marketing and product strategies. Research contributes to theoretical understanding of data-driven persona development and practical applications in consumer-driven innovation. Code and data available at https://github.com/INFPa/LLMwithPersona.
△ Less
Submitted 24 June, 2025; v1 submitted 22 June, 2025;
originally announced June 2025.
-
IRS: Incremental Relationship-guided Segmentation for Digital Pathology
Authors:
Ruining Deng,
Junchao Zhu,
Juming Xiong,
Can Cui,
Tianyuan Yao,
Junlin Guo,
Siqi Lu,
Marilyn Lionts,
Mengmeng Yin,
Yu Wang,
Shilin Zhao,
Yucheng Tang,
Yihe Yang,
Paul Dennis Simonson,
Mert R. Sabuncu,
Haichun Yang,
Yuankai Huo
Abstract:
Continual learning is rapidly emerging as a key focus in computer vision, aiming to develop AI systems capable of continuous improvement, thereby enhancing their value and practicality in diverse real-world applications. In healthcare, continual learning holds great promise for continuously acquired digital pathology data, which is collected in hospitals on a daily basis. However, panoramic segmen…
▽ More
Continual learning is rapidly emerging as a key focus in computer vision, aiming to develop AI systems capable of continuous improvement, thereby enhancing their value and practicality in diverse real-world applications. In healthcare, continual learning holds great promise for continuously acquired digital pathology data, which is collected in hospitals on a daily basis. However, panoramic segmentation on digital whole slide images (WSIs) presents significant challenges, as it is often infeasible to obtain comprehensive annotations for all potential objects, spanning from coarse structures (e.g., regions and unit objects) to fine structures (e.g., cells). This results in temporally and partially annotated data, posing a major challenge in developing a holistic segmentation framework. Moreover, an ideal segmentation model should incorporate new phenotypes, unseen diseases, and diverse populations, making this task even more complex. In this paper, we introduce a novel and unified Incremental Relationship-guided Segmentation (IRS) learning scheme to address temporally acquired, partially annotated data while maintaining out-of-distribution (OOD) continual learning capacity in digital pathology. The key innovation of IRS lies in its ability to realize a new spatial-temporal OOD continual learning paradigm by mathematically modeling anatomical relationships between existing and newly introduced classes through a simple incremental universal proposition matrix. Experimental results demonstrate that the IRS method effectively handles the multi-scale nature of pathological segmentation, enabling precise kidney segmentation across various structures (regions, units, and cells) as well as OOD disease lesions at multiple magnifications. This capability significantly enhances domain generalization, making IRS a robust approach for real-world digital pathology applications.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning
Authors:
Ming Yin,
Yuanhao Qu,
Ling Yang,
Le Cong,
Mengdi Wang
Abstract:
We investigate how to teach large language models (LLMs) to perform scientific reasoning by leveraging expert discussions as a learning signal. Focusing on the genomics domain, we develop an automated pipeline to extract trainable data and introduce Genome-Bench, a new benchmark constructed from over a decade of scientific forum discussions on genome engineering. Our pipeline transforms raw intera…
▽ More
We investigate how to teach large language models (LLMs) to perform scientific reasoning by leveraging expert discussions as a learning signal. Focusing on the genomics domain, we develop an automated pipeline to extract trainable data and introduce Genome-Bench, a new benchmark constructed from over a decade of scientific forum discussions on genome engineering. Our pipeline transforms raw interactions into a reinforcement learning-friendly multiple-choice questions format, supported by 3000+ high-quality question-answer pairs spanning foundational biology, experimental troubleshooting, tool usage, and beyond. We fine-tune an LLM using RL with a rule-based reward signal derived from the synthetic MCQ dataset to enhance domain-specific reasoning. Our results show that reinforcement learning from scientific discussions improves model performance by over 15% compared to the base model on Genome-Bench, narrowing the gap between open-source LLMs and expert-level reasoning. To our knowledge, this is the first end-to-end pipeline for teaching LLMs to reason from scientific discussions, with promising potential for generalization across scientific domains beyond biology.
△ Less
Submitted 2 June, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
Enhancing CTR Prediction with De-correlated Expert Networks
Authors:
Jiancheng Wang,
Mingjia Yin,
Junwei Pan,
Ximei Wang,
Hao Wang,
Enhong Chen
Abstract:
Modeling feature interactions is essential for accurate click-through rate (CTR) prediction in advertising systems. Recent studies have adopted the Mixture-of-Experts (MoE) approach to improve performance by ensembling multiple feature interaction experts. These studies employ various strategies, such as learning independent embedding tables for each expert or utilizing heterogeneous expert archit…
▽ More
Modeling feature interactions is essential for accurate click-through rate (CTR) prediction in advertising systems. Recent studies have adopted the Mixture-of-Experts (MoE) approach to improve performance by ensembling multiple feature interaction experts. These studies employ various strategies, such as learning independent embedding tables for each expert or utilizing heterogeneous expert architectures, to differentiate the experts, which we refer to expert \emph{de-correlation}. However, it remains unclear whether these strategies effectively achieve de-correlated experts. To address this, we propose a De-Correlated MoE (D-MoE) framework, which introduces a Cross-Expert De-Correlation loss to minimize expert correlations.Additionally, we propose a novel metric, termed Cross-Expert Correlation, to quantitatively evaluate the expert de-correlation degree. Based on this metric, we identify a key finding for MoE framework design: \emph{different de-correlation strategies are mutually compatible, and progressively employing them leads to reduced correlation and enhanced performance}.Extensive experiments have been conducted to validate the effectiveness of D-MoE and the de-correlation principle. Moreover, online A/B testing on Tencent's advertising platforms demonstrates that D-MoE achieves a significant 1.19\% Gross Merchandise Volume (GMV) lift compared to the Multi-Embedding MoE baseline.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Causal Discovery in Symmetric Dynamic Systems with Convergent Cross Mapping
Authors:
Yiting Duan,
Yi Guo,
Jack Yang,
Ming Yin
Abstract:
This paper systematically discusses how the inherent properties of chaotic attractors influence the results of discovering causality from time series using convergent cross mapping, particularly how convergent cross mapping misleads bidirectional causality as unidirectional when the chaotic attractor exhibits symmetry. We propose a novel method based on the k-means clustering method to address the…
▽ More
This paper systematically discusses how the inherent properties of chaotic attractors influence the results of discovering causality from time series using convergent cross mapping, particularly how convergent cross mapping misleads bidirectional causality as unidirectional when the chaotic attractor exhibits symmetry. We propose a novel method based on the k-means clustering method to address the challenges when the chaotic attractor exhibits two-fold rotation symmetry. This method is demonstrated to recover the symmetry of the latent chaotic attractor and discover the correct causality between time series without introducing information from other variables. We validate the accuracy of this method using time series derived from low-dimension and high-dimensional chaotic symmetric attractors for which convergent cross mapping may conclude erroneous results.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Authors:
Shaokun Zhang,
Ming Yin,
Jieyu Zhang,
Jiale Liu,
Zhiguang Han,
Jingyang Zhang,
Beibin Li,
Chi Wang,
Huazheng Wang,
Yiran Chen,
Qingyun Wu
Abstract:
Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extens…
▽ More
Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extensive failure logs from 127 LLM multi-agent systems with fine-grained annotations linking failures to specific agents and decisive error steps. Using the Who&When, we develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps, with some methods performing below random. Even SOTA reasoning models, such as OpenAI o1 and DeepSeek R1, fail to achieve practical usability. These results highlight the task's complexity and the need for further research in this area. Code and dataset are available at https://github.com/mingyin1/Agents_Failure_Attribution
△ Less
Submitted 1 June, 2025; v1 submitted 30 April, 2025;
originally announced May 2025.
-
Toward Realization of Low-Altitude Economy Networks: Core Architecture, Integrated Technologies, and Future Directions
Authors:
Yixian Wang,
Geng Sun,
Zemin Sun,
Jiacheng Wang,
Jiahui Li,
Changyuan Zhao,
Jing Wu,
Shuang Liang,
Minghao Yin,
Pengfei Wang,
Dusit Niyato,
Sumei Sun,
Dong In Kim
Abstract:
The rise of the low-altitude economy (LAE) is propelling urban development and emerging industries by integrating advanced technologies to enhance efficiency, safety, and sustainability in low-altitude operations. The widespread adoption of unmanned aerial vehicles (UAVs) and electric vertical takeoff and landing (eVTOL) aircraft plays a crucial role in enabling key applications within LAE, such a…
▽ More
The rise of the low-altitude economy (LAE) is propelling urban development and emerging industries by integrating advanced technologies to enhance efficiency, safety, and sustainability in low-altitude operations. The widespread adoption of unmanned aerial vehicles (UAVs) and electric vertical takeoff and landing (eVTOL) aircraft plays a crucial role in enabling key applications within LAE, such as urban logistics, emergency rescue, and aerial mobility. However, unlike traditional UAV networks, LAE networks encounter increased airspace management demands due to dense flying nodes and potential interference with ground communication systems. In addition, there are heightened and extended security risks in real-time operations, particularly the vulnerability of low-altitude aircraft to cyberattacks from ground-based threats. To address these, this paper first explores related standards and core architecture that support the development of LAE networks. Subsequently, we highlight the integration of technologies such as communication, sensing, computing, positioning, navigation, surveillance, flight control, and airspace management. This synergy of multi-technology drives the advancement of real-world LAE applications, particularly in improving operational efficiency, optimizing airspace usage, and ensuring safety. Finally, we outline future research directions for LAE networks, such as intelligent and adaptive optimization, security and privacy protection, sustainable energy and power management, quantum-driven coordination, generative governance, and three-dimensional (3D) airspace coverage, which collectively underscore the potential of collaborative technologies to advance LAE networks.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Charge Densities in Crystals and Triply-Periodic Minimal Surfaces
Authors:
Mengdi Yin,
Dimitri D. Vvedensky
Abstract:
The relationship between surfaces of constant charge density and triply-periodic minimal surfaces (TPMS) has been the subject of considerable speculation over many years. Zero-potential surfaces generated in crystals by an electrostatic field from a distribution of point charges provide an approximate description of the TPMS for that crystal. We have recently provided a first-principles alternativ…
▽ More
The relationship between surfaces of constant charge density and triply-periodic minimal surfaces (TPMS) has been the subject of considerable speculation over many years. Zero-potential surfaces generated in crystals by an electrostatic field from a distribution of point charges provide an approximate description of the TPMS for that crystal. We have recently provided a first-principles alternative to such phenomenological comparisons based on the Vienna ab initio simulation package (VASP). We showed that the surfaces of a zero charge density calculated for the crystal structure of a material converges to the TPMS of the corresponding crystalline material. The exchange-correlation potentials are chosen for the particular material based on the benchmarking of various approximations for these potentials carried out by others. Here, we report an extension of our previous work by giving additional examples of our theory that shows the zero electron density of an equilibrium structure corresponds to a TPMS for a variety of materials and crystalline structures. We study the ground states of elemental materials that differ electronically and structurally, Na, Cu, Al, Zr, and a compound, NiTi, as well as different phases of the elemental solids that are observed under different thermodynamic conditions.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings
Authors:
Zitai Kong,
Yiheng Zhu,
Yinlong Xu,
Hanjing Zhou,
Mingzhe Yin,
Jialu Wu,
Hongxia Xu,
Chang-Yu Hsieh,
Tingjun Hou,
Jian Wu
Abstract:
The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high tr…
▽ More
The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high training cost. To address these challenges, we introduce ProtFlow, a fast flow matching-based protein sequence design framework that operates on embeddings derived from semantically meaningful latent space of protein language models. By compressing and smoothing the latent space, ProtFlow enhances performance while training on limited computational resources. Leveraging reflow techniques, ProtFlow enables high-quality single-step sequence generation. Additionally, we develop a joint design pipeline for the design scene of multichain proteins. We evaluate ProtFlow across diverse protein design tasks, including general peptides and long-chain proteins, antimicrobial peptides, and antibodies. Experimental results demonstrate that ProtFlow outperforms task-specific methods in these applications, underscoring its potential and broad applicability in computational protein sequence design and analysis.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Aligning Anime Video Generation with Human Feedback
Authors:
Bingwen Zhu,
Yudong Jiang,
Baohan Xu,
Siqian Yang,
Mingyu Yin,
Yidi Wu,
Huyang Sun,
Zuxuan Wu
Abstract:
Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we pr…
▽ More
Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment. Specifically, we construct the first multi-dimensional reward dataset for anime videos, comprising 30k human-annotated samples that incorporating human preferences for both visual appearance and visual consistency. Based on this, we develop AnimeReward, a powerful reward model that employs specialized vision-language models for different evaluation dimensions to guide preference alignment. Furthermore, we introduce Gap-Aware Preference Optimization (GAPO), a novel training method that explicitly incorporates preference gaps into the optimization process, enhancing alignment performance and efficiency. Extensive experiment results show that AnimeReward outperforms existing reward models, and the inclusion of GAPO leads to superior alignment in both quantitative benchmarks and human evaluations, demonstrating the effectiveness of our pipeline in enhancing anime video quality. Our code and dataset are publicly available at https://github.com/bilibili/Index-anisora.
△ Less
Submitted 24 June, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
Entertainers Between Real and Virtual -- Investigating Viewer Interaction, Engagement, and Relationships with Avatarized Virtual Livestreamers
Authors:
Michael Yin,
Chenxinran Shen,
Robert Xiao
Abstract:
Virtual YouTubers (VTubers) are avatar-based livestreamers that are voiced and played by human actors. VTubers have been popular in East Asia for years and have more recently seen widespread international growth. Despite their emergent popularity, research has been scarce into the interactions and relationships that exist between avatarized VTubers and their viewers, particularly in contrast to no…
▽ More
Virtual YouTubers (VTubers) are avatar-based livestreamers that are voiced and played by human actors. VTubers have been popular in East Asia for years and have more recently seen widespread international growth. Despite their emergent popularity, research has been scarce into the interactions and relationships that exist between avatarized VTubers and their viewers, particularly in contrast to non-avatarized streamers. To address this gap, we performed in-depth interviews with self-reported VTuber viewers (n=21). Our findings first reveal that the avatarized nature of VTubers fosters new forms of theatrical engagement, as factors of the virtual blend with the real to create a mixture of fantasy and realism in possible livestream interactions. Avatarization furthermore results in a unique audience perception regarding the identity of VTubers - an identity which comprises a dynamic, distinct mix of the real human (the voice actor/actress) and the virtual character. Our findings suggest that each of these dual identities both individually and symbiotically affect viewer interactions and relationships with VTubers. Whereas the performer's identity mediates social factors such as intimacy, relatability, and authenticity, the virtual character's identity offers feelings of escapism, novelty in interactions, and a sense of continuity beyond the livestream. We situate our findings within existing livestreaming literature to highlight how avatarization drives unique, character-based interactions as well as reshapes the motivations and relationships that viewers form with livestreamers. Finally, we provide suggestions and recommendations for areas of future exploration to address the challenges involved in present livestreamed avatarized entertainment.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
VIBES: Exploring Viewer Spatial Interactions as Direct Input for Livestreamed Content
Authors:
Michael Yin,
Robert Xiao
Abstract:
Livestreaming has rapidly become a popular online pastime, with real-time interaction between streamer and viewer being a key motivating feature. However, viewers have traditionally had limited opportunity to directly influence the streamed content; even when such interactions are possible, it has been reliant on text-based chat. We investigate the potential of spatial interaction on the livestrea…
▽ More
Livestreaming has rapidly become a popular online pastime, with real-time interaction between streamer and viewer being a key motivating feature. However, viewers have traditionally had limited opportunity to directly influence the streamed content; even when such interactions are possible, it has been reliant on text-based chat. We investigate the potential of spatial interaction on the livestreamed video content as a form of direct, real-time input for livestreamed applications. We developed VIBES, a flexible digital system that registers viewers' mouse interactions on the streamed video, i.e., clicks or movements, and transmits it directly into the streamed application. We used VIBES as a technology probe; first designing possible demonstrative interactions and using these interactions to explore streamers' perception of viewer influence and possible challenges and opportunities. We then deployed applications built using VIBES in two livestreams to explore its effects on audience engagement and investigate their relationships with the stream, the streamer, and fellow audience members. The use of spatial interactions enhances engagement and participation and opens up new avenues for both streamer-viewer and viewer-viewer participation. We contextualize our findings around a broader understanding of motivations and engagement in livestreaming, and we propose design guidelines and extensions for future research.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Pharmacokinetic characteristics of Jinhong tablets in normal, chronic superficial gastritis and intestinal microbial disorder rats
Authors:
Tingyu Zhang,
Jian Feng,
Xia Gao,
Xialin Chen,
Hongyu Peng,
Xiaoxue Fan,
Xin Meng,
Mingke Yin,
Zhenzhong Wang,
Bo Zhang,
Liang Cao
Abstract:
Jinhong tablet (JHT), a traditional Chinese medicine made from four herbs, effectively treats chronic superficial gastritis (CSG) by soothing the liver, relieving depression, regulating qi, and promoting blood circulation. However, its pharmacokinetics are underexplored. This study investigates JHT's pharmacokinetics in normal rats and its differences in normal, CSG, and intestinal microbial disor…
▽ More
Jinhong tablet (JHT), a traditional Chinese medicine made from four herbs, effectively treats chronic superficial gastritis (CSG) by soothing the liver, relieving depression, regulating qi, and promoting blood circulation. However, its pharmacokinetics are underexplored. This study investigates JHT's pharmacokinetics in normal rats and its differences in normal, CSG, and intestinal microbial disorder rats. A quantitative method for seven active ingredients in rat plasma was established using UPLC-TQ-MS/MS. After administering various JHT doses, plasma concentrations were measured to assess pharmacokinetics in normal rats. The pharmacokinetics of four main ingredients were compared in normal, CSG, and fecal microbiota transplantation (FMT) rats. Intestinal microbial changes were evaluated by high-throughput sequencing. Spearman correlation analysis linked ingredient exposure to gut microbiota disturbances. The method showed good linearity, precision, accuracy, extraction recovery, and stability. In normal rats, all seven ingredients were rapidly absorbed. Tetrahydropalmatine, corydaline, costunolide, and rhamnosylvitexin had good exposure, while dehydrocorydaline, allocryptopine, and palmatine hydrochloride had low exposure. Tetrahydropalmatine, corydaline, and costunolide followed linear pharmacokinetics (AUC0-t, Cmax) at doses of 0.7-5.6 g/kg, while rhamnosylvitexin and dehydrocorydaline showed linearity at 0.7-2.8 g/kg. In CSG and FMT rats, pharmacokinetic differences were observed. CSG enhanced costunolide exposure and Cmax, and increased rhamnosylvitexin exposure. FMT raised corydaline exposure and rhamnosylvitexin Cmax, linked to 20 bacterial genera.
△ Less
Submitted 31 March, 2025;
originally announced April 2025.
-
Dynamic Carrier Modulation via Nonlinear Acoustoelectric Transport in van der Waals Heterostructures
Authors:
Timothy J. McSorley,
Kaustubh Simha,
James E. Corcoran,
Izzie J. Catanzaro,
Haochong Zhang,
Meitong Yin,
Tzu-Ming Lu,
Davis Thuillier,
Marshall A. Campbell,
Thomas Scaffidi,
Luis A. Jauregui
Abstract:
Dynamically manipulating carriers in van der Waals heterostructures could enable solid-state quantum simulators with tunable lattice parameters. A key requirement is forming deep potential wells to reliably trap excitations. Here, we report the observation of nonlinear acoustoelectric transport and dynamic carrier modulation in boron nitride-encapsulated graphene devices coupled to intense surface…
▽ More
Dynamically manipulating carriers in van der Waals heterostructures could enable solid-state quantum simulators with tunable lattice parameters. A key requirement is forming deep potential wells to reliably trap excitations. Here, we report the observation of nonlinear acoustoelectric transport and dynamic carrier modulation in boron nitride-encapsulated graphene devices coupled to intense surface acoustic waves (SAWs) on LiNbO3 substrates. SAWs generate strong acoustoelectric current densities (JAE), transitioning from linear to nonlinear regimes with increasing SAW intensity. In the nonlinear regime, periodic carrier (electrons, holes, or their mixtures) stripes emerge. Using counter-propagating SAWs, we create standing SAWs (SSAWs) to dynamically manipulate charge distributions without static gates. The saturation of JAE, attenuation transitions, and tunable resistance peaks confirm strong carrier localization. These results establish SAWs as a powerful tool for controlling carrier dynamics in two-dimensional (2D) materials, paving the way for the development of time-dependent quantum systems and acoustic lattices for quantum simulation.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Density-Functional Theory and Triply-Periodic Minimal Surfaces
Authors:
Mengdi Yin,
Jing Zhang,
Dimitri D Vvedensky
Abstract:
Several authors have suggested that the surfaces of vanishing potential generated by the electrostatic fields from a distribution of point charges resemble triply periodic minimal surfaces (TPMS) corresponding to the positions of the point charges. We provide a theoretical basis for this phenomenological comparison by starting with the Boltzmann equation to show that the surface corresponding to z…
▽ More
Several authors have suggested that the surfaces of vanishing potential generated by the electrostatic fields from a distribution of point charges resemble triply periodic minimal surfaces (TPMS) corresponding to the positions of the point charges. We provide a theoretical basis for this phenomenological comparison by starting with the Boltzmann equation to show that the surface corresponding to zero charge density is a minimal surface. We then use density-functional calculations for elemental materials that differ electronically and structurally, Na, Cu, and Al, to show that surfaces of vanishing charge density converge to the corresponding TPMS.
△ Less
Submitted 18 April, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Authors:
Yudong Liu,
Jingwei Sun,
Yueqian Lin,
Jingyang Zhang,
Ming Yin,
Qinsi Wang,
Jianyi Zhang,
Hai Li,
Yiran Chen
Abstract:
Vision language models (VLMs) demonstrate strong capabilities in jointly processing visual and textual data. However, they often incur substantial computational overhead due to redundant visual information, particularly in long-form video scenarios. Existing approaches predominantly focus on either vision token pruning, which may overlook spatio-temporal dependencies, or keyframe selection, which…
▽ More
Vision language models (VLMs) demonstrate strong capabilities in jointly processing visual and textual data. However, they often incur substantial computational overhead due to redundant visual information, particularly in long-form video scenarios. Existing approaches predominantly focus on either vision token pruning, which may overlook spatio-temporal dependencies, or keyframe selection, which identifies informative frames but discards others, thus disrupting contextual continuity. In this work, we propose KVTP (Keyframe-oriented Vision Token Pruning), a novel framework that overcomes the drawbacks of token pruning and keyframe selection. By adaptively assigning pruning rates based on frame relevance to the query, KVTP effectively retains essential contextual information while significantly reducing redundant computation. To thoroughly evaluate the long-form video understanding capacities of VLMs, we curated and reorganized subsets from VideoMME, EgoSchema, and NextQA into a unified benchmark named SparseKV-QA that highlights real-world scenarios with sparse but crucial events. Our experiments with VLMs of various scales show that KVTP can reduce token usage by 80% without compromising spatiotemporal and contextual consistency, significantly cutting computation while maintaining the performance. These results demonstrate our approach's effectiveness in efficient long-video processing, facilitating more scalable VLM deployment.
△ Less
Submitted 24 April, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Low-Rank Matrix Regression via Least-Angle Regression
Authors:
Mingzhou Yin,
Matthias A. Müller
Abstract:
Low-rank matrix regression is a fundamental problem in data science with various applications in systems and control. Nuclear norm regularization has been widely applied to solve this problem due to its convexity. However, it suffers from high computational complexity and the inability to directly specify the rank. This work introduces a novel framework for low-rank matrix regression that addresse…
▽ More
Low-rank matrix regression is a fundamental problem in data science with various applications in systems and control. Nuclear norm regularization has been widely applied to solve this problem due to its convexity. However, it suffers from high computational complexity and the inability to directly specify the rank. This work introduces a novel framework for low-rank matrix regression that addresses both unstructured and Hankel matrices. By decomposing the low-rank matrix into rank-1 bases, the problem is reformulated as an infinite-dimensional sparse learning problem. The least-angle regression (LAR) algorithm is then employed to solve this problem efficiently. For unstructured matrices, a closed-form LAR solution is derived with equivalence to a normalized nuclear norm regularization problem. For Hankel matrices, a real-valued polynomial basis reformulation enables effective LAR implementation. Two numerical examples in network modeling and system realization demonstrate that the proposed approach significantly outperforms the nuclear norm method in terms of estimation accuracy and computational efficiency.
△ Less
Submitted 3 June, 2025; v1 submitted 13 March, 2025;
originally announced March 2025.
-
Quantization Design for Deep Learning-Based CSI Feedback
Authors:
Manru Yin,
Shengqian Han,
Chenyang Yang
Abstract:
Deep learning-based autoencoders have been employed to compress and reconstruct channel state information (CSI) in frequency-division duplex systems. Practical implementations require judicious quantization of encoder outputs for digital transmission. In this paper, we propose a novel quantization module with bit allocation among encoder outputs and develop a method for joint training the module a…
▽ More
Deep learning-based autoencoders have been employed to compress and reconstruct channel state information (CSI) in frequency-division duplex systems. Practical implementations require judicious quantization of encoder outputs for digital transmission. In this paper, we propose a novel quantization module with bit allocation among encoder outputs and develop a method for joint training the module and the autoencoder. To enhance learning performance, we design a loss function that adaptively weights the quantization loss and the logarithm of reconstruction loss. Simulation results show the performance gain of the proposed method over existing baselines.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Learning of Uplink Resource Allocation with Multiuser QoS Constraints
Authors:
Manru Yin,
Shengqian Han,
Chenyang Yang
Abstract:
In the paper the joint optimization of uplink multiuser power and resource block (RB) allocation are studied, where each user has quality of service (QoS) constraints on both long- and short-blocklength transmissions. The objective is to minimize the consumption of RBs for meeting the QoS requirements, leading to a mixed-integer nonlinear programming (MINLP) problem. We resort to deep learning to…
▽ More
In the paper the joint optimization of uplink multiuser power and resource block (RB) allocation are studied, where each user has quality of service (QoS) constraints on both long- and short-blocklength transmissions. The objective is to minimize the consumption of RBs for meeting the QoS requirements, leading to a mixed-integer nonlinear programming (MINLP) problem. We resort to deep learning to solve the problem with low inference complexity. To provide a performance benchmark for learning based methods, we propose a hierarchical algorithm to find the global optimal solution in the single-user scenario, which is then extended to the multiuser scenario. The design of the learning method, however, is challenging due to the discrete policy to be learned, which results in either vanishing or exploding gradient during neural network training. We introduce two types of smoothing functions to approximate the involved discretizing processes and propose a smoothing parameter adaption method. Another critical challenge lies in guaranteeing the QoS constraints. To address it, we design a nonlinear function to intensify the penalties for minor constraint violations. Simulation results demonstrate the advantages of the proposed method in reducing the number of occupied RBs and satisfying QoS constraints reliably.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Authors:
Yiheng Zhu,
Mingyang Li,
Junlong Liu,
Kun Fu,
Jiansheng Wu,
Qiuyi Li,
Mingze Yin,
Jieping Ye,
Jian Wu,
Zheng Wang
Abstract:
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained mo…
▽ More
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Descents and flag major index on conjugacy classes of colored permutation groups without short cycles
Authors:
Kevin Liu,
Mei Yin
Abstract:
We consider the descent and flag major index statistics on the colored permutation groups, which are wreath products of the form $\mathfrak{S}_{n,r}=\mathbb{Z}_r\wr \mathfrak{S}_n$. We show that the $k$-th moments of these statistics on $\mathfrak{S}_{n,r}$ will coincide with the corresponding moments on all conjugacy classes without cycles of lengths $1,2,\ldots,2k$. Using this, we establish the…
▽ More
We consider the descent and flag major index statistics on the colored permutation groups, which are wreath products of the form $\mathfrak{S}_{n,r}=\mathbb{Z}_r\wr \mathfrak{S}_n$. We show that the $k$-th moments of these statistics on $\mathfrak{S}_{n,r}$ will coincide with the corresponding moments on all conjugacy classes without cycles of lengths $1,2,\ldots,2k$. Using this, we establish the asymptotic normality of the descent and flag major index statistics on conjugacy classes of $\mathfrak{S}_{n,r}$ with sufficiently long cycles. Our results generalize prior work of Fulman involving the descent and major index statistics on the symmetric group $\mathfrak{S}_n$. Our methods involve an intricate extension of Fulman's work on $\mathfrak{S}_n$ combined with the theory of the degree for a colored permutation statistic, as introduced by Campion Loth, Levet, Liu, Sundaram, and Yin.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
MagNet: Multi-Level Attention Graph Network for Predicting High-Resolution Spatial Transcriptomics
Authors:
Junchao Zhu,
Ruining Deng,
Tianyuan Yao,
Juming Xiong,
Chongyu Qu,
Junlin Guo,
Siqi Lu,
Yucheng Tang,
Daguang Xu,
Mengmeng Yin,
Yu Wang,
Shilin Zhao,
Yaohong Wang,
Haichun Yang,
Yuankai Huo
Abstract:
The rapid development of spatial transcriptomics (ST) offers new opportunities to explore the gene expression patterns within the spatial microenvironment. Current research integrates pathological images to infer gene expression, addressing the high costs and time-consuming processes to generate spatial transcriptomics data. However, as spatial transcriptomics resolution continues to improve, exis…
▽ More
The rapid development of spatial transcriptomics (ST) offers new opportunities to explore the gene expression patterns within the spatial microenvironment. Current research integrates pathological images to infer gene expression, addressing the high costs and time-consuming processes to generate spatial transcriptomics data. However, as spatial transcriptomics resolution continues to improve, existing methods remain primarily focused on gene expression prediction at low-resolution spot levels. These methods face significant challenges, especially the information bottleneck, when they are applied to high-resolution HD data. To bridge this gap, this paper introduces MagNet, a multi-level attention graph network designed for accurate prediction of high-resolution HD data. MagNet employs cross-attention layers to integrate features from multi-resolution image patches hierarchically and utilizes a GAT-Transformer module to aggregate neighborhood information. By integrating multilevel features, MagNet overcomes the limitations posed by low-resolution inputs in predicting high-resolution gene expression. We systematically evaluated MagNet and existing ST prediction models on both a private spatial transcriptomics dataset and a public dataset at three different resolution levels. The results demonstrate that MagNet achieves state-of-the-art performance at both spot level and high-resolution bin levels, providing a novel methodology and benchmark for future research and applications in high-resolution HD-level spatial transcriptomics. Code is available at https://github.com/Junchao-Zhu/MagNet.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis
Authors:
Zhuoyan Li,
Hangxiao Zhu,
Zhuoran Lu,
Ziang Xiao,
Ming Yin
Abstract:
AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities…
▽ More
AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI's decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people's AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers' appropriate reliance on AI in AI-assisted decision making.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
CASC-AI: Consensus-aware Self-corrective Learning for Noise Cell Segmentation
Authors:
Ruining Deng,
Yihe Yang,
David J. Pisapia,
Benjamin Liechty,
Junchao Zhu,
Juming Xiong,
Junlin Guo,
Zhengyi Lu,
Jiacheng Wang,
Xing Yao,
Runxuan Yu,
Rendong Zhang,
Gaurav Rudravaram,
Mengmeng Yin,
Pinaki Sarder,
Haichun Yang,
Yuankai Huo,
Mert R. Sabuncu
Abstract:
Multi-class cell segmentation in high-resolution gigapixel whole slide images (WSIs) is crucial for various clinical applications. However, training such models typically requires labor-intensive, pixel-wise annotations by domain experts. Recent efforts have democratized this process by involving lay annotators without medical expertise. However, conventional non-corrective approaches struggle to…
▽ More
Multi-class cell segmentation in high-resolution gigapixel whole slide images (WSIs) is crucial for various clinical applications. However, training such models typically requires labor-intensive, pixel-wise annotations by domain experts. Recent efforts have democratized this process by involving lay annotators without medical expertise. However, conventional non-corrective approaches struggle to handle annotation noise adaptively because they lack mechanisms to mitigate false positives (FP) and false negatives (FN) at both the image-feature and pixel levels. In this paper, we propose a consensus-aware self-corrective AI agent that leverages the Consensus Matrix to guide its learning process. The Consensus Matrix defines regions where both the AI and annotators agree on cell and non-cell annotations, which are prioritized with stronger supervision. Conversely, areas of disagreement are adaptively weighted based on their feature similarity to high-confidence consensus regions, with more similar regions receiving greater attention. Additionally, contrastive learning is employed to separate features of noisy regions from those of reliable consensus regions by maximizing their dissimilarity. This paradigm enables the model to iteratively refine noisy labels, enhancing its robustness. Validated on one real-world lay-annotated cell dataset and two reasoning-guided simulated noisy datasets, our method demonstrates improved segmentation performance, effectively correcting FP and FN errors and showcasing its potential for training robust models on noisy datasets. The official implementation and cell annotations are publicly available at https://github.com/ddrrnn123/CASC-AI.
△ Less
Submitted 10 March, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Investigating Creativity in Humans and Generative AI Through Circles Exercises
Authors:
Runlin Duan,
Shao-Kang Hsia,
Yuzhao Chen,
Yichen Hu,
Ming Yin,
Karthik Ramani
Abstract:
Generative AI (GenAI) is transforming the creativity process. However, as presented in this paper, GenAI encounters "narrow creativity" barriers. We observe that both humans and GenAI focus on limited subsets of the design space. We investigate this phenomenon using the "Circles Exercise," a creativity test widely used to examine the creativity of humans. Quantitative analysis reveals that humans…
▽ More
Generative AI (GenAI) is transforming the creativity process. However, as presented in this paper, GenAI encounters "narrow creativity" barriers. We observe that both humans and GenAI focus on limited subsets of the design space. We investigate this phenomenon using the "Circles Exercise," a creativity test widely used to examine the creativity of humans. Quantitative analysis reveals that humans tend to generate familiar, high-frequency ideas, while GenAI produces a larger volume of incremental innovations at a low cost. However, similar to humans, it struggles to significantly expand creative boundaries. Moreover, advanced prompting strategies, such as Chain-of-Thought (CoT) prompting, mitigate narrow creativity issues but still fall short of substantially broadening the creative scope of humans and GenAI. These findings underscore both the challenges and opportunities for advancing GenAI-powered human creativity support tools.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level
Authors:
Ruining Deng,
Tianyuan Yao,
Yucheng Tang,
Junlin Guo,
Siqi Lu,
Juming Xiong,
Lining Yu,
Quan Huu Cap,
Pengzhou Cai,
Libin Lan,
Ze Zhao,
Adrian Galdran,
Amit Kumar,
Gunjan Deotale,
Dev Kumar Das,
Inyoung Paik,
Joonho Lee,
Geongyu Lee,
Yujia Chen,
Wangkai Li,
Zhaoyang Li,
Xuege Hou,
Zeyuan Wu,
Shengjin Wang,
Maximilian Fischer
, et al. (22 additional authors not shown)
Abstract:
Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge…
▽ More
Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge, introducing a dataset that incorporates preclinical rodent models of CKD with over 10,000 annotated glomeruli from 60+ Periodic Acid Schiff (PAS)-stained whole slide images. The challenge includes two tasks, patch-level segmentation and whole slide image segmentation and detection, evaluated using the Dice Similarity Coefficient (DSC) and F1-score. By encouraging innovative segmentation methods that adapt to diverse CKD models and tissue conditions, the KPIs Challenge aims to advance kidney pathology analysis, establish new benchmarks, and enable precise, large-scale quantification for disease research and diagnosis.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Limit distributions for cycles of random parking functions
Authors:
J. E. Paguyo,
Mei Yin
Abstract:
We study the asymptotic behavior of cycles of uniformly random parking functions. Our results are multifold: we obtain an explicit formula for the number of parking functions with a prescribed number of cyclic points and show that the scaled number of cyclic points of a random parking function is asymptotically Rayleigh distributed; we establish the classical trio of limit theorems (law of large n…
▽ More
We study the asymptotic behavior of cycles of uniformly random parking functions. Our results are multifold: we obtain an explicit formula for the number of parking functions with a prescribed number of cyclic points and show that the scaled number of cyclic points of a random parking function is asymptotically Rayleigh distributed; we establish the classical trio of limit theorems (law of large numbers, central limit theorem, large deviation principle) for the number of cycles in a random parking function; we also compute the asymptotic mean of the length of the $r$th longest cycle in a random parking function for all valid $r$. A variety of tools from probability theory and combinatorics are used in our investigation. Corresponding results for the class of prime parking functions are obtained.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Authors:
Kaixuan Huang,
Jiacheng Guo,
Zihao Li,
Xiang Ji,
Jiawei Ge,
Wenzhe Li,
Yingqing Guo,
Tianle Cai,
Hui Yuan,
Runzhe Wang,
Yue Wu,
Ming Yin,
Shange Tang,
Yangsibo Huang,
Chi Jin,
Xinyun Chen,
Chiyuan Zhang,
Mengdi Wang
Abstract:
Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations -- modifications that still preserve the underl…
▽ More
Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations -- modifications that still preserve the underlying reasoning patterns of the solutions. However, no work has explored hard perturbations, which fundamentally change the nature of the problem so that the original solution steps do not apply. To bridge the gap, we construct MATH-P-Simple and MATH-P-Hard via simple perturbation and hard perturbation, respectively. Each consists of 279 perturbed math problems derived from level-5 (hardest) problems in the MATH dataset (Hendrycksmath et. al., 2021). We observe significant performance drops on MATH-P-Hard across various models, including o1-mini (-16.49%) and gemini-2.0-flash-thinking (-12.9%). We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills without assessing their applicability to modified contexts. This issue is amplified when using original problems for in-context learning. We call for research efforts to address this challenge, which is critical for developing more robust and reliable reasoning models.
△ Less
Submitted 12 February, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models
Authors:
Ying Zhang,
Maoliang Yin,
Wenfu Bi,
Haibao Yan,
Shaohan Bian,
Cui-Hua Zhang,
Changchun Hua
Abstract:
Service robots operating in unstructured environments must effectively recognize and segment unknown objects to enhance their functionality. Traditional supervised learningbased segmentation techniques require extensive annotated datasets, which are impractical for the diversity of objects encountered in real-world scenarios. Unseen Object Instance Segmentation (UOIS) methods aim to address this b…
▽ More
Service robots operating in unstructured environments must effectively recognize and segment unknown objects to enhance their functionality. Traditional supervised learningbased segmentation techniques require extensive annotated datasets, which are impractical for the diversity of objects encountered in real-world scenarios. Unseen Object Instance Segmentation (UOIS) methods aim to address this by training models on synthetic data to generalize to novel objects, but they often suffer from the simulation-to-reality gap. This paper proposes a novel approach (ZISVFM) for solving UOIS by leveraging the powerful zero-shot capability of the segment anything model (SAM) and explicit visual representations from a selfsupervised vision transformer (ViT). The proposed framework operates in three stages: (1) generating object-agnostic mask proposals from colorized depth images using SAM, (2) refining these proposals using attention-based features from the selfsupervised ViT to filter non-object masks, and (3) applying K-Medoids clustering to generate point prompts that guide SAM towards precise object segmentation. Experimental validation on two benchmark datasets and a self-collected dataset demonstrates the superior performance of ZISVFM in complex environments, including hierarchical settings such as cabinets, drawers, and handheld objects. Our source code is available at https://github.com/Yinmlmaoliang/zisvfm.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation
Authors:
Jiaqing Zhang,
Mingjia Yin,
Hao Wang,
Yawen Li,
Yuyang Ye,
Xingyu Lou,
Junping Du,
Enhong Chen
Abstract:
In the era of data-centric AI, the focus of recommender systems has shifted from model-centric innovations to data-centric approaches. The success of modern AI models is built on large-scale datasets, but this also results in significant training costs. Dataset distillation has emerged as a key solution, condensing large datasets to accelerate model training while preserving model performance. How…
▽ More
In the era of data-centric AI, the focus of recommender systems has shifted from model-centric innovations to data-centric approaches. The success of modern AI models is built on large-scale datasets, but this also results in significant training costs. Dataset distillation has emerged as a key solution, condensing large datasets to accelerate model training while preserving model performance. However, condensing discrete and sequentially correlated user-item interactions, particularly with extensive item sets, presents considerable challenges. This paper introduces \textbf{TD3}, a novel \textbf{T}ucker \textbf{D}ecomposition based \textbf{D}ataset \textbf{D}istillation method within a meta-learning framework, designed for sequential recommendation. TD3 distills a fully expressive \emph{synthetic sequence summary} from original data. To efficiently reduce computational complexity and extract refined latent patterns, Tucker decomposition decouples the summary into four factors: \emph{synthetic user latent factor}, \emph{temporal dynamics latent factor}, \emph{shared item latent factor}, and a \emph{relation core} that models their interconnections. Additionally, a surrogate objective in bi-level optimization is proposed to align feature spaces extracted from models trained on both original data and synthetic sequence summary beyond the naïve performance matching approach. In the \emph{inner-loop}, an augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the \emph{outer-loop}. To accelerate the optimization process and address long dependencies, RaT-BPTT is employed for bi-level optimization. Experiments and analyses on multiple public datasets have confirmed the superiority and cross-architecture generalizability of the proposed designs. Codes are released at https://github.com/USTC-StarTeam/TD3.
△ Less
Submitted 6 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Scalable Precise Computation of Shannon Entropy
Authors:
Yong Lai,
Haolong Tong,
Zhenghang Xu,
Minghao Yin
Abstract:
Quantitative information flow analyses (QIF) are a class of techniques for measuring the amount of confidential information leaked by a program to its public outputs. Shannon entropy is an important method to quantify the amount of leakage in QIF. This paper focuses on the programs modeled in Boolean constraints and optimizes the two stages of the Shannon entropy computation to implement a scalabl…
▽ More
Quantitative information flow analyses (QIF) are a class of techniques for measuring the amount of confidential information leaked by a program to its public outputs. Shannon entropy is an important method to quantify the amount of leakage in QIF. This paper focuses on the programs modeled in Boolean constraints and optimizes the two stages of the Shannon entropy computation to implement a scalable precise tool PSE. In the first stage, we design a knowledge compilation language called \ADDAND that combines Algebraic Decision Diagrams and conjunctive decomposition. \ADDAND avoids enumerating possible outputs of a program and supports tractable entropy computation. In the second stage, we optimize the model counting queries that are used to compute the probabilities of outputs. We compare PSE with the state-of-the-art probabilistic approximately correct tool EntropyEstimation, which was shown to significantly outperform the previous precise tools. The experimental results demonstrate that PSE solved 56 more benchmarks compared to EntropyEstimation in a total of 459. For 98\% of the benchmarks that both PSE and EntropyEstimation solved, PSE is at least $10\times$ as efficient as EntropyEstimation.
△ Less
Submitted 14 June, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Probabilistic $(m,n)$-Parking Functions
Authors:
Pamela E. Harris,
Rodrigo Ribeiro,
Mei Yin
Abstract:
In this article, we establish new results on the probabilistic parking model (introduced by Durmíc, Han, Harris, Ribeiro, and Yin) with $m$ cars and $n$ parking spots and probability parameter $p\in[0,1]$. For any $ m \leq n$ and $p \in [0,1]$, we study the parking preference of the last car, denoted $a_m$, and determine the conditional distribution of $a_m$ and compute its expected value. We show…
▽ More
In this article, we establish new results on the probabilistic parking model (introduced by Durmíc, Han, Harris, Ribeiro, and Yin) with $m$ cars and $n$ parking spots and probability parameter $p\in[0,1]$. For any $ m \leq n$ and $p \in [0,1]$, we study the parking preference of the last car, denoted $a_m$, and determine the conditional distribution of $a_m$ and compute its expected value. We show that both formulas depict explicit dependence on the probability parameter $p$. We study the case where $m = cn $ for some $ 0 < c < 1 $ and investigate the asymptotic behavior and show that the presence of ``extra spots'' on the street significantly affects the rate at which the conditional distribution of $ a_m $ converges to the uniform distribution on $[n]$. Even for small $ \varepsilon = 1 - c $, an $ \varepsilon $-proportion of extra spots reduces the convergence rate from $ 1/\sqrt{n} $ to $ 1/n $ when $ p \neq 1/2 $. Additionally, we examine how the convergence rate depends on $c$, while keeping $n$ and $p$ fixed. We establish that as $c$ approaches zero, the total variation distance between the conditional distribution of $a_m$ and the uniform distribution on $[n]$ decreases at least linearly in $c$.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Gaussian Process-Based Prediction and Control of Hammerstein-Wiener Systems
Authors:
Mingzhou Yin,
Matthias A. Müller
Abstract:
This work investigates data-driven prediction and control of Hammerstein-Wiener systems using physics-informed Gaussian process models. Data-driven prediction algorithms have been developed for structured nonlinear systems based on Willems' fundamental lemma. However, existing frameworks cannot treat output nonlinearities and require a dictionary of basis functions for Hammerstein systems. In this…
▽ More
This work investigates data-driven prediction and control of Hammerstein-Wiener systems using physics-informed Gaussian process models. Data-driven prediction algorithms have been developed for structured nonlinear systems based on Willems' fundamental lemma. However, existing frameworks cannot treat output nonlinearities and require a dictionary of basis functions for Hammerstein systems. In this work, an implicit predictor structure is considered, leveraging the multi-step-ahead ARX structure for the linear part of the model. This implicit function is learned by Gaussian process regression with kernel functions designed from Gaussian process priors for the nonlinearities. The linear model parameters are estimated as hyperparameters by assuming a stable spline hyperprior. The implicit Gaussian process model provides explicit output prediction by optimizing selected optimality criteria. The model is also applied to receding horizon control with the expected control cost and chance constraint satisfaction guarantee. Numerical results demonstrate that the proposed prediction and control algorithms are superior to black-box Gaussian process models.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
Authors:
Maxwell J. Yin,
Dingyi Jiang,
Yongbing Chen,
Boyu Wang,
Charles Ling
Abstract:
Chain-of-Thought (CoT) reasoning in smaller language models is a challenging natural language process problem yet highly desirable in many real-life applications. Existing CoT knowledge distillation methods often suffer from overly conservative memorization in smaller LLMs, leading to low generalization confidence. As fully preserving the CoT ability of teacher model is impossible, we hypothesize…
▽ More
Chain-of-Thought (CoT) reasoning in smaller language models is a challenging natural language process problem yet highly desirable in many real-life applications. Existing CoT knowledge distillation methods often suffer from overly conservative memorization in smaller LLMs, leading to low generalization confidence. As fully preserving the CoT ability of teacher model is impossible, we hypothesize that adversarial CoT fine-tuning is crucial for developing smaller LLM with robust CoT generalization. To this end, we propose \textit{PRompt-Assisted Domain-Adversarial fine-tuning} (PRADA), a principled fine-tuning framework that integrates diverse CoT domains. Specifically, PRADA pioneers two CoT improvements in smaller LLM: (1) Recovering the domain-invariant feature insight which typically lost during distillation with domain adversarial fine-tuning; (2) Enhancing the domain adaptability of CoT prompt engineering by employing domain-adversarial approaches. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks. Moreover, our empirical findings reveal that the smaller LLM, when leveraging PRADA, aligns closely with domain knowledge, thereby improving the explainability of our approach.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Low-Complex Waveform, Modulation and Coding Designs for 3GPP Ambient IoT
Authors:
Mingxi Yin,
Chao Wei,
Kazuki Takeda,
Yinhua Jia,
Changlong Xu,
Chengjin Zhang,
Hao Xu
Abstract:
This paper presents a comprehensive study on low-complexity waveform, modulation and coding (WMC) designs for the 3rd Generation Partnership Project (3GPP) Ambient Internet of Things (A-IoT). A-IoT is a low-cost, low-power IoT system inspired by Ultra High Frequency (UHF) Radio Frequency Identification (RFID) and aims to leverage existing cellular network infrastructure for efficient RF tag manage…
▽ More
This paper presents a comprehensive study on low-complexity waveform, modulation and coding (WMC) designs for the 3rd Generation Partnership Project (3GPP) Ambient Internet of Things (A-IoT). A-IoT is a low-cost, low-power IoT system inspired by Ultra High Frequency (UHF) Radio Frequency Identification (RFID) and aims to leverage existing cellular network infrastructure for efficient RF tag management. The paper compares the physical layer (PHY) design challenges and requirements of RFID and A-IoT, particularly focusing on backscatter communications. An overview of the standardization for PHY designs in Release 19 A-IoT is provided, along with detailed schemes of the proposed low-complex WMC designs. The performance of device-to-reader link designs is validated through simulations, demonstrating 6 dB improvements of the proposed baseband waveform with coherent receivers compared to RFID line coding-based solutions with non-coherent receivers when channel coding is adopted.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit
Authors:
Yuechen Yang,
Yu Wang,
Tianyuan Yao,
Ruining Deng,
Mengmeng Yin,
Shilin Zhao,
Haichun Yang,
Yuankai Huo
Abstract:
Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples. However, traditional feature extraction pipelines based on tools like CellProfiler often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI. To address…
▽ More
Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples. However, traditional feature extraction pipelines based on tools like CellProfiler often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI. To address these challenges, we present PySpatial, a high-speed pathomics toolkit specifically designed for WSI-level analysis. PySpatial streamlines the conventional pipeline by directly operating on computational regions of interest, reducing redundant processing steps. Utilizing rtree-based spatial indexing and matrix-based computation, PySpatial efficiently maps and processes computational regions, significantly accelerating feature extraction while maintaining high accuracy. Our experiments on two datasets-Perivascular Epithelioid Cell (PEC) and data from the Kidney Precision Medicine Project (KPMP)-demonstrate substantial performance improvements. For smaller and sparse objects in PEC datasets, PySpatial achieves nearly a 10-fold speedup compared to standard CellProfiler pipelines. For larger objects, such as glomeruli and arteries in KPMP datasets, PySpatial achieves a 2-fold speedup. These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
No-Regret Linear Bandits under Gap-Adjusted Misspecification
Authors:
Chong Liu,
Dan Qiao,
Ming Yin,
Ilija Bogunovic,
Yu-Xiang Wang
Abstract:
This work studies linear bandits under a new notion of gap-adjusted misspecification and is an extension of Liu et al. (2023). When the underlying reward function is not linear, existing linear bandits work usually relies on a uniform misspecification parameter $ε$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $ε> 0$. We pr…
▽ More
This work studies linear bandits under a new notion of gap-adjusted misspecification and is an extension of Liu et al. (2023). When the underlying reward function is not linear, existing linear bandits work usually relies on a uniform misspecification parameter $ε$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $ε> 0$. We propose a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions.
Quite surprisingly, we show that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against such $ρ$-gap-adjusted misspecification with parameter $ρ$ diminishing at $O(1/(d \sqrt{\log T}))$. It achieves a near-optimal $O(\sqrt{T})$ regret for problems that the best-known regret is almost linear in time horizon $T$. We further advance this frontier by presenting a novel phased elimination-based algorithm whose gap-adjusted misspecification parameter $ρ= O(1/\sqrt{d})$ does not scale with $T$. This algorithm attains optimal $O(\sqrt{T})$ regret and is deployment-efficient, requiring only $\log T$ batches of exploration. It also enjoys an adaptive $O(\log T)$ regret when a constant suboptimality gap exists. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself, and a new inductive lemma that limits the misspecification error within the suboptimality gap for all valid actions in each batch selected by G-optimal design.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
On the Statistical Complexity for Offline and Low-Adaptive Reinforcement Learning with Structures
Authors:
Ming Yin,
Mengdi Wang,
Yu-Xiang Wang
Abstract:
This article reviews the recent advances on the statistical foundation of reinforcement learning (RL) in the offline and low-adaptive settings. We will start by arguing why offline RL is the appropriate model for almost any real-life ML problems, even if they have nothing to do with the recent AI breakthroughs that use RL. Then we will zoom into two fundamental problems of offline RL: offline poli…
▽ More
This article reviews the recent advances on the statistical foundation of reinforcement learning (RL) in the offline and low-adaptive settings. We will start by arguing why offline RL is the appropriate model for almost any real-life ML problems, even if they have nothing to do with the recent AI breakthroughs that use RL. Then we will zoom into two fundamental problems of offline RL: offline policy evaluation (OPE) and offline policy learning (OPL). It may be surprising to people that tight bounds for these problems were not known even for tabular and linear cases until recently. We delineate the differences between worst-case minimax bounds and instance-dependent bounds. We also cover key algorithmic ideas and proof techniques behind near-optimal instance-dependent methods in OPE and OPL. Finally, we discuss the limitations of offline RL and review a burgeoning problem of \emph{low-adaptive exploration} which addresses these limitations by providing a sweet middle ground between offline and online RL.
△ Less
Submitted 3 January, 2025;
originally announced January 2025.
-
ProtCLIP: Function-Informed Protein Multi-Modal Learning
Authors:
Hanjing Zhou,
Mingze Yin,
Wei Wu,
Mingyang Li,
Kun Fu,
Jintai Chen,
Jian Wu,
Zheng Wang
Abstract:
Multi-modality pre-training paradigm that aligns protein sequences and biological descriptions has learned general protein representations and achieved promising performance in various downstream applications. However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data…
▽ More
Multi-modality pre-training paradigm that aligns protein sequences and biological descriptions has learned general protein representations and achieved promising performance in various downstream applications. However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data and the lack of an effective function-informed pre-training paradigm. To address these issues, this paper curates a large-scale protein-text paired dataset called ProtAnno with a property-driven sampling strategy, and introduces a novel function-informed protein pre-training paradigm. Specifically, the sampling strategy determines selecting probability based on the sample confidence and property coverage, balancing the data quality and data quantity in face of large-scale noisy data. Furthermore, motivated by significance of the protein specific functional mechanism, the proposed paradigm explicitly model protein static and dynamic functional segments by two segment-wise pre-training objectives, injecting fine-grained information in a function-informed manner. Leveraging all these innovations, we develop ProtCLIP, a multi-modality foundation model that comprehensively represents function-aware protein embeddings. On 22 different protein benchmarks within 5 types, including protein functionality classification, mutation effect prediction, cross-modal transformation, semantic similarity inference and protein-protein interaction prediction, our ProtCLIP consistently achieves SOTA performance, with remarkable improvements of 75% on average in five cross-modal transformation benchmarks, 59.9% in GO-CC and 39.7% in GO-BP protein function prediction. The experimental results verify the extraordinary potential of ProtCLIP serving as the protein multi-modality foundation model.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
Convolutional Deep Operator Networks for Learning Nonlinear Focused Ultrasound Wave Propagation in Heterogeneous Spinal Cord Anatomy
Authors:
Avisha Kumar,
Xuzhe Zhi,
Zan Ahmad,
Minglang Yin,
Amir Manbachi
Abstract:
Focused ultrasound (FUS) therapy is a promising tool for optimally targeted treatment of spinal cord injuries (SCI), offering submillimeter precision to enhance blood flow at injury sites while minimizing impact on surrounding tissues. However, its efficacy is highly sensitive to the placement of the ultrasound source, as the spinal cord's complex geometry and acoustic heterogeneity distort and at…
▽ More
Focused ultrasound (FUS) therapy is a promising tool for optimally targeted treatment of spinal cord injuries (SCI), offering submillimeter precision to enhance blood flow at injury sites while minimizing impact on surrounding tissues. However, its efficacy is highly sensitive to the placement of the ultrasound source, as the spinal cord's complex geometry and acoustic heterogeneity distort and attenuate the FUS signal. Current approaches rely on computer simulations to solve the governing wave propagation equations and compute patient-specific pressure maps using ultrasound images of the spinal cord anatomy. While accurate, these high-fidelity simulations are computationally intensive, taking up to hours to complete parameter sweeps, which is impractical for real-time surgical decision-making. To address this bottleneck, we propose a convolutional deep operator network (DeepONet) to rapidly predict FUS pressure fields in patient spinal cords. Unlike conventional neural networks, DeepONets are well equipped to approximate the solution operator of the parametric partial differential equations (PDEs) that govern the behavior of FUS waves with varying initial and boundary conditions (i.e., new transducer locations or spinal cord geometries) without requiring extensive simulations. Trained on simulated pressure maps across diverse patient anatomies, this surrogate model achieves real-time predictions with only a 2% loss on the test set, significantly accelerating the modeling of nonlinear physical systems in heterogeneous domains. By facilitating rapid parameter sweeps in surgical settings, this work provides a crucial step toward precise and individualized solutions in neurosurgical treatments.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
Authors:
Yudong Jiang,
Baohan Xu,
Siqian Yang,
Mingyu Yin,
Jing Liu,
Chao Xu,
Siqi Wang,
Yidi Wu,
Bingwen Zhu,
Xinwen Zhang,
Xingyu Zheng,
Jixuan Xu,
Yue Zhang,
Jinlong Hou,
Huyang Sun
Abstract:
Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerate…
▽ More
Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, with specifically developed metrics for animation video generation. Our entire project is publicly available on https://github.com/bilibili/Index-anisora/tree/main.
△ Less
Submitted 22 May, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Authors:
Junchao Zhu,
Ruining Deng,
Tianyuan Yao,
Juming Xiong,
Chongyu Qu,
Junlin Guo,
Siqi Lu,
Mengmeng Yin,
Yu Wang,
Shilin Zhao,
Haichun Yang,
Yuankai Huo
Abstract:
Spatial transcriptomics (ST) is an emerging technology that enables medical computer vision scientists to automatically interpret the molecular profiles underlying morphological features. Currently, however, most deep learning-based ST analyses are limited to two-dimensional (2D) sections, which can introduce diagnostic errors due to the heterogeneity of pathological tissues across 3D sections. Ex…
▽ More
Spatial transcriptomics (ST) is an emerging technology that enables medical computer vision scientists to automatically interpret the molecular profiles underlying morphological features. Currently, however, most deep learning-based ST analyses are limited to two-dimensional (2D) sections, which can introduce diagnostic errors due to the heterogeneity of pathological tissues across 3D sections. Expanding ST to three-dimensional (3D) volumes is challenging due to the prohibitive costs; a 2D ST acquisition already costs over 50 times more than whole slide imaging (WSI), and a full 3D volume with 10 sections can be an order of magnitude more expensive. To reduce costs, scientists have attempted to predict ST data directly from WSI without performing actual ST acquisition. However, these methods typically yield unsatisfying results. To address this, we introduce a novel problem setting: 3D ST imputation using 3D WSI histology sections combined with a single 2D ST slide. To do so, we present the Anatomy-aware Spatial Imputation Graph Network (ASIGN) for more precise, yet affordable, 3D ST modeling. The ASIGN architecture extends existing 2D spatial relationships into 3D by leveraging cross-layer overlap and similarity-based expansion. Moreover, a multi-level spatial attention graph network integrates features comprehensively across different data sources. We evaluated ASIGN on three public spatial transcriptomics datasets, with experimental results demonstrating that ASIGN achieves state-of-the-art performance on both 2D and 3D scenarios. Code is available at https://github.com/hrlblab/ASIGN.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
GloFinder: AI-empowered QuPath Plugin for WSI-level Glomerular Detection, Visualization, and Curation
Authors:
Jialin Yue,
Tianyuan Yao,
Ruining Deng,
Siqi Lu,
Junlin Guo,
Quan Liu,
Mengmeng Yin,
Juming Xiong,
Haichun Yang,
Yuankai Huo
Abstract:
Artificial intelligence (AI) has demonstrated significant success in automating the detection of glomeruli, the key functional units of the kidney, from whole slide images (WSIs) in kidney pathology. However, existing open-source tools are often distributed as source code or Docker containers, requiring advanced programming skills that hinder accessibility for non-programmers, such as clinicians.…
▽ More
Artificial intelligence (AI) has demonstrated significant success in automating the detection of glomeruli, the key functional units of the kidney, from whole slide images (WSIs) in kidney pathology. However, existing open-source tools are often distributed as source code or Docker containers, requiring advanced programming skills that hinder accessibility for non-programmers, such as clinicians. Additionally, current models are typically trained on a single dataset and lack flexibility in adjusting confidence levels for predictions. To overcome these challenges, we introduce GloFinder, a QuPath plugin designed for single-click automated glomeruli detection across entire WSIs with online editing through the graphical user interface (GUI). GloFinder employs CircleNet, an anchor-free detection framework utilizing circle representations for precise object localization, with models trained on approximately 160,000 manually annotated glomeruli. To further enhance accuracy, the plugin incorporates Weighted Circle Fusion (WCF), an ensemble method that combines confidence scores from multiple CircleNet models to produce refined predictions, achieving superior performance in glomerular detection. GloFinder enables direct visualization and editing of results in QuPath, facilitating seamless interaction for clinicians and providing a powerful tool for nephropathology research and clinical practice.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Diffeomorphic Latent Neural Operators for Data-Efficient Learning of Solutions to Partial Differential Equations
Authors:
Zan Ahmad,
Shiyi Chen,
Minglang Yin,
Avisha Kumar,
Nicolas Charon,
Natalia Trayanova,
Mauro Maggioni
Abstract:
A computed approximation of the solution operator to a system of partial differential equations (PDEs) is needed in various areas of science and engineering. Neural operators have been shown to be quite effective at predicting these solution generators after training on high-fidelity ground truth data (e.g. numerical simulations). However, in order to generalize well to unseen spatial domains, neu…
▽ More
A computed approximation of the solution operator to a system of partial differential equations (PDEs) is needed in various areas of science and engineering. Neural operators have been shown to be quite effective at predicting these solution generators after training on high-fidelity ground truth data (e.g. numerical simulations). However, in order to generalize well to unseen spatial domains, neural operators must be trained on an extensive amount of geometrically varying data samples that may not be feasible to acquire or simulate in certain contexts (e.g., patient-specific medical data, large-scale computationally intensive simulations.) We propose that in order to learn a PDE solution operator that can generalize across multiple domains without needing to sample enough data expressive enough for all possible geometries, we can train instead a latent neural operator on just a few ground truth solution fields diffeomorphically mapped from different geometric/spatial domains to a fixed reference configuration. Furthermore, the form of the solutions is dependent on the choice of mapping to and from the reference domain. We emphasize that preserving properties of the differential operator when constructing these mappings can significantly reduce the data requirement for achieving an accurate model due to the regularity of the solution fields that the latent neural operator is training on. We provide motivating numerical experimentation that demonstrates an extreme case of this consideration by exploiting the conformal invariance of the Laplacian
△ Less
Submitted 29 November, 2024; v1 submitted 26 November, 2024;
originally announced November 2024.
-
Glo-In-One-v2: Holistic Identification of Glomerular Cells, Tissues, and Lesions in Human and Mouse Histopathology
Authors:
Lining Yu,
Mengmeng Yin,
Ruining Deng,
Quan Liu,
Tianyuan Yao,
Can Cui,
Junlin Guo,
Yu Wang,
Yaohong Wang,
Shilin Zhao,
Haichun Yang,
Yuankai Huo
Abstract:
Segmenting glomerular intraglomerular tissue and lesions traditionally depends on detailed morphological evaluations by expert nephropathologists, a labor-intensive process susceptible to interobserver variability. Our group previously developed the Glo-In-One toolkit for integrated detection and segmentation of glomeruli. In this study, we leverage the Glo-In-One toolkit to version 2 with fine-gr…
▽ More
Segmenting glomerular intraglomerular tissue and lesions traditionally depends on detailed morphological evaluations by expert nephropathologists, a labor-intensive process susceptible to interobserver variability. Our group previously developed the Glo-In-One toolkit for integrated detection and segmentation of glomeruli. In this study, we leverage the Glo-In-One toolkit to version 2 with fine-grained segmentation capabilities, curating 14 distinct labels for tissue regions, cells, and lesions across a dataset of 23,529 annotated glomeruli across human and mouse histopathology data. To our knowledge, this dataset is among the largest of its kind to date.In this study, we present a single dynamic head deep learning architecture designed to segment 14 classes within partially labeled images of human and mouse pathology data. Our model was trained using a training set derived from 368 annotated kidney whole-slide images (WSIs) to identify 5 key intraglomerular tissues covering Bowman's capsule, glomerular tuft, mesangium, mesangial cells, and podocytes. Additionally, the network segments 9 glomerular lesion classes including adhesion, capsular drop, global sclerosis, hyalinosis, mesangial lysis, microaneurysm, nodular sclerosis, mesangial expansion, and segmental sclerosis. The glomerulus segmentation model achieved a decent performance compared with baselines, and achieved a 76.5 % average Dice Similarity Coefficient (DSC). Additional, transfer learning from rodent to human for glomerular lesion segmentation model has enhanced the average segmentation accuracy across different types of lesions by more than 3 %, as measured by Dice scores. The Glo-In-One-v2 model and trained weight have been made publicly available at https: //github.com/hrlblab/Glo-In-One_v2.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.