Skip to main content

Showing 1–50 of 156 results for author: Ren, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06739  [pdf, ps, other

    cs.CV

    PromptTea: Let Prompts Tell TeaCache the Optimal Threshold

    Authors: Zishen Huang, Chunyu Yang, Mengyuan Ren

    Abstract: Despite recent progress in video generation, inference speed remains a major bottleneck. A common acceleration strategy involves reusing model outputs via caching mechanisms at fixed intervals. However, we find that such fixed-frequency reuse significantly degrades quality in complex scenes, while manually tuning reuse thresholds is inefficient and lacks robustness. To address this, we propose Pro… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  2. arXiv:2507.04221  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Context Tuning for In-Context Optimization

    Authors: Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren

    Abstract: We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for large language models (LLMs), they typically initialize a trainable prompt or prefix with irrelevant tokens for the t… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: A short version of this paper has been accepted for publication in the Workshop on Test-Time Adaptation (PUT) at the International Conference on Machine Learning (ICML) 2025

  3. arXiv:2506.14373  [pdf, ps, other

    cs.CV

    Discrete JEPA: Learning Discrete Token Representations without Reconstruction

    Authors: Junyeob Baek, Hosung Lee, Christopher Hoang, Mengye Ren, Sungjin Ahn

    Abstract: The cornerstone of cognitive intelligence lies in extracting hidden patterns from observations and leveraging these principles to systematically predict future outcomes. However, current image tokenization methods demonstrate significant limitations in tasks requiring symbolic abstraction and logical reasoning capabilities essential for systematic inference. To address this challenge, we propose D… ▽ More

    Submitted 22 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  4. arXiv:2506.04377  [pdf, ps, other

    cs.LG

    Replay Can Provably Increase Forgetting

    Authors: Yasaman Mahdaviyeh, James Lucas, Mengye Ren, Andreas S. Tolias, Richard Zemel, Toniann Pitassi

    Abstract: Continual learning seeks to enable machine learning systems to solve an increasing corpus of tasks sequentially. A critical challenge for continual learning is forgetting, where the performance on previously learned tasks decreases as new tasks are introduced. One of the commonly used techniques to mitigate forgetting, sample replay, has been shown empirically to reduce forgetting by retaining som… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: To appear in the Proceedings of the Conference on Lifelong Learning Agents (CoLLAs) 2025

  5. arXiv:2505.21661  [pdf, ps, other

    cs.DC cs.PL

    KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads

    Authors: Yue Guan, Yuanwei Fang, Keren Zhou, Corbin Robeck, Manman Ren, Zhongkai Yu, Yufei Ding, Adnan Aziz

    Abstract: In this work, we propose KPerfIR, a novel multilevel compiler-centric infrastructure to enable the development of customizable, extendable, and portable profiling tools tailored for modern artificial intelligence (AI) workloads on modern GPUs. Our approach integrates profiling capabilities directly into the compiler workflow, allowing profiling functionalities to be implemented as compiler passes,… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted to OSDI 2025

  6. arXiv:2505.16512  [pdf, ps, other

    cs.CV cs.AI

    Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection

    Authors: Jiaxin Liu, Jia Wang, Saihui Hou, Min Ren, Huijia Wu, Long Ma, Renwang Pei, Zhaofeng He

    Abstract: In recent years, the explosive advancement of deepfake technology has posed a critical and escalating threat to public security: diffusion-based digital human generation. Unlike traditional face manipulation methods, such models can generate highly realistic videos with consistency via multimodal control signals. Their flexibility and covertness pose severe challenges to existing detection strateg… ▽ More

    Submitted 3 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  7. arXiv:2505.08811  [pdf, other

    cs.CV cs.RO

    TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian

    Authors: Shijie Lian, Ziyi Zhang, Laurence Tianruo Yang and, Mengyu Ren, Debin Liu, Hua Li

    Abstract: Underwater 3D scene reconstruction is crucial for undewater robotic perception and navigation. However, the task is significantly challenged by the complex interplay between light propagation, water medium, and object surfaces, with existing methods unable to model their interactions accurately. Additionally, expensive training and rendering costs limit their practical application in underwater ro… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.01212  [pdf, other

    cs.CV eess.IV

    High Dynamic Range Novel View Synthesis with Single Exposure

    Authors: Kaixuan Zhang, Hu Wang, Minxian Li, Mingwu Ren, Mao Ye, Xiatian Zhu

    Abstract: High Dynamic Range Novel View Synthesis (HDR-NVS) aims to establish a 3D scene HDR model from Low Dynamic Range (LDR) imagery. Typically, multiple-exposure LDR images are employed to capture a wider range of brightness levels in a scene, as a single LDR image cannot represent both the brightest and darkest regions simultaneously. While effective, this multiple-exposure HDR-NVS approach has signifi… ▽ More

    Submitted 19 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: It has been accepted by ICML 2025

  9. arXiv:2504.11519  [pdf, other

    physics.med-ph cs.CV cs.LG

    FACT: Foundation Model for Assessing Cancer Tissue Margins with Mass Spectrometry

    Authors: Mohammad Farahmand, Amoon Jamzad, Fahimeh Fooladgar, Laura Connolly, Martin Kaufmann, Kevin Yi Mi Ren, John Rudan, Doug McKay, Gabor Fichtinger, Parvin Mousavi

    Abstract: Purpose: Accurately classifying tissue margins during cancer surgeries is crucial for ensuring complete tumor removal. Rapid Evaporative Ionization Mass Spectrometry (REIMS), a tool for real-time intraoperative margin assessment, generates spectra that require machine learning models to support clinical decision-making. However, the scarcity of labeled data in surgical contexts presents a signific… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Journal ref: International Journal of Computer Assisted Radiology and Surgery (2025)

  10. arXiv:2504.10519  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Toward Super Agent System with Hybrid AI Routers

    Authors: Yuhang Yao, Haixin Wang, Yibo Chen, Jiawen Wang, Min Chang Jordan Ren, Bosheng Ding, Salman Avestimehr, Chaoyang He

    Abstract: AI Agents powered by Large Language Models are transforming the world through enormous applications. A super agent has the potential to fulfill diverse user needs, such as summarization, coding, and research, by accurately understanding user intent and leveraging the appropriate tools to solve tasks. However, to make such an agent viable for real-world deployment and accessible at scale, significa… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  11. arXiv:2503.20244  [pdf, other

    cs.CR

    Software Vulnerability Analysis Across Programming Language and Program Representation Landscapes: A Survey

    Authors: Zhuoyun Qian, Fangtian Zhong, Qin Hu, Yili Jiang, Jiaqi Huang, Mengfei Ren, Jiguo Yu

    Abstract: Modern software systems are developed in diverse programming languages and often harbor critical vulnerabilities that attackers can exploit to compromise security. These vulnerabilities have been actively targeted in real-world attacks, causing substantial harm to users and cyberinfrastructure. Since many of these flaws originate from the code itself, a variety of techniques have been proposed to… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  12. arXiv:2503.18626  [pdf, other

    cs.CV

    Generative Dataset Distillation using Min-Max Diffusion Model

    Authors: Junqiao Fan, Yunjiao Zhou, Min Chang Jordan Ren, Jianfei Yang

    Abstract: In this paper, we address the problem of generative dataset distillation that utilizes generative models to synthesize images. The generator may produce any number of images under a preserved evaluation time. In this work, we leverage the popular diffusion model as the generator to compute a surrogate dataset, boosted by a min-max loss to control the dataset's diversity and representativeness duri… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: The paper is accepted as the ECCV2024 workshop paper and achieved second place in the generative track of The First Dataset Distillation Challenge of ECCV2024, https://www.dd-challenge.com/#/

    Journal ref: ECCV 2024 Workshop Paper

  13. arXiv:2503.15497  [pdf, other

    cs.HC cs.AI cs.CY

    The Impact of Big Five Personality Traits on AI Agent Decision-Making in Public Spaces: A Social Simulation Study

    Authors: Mingjun Ren, Wentao Xu

    Abstract: This study investigates how the Big Five personality traits influence decision-making processes in AI agents within public spaces. Using AgentVerse framework and GPT-3.5-turbo, we simulated interactions among 10 AI agents, each embodying different dimensions of the Big Five personality traits, in a classroom environment responding to misinformation. The experiment assessed both public expressions… ▽ More

    Submitted 15 January, 2025; originally announced March 2025.

  14. arXiv:2503.09634  [pdf, other

    cs.GR

    Identity Preserving Latent Diffusion for Brain Aging Modeling

    Authors: Gexin Huang, Zhangsihao Yang, Yalin Wang, Guido Gerig, Mengwei Ren, Xiaoxiao Li

    Abstract: Structural and appearance changes in brain imaging over time are crucial indicators of neurodevelopment and neurodegeneration. The rapid advancement of large-scale generative models provides a promising backbone for modeling these complex global and local changes in brain images, such as transforming the age of a source image to a target age. However, current generative models, typically trained o… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 19 pages, 10 figures

  15. arXiv:2503.01046  [pdf, other

    physics.optics cs.AI cs.ET

    MAPS: Multi-Fidelity AI-Augmented Photonic Simulation and Inverse Design Infrastructure

    Authors: Pingchuan Ma, Zhengqi Gao, Meng Zhang, Haoyu Yang, Mark Ren, Rena Huang, Duane S. Boning, Jiaqi Gu

    Abstract: Inverse design has emerged as a transformative approach for photonic device optimization, enabling the exploration of high-dimensional, non-intuitive design spaces to create ultra-compact devices and advance photonic integrated circuits (PICs) in computing and interconnects. However, practical challenges, such as suboptimal device performance, limited manufacturability, high sensitivity to variati… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: 6 pages. Accepted to DATE 2025

  16. arXiv:2501.12254  [pdf, ps, other

    cs.CV cs.LG

    Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

    Authors: Yanlai Yang, Mengye Ren

    Abstract: Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. I… ▽ More

    Submitted 3 July, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: Fourth Conference on Lifelong Learning Agents - CoLLAs 2025 (Oral)

  17. arXiv:2501.09756  [pdf, other

    cs.CV cs.GR

    SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

    Authors: Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, Zhixin Shu

    Abstract: We introduce SynthLight, a diffusion model for portrait relighting. Our approach frames image relighting as a re-rendering problem, where pixels are transformed in response to changes in environmental lighting conditions. Using a physically-based rendering engine, we synthesize a dataset to simulate this lighting-conditioned transformation with 3D head assets under varying lighting. We propose two… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 27 pages, 25 figures, Project Page https://vrroom.github.io/synthlight/

  18. arXiv:2501.06848  [pdf, other

    cs.LG cs.CL cs.CV

    A General Framework for Inference-time Scaling and Steering of Diffusion Models

    Authors: Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath

    Abstract: Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we propose Feynm… ▽ More

    Submitted 15 January, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  19. arXiv:2501.02869  [pdf, other

    cs.CL cs.AI

    IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment

    Authors: Yiming Zhang, Zheng Chang, Wentao Cai, MengXing Ren, Kang Yuan, Yining Sun, Zenghui Ding

    Abstract: Recent researches of large language models(LLM), which is pre-trained on massive general-purpose corpora, have achieved breakthroughs in responding human queries. However, these methods face challenges including limited data insufficiency to support extensive pre-training and can not align responses with users' instructions. To address these issues, we introduce a medical instruction dataset, CMed… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  20. arXiv:2412.19102  [pdf, other

    cs.CL

    "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

    Authors: Jiawei Yu, Xiang Geng, Yuang Li, Mengxin Ren, Wei Tang, Jiahuan Li, Zhibin Lan, Min Zhang, Hao Yang, Shujian Huang, Jinsong Su

    Abstract: Spoken named entity recognition (NER) aims to identify named entities from speech, playing an important role in speech processing. New named entities appear every day, however, annotating their Spoken NER data is costly. In this paper, we demonstrate that existing Spoken NER systems perform poorly when dealing with previously unseen named entities. To tackle this challenge, we propose a method for… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted by ICASSP 2025

  21. arXiv:2412.15373  [pdf, other

    cs.LG cs.AI stat.ML

    Granger Causality Detection with Kolmogorov-Arnold Networks

    Authors: Hongyu Lin, Mohan Ren, Paolo Barucca, Tomaso Aste

    Abstract: Discovering causal relationships in time series data is central in many scientific areas, ranging from economics to climate science. Granger causality is a powerful tool for causality detection. However, its original formulation is limited by its linear form and only recently nonlinear machine-learning generalizations have been introduced. This study contributes to the definition of neural Granger… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 8 pages, 2 figures, 2 tables

  22. arXiv:2412.13734  [pdf, other

    cs.CV

    Text2Relight: Creative Portrait Relighting with Text Guidance

    Authors: Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, Seungryul Baek

    Abstract: We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting. Our model modifies the lighting and color of both the foreground and background to align with the provided text description. The unbounded nature in creativeness of a text allows us to describe the lighting of a scene with any sensory features including temperature,… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  23. arXiv:2412.07298  [pdf, other

    cs.CL

    The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model

    Authors: Jiawei Chen, Wentao Chen, Jing Su, Jingjing Xu, Hongyu Lin, Mengjie Ren, Yaojie Lu, Xianpei Han, Le Sun

    Abstract: Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel… ▽ More

    Submitted 3 March, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: Accepted to ICLR 2025

  24. arXiv:2411.18941  [pdf, other

    cs.CV

    Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

    Authors: Hongda Liu, Yunfan Liu, Min Ren, Hao Wang, Yunlong Wang, Zhenan Sun

    Abstract: In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this… ▽ More

    Submitted 20 March, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

    Comments: Accepted by CVPR 2025

  25. arXiv:2411.17864  [pdf, other

    cs.CV

    Generative Image Layer Decomposition with Visual Effects

    Authors: Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou

    Abstract: Recent advancements in large generative models, particularly diffusion-based methods, have significantly enhanced the capabilities of image editing. However, achieving precise control over image composition tasks remains a challenge. Layered representations, which allow for independent editing of image components, are essential for user-driven content creation, yet existing approaches often strugg… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: The project page: https://rayjryang.github.io/LayerDecomp

  26. arXiv:2411.14384  [pdf, ps, other

    cs.CV cs.GR

    Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

    Authors: Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille

    Abstract: Existing feedforward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric cases. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object generation and scene reconstruction from a single view. DiffusionGS direct… ▽ More

    Submitted 26 June, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    Comments: ICCV 2025; A novel one-stage 3DGS-based diffusion for 3D object generation and scene reconstruction from a single view in ~6 seconds

  27. arXiv:2411.08324  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle

    Authors: Hui Dai, Ryan Teehan, Mengye Ren

    Abstract: Many existing evaluation benchmarks for Large Language Models (LLMs) quickly become outdated due to the emergence of new models and training data. These benchmarks also fall short in assessing how LLM performance changes over time, as they consist of a static set of questions without a temporal dimension. To address these limitations, we propose using future event prediction as a continuous evalua… ▽ More

    Submitted 8 July, 2025; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: ICML 2025

  28. arXiv:2411.05676  [pdf, other

    cs.LG cs.AI

    Improving Molecular Graph Generation with Flow Matching and Optimal Transport

    Authors: Xiaoyang Hou, Tian Zhu, Milong Ren, Dongbo Bu, Xin Gao, Chunming Zhang, Shiwei Sun

    Abstract: Generating molecular graphs is crucial in drug design and discovery but remains challenging due to the complex interdependencies between nodes and edges. While diffusion models have demonstrated their potentiality in molecular graph design, they often suffer from unstable training and inefficient sampling. To enhance generation performance and training stability, we propose GGFlow, a discrete flow… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  29. arXiv:2411.02445  [pdf, other

    cs.CV

    WiCV@CVPR2024: The Thirteenth Women In Computer Vision Workshop at the Annual CVPR Conference

    Authors: Asra Aslam, Sachini Herath, Ziqi Huang, Estefania Talavera, Deblina Bhattacharjee, Himangi Mittal, Vanessa Staderini, Mengwei Ren, Azade Farshad

    Abstract: In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2024, organized alongside the CVPR 2024 in Seattle, Washington, United States. WiCV aims to amplify the voices of underrepresented women in the computer vision community, fostering increased visibility in both academia and industry. We believe that such events play a vital role in addressing gender imbalances within… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2309.12768

  30. arXiv:2411.02372  [pdf, other

    cs.CV cs.LG

    Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis

    Authors: Neel Dey, Benjamin Billot, Hallee E. Wong, Clinton J. Wang, Mengwei Ren, P. Ellen Grant, Adrian V. Dalca, Polina Golland

    Abstract: Current volumetric biomedical foundation models struggle to generalize as public 3D datasets are small and do not cover the broad diversity of medical procedures, conditions, anatomical regions, and imaging protocols. We address this by creating a representation learning method that instead anticipates strong domain shifts at training time itself. We first propose a data engine that synthesizes hi… ▽ More

    Submitted 2 March, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: ICLR 2025: International Conference on Learning Representations. Code and model weights available at https://github.com/neel-dey/anatomix. Keywords: synthetic data, representation learning, medical image analysis, image registration, image segmentation

  31. arXiv:2410.12896  [pdf, other

    cs.CL

    A Survey on Data Synthesis and Augmentation for Large Language Models

    Authors: Ke Wang, Jiahui Zhu, Minjie Ren, Zeming Liu, Shiwei Li, Zongye Zhang, Chenkai Zhang, Xiaoyu Wu, Qiqi Zhan, Qingjie Liu, Yunhong Wang

    Abstract: The success of Large Language Models (LLMs) is inherently linked to the availability of vast, diverse, and high-quality data for training and evaluation. However, the growth rate of high-quality data is significantly outpaced by the expansion of training datasets, leading to a looming data exhaustion crisis. This underscores the urgent need to enhance data efficiency and explore new data sources.… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  32. arXiv:2410.05525  [pdf, other

    cs.CV

    Generative Portrait Shadow Removal

    Authors: Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Xuaner Zhang, Yannick Hold-Geoffroy, Krishna Kumar Singh, He Zhang

    Abstract: We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. While existing works have solved this problem by predicting the appearance residuals that c… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 17 pages, siggraph asia, TOG

  33. arXiv:2410.02309  [pdf, other

    cs.CV

    Decoupling Layout from Glyph in Online Chinese Handwriting Generation

    Authors: Min-Si Ren, Yan-Ming Zhang, Yi Chen

    Abstract: Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be di… ▽ More

    Submitted 24 February, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted by ICLR2025

  34. arXiv:2409.12311  [pdf, other

    cs.RO eess.SY

    Towards Closing the Loop in Robotic Pollination for Indoor Farming via Autonomous Microscopic Inspection

    Authors: Chuizheng Kong, Alex Qiu, Idris Wibowo, Marvin Ren, Aishik Dhori, Kai-Shu Ling, Ai-Ping Hu, Shreyas Kousik

    Abstract: Effective pollination is a key challenge for indoor farming, since bees struggle to navigate without the sun. While a variety of robotic system solutions have been proposed, it remains difficult to autonomously check that a flower has been sufficiently pollinated to produce high-quality fruit, which is especially critical for self-pollinating crops such as strawberries. To this end, this work prop… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  35. arXiv:2409.09715  [pdf, ps, other

    cs.IT cs.GT

    Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs

    Authors: Mengmeng Ren, Li Qiao, Long Yang, Zhen Gao, Jian Chen, Mahdi Boloursaz Mashhadi, Pei Xiao, Rahim Tafazolli, Mehdi Bennis

    Abstract: This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answeri… ▽ More

    Submitted 2 May, 2025; v1 submitted 15 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE Transactions on Vehicular Technology

  36. arXiv:2408.11208  [pdf, other

    cs.CV cs.LG

    PooDLe: Pooled and dense self-supervised learning from naturalistic videos

    Authors: Alex N. Wang, Christopher Hoang, Yuwen Xiong, Yann LeCun, Mengye Ren

    Abstract: Self-supervised learning has driven significant progress in learning from single-subject, iconic images. However, there are still unanswered questions about the use of minimally-curated, naturalistic video data, which contain dense scenes with many independent objects, imbalanced class distributions, and varying object sizes. In this paper, we propose PooDLe, a self-supervised learning method that… ▽ More

    Submitted 23 April, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: Project page: https://agenticlearning.ai/poodle/

  37. arXiv:2408.03281  [pdf, other

    cs.CL cs.AI cs.LG

    StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation

    Authors: Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun

    Abstract: Evaluation is the baton for the development of large language models. Current evaluations typically employ a single-item assessment paradigm for each atomic test objective, which struggles to discern whether a model genuinely possesses the required capabilities or merely memorizes/guesses the answers to specific questions. To this end, we propose a novel evaluation framework referred to as StructE… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024;Benchmark at https://github.com/c-box/StructEval ;Leaderboard at https://huggingface.co/spaces/Bowieee/StructEval_leaderboard

  38. arXiv:2408.02226  [pdf, other

    cs.CV

    ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation

    Authors: Jack Lu, Ryan Teehan, Mengye Ren

    Abstract: In this paper, we propose ProCreate, a simple and easy-to-implement method to improve sample diversity and creativity of diffusion-based image generative models and to prevent training data reproduction. ProCreate operates on a set of reference images and actively propels the generated image embedding away from the reference embeddings during the generation process. We propose FSCG-8 (Few-Shot Cre… ▽ More

    Submitted 6 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024. Project page: https://procreate-diffusion.github.io

  39. arXiv:2407.20741  [pdf, other

    cs.LG math.DS math.NA

    Improving PINNs By Algebraic Inclusion of Boundary and Initial Conditions

    Authors: Mohan Ren, Zhihao Fang, Keren Li, Anirbit Mukherjee

    Abstract: "AI for Science" aims to solve fundamental scientific problems using AI techniques. As most physical phenomena can be described as Partial Differential Equations (PDEs) , approximating their solutions using neural networks has evolved as a central component of scientific-ML. Physics-Informed Neural Networks (PINNs) is the general method that has evolved for this task but its training is well-known… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 48 Pages, 25 Figures

  40. arXiv:2407.16985  [pdf, ps, other

    cs.LG

    Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection

    Authors: Junjing Zheng, Xinyu Zhang, Weidong Jiang, Xiangfeng Qiu, Mingjian Ren

    Abstract: Recently, introducing Tensor Decomposition (TD) techniques into unsupervised feature selection (UFS) has been an emerging research topic. A tensor structure is beneficial for mining the relations between different modes and helps relieve the computation burden. However, while existing methods exploit TD to preserve the data tensor structure, they do not consider the influence of data orientation a… ▽ More

    Submitted 3 July, 2025; v1 submitted 24 July, 2024; originally announced July 2024.

  41. arXiv:2407.06551  [pdf, other

    cs.CL

    OffsetBias: Leveraging Debiased Data for Tuning Evaluators

    Authors: Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, Sanghyuk Choi

    Abstract: Employing Large Language Models (LLMs) to assess the quality of generated responses, such as prompting instruct-tuned models or fine-tuning judge models, has become a widely adopted evaluation method. It is also known that such evaluators are vulnerable to biases, such as favoring longer responses. While it is important to overcome this problem, the specifics of these biases remain under-explored.… ▽ More

    Submitted 7 October, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: EMNLP2024 Findings

  42. arXiv:2407.00322  [pdf

    cs.CL

    LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods

    Authors: Zhenhua Wang, Guang Xu, Ming Ren

    Abstract: With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  43. Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

    Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

    Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (IJCV), 2024

  44. arXiv:2406.08980  [pdf, other

    q-bio.BM cs.LG

    From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

    Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  45. arXiv:2406.05613  [pdf, other

    cs.RO

    Distributed Motion Control of Multiple Mobile Manipulators for Reducing Interaction Wrench in Object Manipulation

    Authors: Wenhang Liu, Meng Ren, Kun Song, Gaoming Chen, Michael Yu Wang, Zhenhua Xiong

    Abstract: In real-world cooperative manipulation of objects, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction wrenches and potentially causing object damage or emergency stops. Existing methods often rely on torque control and dynamic models, which are uncommon in many industrial robots and settings. Additionally, dynamic models often neglect… ▽ More

    Submitted 7 April, 2025; v1 submitted 8 June, 2024; originally announced June 2024.

  46. arXiv:2406.01252  [pdf, other

    cs.CL cs.AI stat.ML

    Towards Scalable Automated Alignment of LLMs: A Survey

    Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

    Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More

    Submitted 3 September, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Paper List: https://github.com/cascip/awesome-auto-alignment

  47. arXiv:2405.03178  [pdf, other

    cs.SD eess.AS

    POPDG: Popular 3D Dance Generation with PopDanceSet

    Authors: Zhenye Luo, Min Ren, Xuecai Hu, Yongzhen Huang, Li Yao

    Abstract: Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M… ▽ More

    Submitted 27 December, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  48. arXiv:2404.19132  [pdf, other

    cs.LG cs.CV

    Integrating Present and Past in Unsupervised Continual Learning

    Authors: Yipeng Zhang, Laurent Charlin, Richard Zemel, Mengye Ren

    Abstract: We formulate a unifying framework for unsupervised continual learning (UCL), which disentangles learning objectives that are specific to the present and the past data, encompassing stability, plasticity, and cross-task consolidation. The framework reveals that many existing UCL approaches overlook cross-task consolidation and try to balance plasticity and stability in a shared embedding space. Thi… ▽ More

    Submitted 12 August, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: CoLLAs 2024 (Oral)

  49. arXiv:2404.04904  [pdf, other

    cs.SD cs.AI eess.AS

    Cross-Domain Audio Deepfake Detection: Dataset and Analysis

    Authors: Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

    Abstract: Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-… ▽ More

    Submitted 20 September, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  50. arXiv:2403.15362  [pdf, other

    cs.CL cs.AI

    CoLLEGe: Concept Embedding Generation for Large Language Models

    Authors: Ryan Teehan, Brenden Lake, Mengye Ren

    Abstract: Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language model… ▽ More

    Submitted 16 October, 2024; v1 submitted 22 March, 2024; originally announced March 2024.