Skip to main content

Showing 1–50 of 129 results for author: Ni, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2506.13233  [pdf, ps, other

    cs.CV

    High-Quality Facial Albedo Generation for 3D Face Reconstruction from a Single Image using a Coarse-to-Fine Approach

    Authors: Jiashu Dai, Along Wang, Binfan Ni, Tao Cao

    Abstract: Facial texture generation is crucial for high-fidelity 3D face reconstruction from a single image. However, existing methods struggle to generate UV albedo maps with high-frequency details. To address this challenge, we propose a novel end-to-end coarse-to-fine approach for UV albedo map generation. Our method first utilizes a UV Albedo Parametric Model (UVAPM), driven by low-dimensional coefficie… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  3. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 4 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2505.14106  [pdf, ps, other

    cs.CL cs.AI

    A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

    Authors: Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernoncourt, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ruiyi Zhang, Zhengmian Hu, Bo Ni, Yuxiao Zhou, Zichao Wang, Yue Huang, Yu Wang, Xiangliang Zhang, Philip S. Yu, Xiyang Hu, Yue Zhao

    Abstract: We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on either personalization or conversational structure in isolation, PersonaConvBench integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text ge… ▽ More

    Submitted 25 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  5. arXiv:2505.11216  [pdf, ps, other

    cs.CV

    GeoMM: On Geodesic Perspective for Multi-modal Learning

    Authors: Shibin Mei, Hang Wang, Bingbing Ni

    Abstract: Geodesic distance serves as a reliable means of measuring distance in nonlinear spaces, and such nonlinear manifolds are prevalent in the current multimodal learning. In these scenarios, some samples may exhibit high similarity, yet they convey different semantics, making traditional distance metrics inadequate for distinguishing between positive and negative samples. This paper introduces geodesi… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 15 pages, 3 figures, accepted by CVPR2025

  6. arXiv:2505.02018  [pdf, ps, other

    cs.CV

    R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation

    Authors: Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxi Song, Haoyang Peng, Yi-Xuan Deng, Xinzhi Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-min Hu

    Abstract: Reasoning stands as a cornerstone of intelligence, enabling the synthesis of existing knowledge to solve complex problems. Despite remarkable progress, existing reasoning benchmarks often fail to rigorously evaluate the nuanced reasoning capabilities required for complex, real-world problemsolving, particularly in multi-disciplinary and multimodal contexts. In this paper, we introduce a graduate-l… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 18pages

  7. arXiv:2504.06620  [pdf, other

    cs.CV

    InstantSticker: Realistic Decal Blending via Disentangled Object Reconstruction

    Authors: Yi Zhang, Xiaoyang Huang, Yishun Dou, Yue Shi, Rui Shi, Ye Chen, Bingbing Ni, Wenjun Zhang

    Abstract: We present InstantSticker, a disentangled reconstruction pipeline based on Image-Based Lighting (IBL), which focuses on highly realistic decal blending, simulates stickers attached to the reconstructed surface, and allows for instant editing and real-time rendering. To achieve stereoscopic impression of the decal, we introduce shadow factor into IBL, which can be adaptively optimized during traini… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted by AAAI 2025

  8. arXiv:2503.10257  [pdf, other

    cs.LG

    AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation

    Authors: Zeyi Xu, Jinfan Liu, Kuangxu Chen, Ye Chen, Zhangli Hu, Bingbing Ni

    Abstract: Accurately and efficiently simulating complex fluid dynamics is a challenging task that has traditionally relied on computationally intensive methods. Neural network-based approaches, such as convolutional and graph neural networks, have partially alleviated this burden by enabling efficient local feature extraction. However, they struggle to capture long-range dependencies due to limited receptiv… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  9. arXiv:2502.16302  [pdf, other

    cs.CV

    DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation

    Authors: Yuxuan Xiong, Yue Shi, Yishun Dou, Bingbing Ni

    Abstract: Recently, denoising diffusion models have achieved promising results in 2D image generation and editing. Instruct-NeRF2NeRF (IN2N) introduces the success of diffusion into 3D scene editing through an "Iterative dataset update" (IDU) strategy. Though achieving fascinating results, IN2N suffers from problems of blurry backgrounds and trapping in local optima. The first problem is caused by IN2N's la… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  10. arXiv:2502.10173  [pdf

    q-bio.BM cond-mat.mes-hall cond-mat.mtrl-sci cs.LG

    Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model

    Authors: Bo Ni, Markus J. Buehler

    Abstract: Proteins are dynamic molecular machines whose biological functions, spanning enzymatic catalysis, signal transduction, and structural adaptation, are intrinsically linked to their motions. Designing proteins with targeted dynamic properties, however, remains a challenge due to the complex, degenerate relationships between sequence, structure, and molecular motion. Here, we introduce VibeGen, a gen… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  11. arXiv:2502.06872  [pdf, other

    cs.CL cs.AI

    Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey

    Authors: Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuying Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, Ryan Rossi, Franck Dernoncourt, Md Mehrab Tanjim, Nesreen Ahmed, Xiaorui Liu, Wenqi Fan, Erik Blasch, Yu Wang, Meng Jiang, Tyler Derr

    Abstract: Retrieval-Augmented Generation (RAG) is an advanced technique designed to address the challenges of Artificial Intelligence-Generated Content (AIGC). By integrating context retrieval into content generation, RAG provides reliable and up-to-date external knowledge, reduces hallucinations, and ensures relevant context across a wide range of tasks. However, despite RAG's success and potential, recent… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

  12. arXiv:2501.09705  [pdf, other

    cs.CV cs.AI cs.LG

    Practical Continual Forgetting for Pre-trained Vision Models

    Authors: Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang

    Abstract: For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model whil… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  13. arXiv:2411.19528  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation

    Authors: Xianfeng Tan, Yuhan Li, Wenxiang Shang, Yubo Wu, Jian Wang, Xuanhong Chen, Yi Zhang, Ran Lin, Bingbing Ni

    Abstract: Standard clothing asset generation involves creating forward-facing flat-lay garment images displayed on a clear background by extracting clothing information from diverse real-world contexts, which presents significant challenges due to highly standardized sampling distributions and precise structural requirements in the generated images. Existing models have limited spatial perception and often… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

    Comments: Project website: https://colorful-liyu.github.io/RAGDiffusion-page/

  14. arXiv:2411.11562  [pdf, other

    cs.CV eess.IV

    MSSIDD: A Benchmark for Multi-Sensor Denoising

    Authors: Shibin Mei, Hang Wang, Bingbing Ni

    Abstract: The cameras equipped on mobile terminals employ different sensors in different photograph modes, and the transferability of raw domain denoising models between these sensors is significant but remains sufficient exploration. Industrial solutions either develop distinct training strategies and models for different sensors or ignore the differences between sensors and simply extend existing models t… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

    Comments: 15 pages,7 figures

  15. arXiv:2410.16882  [pdf, other

    cs.AI cs.LG cs.SI

    SaVe-TAG: Semantic-aware Vicinal Risk Minimization for Long-Tailed Text-Attributed Graphs

    Authors: Leyao Wang, Yu Wang, Bo Ni, Yuying Zhao, Hanyu Wang, Yao Ma, Tyler Derr

    Abstract: Real-world graph data often follows long-tailed distributions, making it difficult for Graph Neural Networks (GNNs) to generalize well across both head and tail classes. Recent advances in Vicinal Risk Minimization (VRM) have shown promise in mitigating class imbalance with numeric interpolation; however, existing approaches largely rely on embedding-space arithmetic, which fails to capture the ri… ▽ More

    Submitted 25 May, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: 25 pages

  16. arXiv:2410.08985  [pdf, other

    cs.AI cs.CL

    Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

    Authors: Bo Ni, Yu Wang, Lu Cheng, Erik Blasch, Tyler Derr

    Abstract: Recently, Knowledge Graphs (KGs) have been successfully coupled with Large Language Models (LLMs) to mitigate their hallucinations and enhance their reasoning capability, such as in KG-based retrieval-augmented frameworks. However, current KG-LLM frameworks lack rigorous uncertainty estimation, limiting their reliable deployment in high-stakes applications. Directly incorporating uncertainty quant… ▽ More

    Submitted 20 October, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

  17. arXiv:2410.07690  [pdf, other

    cs.GT

    Stackelberg vs. Nash in the Lottery Colonel Blotto Game

    Authors: Yan Liu, Bonan Ni, Weiran Shen, Zihe Wang, Jie Zhang

    Abstract: Resource competition problems are often modeled using Colonel Blotto games, where players take simultaneous actions. However, many real-world scenarios involve sequential decision-making rather than simultaneous moves. To model these dynamics, we represent the Lottery Colonel Blotto game as a Stackelberg game, in which one player, the leader, commits to a strategy first, and the other player, th… ▽ More

    Submitted 10 May, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  18. arXiv:2408.06286  [pdf, other

    cs.CV

    Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering

    Authors: Jiameng Li, Yue Shi, Jiezhang Cao, Bingbing Ni, Wenjun Zhang, Kai Zhang, Luc Van Gool

    Abstract: 3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 9 pages

  19. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  20. arXiv:2407.11468  [pdf, other

    cs.CV

    AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

    Authors: Qiaoqiao Jin, Rui Shi, Yishun Dou, Bingbing Ni

    Abstract: Current Facial Action Unit (FAU) detection methods generally encounter difficulties due to the scarcity of labeled video training data and the limited number of training face IDs, which renders the trained feature extractor insufficient coverage for modeling the large diversity of inter-person facial structures and movements. To explicitly address the above challenges, we propose a novel video-lev… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  21. arXiv:2407.00315  [pdf, other

    cs.CV

    Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

    Authors: Yangzhou Jiang, Yinxin Lin, Yaoming Wang, Teng Li, Bilian Ke, Bingbing Ni

    Abstract: Appearance-based supervised methods with full-face image input have made tremendous advances in recent gaze estimation tasks. However, intensive human annotation requirement inhibits current methods from achieving industrial level accuracy and robustness. Although current unsupervised pre-training frameworks have achieved success in many image recognition tasks, due to the deep coupling between fa… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures, 7 tables

  22. arXiv:2406.11409  [pdf, other

    cs.CL cs.AI

    CodeGemma: Open Code Models Based on Gemma

    Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, Jingyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

    Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

  23. arXiv:2405.20335  [pdf, other

    cs.CL

    Xwin-LM: Strong and Scalable Alignment Practice for LLMs

    Authors: Bolin Ni, JingCheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu

    Abstract: In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs). This suite encompasses several key techniques, including supervised finetuning (SFT), reward modeling (RM), rejection sampling finetuning (RS), and direct preference optimization (DPO). The key components are as follows: (1) Xwin-LM-SFT, models initially finetuned with high-quality… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  24. arXiv:2405.18172  [pdf, other

    cs.CV cs.AI cs.LG

    AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

    Authors: Yuhan Li, Hao Zhou, Wenxiang Shang, Ran Lin, Xuanhong Chen, Bingbing Ni

    Abstract: While image-based virtual try-on has made significant strides, emerging approaches still fall short of delivering high-fidelity and robust fitting images across various scenarios, as their models suffer from issues of ill-fitted garment styles and quality degrading during the training process, not to mention the lack of support for various combinations of attire. Therefore, we first propose a ligh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project website: https://colorful-liyu.github.io/anyfit-page/

  25. arXiv:2405.17602  [pdf, other

    cs.IR

    Augmenting Textual Generation via Topology Aware Retrieval

    Authors: Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr

    Abstract: Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling addit… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  26. arXiv:2404.02617  [pdf, other

    cs.CV

    Neural Radiance Fields with Torch Units

    Authors: Bingnan Ni, Huanyu Wang, Dongfeng Bai, Minghe Weng, Dexin Qi, Weichao Qiu, Bingbing Liu

    Abstract: Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current i… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  27. arXiv:2403.16124  [pdf, other

    cs.CV

    Enhancing Visual Continual Learning with Language-Guided Supervision

    Authors: Bolin Ni, Hongbo Zhao, Chenghao Zhang, Ke Hu, Gaofeng Meng, Zhaoxiang Zhang, Shiming Xiang

    Abstract: Continual learning (CL) aims to empower models to learn new tasks without forgetting previously acquired knowledge. Most prior works concentrate on the techniques of architectures, replay data, regularization, \etc. However, the category name of each class is largely neglected. Existing methods commonly utilize the one-hot labels and randomly initialize the classifier head. We argue that the scarc… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  28. arXiv:2403.15033  [pdf, other

    cs.CV

    Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

    Authors: Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni

    Abstract: Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new… ▽ More

    Submitted 25 September, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  29. arXiv:2403.14910  [pdf, other

    cs.CV

    Defying Imbalanced Forgetting in Class Incremental Learning

    Authors: Shixiong Xu, Gaofeng Meng, Xing Nie, Bolin Ni, Bin Fan, Shiming Xiang

    Abstract: We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the re… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: AAAI2024

  30. arXiv:2403.11530  [pdf, other

    cs.CV

    Continual Forgetting for Pre-trained Vision Models

    Authors: Hongbo Zhao, Bolin Ni, Haochen Wang, Junsong Fan, Fei Zhu, Yuxi Wang, Yuntao Chen, Gaofeng Meng, Zhaoxiang Zhang

    Abstract: For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while ma… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024, latest version ahead of carema ready version

  31. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  32. arXiv:2401.14613  [pdf, other

    cs.GT

    Multiplayer General Lotto game

    Authors: Yan Liu, Bonan Ni, Weiran Shen, Zihe Wang, Jie Zhang

    Abstract: In this paper, we investigate the multiplayer General Lotto game across multiple battlefields, a significant variant of the Colonel Blotto game. In this version, each player employs a probability distribution for resource allocation, ensuring that their expected expenditure does not exceed their budget. We first establish the existence of the Nash equilibrium in a general setting, where players' b… ▽ More

    Submitted 16 October, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  33. arXiv:2311.08166  [pdf

    cs.AI cond-mat.dis-nn cond-mat.mtrl-sci cs.CL cs.LG

    MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge

    Authors: Bo Ni, Markus J. Buehler

    Abstract: Solving mechanics problems using numerical methods requires comprehensive intelligent capability of retrieving relevant knowledge and theory, constructing and executing codes, analyzing the results, a task that has thus far mainly been reserved for humans. While emerging AI methods can provide effective approaches to solve end-to-end problems, for instance via the use of deep surrogate models or v… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  34. arXiv:2310.18313  [pdf, other

    cs.LG cs.CL

    FP8-LM: Training FP8 Large Language Models

    Authors: Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, Peng Cheng

    Abstract: In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for tr… ▽ More

    Submitted 19 December, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  35. arXiv:2310.10605  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall cs.CL cs.LG q-bio.BM

    ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model

    Authors: Bo Ni, David L. Kaplan, Markus J. Buehler

    Abstract: Through evolution, nature has presented a set of remarkable protein materials, including elastins, silks, keratins and collagens with superior mechanical performances that play crucial roles in mechanobiology. However, going beyond natural designs to discover proteins that meet specified mechanical properties remains challenging. Here we report a generative model that predicts protein designs to m… ▽ More

    Submitted 15 December, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

  36. arXiv:2310.08332  [pdf, other

    cs.CV

    Real-Time Neural BRDF with Spherically Distributed Primitives

    Authors: Yishun Dou, Zhong Zheng, Qiaoqiao Jin, Bingbing Ni, Yugang Chen, Junxiang Ke

    Abstract: We propose a novel compact and efficient neural BRDF offering highly versatile material representation, yet with very-light memory and neural computation consumption towards achieving real-time rendering. The results in Figure 1, rendered at full HD resolution on a current desktop machine, show that our system achieves real-time rendering with a wide variety of appearances, which is approached by… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  37. arXiv:2308.12530  [pdf, other

    cs.CV cs.LG

    SieveNet: Selecting Point-Based Features for Mesh Networks

    Authors: Shengchao Yuan, Yishun Dou, Rui Shi, Bingbing Ni, Zhong Zheng

    Abstract: Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of m… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: The project homepage is https://sievenet.github.io/

  38. arXiv:2308.10608  [pdf, other

    cs.CV cs.GR cs.LG

    FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

    Authors: Yuhan Li, Yishun Dou, Yue Shi, Yu Lei, Xuanhong Chen, Yi Zhang, Peng Zhou, Bingbing Ni

    Abstract: While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specif… ▽ More

    Submitted 21 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Project website: https://focaldreamer.github.io

  39. arXiv:2304.10244  [pdf, other

    cs.CV

    Omni Aggregation Networks for Lightweight Image Super-Resolution

    Authors: Hang Wang, Xuanhong Chen, Bingbing Ni, Yutian Liu, Jinfan Liu

    Abstract: While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks, this work proposes two enhanced components under a new Omni-SR archite… ▽ More

    Submitted 24 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR2023. Code is available at \url{https://github.com/Francis0625/Omni-SR}

  40. arXiv:2304.04446  [pdf, other

    cs.CV cs.GR

    Inferring Fluid Dynamics via Inverse Rendering

    Authors: Jinxian Liu, Ye Chen, Bingbing Ni, Jiyao Mao, Zhenbo Yu

    Abstract: Humans have a strong intuitive understanding of physical processes such as fluid falling by just a glimpse of such a scene picture, i.e., quickly derived from our immersive visual experiences in memory. This work achieves such a photo-to-fluid-dynamics reconstruction functionality learned from unannotated videos, without any supervision of ground-truth fluid dynamics. In a nutshell, a differentiab… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  41. arXiv:2303.10619  [pdf, ps, other

    cs.GT

    Sequential Persuasion Using Limited Experiments

    Authors: Bonan Ni, Weiran Shen, Pingzhong Tang

    Abstract: Bayesian persuasion and its derived information design problem has been one of the main research agendas in the economics and computation literature over the past decade. However, when attempting to apply its model and theory, one is often limited by the fact that the sender can only implement very restricted information structures. Moreover, in this case, the sender can possibly achieve higher ex… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

  42. arXiv:2303.10406  [pdf, other

    cs.CV cs.AI cs.LG

    3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process

    Authors: Yuhan Li, Yishun Dou, Xuanhong Chen, Bingbing Ni, Yilin Sun, Yutian Liu, Fuzhen Wang

    Abstract: We develop a generalized 3D shape generation prior model, tailored for multiple 3D tasks including unconditional shape generation, point cloud completion, and cross-modality shape generation, etc. On one hand, to precisely capture local fine detailed shape information, a vector quantized variational autoencoder (VQ-VAE) is utilized to index local geometry from a compactly learned codebook based on… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  43. arXiv:2303.07596  [pdf, other

    cs.CV

    Frequency-Modulated Point Cloud Rendering with Easy Editing

    Authors: Yi Zhang, Xiaoyang Huang, Bingbing Ni, Teng Li, Wenjun Zhang

    Abstract: We develop an effective point cloud rendering pipeline for novel view synthesis, which enables high fidelity local detail reconstruction, real-time rendering and user-friendly editing. In the heart of our pipeline is an adaptive frequency modulation module called Adaptive Frequency Net (AFNet), which utilizes a hypernetwork to learn the local texture frequency encoding that is consecutively inject… ▽ More

    Submitted 18 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  44. arXiv:2302.10518  [pdf, other

    cs.CV

    USR: Unsupervised Separated 3D Garment and Human Reconstruction via Geometry and Semantic Consistency

    Authors: Yue Shi, Yuxuan Xiong, Jingyi Chai, Bingbing Ni, Wenjun Zhang

    Abstract: Dressed people reconstruction from images is a popular task with promising applications in the creative media and game industry. However, most existing methods reconstruct the human body and garments as a whole with the supervision of 3D models, which hinders the downstream interaction tasks and requires hard-to-obtain data. To address these issues, we propose an unsupervised separated 3D garments… ▽ More

    Submitted 2 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

  45. arXiv:2301.12613  [pdf, other

    cs.CV cs.MM

    AudioEar: Single-View Ear Reconstruction for Personalized Spatial Audio

    Authors: Xiaoyang Huang, Yanjun Wang, Yang Liu, Bingbing Ni, Wenjun Zhang, Jinxian Liu, Teng Li

    Abstract: Spatial audio, which focuses on immersive 3D sound rendering, is widely applied in the acoustic industry. One of the key problems of current spatial audio rendering methods is the lack of personalization based on different anatomies of individuals, which is essential to produce accurate sound source positions. In this work, we address this problem from an interdisciplinary perspective. The renderi… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted by Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

  46. arXiv:2212.04362  [pdf, other

    cs.CV

    CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

    Authors: Jiezhang Cao, Qin Wang, Yongqin Xian, Yawei Li, Bingbing Ni, Zhiming Pi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool

    Abstract: Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no l… ▽ More

    Submitted 13 April, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  47. arXiv:2212.03499  [pdf, other

    cs.CV cs.AI

    Learning Continuous Depth Representation via Geometric Spatial Aggregator

    Authors: Xiaohang Wang, Xuanhong Chen, Bingbing Ni, Zhengyan Tong, Hang Wang

    Abstract: Depth map super-resolution (DSR) has been a fundamental task for 3D computer vision. While arbitrary scale DSR is a more realistic setting in this scenario, previous approaches predominantly suffer from the issue of inefficient real-numbered scale upsampling. To explicitly address this issue, we propose a novel continuous depth representation for DSR. The heart of this representation is our propos… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023. Code is available at https://github.com/nana01219/GeoDSR

    ACM Class: I.4

  48. arXiv:2212.02280  [pdf, other

    cs.CV

    DARF: Depth-Aware Generalizable Neural Radiance Field

    Authors: Yue Shi, Dingyi Rong, Chang Chen, Chaofan Ma, Bingbing Ni, Wenjun Zhang

    Abstract: Neural Radiance Field (NeRF) has revolutionized novel-view rendering tasks and achieved impressive results. However, the inefficient sampling and per-scene optimization hinder its wide applications. Though some generalizable NeRFs have been proposed, the rendering quality is unsatisfactory due to the lack of geometry and scene uniqueness. To address these issues, we propose the Depth-Aware General… ▽ More

    Submitted 15 February, 2025; v1 submitted 5 December, 2022; originally announced December 2022.

  49. arXiv:2210.15107  [pdf, other

    cs.CV

    Boosting Point Clouds Rendering via Radiance Mapping

    Authors: Xiaoyang Huang, Yi Zhang, Bingbing Ni, Teng Li, Kai Chen, Wenjun Zhang

    Abstract: Recent years we have witnessed rapid development in NeRF-based image rendering due to its high quality. However, point clouds rendering is somehow less explored. Compared to NeRF-based rendering which suffers from dense spatial sampling, point clouds rendering is naturally less computation intensive, which enables its deployment in mobile computing device. In this work, we focus on boosting the im… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted by Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

  50. arXiv:2210.09309  [pdf, other

    eess.IV cs.CV cs.LG

    RibSeg v2: A Large-scale Benchmark for Rib Labeling and Anatomical Centerline Extraction

    Authors: Liang Jin, Shixuan Gu, Donglai Wei, Jason Ken Adhinarta, Kaiming Kuang, Yongjie Jessica Zhang, Hanspeter Pfister, Bingbing Ni, Jiancheng Yang, Ming Li

    Abstract: Automatic rib labeling and anatomical centerline extraction are common prerequisites for various clinical applications. Prior studies either use in-house datasets that are inaccessible to communities, or focus on rib segmentation that neglects the clinical significance of rib labeling. To address these issues, we extend our prior dataset (RibSeg) on the binary rib segmentation task to a comprehens… ▽ More

    Submitted 1 August, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 10 pages, 6 figures, journal