Skip to main content

Showing 1–29 of 29 results for author: Dia, M

.
  1. arXiv:2505.20665  [pdf, ps, other

    cs.CV

    DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving

    Authors: Muxi Diao, Lele Yang, Hongbo Yin, Zhexu Wang, Yejie Wang, Daxin Tian, Kongming Liang, Zhanyu Ma

    Abstract: Autonomous driving requires real-time, robust reasoning across perception, prediction, planning, and behavior. However, conventional end-to-end models fail to generalize in complex scenarios due to the lack of structured reasoning. Recent vision-language models (VLMs) have been applied to driving tasks, but they typically rely on isolated modules and static supervision, limiting their ability to s… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2505.15172  [pdf, ps, other

    cs.CV

    Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation

    Authors: Xinran Wang, Muxi Diao, Yuanzhi Liu, Chunyu Wang, Kongming Liang, Zhanyu Ma, Jun Guo

    Abstract: Training text-to-image (T2I) models with detailed captions can significantly improve their generation quality. Existing methods often rely on simplistic metrics like caption length to represent the detailness of the caption in the T2I training set. In this paper, we propose a new metric to estimate caption detailness based on two aspects: image coverage rate (ICR), which evaluates whether the capt… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  3. arXiv:2505.15145  [pdf, ps, other

    cs.CV

    CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation

    Authors: Xinran Wang, Songyu Xu, Xiangxuan Shan, Yuxuan Zhang, Muxi Diao, Xueyan Duan, Yanhua Huang, Kongming Liang, Zhanyu Ma

    Abstract: Cinematography is a cornerstone of film production and appreciation, shaping mood, emotion, and narrative through visual elements such as camera movement, shot composition, and lighting. Despite recent progress in multimodal large language models (MLLMs) and video generation models, the capacity of current models to grasp and reproduce cinematographic techniques remains largely uncharted, hindered… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Under review

  4. arXiv:2505.03923  [pdf, other

    cs.LG

    SAND: One-Shot Feature Selection with Additive Noise Distortion

    Authors: Pedram Pad, Hadi Hammoud, Mohamad Dia, Nadim Maamari, L. Andrea Dunbar

    Abstract: Feature selection is a critical step in data-driven applications, reducing input dimensionality to enhance learning accuracy, computational efficiency, and interpretability. Existing state-of-the-art methods often require post-selection retraining and extensive hyperparameter tuning, complicating their adoption. We introduce a novel, non-intrusive feature selection layer that, given a target featu… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Proceedings of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada. PMLR 267, 2025

  5. arXiv:2412.11025  [pdf, other

    cs.CV cs.AI

    From Simple to Professional: A Combinatorial Controllable Image Captioning Agent

    Authors: Xinran Wang, Muxi Diao, Baoteng Li, Haiwen Zhang, Kongming Liang, Zhanyu Ma

    Abstract: The Controllable Image Captioning Agent (CapAgent) is an innovative system designed to bridge the gap between user simplicity and professional-level outputs in image captioning tasks. CapAgent automatically transforms user-provided simple instructions into detailed, professional instructions, enabling precise and context-aware caption generation. By leveraging multimodal large language models (MLL… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: A technical report. Project: https://github.com/xin-ran-w/CapAgent

  6. arXiv:2409.03810  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

    Authors: Yejie Wang, Keqing He, Dayuan Fu, Zhuoma Gongque, Heyang Xu, Yanxu Chen, Zhexu Wang, Yujia Fu, Guanting Dong, Muxi Diao, Jingang Wang, Mengdi Zhang, Xunliang Cai, Weiran Xu

    Abstract: Recently, there has been a growing interest in studying how to construct better code instruction tuning data. However, we observe Code models trained with these datasets exhibit high performance on HumanEval but perform worse on other benchmarks such as LiveCodeBench. Upon further investigation, we find that many datasets suffer from severe data leakage. After cleaning up most of the leaked data,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Working in progress

  7. arXiv:2408.02632  [pdf, other

    cs.CL cs.AI

    SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models

    Authors: Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu

    Abstract: As large language models (LLMs) continue to advance in capability and influence, ensuring their security and preventing harmful outputs has become crucial. A promising approach to address these concerns involves training models to automatically generate adversarial prompts for red teaming. However, the evolving subtlety of vulnerabilities in LLMs challenges the effectiveness of current adversarial… ▽ More

    Submitted 23 December, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  8. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  9. arXiv:2406.08587  [pdf, other

    cs.CL cs.AI cs.LG

    CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

    Authors: Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu

    Abstract: Large language models (LLMs) have demonstrated significant potential in advancing various fields of research and society. However, the current community of LLMs overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer science field. To bridge this gap, we introduce CS-Bench, the first multilin… ▽ More

    Submitted 28 February, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at ICLR 2025

  10. A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews

    Authors: Mamadou Dia, Ghazaleh Khodabandelou, Alice Othmani

    Abstract: Post-traumatic stress disorder (PTSD) is a mental disorder that can be developed after witnessing or experiencing extremely traumatic events. PTSD can affect anyone, regardless of ethnicity, or culture. An estimated one in every eleven people will experience PTSD during their lifetime. The Clinician-Administered PTSD Scale (CAPS) and the PTSD Check List for Civilians (PCL-C) interviews are gold st… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Journal ref: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (2023) 700-705

  11. arXiv:2403.01403  [pdf, other

    stat.CO stat.ME

    Greedy selection of optimal location of sensors for uncertainty reduction in seismic moment tensor inversion

    Authors: Ben Mansour Dia, Michael Fehler, SanLinn I. Kaka, Andrea Scarinci, Umair bin Waheed, Chen Gu

    Abstract: We address an optimal sensor placement problem through Bayesian experimental design for seismic full waveform inversion for the recovery of the associated moment tensor. The objective is that of optimally choosing the location of the sensors (stations) from which to collect the observed data. The Shannon expected information gain is used as the objective function to search for the optimal network… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  12. arXiv:2402.11279  [pdf, other

    cs.CL cs.AI

    Multi-Perspective Consistency Enhances Confidence Estimation in Large Language Models

    Authors: Pei Wang, Yejie Wang, Muxi Diao, Keqing He, Guanting Dong, Weiran Xu

    Abstract: In the deployment of large language models (LLMs), accurate confidence estimation is critical for assessing the credibility of model predictions. However, existing methods often fail to overcome the issue of overconfidence on incorrect answers. In this work, we focus on improving the confidence estimation of large language models. Considering the fragility of self-awareness in language models, we… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  13. arXiv:2402.09136  [pdf, other

    cs.CL cs.AI

    DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

    Authors: Yejie Wang, Keqing He, Guanting Dong, Pei Wang, Weihao Zeng, Muxi Diao, Yutao Mou, Mengdi Zhang, Jingang Wang, Xunliang Cai, Weiran Xu

    Abstract: Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code eva… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 14 pages, 6 figures

  14. arXiv:2304.05398  [pdf, other

    math.ST cs.LG math.OC

    Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space

    Authors: Michael Diao, Krishnakumar Balasubramanian, Sinho Chewi, Adil Salim

    Abstract: Variational inference (VI) seeks to approximate a target distribution $π$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $π$ by minimizing the Kullback-Leibler (KL) divergence to $π$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-G… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  15. arXiv:1911.11650  [pdf, other

    stat.CO

    A continuation method in Bayesian inference

    Authors: Ben Mansour Dia

    Abstract: We present a continuation method that entails generating a sequence of transition probability density functions from the prior to the posterior in the context of Bayesian inference for parameter estimation problems. The characterization of transition distributions, by tempering the likelihood function, results in a homogeneous nonlinear partial integro-differential equation whose existence and uni… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: 27 pages,9 figures

  16. arXiv:1909.12160  [pdf, other

    cs.LG astro-ph.GA eess.IV stat.ML

    Galaxy Image Simulation Using Progressive GANs

    Authors: Mohamad Dia, Elodie Savary, Martin Melchior, Frederic Courbin

    Abstract: In this work, we provide an efficient and realistic data-driven approach to simulate astronomical images using deep generative models from machine learning. Our solution is based on a variant of the generative adversarial network (GAN) with progressive training methodology and Wasserstein cost function. The proposed solution generates naturalistic images of galaxies that show complex structures an… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: Submitted to the Astronomical Data Analysis Software & Systems Conference (ADASS), 2019

  17. arXiv:1812.02537  [pdf, other

    cs.IT cs.LG

    Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method

    Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

    Abstract: Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and pro… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

    Comments: Submitted to Journal of Machine Learning Research (JMLR)

  18. arXiv:1811.11469  [pdf, other

    math.NA

    Multilevel Double Loop Monte Carlo and Stochastic Collocation Methods with Importance Sampling for Bayesian Optimal Experimental Design

    Authors: Joakim Beck, Ben Mansour Dia, Luis F. R. Espath, Raul Tempone

    Abstract: An optimal experimental set-up maximizes the value of data for statistical inferences and predictions. The efficiency of strategies for finding optimal experimental set-ups is particularly important for experiments that are time-consuming or expensive to perform. For instance, in the situation when the experiments are modeled by Partial Differential Equations (PDEs), multilevel methods have been p… ▽ More

    Submitted 3 February, 2020; v1 submitted 28 November, 2018; originally announced November 2018.

  19. arXiv:1807.00653  [pdf, other

    math.NA

    Nesterov-aided Stochastic Gradient Methods using Laplace Approximation for Bayesian Design Optimization

    Authors: Andre Gustavo Carlon, Ben Mansour Dia, Luis FR Espath, Rafael Holdorf Lopez, Raul Tempone

    Abstract: Finding the best setup for experiments is the primary concern for Optimal Experimental Design (OED). Here, we focus on the Bayesian experimental design problem of finding the setup that maximizes the Shannon expected information gain. We use the stochastic gradient descent and its accelerated counterpart, which employs Nesterov's method, to solve the optimization problem in OED. We adapt a restart… ▽ More

    Submitted 26 February, 2020; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: 36 pages, 14 figures

    MSC Class: 62K05; 65N21; 65C60; 65C05

  20. arXiv:1804.00602  [pdf, other

    cs.IT stat.ML

    A Compressed Sensing Approach for Distribution Matching

    Authors: Mohamad Dia, Vahid Aref, Laurent Schmalen

    Abstract: In this work, we formulate the fixed-length distribution matching as a Bayesian inference problem. Our proposed solution is inspired from the compressed sensing paradigm and the sparse superposition (SS) codes. First, we introduce sparsity in the binary source via position modulation (PM). We then present a simple and exact matcher based on Gaussian signal quantization. At the receiver, the dematc… ▽ More

    Submitted 25 November, 2018; v1 submitted 2 April, 2018; originally announced April 2018.

    Comments: in the 2018 IEEE International Symposium on Information Theory (ISIT)

  21. Fast Bayesian experimental design: Laplace-based importance sampling for the expected information gain

    Authors: Joakim Beck, Ben Mansour Dia, Luis FR Espath, Quan Long, Raul Tempone

    Abstract: In calculating expected information gain in optimal Bayesian experimental design, the computation of the inner loop in the classical double-loop Monte Carlo requires a large number of samples and suffers from underflow if the number of samples is small. These drawbacks can be avoided by using an importance sampling approach. We present a computationally efficient method for optimal Bayesian experi… ▽ More

    Submitted 10 October, 2017; originally announced October 2017.

    Comments: 42 pages, 35 figures

    MSC Class: 62K05; 65N21; 65C60; 65C05

  22. arXiv:1707.04203  [pdf, other

    cs.IT

    Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding

    Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

    Abstract: Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocatio… ▽ More

    Submitted 8 November, 2018; v1 submitted 13 July, 2017; originally announced July 2017.

    Comments: Submitted to the IEEE transactions on information theory

  23. arXiv:1701.05823  [pdf, other

    cs.IT cond-mat.dis-nn math-ph

    Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

    Authors: Jean Barbier, Nicolas Macris, Mohamad Dia, Florent Krzakala

    Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these consid… ▽ More

    Submitted 28 August, 2020; v1 submitted 20 January, 2017; originally announced January 2017.

    Journal ref: IEEE Transactions on Information Theory, vol. 66, no. 7, pp. 4270-4303, July 2020

  24. Generalized Approximate Message-Passing Decoder for Universal Sparse Superposition Codes

    Authors: Erdem Biyik, Jean Barbier, Mohamad Dia

    Abstract: Sparse superposition (SS) codes were originally proposed as a capacity-achieving communication scheme over the additive white Gaussian noise channel (AWGNC) [1]. Very recently, it was discovered that these codes are universal, in the sense that they achieve capacity over any memoryless channel under generalized approximate message-passing (GAMP) decoding [2], although this decoder has never been s… ▽ More

    Submitted 13 January, 2017; originally announced January 2017.

  25. The Mutual Information in Random Linear Estimation

    Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala

    Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considera… ▽ More

    Submitted 6 September, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

    Comments: Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016

    Journal ref: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Pages: 625 - 632

  26. arXiv:1606.04142  [pdf, other

    cs.IT cond-mat.dis-nn cs.LG math-ph

    Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

    Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, Lenka Zdeborova

    Abstract: Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows… ▽ More

    Submitted 13 June, 2016; originally announced June 2016.

    Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016) pp 424-432

  27. arXiv:1603.04591  [pdf, other

    cs.IT cond-mat.dis-nn

    Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels

    Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

    Abstract: We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in… ▽ More

    Submitted 15 March, 2016; originally announced March 2016.

    Comments: Submitted to the Information Theory Workshop (ITW) 2016, Cambridge, United Kingdom

  28. arXiv:1603.01817  [pdf, other

    cs.IT cond-mat.dis-nn

    Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes

    Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

    Abstract: Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding a… ▽ More

    Submitted 6 March, 2016; originally announced March 2016.

    Comments: Submitted to the International Symposium on Information Theory (ISIT) 2016, Barcelona, Spain

  29. arXiv:1404.3389  [pdf, other

    math.OC cs.GT eess.SY math.DS math.PR

    Mean-Field Games for Marriage

    Authors: Dario Bauso, Ben Mansour Dia, Boualem Djehiche, Hamidou Tembine, Raul Tempone

    Abstract: This article examines mean-field games for marriage. The results support the argument that optimizing the long-term well-being through effort and social feeling state distribution (mean-field) will help to stabilize marriage. However, if the cost of effort is very high, the couple fluctuates in a bad feeling state or the marriage breaks down. We then examine the influence of society on a couple us… ▽ More

    Submitted 13 April, 2014; originally announced April 2014.

    Comments: 22 figures. Accepted and to appear in PLoS One