Search | arXiv e-print repository

High Dynamic Range Novel View Synthesis with Single Exposure

Authors: Kaixuan Zhang, Hu Wang, Minxian Li, Mingwu Ren, Mao Ye, Xiatian Zhu

Abstract: High Dynamic Range Novel View Synthesis (HDR-NVS) aims to establish a 3D scene HDR model from Low Dynamic Range (LDR) imagery. Typically, multiple-exposure LDR images are employed to capture a wider range of brightness levels in a scene, as a single LDR image cannot represent both the brightest and darkest regions simultaneously. While effective, this multiple-exposure HDR-NVS approach has signifi… ▽ More High Dynamic Range Novel View Synthesis (HDR-NVS) aims to establish a 3D scene HDR model from Low Dynamic Range (LDR) imagery. Typically, multiple-exposure LDR images are employed to capture a wider range of brightness levels in a scene, as a single LDR image cannot represent both the brightest and darkest regions simultaneously. While effective, this multiple-exposure HDR-NVS approach has significant limitations, including susceptibility to motion artifacts (e.g., ghosting and blurring), high capture and storage costs. To overcome these challenges, we introduce, for the first time, the single-exposure HDR-NVS problem, where only single exposure LDR images are available during training. We further introduce a novel approach, Mono-HDR-3D, featuring two dedicated modules formulated by the LDR image formation principles, one for converting LDR colors to HDR counterparts, and the other for transforming HDR images to LDR format so that unsupervised learning is enabled in a closed loop. Designed as a meta-algorithm, our approach can be seamlessly integrated with existing NVS models. Extensive experiments show that Mono-HDR-3D significantly outperforms previous methods. Source code will be released. △ Less

Submitted 19 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

Comments: It has been accepted by ICML 2025

arXiv:2409.12311 [pdf, other]

Towards Closing the Loop in Robotic Pollination for Indoor Farming via Autonomous Microscopic Inspection

Authors: Chuizheng Kong, Alex Qiu, Idris Wibowo, Marvin Ren, Aishik Dhori, Kai-Shu Ling, Ai-Ping Hu, Shreyas Kousik

Abstract: Effective pollination is a key challenge for indoor farming, since bees struggle to navigate without the sun. While a variety of robotic system solutions have been proposed, it remains difficult to autonomously check that a flower has been sufficiently pollinated to produce high-quality fruit, which is especially critical for self-pollinating crops such as strawberries. To this end, this work prop… ▽ More Effective pollination is a key challenge for indoor farming, since bees struggle to navigate without the sun. While a variety of robotic system solutions have been proposed, it remains difficult to autonomously check that a flower has been sufficiently pollinated to produce high-quality fruit, which is especially critical for self-pollinating crops such as strawberries. To this end, this work proposes a novel robotic system for indoor farming. The proposed hardware combines a 7-degree-of-freedom (DOF) manipulator arm with a custom end-effector, comprised of an endoscope camera, a 2-DOF microscope subsystem, and a custom vibrating pollination tool; this is paired with algorithms to detect and estimate the pose of strawberry flowers, navigate to each flower, pollinate using the tool, and inspect with the microscope. The key novelty is vibrating the flower from below while simultaneously inspecting with a microscope from above. Each subsystem is validated via extensive experiments. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2405.03178 [pdf, other]

POPDG: Popular 3D Dance Generation with PopDanceSet

Authors: Zhenye Luo, Min Ren, Xuecai Hu, Yongzhen Huang, Li Yao

Abstract: Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M… ▽ More Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG. △ Less

Submitted 27 December, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2404.04904 [pdf, other]

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Authors: Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

Abstract: Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-… ▽ More Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-domain ADD dataset comprising over 300 hours of speech data that is generated by five advanced zero-shot TTS models. To simulate real-world scenarios, we employ diverse attack methods and audio prompts from different datasets. Experiments show that, through novel attack-augmented training, the Wav2Vec2-large and Whisper-medium models achieve equal error rates of 4.1\% and 6.5\% respectively. Additionally, we demonstrate our models' outstanding few-shot ADD ability by fine-tuning with just one minute of target-domain data. Nonetheless, neural codec compressors greatly affect the detection accuracy, necessitating further research. △ Less

Submitted 20 September, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

arXiv:2301.10624 [pdf, ps, other]

Energy-Delay Tradeoff in Helper-Assisted NOMA-MEC Systems: A Four-Sided Matching Algorithm

Authors: Mengmeng Ren, Long Yang, Hai Jiang, Jian Chen, Yuchen Zhou

Abstract: This paper designs a helper-assisted resource allocation strategy in non-orthogonal multiple access (NOMA)-enabled mobile edge computing (MEC) systems, in order to guarantee the quality of service (QoS) of the energy/delay-sensitive user equipments (UEs). To achieve a tradeoff between the energy consumption and the delay, we introduce a novel performance metric, called \emph{energy-delay tradeoff}… ▽ More This paper designs a helper-assisted resource allocation strategy in non-orthogonal multiple access (NOMA)-enabled mobile edge computing (MEC) systems, in order to guarantee the quality of service (QoS) of the energy/delay-sensitive user equipments (UEs). To achieve a tradeoff between the energy consumption and the delay, we introduce a novel performance metric, called \emph{energy-delay tradeoff}, which is defined as the weighted sum of energy consumption and delay. The joint optimization of user association, resource block (RB) assignment, power allocation, task assignment, and computation resource allocation is formulated as a mixed-integer nonlinear programming problem with the aim of minimizing the maximal energy-delay tradeoff. Due to the non-convexity of the formulated problem with coupled and 0-1 variables, this problem cannot be directly solved with polynomial complexity. To tackle this challenge, we first decouple the formulated problem into a power allocation, task assignment and computation resource allocation (PATACRA) subproblem. Then, with the solution obtained from the PATACRA subproblem, we equivalently reformulate the original problem as a discrete user association and RB assignment (DUARA) problem. For the PATACRA subproblem, an iterative parametric convex approximation (IPCA) algorithm is proposed. Then, based on the solution obtained from the PATACRA subproblem, we first model the DUARA problem as a four-sided matching problem, and then propose a low-complexity four-sided UE-RB-helper-server matching (FS-URHSM) algorithm. Theoretical analysis demonstrates that the proposed algorithms are guaranteed to converge to stable solutions with polynomial complexity. Finally, simulation results are provided to show the superior performance of our proposed algorithm in terms of the energy consumption and the delay. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2106.13188 [pdf, other]

Q-space Conditioned Translation Networks for Directional Synthesis of Diffusion Weighted Images from Multi-modal Structural MRI

Authors: Mengwei Ren, Heejong Kim, Neel Dey, Guido Gerig

Abstract: Current deep learning approaches for diffusion MRI modeling circumvent the need for densely-sampled diffusion-weighted images (DWIs) by directly predicting microstructural indices from sparsely-sampled DWIs. However, they implicitly make unrealistic assumptions of static $q$-space sampling during training and reconstruction. Further, such approaches can restrict downstream usage of variably sample… ▽ More Current deep learning approaches for diffusion MRI modeling circumvent the need for densely-sampled diffusion-weighted images (DWIs) by directly predicting microstructural indices from sparsely-sampled DWIs. However, they implicitly make unrealistic assumptions of static $q$-space sampling during training and reconstruction. Further, such approaches can restrict downstream usage of variably sampled DWIs for usages including the estimation of microstructural indices or tractography. We propose a generative adversarial translation framework for high-quality DWI synthesis with arbitrary $q$-space sampling given commonly acquired structural images (e.g., B0, T1, T2). Our translation network linearly modulates its internal representations conditioned on continuous $q$-space information, thus removing the need for fixed sampling schemes. Moreover, this approach enables downstream estimation of high-quality microstructural maps from arbitrarily subsampled DWIs, which may be particularly important in cases with sparsely sampled DWIs. Across several recent methodologies, the proposed approach yields improved DWI synthesis accuracy and fidelity with enhanced downstream utility as quantified by the accuracy of scalar microstructure indices estimated from the synthesized images. Code is available at https://github.com/mengweiren/q-space-conditioned-dwi-synthesis. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: Accepted by MICCAI 2021. Project page: https://heejongkim.com/dwi-synthesis; Code: https://github.com/mengweiren/q-space-conditioned-dwi-synthesis

arXiv:2105.04349 [pdf, other]

Generative Adversarial Registration for Improved Conditional Deformable Templates

Authors: Neel Dey, Mengwei Ren, Adrian V. Dalca, Guido Gerig

Abstract: Deformable templates are essential to large-scale medical image registration, segmentation, and population analysis. Current conventional and deep network-based methods for template construction use only regularized registration objectives and often yield templates with blurry and/or anatomically implausible appearance, confounding downstream biomedical interpretation. We reformulate deformable re… ▽ More Deformable templates are essential to large-scale medical image registration, segmentation, and population analysis. Current conventional and deep network-based methods for template construction use only regularized registration objectives and often yield templates with blurry and/or anatomically implausible appearance, confounding downstream biomedical interpretation. We reformulate deformable registration and conditional template estimation as an adversarial game wherein we encourage realism in the moved templates with a generative adversarial registration framework conditioned on flexible image covariates. The resulting templates exhibit significant gain in specificity to attributes such as age and disease, better fit underlying group-wise spatiotemporal trends, and achieve improved sharpness and centrality. These improvements enable more accurate population modeling with diverse covariates for standardized downstream analyses and easier anatomical delineation for structures of interest. △ Less

Submitted 17 March, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

Comments: ICCV 2021 camera-ready. 24 pages, 15 figures. Project page: https://www.neeldey.com/deformable-templates/ Code: https://github.com/neel-dey/Atlas-GAN

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision 2021

arXiv:2102.06315 [pdf, other]

doi 10.1109/TMI.2021.3059726

Segmentation-Renormalized Deep Feature Modulation for Unpaired Image Harmonization

Authors: Mengwei Ren, Neel Dey, James Fishbaugh, Guido Gerig

Abstract: Deep networks are now ubiquitous in large-scale multi-center imaging studies. However, the direct aggregation of images across sites is contraindicated for downstream statistical and deep learning-based image analysis due to inconsistent contrast, resolution, and noise. To this end, in the absence of paired data, variations of Cycle-consistent Generative Adversarial Networks have been used to harm… ▽ More Deep networks are now ubiquitous in large-scale multi-center imaging studies. However, the direct aggregation of images across sites is contraindicated for downstream statistical and deep learning-based image analysis due to inconsistent contrast, resolution, and noise. To this end, in the absence of paired data, variations of Cycle-consistent Generative Adversarial Networks have been used to harmonize image sets between a source and target domain. Importantly, these methods are prone to instability, contrast inversion, intractable manipulation of pathology, and steganographic mappings which limit their reliable adoption in real-world medical imaging. In this work, based on an underlying assumption that morphological shape is consistent across imaging sites, we propose a segmentation-renormalized image translation framework to reduce inter-scanner heterogeneity while preserving anatomical layout. We replace the affine transformations used in the normalization layers within generative networks with trainable scale and shift parameters conditioned on jointly learned anatomical segmentation embeddings to modulate features at every level of translation. We evaluate our methodologies against recent baselines across several imaging modalities (T1w MRI, FLAIR MRI, and OCT) on datasets with and without lesions. Segmentation-renormalization for translation GANs yields superior image harmonization as quantified by Inception distances, demonstrates improved downstream utility via post-hoc segmentation accuracy, and improved robustness to translation perturbation and self-adversarial attacks. △ Less

Submitted 15 February, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

Comments: Accepted by IEEE Transactions on Medical Imaging. Code available at https://github.com/mengweiren/segmentation-renormalized-harmonization

arXiv:2009.00294 [pdf, other]

Recognition Oriented Iris Image Quality Assessment in the Feature Space

Authors: Leyuan Wang, Kunbo Zhang, Min Ren, Yunlong Wang, Zhenan Sun

Abstract: A large portion of iris images captured in real world scenarios are poor quality due to the uncontrolled environment and the non-cooperative subject. To ensure that the recognition algorithm is not affected by low-quality images, traditional hand-crafted factors based methods discard most images, which will cause system timeout and disrupt user experience. In this paper, we propose a recognition-o… ▽ More A large portion of iris images captured in real world scenarios are poor quality due to the uncontrolled environment and the non-cooperative subject. To ensure that the recognition algorithm is not affected by low-quality images, traditional hand-crafted factors based methods discard most images, which will cause system timeout and disrupt user experience. In this paper, we propose a recognition-oriented quality metric and assessment method for iris image to deal with the problem. The method regards the iris image embeddings Distance in Feature Space (DFS) as the quality metric and the prediction is based on deep neural networks with the attention mechanism. The quality metric proposed in this paper can significantly improve the performance of the recognition algorithm while reducing the number of images discarded for recognition, which is advantageous over hand-crafted factors based iris quality assessment methods. The relationship between Image Rejection Rate (IRR) and Equal Error Rate (EER) is proposed to evaluate the performance of the quality assessment algorithm under the same image quality distribution and the same recognition algorithm. Compared with hand-crafted factors based methods, the proposed method is a trial to bridge the gap between the image quality assessment and biometric recognition. The code is available at https://github.com/Debatrix/DFSNet. △ Less

Submitted 27 September, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

arXiv:1512.00399 [pdf, ps, other]

Joint Group Testing of Time-varying Faulty Sensors and System State Estimation in Large Sensor Networks

Authors: Mengqi Ren, Ruixin Niu

Abstract: The problem of faulty sensor detection is investigated in large sensor networks where the sensor faults are sparse and time-varying, such as those caused by attacks launched by an adversary. Group testing and the Kalman filter are designed jointly to perform real time system state estimation and time-varying faulty sensor detection with a small number of tests. Numerical results show that the faul… ▽ More The problem of faulty sensor detection is investigated in large sensor networks where the sensor faults are sparse and time-varying, such as those caused by attacks launched by an adversary. Group testing and the Kalman filter are designed jointly to perform real time system state estimation and time-varying faulty sensor detection with a small number of tests. Numerical results show that the faulty sensors are efficiently detected and removed, and the system state estimation performance is significantly improved via the proposed method. Compared with an approach that tests sensors one by one, the proposed approach reduces the number of tests significantly while maintaining a similar fault detection performance. △ Less

Submitted 1 December, 2015; originally announced December 2015.

Comments: 5 pages, 3 figures, and 2 tables

Showing 1–10 of 10 results for author: Ren, M