Search | arXiv e-print repository

Large-scale compressive microscopy via diffractive multiplexing across a sensor array

Authors: Kevin C. Zhou, Chaoying Gu, Muneki Ikeda, Tina M. Hayward, Nicholas Antipa, Rajesh Menon, Roarke Horstmeyer, Saul Kato, Laura Waller

Abstract: Microscopes face a trade-off between spatial resolution, field-of-view, and frame rate -- improving one of these properties typically requires sacrificing the others, due to the limited spatiotemporal throughput of the sensor. To overcome this, we propose a new microscope that achieves snapshot gigapixel-scale imaging with a sensor array and a diffractive optical element (DOE). We improve the spat… ▽ More Microscopes face a trade-off between spatial resolution, field-of-view, and frame rate -- improving one of these properties typically requires sacrificing the others, due to the limited spatiotemporal throughput of the sensor. To overcome this, we propose a new microscope that achieves snapshot gigapixel-scale imaging with a sensor array and a diffractive optical element (DOE). We improve the spatiotemporal throughput in two ways. First, we capture data with an array of 48 sensors resulting in 48x more pixels than a single sensor. Second, we use point spread function (PSF) engineering and compressive sensing algorithms to fill in the missing information from the gaps surrounding the individual sensors in the array, further increasing the spatiotemporal throughput of the system by an additional >5.4x. The array of sensors is modeled as a single large-format "super-sensor," with erasures corresponding to the gaps between the individual sensors. The array is placed at the output of a (nearly) 4f imaging system, and we design a DOE for the Fourier plane that generates a distributed PSF that encodes information from the entire super-sensor area, including the gaps. We then computationally recover the large-scale image, assuming the object is sparse in some domain. Our calibration-free microscope can achieve ~3 μm resolution over >5.2 cm^2 FOVs at up to 120 fps, culminating in a total spatiotemporal throughput of 25.2 billion pixels per second. We demonstrate the versatility of our microscope in two different modes: structural imaging via darkfield contrast and functional fluorescence imaging of calcium dynamics across dozens of freely moving C. elegans simultaneously. △ Less

Submitted 18 July, 2025; originally announced July 2025.

arXiv:2406.17302 [pdf]

HD snapshot diffractive spectral imaging and inferencing

Authors: Apratim Majumder, Monjurul Meem, Fernando Gonzalez del Cueto, Fernando Guevara-Vasquez, Syed N. Qadri, Freddie Santiago, Rajesh Menon

Abstract: We present a novel high-definition (HD) snapshot diffractive spectral imaging system utilizing a diffractive filter array (DFA) to capture a single image that encodes both spatial and spectral information. This single diffractogram can be computationally reconstructed into a spectral image cube, providing a high-resolution representation of the scene across 25 spectral channels in the 440-800 nm r… ▽ More We present a novel high-definition (HD) snapshot diffractive spectral imaging system utilizing a diffractive filter array (DFA) to capture a single image that encodes both spatial and spectral information. This single diffractogram can be computationally reconstructed into a spectral image cube, providing a high-resolution representation of the scene across 25 spectral channels in the 440-800 nm range at 1304x744 spatial pixels (~1 MP). This unique approach offers numerous advantages including snapshot capture, a form of optical compression, flexible offline reconstruction, the ability to select the spectral basis after capture, and high light throughput due to the absence of lossy filters. We demonstrate a 30-50 nm spectral resolution and compared our reconstructed spectra against ground truth obtained by conventional spectrometers. Proof-of-concept experiments in diverse applications including biological tissue classification, food quality assessment, and simulated stellar photometry validate our system's capability to perform robust and accurate inference. These results establish the DFA-based imaging system as a versatile and powerful tool for advancing scientific and industrial imaging applications. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 33 pages, 16 figures

arXiv:2307.12750 [pdf, other]

DawnIK: Decentralized Collision-Aware Inverse Kinematics Solver for Heterogeneous Multi-Arm Systems

Authors: Salih Marangoz, Rohit Menon, Nils Dengler, Maren Bennewitz

Abstract: Although inverse kinematics of serial manipulators is a well studied problem, challenges still exist in finding smooth feasible solutions that are also collision aware. Furthermore, with collaborative service robots gaining traction, different robotic systems have to work in close proximity. This means that the current inverse kinematics approaches do not have only to avoid collisions with themsel… ▽ More Although inverse kinematics of serial manipulators is a well studied problem, challenges still exist in finding smooth feasible solutions that are also collision aware. Furthermore, with collaborative service robots gaining traction, different robotic systems have to work in close proximity. This means that the current inverse kinematics approaches do not have only to avoid collisions with themselves but also collisions with other robot arms. Therefore, we present a novel approach to compute inverse kinematics for serial manipulators that take into account different constraints while trying to reach a desired end-effector pose that avoids collisions with themselves and other arms. Unlike other constraint based approaches, we neither perform expensive inverse Jacobian computations nor do we require arms with redundant degrees of freedom. Instead, we formulate different constraints as weighted cost functions to be optimized by a non-linear optimization solver. Our approach is superior to the state-of-the-art CollisionIK in terms of collision avoidance in the presence of multiple arms in confined spaces with no collisions occurring in all the experimental scenarios. When the probability of collision is low, our approach shows better performance at trajectory tracking as well. Additionally, our approach is capable of simultaneous yet decentralized control of multiple arms for trajectory tracking in intersecting workspace without any collisions. △ Less

Submitted 31 October, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: Salih Marangoz and Rohit Menon have equal authorship. Publication to appear in IEEE RAS Intl Conference on Humanoid Robotics (Humanoids), 2023

arXiv:2108.06174 [pdf, ps, other]

doi 10.1016/j.csl.2021.101275

Feature learning for efficient ASR-free keyword spotting in low-resource languages

Authors: Ewald van der Westhuizen, Herman Kamper, Raghav Menon, John Quinn, Thomas Niesler

Abstract: We consider feature learning for efficient keyword spotting that can be applied in severely under-resourced settings. The objective is to support humanitarian relief programmes by the United Nations in parts of Africa in which almost no language resources are available. For rapid development in such languages, we rely on a small, easily-compiled set of isolated keywords. These keyword templates ar… ▽ More We consider feature learning for efficient keyword spotting that can be applied in severely under-resourced settings. The objective is to support humanitarian relief programmes by the United Nations in parts of Africa in which almost no language resources are available. For rapid development in such languages, we rely on a small, easily-compiled set of isolated keywords. These keyword templates are applied to a large corpus of in-domain but untranscribed speech using dynamic time warping (DTW). The resulting DTW alignment scores are used to train a convolutional neural network (CNN) which is orders of magnitude more computationally efficient and suitable for real-time application. We optimise this neural network keyword spotter by identifying robust acoustic features in this almost zero-resource setting. First, we incorporate information from well-resourced but unrelated languages using a multilingual bottleneck feature (BNF) extractor. Next, we consider features extracted from an autoencoder (AE) trained on in-domain but untranscribed data. Finally, we consider correspondence autoencoder (CAE) features which are fine-tuned on the small set of in-domain labelled data. Experiments in South African English and Luganda, a low-resource language, show that BNF and CAE features achieve a 5% relative performance improvement over baseline MFCCs. However, using BNFs as input to the CAE results in a more than 27% relative improvement over MFCCs in ROC area-under-the-curve (AUC) and more than twice as many top-10 retrievals. We show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while outperforming a baseline CNN trained only on the keyword templates. The CNN-DTW keyword spotter using BNF-derived CAE features represents an efficient approach with competitive performance suited to rapid deployment in a severely under-resourced scenario. △ Less

Submitted 13 August, 2021; originally announced August 2021.

Comments: 37 pages, 14 figures, Preprint accepted for publication in Computer Speech and Language

arXiv:2011.07184 [pdf]

doi 10.1364/AO.415059

A needle-based deep-neural-network camera

Authors: Ruipeng Guo, Soren Nelson, Rajesh Menon

Abstract: We experimentally demonstrate a camera whose primary optic is a cannula (diameter=0.22mm and length=12.5mm) that acts a lightpipe transporting light intensity from an object plane (35cm away) to its opposite end. Deep neural networks (DNNs) are used to reconstruct color and grayscale images with field of view of 180 and angular resolution of ~0.40. When trained on images with depth information, th… ▽ More We experimentally demonstrate a camera whose primary optic is a cannula (diameter=0.22mm and length=12.5mm) that acts a lightpipe transporting light intensity from an object plane (35cm away) to its opposite end. Deep neural networks (DNNs) are used to reconstruct color and grayscale images with field of view of 180 and angular resolution of ~0.40. When trained on images with depth information, the DNN can create depth maps. Finally, we show DNN-based classification of the EMNIST dataset without and with image reconstructions. The former could be useful for imaging with enhanced privacy. △ Less

Submitted 13 November, 2020; originally announced November 2020.

arXiv:2011.05132 [pdf]

Classification of optics-free images with deep neural networks

Authors: Soren Nelson, Rajesh Menon

Abstract: The thinnest possible camera is achieved by removing all optics, leaving only the image sensor. We train deep neural networks to perform multi-class detection and binary classification (with accuracy of 92%) on optics-free images without the need for anthropocentric image reconstructions. Inferencing from optics-free images has the potential for enhanced privacy and power efficiency. The thinnest possible camera is achieved by removing all optics, leaving only the image sensor. We train deep neural networks to perform multi-class detection and binary classification (with accuracy of 92%) on optics-free images without the need for anthropocentric image reconstructions. Inferencing from optics-free images has the potential for enhanced privacy and power efficiency. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2007.09430 [pdf]

doi 10.1364/OE.403238

3D Computational Cannula Fluorescence Microscopy enabled by Artificial Neural Networks

Authors: Ruipeng Guo, Zhimeng Pan, Andrew Taibi, Jason Shepherd, Rajesh Menon

Abstract: Computational Cannula Microscopy (CCM) is a high-resolution widefield fluorescence imaging approach deep inside tissue, which is minimally invasive. Rather than using conventional lenses, a surgical cannula acts as a lightpipe for both excitation and fluorescence emission, where computational methods are used for image visualization. Here, we enhance CCM with artificial neural networks to enable 3… ▽ More Computational Cannula Microscopy (CCM) is a high-resolution widefield fluorescence imaging approach deep inside tissue, which is minimally invasive. Rather than using conventional lenses, a surgical cannula acts as a lightpipe for both excitation and fluorescence emission, where computational methods are used for image visualization. Here, we enhance CCM with artificial neural networks to enable 3D imaging of cultured neurons and fluorescent beads, the latter inside a volumetric phantom. We experimentally demonstrate transverse resolution of ~6um, field of view ~200um and axial sectioning of ~50um for depths down to ~700um, all achieved with computation time of ~3ms/frame on a laptop computer. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2002.11141 [pdf]

Optics-free imaging of complex, non-sparse QR-codes with Deep Neural Networks

Authors: Evan Scullion, Soren Nelson, Rajesh Menon

Abstract: We demonstrate optics-free imaging of complex QR-codes using a bare image sensor and a trained artificial neural network (ANN). The ANN is trained to interpret the raw sensor data for human visualization. The image sensor is placed at a specified gap from the QR code. We studied the robustness of our approach by experimentally testing the output of the ANNs with system perturbations of this gap, a… ▽ More We demonstrate optics-free imaging of complex QR-codes using a bare image sensor and a trained artificial neural network (ANN). The ANN is trained to interpret the raw sensor data for human visualization. The image sensor is placed at a specified gap from the QR code. We studied the robustness of our approach by experimentally testing the output of the ANNs with system perturbations of this gap, and the translational and rotational alignments of the QR code to the image sensor. Our demonstration opens us the possibility of using completely optics-free cameras for application-specific imaging of complex, non-sparse objects. △ Less

Submitted 25 February, 2020; originally announced February 2020.

arXiv:2001.06523 [pdf]

Large-area, high-NA Multi-level Diffractive Lens via inverse design

Authors: Monjurul Meem, Sourangsu Banerji, Christian Pies, Timo Oberbiermann, Beradi Sensale-Rodriguez, Rajesh Menon

Abstract: Flat lenses enable thinner, lighter, and simpler imaging systems. However, large-area and high-NA flat lenses have been elusive due to computational and fabrication challenges. Here, we applied inverse design to create a multi-level diffractive lens (MDL) with thickness <1.35μm, diameter of 4.13mm, NA=0.9 at wavelength of 850nm. Since the MDL is created in polymer, it can be cost-effectively repli… ▽ More Flat lenses enable thinner, lighter, and simpler imaging systems. However, large-area and high-NA flat lenses have been elusive due to computational and fabrication challenges. Here, we applied inverse design to create a multi-level diffractive lens (MDL) with thickness <1.35μm, diameter of 4.13mm, NA=0.9 at wavelength of 850nm. Since the MDL is created in polymer, it can be cost-effectively replicated via imprint lithography. △ Less

Submitted 17 January, 2020; originally announced January 2020.

arXiv:2001.01097 [pdf]

doi 10.1364/OL.387496

Computational Cannula Microscopy of neurons using Neural Networks

Authors: Ruipeng Guo, Zhimeng Pan, Andrew Taibi, Jason Shepherd, Rajesh Menon

Abstract: Computational Cannula Microscopy is a minimally invasive imaging technique that can enable high-resolution imaging deep inside tissue. Here, we apply artificial neural networks to enable fast, power-efficient image reconstructions that are more efficiently scalable to larger fields of view. Specifically, we demonstrate widefield fluorescence microscopy of cultured neurons and fluorescent beads wit… ▽ More Computational Cannula Microscopy is a minimally invasive imaging technique that can enable high-resolution imaging deep inside tissue. Here, we apply artificial neural networks to enable fast, power-efficient image reconstructions that are more efficiently scalable to larger fields of view. Specifically, we demonstrate widefield fluorescence microscopy of cultured neurons and fluorescent beads with field of view of 200$μ$m (diameter) and resolution of less than 10$μ$m using a cannula of diameter of only 220$μ$m. In addition, we show that this approach can also be extended to macro-photography. △ Less

Submitted 4 January, 2020; originally announced January 2020.

arXiv:1912.13423 [pdf, other]

Learning Wavefront Coding for Extended Depth of Field Imaging

Authors: Ugur Akpinar, Erdem Sahin, Monjurul Meem, Rajesh Menon, Atanas Gotchev

Abstract: Depth of field is an important factor of imaging systems that highly affects the quality of the acquired spatial information. Extended depth of field (EDoF) imaging is a challenging ill-posed problem and has been extensively addressed in the literature. We propose a computational imaging approach for EDoF, where we employ wavefront coding via a diffractive optical element (DOE) and we achieve debl… ▽ More Depth of field is an important factor of imaging systems that highly affects the quality of the acquired spatial information. Extended depth of field (EDoF) imaging is a challenging ill-posed problem and has been extensively addressed in the literature. We propose a computational imaging approach for EDoF, where we employ wavefront coding via a diffractive optical element (DOE) and we achieve deblurring through a convolutional neural network. Thanks to the end-to-end differentiable modeling of optical image formation and computational post-processing, we jointly optimize the optical design, i.e., DOE, and the deblurring through standard gradient descent methods. Based on the properties of the underlying refractive lens and the desired EDoF range, we provide an analytical expression for the search space of the DOE, which is instrumental in the convergence of the end-to-end network. We achieve superior EDoF imaging performance compared to the state of the art, where we demonstrate results with minimal artifacts in various scenarios, including deep 3D scenes and broadband imaging. △ Less

Submitted 25 May, 2020; v1 submitted 31 December, 2019; originally announced December 2019.

arXiv:1908.09401 [pdf]

Machine-learning enables Image Reconstruction and Classification in a "see-through" camera

Authors: Zhimeng Pan, Brian Rodriguez, Rajesh Menon

Abstract: We demonstrate that image reconstruction can be achieved via a convolutional neural network for a "see-through" computational camera comprised of a transparent window and a CMOS image sensor. Furthermore, we compared classification results using a classifier network for the raw sensor data vs the reconstructed images. The results suggest that similar classification accuracy is likely possible in b… ▽ More We demonstrate that image reconstruction can be achieved via a convolutional neural network for a "see-through" computational camera comprised of a transparent window and a CMOS image sensor. Furthermore, we compared classification results using a classifier network for the raw sensor data vs the reconstructed images. The results suggest that similar classification accuracy is likely possible in both cases with appropriate network optimizations. All networks were trained and tested for the MNIST (6 classes), EMNIST and the Kanji49 datasets. △ Less

Submitted 25 August, 2019; originally announced August 2019.

arXiv:1907.03064 [pdf, other]

Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

Authors: Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler

Abstract: We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying… ▽ More We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying semi-supervised training to 17.55 hours of untranscribed speech. We make use of factorised time-delay neural networks (TDNN-F) for acoustic modelling, since these have recently been shown to be effective in resource-scarce situations. Three semi-supervised training passes were performed, where the decoded output from each pass was used for acoustic model training in the subsequent pass. The automatic transcriptions from the best performing pass were used for language model augmentation. To ensure the quality of automatic transcriptions, decoder confidence is used as a threshold. The acoustic and language models obtained from the semi-supervised approach show significant improvement in terms of WER and perplexity compared to the baseline. Incorporating the automatically generated transcriptions yields a 6.55\% improvement in language model perplexity. The use of 17.55 hour of Somali acoustic data in semi-supervised training shows an improvement of 7.74\% relative over the baseline. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: 5 pages, 6 Tables, 3 figures, 22 references (Accepted at Interspeech 2019)

arXiv:1811.08284 [pdf, other]

Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders

Authors: Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John Quinn, Thomas Niesler

Abstract: We compare features for dynamic time warping (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting. Such quickly-deployable systems aim to support United Nations (UN) humanitarian relief efforts in parts of Africa with severely under-resourced languages. Our objective is to identify acoustic features that provide acceptable KWS performance in such environments. As… ▽ More We compare features for dynamic time warping (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting. Such quickly-deployable systems aim to support United Nations (UN) humanitarian relief efforts in parts of Africa with severely under-resourced languages. Our objective is to identify acoustic features that provide acceptable KWS performance in such environments. As supervised resource, we restrict ourselves to a small, easily acquired and independently compiled set of isolated keywords. For feature extraction, a multilingual bottleneck feature (BNF) extractor, trained on well-resourced out-of-domain languages, is integrated with a correspondence autoencoder (CAE) trained on extremely sparse in-domain data. On their own, BNFs and CAE features are shown to achieve a more than 2% absolute performance improvement over baseline MFCCs. However, by using BNFs as input to the CAE, even better performance is achieved, with a more than 11% absolute improvement in ROC AUC over MFCCs and more than twice as many top-10 retrievals for two evaluated languages, English and Luganda. We conclude that integrating BNFs with the CAE allows both large out-of-domain and sparse in-domain resources to be exploited for improved ASR-free keyword spotting. △ Less

Submitted 12 July, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

Comments: 5 pages, 2 figures, 2 tables, 38 references, Accepted at Interspeech 2019

Showing 1–14 of 14 results for author: Menon, R