-
Time Stretch with Continuous-Wave Lasers
Authors:
Tingyi Zhou,
Yuta Goto,
Takeshi Makino,
Callen MacPhee,
Yiming Zhou,
Asad M. Madni,
Hideaki Furukawa,
Naoya Wada,
Bahram Jalali
Abstract:
A single-shot measurement technique for ultrafast phenomena with high throughput enables the capture of rare events within a short time scale, facilitating the exploration of rare ultrafast processes. Photonic time stretch stands out as a highly effective method for both detecting rapid events and achieving remarkable speed in imaging and ranging applications. The current time stretch method relie…
▽ More
A single-shot measurement technique for ultrafast phenomena with high throughput enables the capture of rare events within a short time scale, facilitating the exploration of rare ultrafast processes. Photonic time stretch stands out as a highly effective method for both detecting rapid events and achieving remarkable speed in imaging and ranging applications. The current time stretch method relies on costly passive mode-locked lasers with continuous and fixed spectra to capture fast transients and dilate their time scale using dispersion. This hinders the broad application of time stretch technology and presents synchronization challenges with ultrafast events for measurement. Here we report the first implementation of time stretch using continuous wave (CW) diode lasers with discrete and tunable spectra that are common in WDM optical communication. This approach offers the potential for more cost-effective and compact time stretch systems and simplifies laser synchronization with the input signal. Two different embodiments in the United States and Japan demonstrate the technique's operation and limitations, and potential applications to time stretch imaging and angular light scattering.
△ Less
Submitted 1 November, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Authors:
Otavio Braga,
Takaki Makino,
Olivier Siohan,
Hank Liao
Abstract:
Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching the audio. However, in a more realistic setting, when multiple faces are potentially on screen one needs to decide which face to feed to the A/V ASR system. The present work takes the recent progress of A/V ASR one step further and consider…
▽ More
Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching the audio. However, in a more realistic setting, when multiple faces are potentially on screen one needs to decide which face to feed to the A/V ASR system. The present work takes the recent progress of A/V ASR one step further and considers the scenario where multiple people are simultaneously on screen (multi-person A/V ASR). We propose a fully differentiable A/V ASR model that is able to handle multiple face tracks in a video. Instead of relying on two separate models for speaker face selection and audio-visual ASR on a single face track, we introduce an attention layer to the ASR encoder that is able to soft-select the appropriate face video track. Experiments carried out on an A/V system trained on over 30k hours of YouTube videos illustrate that the proposed approach can automatically select the proper face tracks with minor WER degradation compared to an oracle selection of the speaking face while still showing benefits of employing the visual signal instead of the audio alone.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Differences between human and machine perception in medical diagnosis
Authors:
Taro Makino,
Stanislaw Jastrzebski,
Witold Oleszkiewicz,
Celin Chacko,
Robin Ehrenpreis,
Naziya Samreen,
Chloe Chhor,
Eric Kim,
Jiyon Lee,
Kristine Pysarenko,
Beatriu Reig,
Hildegard Toth,
Divya Awal,
Linda Du,
Alice Kim,
James Park,
Daniel K. Sodickson,
Laura Heacock,
Linda Moy,
Kyunghyun Cho,
Krzysztof J. Geras
Abstract:
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since their performance can be severely degraded by dataset shifts to which human perception remains invariant. If we can better understand the differences between human and machine perception, we can potentially characterize and mitigate this effect. We therefore propose a framework for comparin…
▽ More
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since their performance can be severely degraded by dataset shifts to which human perception remains invariant. If we can better understand the differences between human and machine perception, we can potentially characterize and mitigate this effect. We therefore propose a framework for comparing human and machine perception in medical diagnosis. The two are compared with respect to their sensitivity to the removal of clinically meaningful information, and to the regions of an image deemed most suspicious. Drawing inspiration from the natural image domain, we frame both comparisons in terms of perturbation robustness. The novelty of our framework is that separate analyses are performed for subgroups with clinically meaningful differences. We argue that this is necessary in order to avert Simpson's paradox and draw correct conclusions. We demonstrate our framework with a case study in breast cancer screening, and reveal significant differences between radiologists and DNNs. We compare the two with respect to their robustness to Gaussian low-pass filtering, performing a subgroup analysis on microcalcifications and soft tissue lesions. For microcalcifications, DNNs use a separate set of high frequency components than radiologists, some of which lie outside the image regions considered most suspicious by radiologists. These features run the risk of being spurious, but if not, could represent potential new biomarkers. For soft tissue lesions, the divergence between radiologists and DNNs is even starker, with DNNs relying heavily on spurious high frequency components ignored by radiologists. Importantly, this deviation in soft tissue lesions was only observable through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into our comparison framework.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Reducing false-positive biopsies with deep neural networks that utilize local and global information in screening mammograms
Authors:
Nan Wu,
Zhe Huang,
Yiqiu Shen,
Jungkyu Park,
Jason Phang,
Taro Makino,
S. Gene Kim,
Kyunghyun Cho,
Laura Heacock,
Linda Moy,
Krzysztof J. Geras
Abstract:
Breast cancer is the most common cancer in women, and hundreds of thousands of unnecessary biopsies are done around the world at a tremendous cost. It is crucial to reduce the rate of biopsies that turn out to be benign tissue. In this study, we build deep neural networks (DNNs) to classify biopsied lesions as being either malignant or benign, with the goal of using these networks as second reader…
▽ More
Breast cancer is the most common cancer in women, and hundreds of thousands of unnecessary biopsies are done around the world at a tremendous cost. It is crucial to reduce the rate of biopsies that turn out to be benign tissue. In this study, we build deep neural networks (DNNs) to classify biopsied lesions as being either malignant or benign, with the goal of using these networks as second readers serving radiologists to further reduce the number of false positive findings. We enhance the performance of DNNs that are trained to learn from small image patches by integrating global context provided in the form of saliency maps learned from the entire image into their reasoning, similar to how radiologists consider global context when evaluating areas of interest. Our experiments are conducted on a dataset of 229,426 screening mammography exams from 141,473 patients. We achieve an AUC of 0.8 on a test set consisting of 464 benign and 136 malignant lesions.
△ Less
Submitted 19 September, 2020;
originally announced September 2020.
-
An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department
Authors:
Farah E. Shamout,
Yiqiu Shen,
Nan Wu,
Aakash Kaku,
Jungkyu Park,
Taro Makino,
Stanisław Jastrzębski,
Jan Witowski,
Duo Wang,
Ben Zhang,
Siddhant Dogra,
Meng Cao,
Narges Razavian,
David Kudlowitz,
Lea Azour,
William Moore,
Yvonne W. Lui,
Yindalon Aphinyanaphongs,
Carlos Fernandez-Granda,
Krzysztof J. Geras
Abstract:
During the coronavirus disease 2019 (COVID-19) pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images and a gradient boosting model that learns from routine clinical variables. Our AI prognosis s…
▽ More
During the coronavirus disease 2019 (COVID-19) pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images and a gradient boosting model that learns from routine clinical variables. Our AI prognosis system, trained using data from 3,661 patients, achieves an area under the receiver operating characteristic curve (AUC) of 0.786 (95% CI: 0.745-0.830) when predicting deterioration within 96 hours. The deep neural network extracts informative areas of chest X-ray images to assist clinicians in interpreting the predictions and performs comparably to two radiologists in a reader study. In order to verify performance in a real clinical setting, we silently deployed a preliminary version of the deep neural network at New York University Langone Health during the first wave of the pandemic, which produced accurate predictions in real-time. In summary, our findings demonstrate the potential of the proposed system for assisting front-line physicians in the triage of COVID-19 patients.
△ Less
Submitted 3 November, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Authors:
Takaki Makino,
Hank Liao,
Yannis Assael,
Brendan Shillingford,
Basilio Garcia,
Otavio Braga,
Olivier Siohan
Abstract:
This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utterances extracted from YouTube public videos, leading to 31k hours of audio-visual training content. The performance of an audio-only, visual-only, and au…
▽ More
This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utterances extracted from YouTube public videos, leading to 31k hours of audio-visual training content. The performance of an audio-only, visual-only, and audio-visual system are compared on two large-vocabulary test sets: a set of utterance segments from public YouTube videos called YTDEV18 and the publicly available LRS3-TED set. To highlight the contribution of the visual modality, we also evaluated the performance of our system on the YTDEV18 set artificially corrupted with background noise and overlapping speech. To the best of our knowledge, our system significantly improves the state-of-the-art on the LRS3-TED set.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.