-
Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machine
Authors:
Andrei V. Ermolaev,
Mathilde Hary,
Lev Leybov,
Piotr Ryczkowski,
Anas Skalli,
Daniel Brunner,
Goëry Genty,
John M. Dudley
Abstract:
We report a generalized nonlinear Schrödinger equation simulation model of an extreme learning machine (ELM) based on optical fiber propagation. Using the MNIST handwritten digit dataset as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. For this dataset and with quantum noise limited input, test accuracies…
▽ More
We report a generalized nonlinear Schrödinger equation simulation model of an extreme learning machine (ELM) based on optical fiber propagation. Using the MNIST handwritten digit dataset as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. For this dataset and with quantum noise limited input, test accuracies of : over 91% and 93% are found for propagation in the anomalous and normal dispersion regimes respectively. Our results also suggest that quantum noise on the input pulses introduces an intrinsic penalty to ELM performance.
△ Less
Submitted 11 June, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality
Authors:
Junlong Chen,
Rosella P. Galindo Esparza,
Vanja Garaj,
Per Ola Kristensson,
John Dudley
Abstract:
Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision…
▽ More
Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we designed and developed EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Principles and Metrics of Extreme Learning Machines Using a Highly Nonlinear Fiber
Authors:
Mathilde Hary,
Daniel Brunner,
Lev Leybov,
Piotr Ryczkowski,
John M. Dudley,
Goëry Genty
Abstract:
Optical computing offers potential for ultra high-speed and low latency computation by leveraging the intrinsic properties of light. Here, we explore the use of highly nonlinear optical fibers (HNLFs) as platforms for optical computing based on the concept of Extreme Learning Machines. Task-independent evaluations are introduced to the field for the first time and focus on the fundamental metrics…
▽ More
Optical computing offers potential for ultra high-speed and low latency computation by leveraging the intrinsic properties of light. Here, we explore the use of highly nonlinear optical fibers (HNLFs) as platforms for optical computing based on the concept of Extreme Learning Machines. Task-independent evaluations are introduced to the field for the first time and focus on the fundamental metrics of effective dimensionality and consistency, which we experimentally characterize for different nonlinear and dispersive conditions. We show that input power and fiber characteristics significantly influence the dimensionality of the computational system, with longer fibers and higher dispersion producing up to 100 principal components (PCs) at input power levels of 30 mW, where the PC correspond to the linearly independent dimensions of the system. The spectral distribution of the PC's eigenvectors reveals that the high-dimensional dynamics facilitating computing through dimensionality expansion are located within 40~nm of the pump wavelength at 1560~nm, providing general insight for computing with nonlinear Schrödinger equation systems. Task-dependent results demonstrate the effectiveness of HNLFs in classifying MNIST dataset images. Using input data compression through PC analysis, we inject MNIST images of various input dimensionality into the system and study the impact of input power upon classification accuracy. At optimized power levels we achieve a classification test accuracy of 88\%, significantly surpassing the baseline of 83.7\% from linear systems. Noteworthy, we find that best performance is not obtained at maximal input power, i.e. maximal system dimensionality, but at more than one order of magnitude lower. The same is confirmed regarding the MNIST image's compression, where accuracy is substantially improved when strongly compressing the image to less than 50 PCs.
△ Less
Submitted 20 June, 2025; v1 submitted 9 January, 2025;
originally announced January 2025.
-
Swarm manipulation: An efficient and accurate technique for multi-object manipulation in virtual reality
Authors:
Xiang Li,
Jin-Du Wang,
John J. Dudley,
Per Ola Kristensson
Abstract:
The theory of swarm control shows promise for controlling multiple objects, however, scalability is hindered by cost constraints, such as hardware and infrastructure. Virtual Reality (VR) can overcome these limitations, but research on swarm interaction in VR is limited. This paper introduces a novel Swarm Manipulation interaction technique and compares it with two baseline techniques: Virtual Han…
▽ More
The theory of swarm control shows promise for controlling multiple objects, however, scalability is hindered by cost constraints, such as hardware and infrastructure. Virtual Reality (VR) can overcome these limitations, but research on swarm interaction in VR is limited. This paper introduces a novel Swarm Manipulation interaction technique and compares it with two baseline techniques: Virtual Hand and Controller (ray-casting). We evaluated these techniques in a user study ($N$ = 12) in three tasks (selection, rotation, and resizing) across five conditions. Our results indicate that Swarm Manipulation yielded superior performance, with significantly faster speeds in most conditions across the three tasks. It notably reduced resizing size deviations but introduced a trade-off between speed and accuracy in the rotation task. Additionally, we conducted a follow-up user study ($N$ = 6) using Swarm Manipulation in two complex VR scenarios and obtained insights through semi-structured interviews, shedding light on optimized swarm control mechanisms and perceptual changes induced by this interaction paradigm. These results demonstrate the potential of the Swarm Manipulation technique to enhance the usability and user experience in VR compared to conventional manipulation techniques. In future studies, we aim to understand and improve swarm interaction via internal swarm particle cooperation.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
FAIR Universe HiggsML Uncertainty Challenge Competition
Authors:
Wahid Bhimji,
Paolo Calafiura,
Ragansu Chakkappai,
Po-Wen Chang,
Yuan-Tang Chou,
Sascha Diefenbacher,
Jordan Dudley,
Steven Farrell,
Aishik Ghosh,
Isabelle Guyon,
Chris Harris,
Shih-Chieh Hsu,
Elham E Khoda,
Rémy Lyscar,
Alexandre Michon,
Benjamin Nachman,
Peter Nugent,
Mathis Reymond,
David Rousseau,
Benjamin Sluijter,
Benjamin Thorne,
Ihsan Ullah,
Yulei Zhang
Abstract:
The FAIR Universe -- HiggsML Uncertainty Challenge focuses on measuring the physics properties of elementary particles with imperfect simulators due to differences in modelling systematic errors. Additionally, the challenge is leveraging a large-compute-scale AI platform for sharing datasets, training models, and hosting machine learning competitions. Our challenge brings together the physics and…
▽ More
The FAIR Universe -- HiggsML Uncertainty Challenge focuses on measuring the physics properties of elementary particles with imperfect simulators due to differences in modelling systematic errors. Additionally, the challenge is leveraging a large-compute-scale AI platform for sharing datasets, training models, and hosting machine learning competitions. Our challenge brings together the physics and machine learning communities to advance our understanding and methodologies in handling systematic (epistemic) uncertainties within AI techniques.
△ Less
Submitted 18 December, 2024; v1 submitted 3 October, 2024;
originally announced October 2024.
-
Working in Extended Reality in the Wild: Worker and Bystander Experiences of XR Virtual Displays in Real-World Settings
Authors:
Leonardo Pavanatto,
Verena Biener,
Jennifer Chandran,
Snehanjali Kalamkar,
Feiyu Lu,
John J. Dudley,
Jinghui Hu,
G. Nikki Ramirez-Saffy,
Per Ola Kristensson,
Alexander Giovannelli,
Luke Schlueter,
Jörg Müller,
Jens Grubert,
Doug A. Bowman
Abstract:
Although access to sufficient screen space is crucial to knowledge work, workers often find themselves with limited access to display infrastructure in remote or public settings. While virtual displays can be used to extend the available screen space through extended reality (XR) head-worn displays (HWD), we must better understand the implications of working with them in public settings from both…
▽ More
Although access to sufficient screen space is crucial to knowledge work, workers often find themselves with limited access to display infrastructure in remote or public settings. While virtual displays can be used to extend the available screen space through extended reality (XR) head-worn displays (HWD), we must better understand the implications of working with them in public settings from both users' and bystanders' viewpoints. To this end, we conducted two user studies. We first explored the usage of a hybrid AR display across real-world settings and tasks. We focused on how users take advantage of virtual displays and what social and environmental factors impact their usage of the system. A second study investigated the differences between working with a laptop, an AR system, or a VR system in public. We focused on a single location and participants performed a predefined task to enable direct comparisons between the conditions while also gathering data from bystanders. The combined results suggest a positive acceptance of XR technology in public settings and show that virtual displays can be used to accompany existing devices. We highlighted some environmental and social factors. We saw that previous XR experience and personality can influence how people perceive the use of XR in public. In addition, we confirmed that using XR in public still makes users stand out and that bystanders are curious about the devices, yet have no clear understanding of how they can be used.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Hold Tight: Identifying Behavioral Patterns During Prolonged Work in VR through Video Analysis
Authors:
Verena Biener,
Forouzan Farzinnejad,
Rinaldo Schuster,
Seyedmasih Tabaei,
Leon Lindlein,
Jinghui Hu,
Negar Nouri,
John J. Dudley,
Per Ola Kristensson,
Jörg Müller,
Jens Grubert
Abstract:
VR devices have recently been actively promoted as tools for knowledge workers and prior work has demonstrated that VR can support some knowledge worker tasks. However, only a few studies have explored the effects of prolonged use of VR such as a study observing 16 participant working in VR and a physical environment for one work-week each and reporting mainly on subjective feedback. As a nuanced…
▽ More
VR devices have recently been actively promoted as tools for knowledge workers and prior work has demonstrated that VR can support some knowledge worker tasks. However, only a few studies have explored the effects of prolonged use of VR such as a study observing 16 participant working in VR and a physical environment for one work-week each and reporting mainly on subjective feedback. As a nuanced understanding of participants' behavior in VR and how it evolves over time is still missing, we report on the results from an analysis of 559 hours of video material obtained in this prior study. Among other findings, we report that (1) the frequency of actions related to adjusting the headset reduced by 46% and the frequency of actions related to supporting the headset reduced by 42% over the five days; (2) the HMD was removed 31% less frequently over the five days but for 41% longer periods; (3) wearing an HMD is disruptive to normal patterns of eating and drinking, but not to social interactions, such as talking. The combined findings in this work demonstrate the value of long-term studies of deployed VR systems and can be used to inform the design of better, more ergonomic VR systems as tools for knowledge workers.
△ Less
Submitted 29 January, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Working with XR in Public: Effects on Users and Bystanders
Authors:
Verena Biener,
Snehanjali Kalamkar,
John J Dudley,
Jinghui Hu,
Per Ola Kristensson,
Jörg Müller,
Jens Grubert
Abstract:
Recent commercial off-the-shelf virtual and augmented reality devices have been promoted as tools for knowledge work and research findings show how this kind of work can benefit from the affordances of extended reality (XR). One major advantage that XR can provide is the enlarged display space that can be used to display virtual screens which is a feature already readily available in many commerci…
▽ More
Recent commercial off-the-shelf virtual and augmented reality devices have been promoted as tools for knowledge work and research findings show how this kind of work can benefit from the affordances of extended reality (XR). One major advantage that XR can provide is the enlarged display space that can be used to display virtual screens which is a feature already readily available in many commercial devices. This could be especially helpful in mobile contexts, in which users might not have access to their optimal physical work setup. Such situations often occur in a public setting, for example when working on a train while traveling to a business meeting. At the same time, the use of XR devices is still uncommon in public, which might impact both users and bystanders. Hence, there is a need to better understand the implications of using XR devices for work in public both on the user itself, as well as on bystanders. We report the results of a study in a university cafeteria in which participants used three different systems. In one setup they only used a laptop with a single screen, in a second setup, they combined the laptop with an optical see-through AR headset, and in the third, they combined the laptop with an immersive VR headset. In addition, we also collected 231 responses from bystanders through a questionnaire. The combined results indicate that (1) users feel safer if they can see their physical surroundings; (2) current use of XR in public makes users stand out; and (3) prior XR experience can influence how users feel when using XR in public.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Authors:
Junxiao Shen,
John J. Dudley,
Jingyao Zheng,
Bill Byrne,
Per Ola Kristensson
Abstract:
Text entry is an essential task in our day-to-day digital interactions. Numerous intelligent features have been developed to streamline this process, making text entry more effective, efficient, and fluid. These improvements include sentence prediction and user personalization. However, as deep learning-based language models become the norm for these advanced features, the necessity for data colle…
▽ More
Text entry is an essential task in our day-to-day digital interactions. Numerous intelligent features have been developed to streamline this process, making text entry more effective, efficient, and fluid. These improvements include sentence prediction and user personalization. However, as deep learning-based language models become the norm for these advanced features, the necessity for data collection and model fine-tuning increases. These challenges can be mitigated by harnessing the in-context learning capability of large language models such as GPT-3.5. This unique feature allows the language model to acquire new skills through prompts, eliminating the need for data collection and fine-tuning. Consequently, large language models can learn various text prediction techniques. We initially showed that, for a sentence prediction task, merely prompting GPT-3.5 surpassed a GPT-2 backed system and is comparable with a fine-tuned GPT-3.5 model, with the latter two methods requiring costly data collection, fine-tuning and post-processing. However, the task of prompting large language models to specialize in specific text prediction tasks can be challenging, particularly for designers without expertise in prompt engineering. To address this, we introduce Promptor, a conversational prompt generation agent designed to engage proactively with designers. Promptor can automatically generate complex prompts tailored to meet specific needs, thus offering a solution to this challenge. We conducted a user study involving 24 participants creating prompts for three intelligent text entry tasks, half of the participants used Promptor while the other half designed prompts themselves. The results show that Promptor-designed prompts result in a 35% increase in similarity and 22% in coherence over those by designers.
△ Less
Submitted 15 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Video Analysis of Behavioral Patterns During Prolonged Work in VR
Authors:
Verena Biener,
Forouzan Farzinnejad,
Rinaldo Schuster,
Seyedmasih Tabaei,
Leon Lindlein,
Jinghui Hu,
Negar Nouri,
John J. Dudley,
Per Ola Kristensson,
Jörg Müller,
Jens Grubert
Abstract:
VR has recently been promoted as a tool for knowledge workers and studies have shown that it has the potential to improve knowledge work. However, studies on its prolonged use have been scarce. A prior study compared working in VR for one week to working in a physical environment, focusing on performance measures and subjective feedback. However, a nuanced understanding and comparison of participa…
▽ More
VR has recently been promoted as a tool for knowledge workers and studies have shown that it has the potential to improve knowledge work. However, studies on its prolonged use have been scarce. A prior study compared working in VR for one week to working in a physical environment, focusing on performance measures and subjective feedback. However, a nuanced understanding and comparison of participants' behavior in VR and the physical environment is still missing. To this end, we analyzed video material made available from this previously conducted experiment, carried out over a working week, and present our findings on comparing the behavior of participants while working in VR and in a physical environment.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception
Authors:
Junxiao Shen,
John Dudley,
Per Ola Kristensson
Abstract:
We depend on our own memory to encode, store, and retrieve our experiences. However, memory lapses can occur. One promising avenue for achieving memory augmentation is through the use of augmented reality head-mounted displays to capture and preserve egocentric videos, a practice commonly referred to as lifelogging. However, a significant challenge arises from the sheer volume of video data genera…
▽ More
We depend on our own memory to encode, store, and retrieve our experiences. However, memory lapses can occur. One promising avenue for achieving memory augmentation is through the use of augmented reality head-mounted displays to capture and preserve egocentric videos, a practice commonly referred to as lifelogging. However, a significant challenge arises from the sheer volume of video data generated through lifelogging, as the current technology lacks the capability to encode and store such large amounts of data efficiently. Further, retrieving specific information from extensive video archives requires substantial computational power, further complicating the task of quickly accessing desired content. To address these challenges, we propose a memory augmentation agent that involves leveraging natural language encoding for video data and storing them in a vector database. This approach harnesses the power of large vision language models to perform the language encoding process. Additionally, we propose using large language models to facilitate natural language querying. Our agent underwent extensive evaluation using the QA-Ego4D dataset and achieved state-of-the-art results with a BLEU score of 8.3, outperforming conventional machine learning models that scored between 3.4 and 5.8. Additionally, we conducted a user study in which participants interacted with the human memory augmentation agent through episodic memory and open-ended questions. The results of this study show that the agent results in significantly better recall performance on episodic memory tasks compared to human participants. The results also highlight the agent's practical applicability and user acceptance.
△ Less
Submitted 18 October, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Predicting nonlinear reshaping of periodic signals in optical fibre with a neural network
Authors:
Sonia Boscolo,
J. M. Dudley,
Christophe Finot
Abstract:
We deploy a supervised machine-learning model based on a neural network to predict the temporal and spectral reshaping of a simple sinusoidal modulation into a pulse train having a comb structure in the frequency domain, which occurs upon nonlinear propagation in an optical fibre. Both normal and anomalous second-order dispersion regimes of the fibre are studied, and the speed of the neural networ…
▽ More
We deploy a supervised machine-learning model based on a neural network to predict the temporal and spectral reshaping of a simple sinusoidal modulation into a pulse train having a comb structure in the frequency domain, which occurs upon nonlinear propagation in an optical fibre. Both normal and anomalous second-order dispersion regimes of the fibre are studied, and the speed of the neural network is leveraged to probe the space of input parameters for the generation of custom combs or the occurrence of significant temporal or spectral focusing.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Quantifying the Effects of Working in VR for One Week
Authors:
Verena Biener,
Snehanjali Kalamkar,
Negar Nouri,
Eyal Ofek,
Michel Pahud,
John J. Dudley,
Jinghui Hu,
Per Ola Kristensson,
Maheshya Weerasinghe,
Klen Čopič Pucihar,
Matjaž Kljun,
Stephan Streuber,
Jens Grubert
Abstract:
Virtual Reality (VR) provides new possibilities for modern knowledge work. However, the potential advantages of virtual work environments can only be used if it is feasible to work in them for an extended period of time. Until now, there are limited studies of long-term effects when working in VR. This paper addresses the need for understanding such long-term effects. Specifically, we report on a…
▽ More
Virtual Reality (VR) provides new possibilities for modern knowledge work. However, the potential advantages of virtual work environments can only be used if it is feasible to work in them for an extended period of time. Until now, there are limited studies of long-term effects when working in VR. This paper addresses the need for understanding such long-term effects. Specifically, we report on a comparative study (n=16), in which participants were working in VR for an entire week -- for five days, eight hours each day -- as well as in a baseline physical desktop environment. This study aims to quantify the effects of exchanging a desktop-based work environment with a VR-based environment. Hence, during this study, we do not present the participants with the best possible VR system but rather a setup delivering a comparable experience to working in the physical desktop environment. The study reveals that, as expected, VR results in significantly worse ratings across most measures. Among other results, we found concerning levels of simulator sickness, below average usability ratings and two participants dropped out on the first day using VR, due to migraine, nausea and anxiety. Nevertheless, there is some indication that participants gradually overcame negative first impressions and initial discomfort. Overall, this study helps lay the groundwork for subsequent research, by clearly highlighting current shortcomings and identifying opportunities for improving the experience of working in VR.
△ Less
Submitted 8 June, 2022; v1 submitted 7 June, 2022;
originally announced June 2022.
-
Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques
Authors:
Liwei Chan,
Yi-Chi Liao,
George B. Mo,
John J. Dudley,
Chun-Lien Cheng,
Per Ola Kristensson,
Antti Oulasvirta
Abstract:
Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during…
▽ More
Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition
Authors:
Junxiao Shen,
John Dudley,
Per Ola Kristensson
Abstract:
Deep learning approaches deliver state-of-the-art performance in recognition of spatiotemporal human motion data. However, one of the main challenges in these recognition tasks is limited available training data. Insufficient training data results in over-fitting and data augmentation is one approach to address this challenge. Existing data augmentation strategies based on scaling, shifting and in…
▽ More
Deep learning approaches deliver state-of-the-art performance in recognition of spatiotemporal human motion data. However, one of the main challenges in these recognition tasks is limited available training data. Insufficient training data results in over-fitting and data augmentation is one approach to address this challenge. Existing data augmentation strategies based on scaling, shifting and interpolating offer limited generalizability and typically require detailed inspection of the dataset as well as hundreds of GPU hours for hyperparameter optimization. In this paper, we present a novel automatic data augmentation model, the Imaginative Generative Adversarial Network (GAN), that approximates the distribution of the input data and samples new data from this distribution. It is automatic in that it requires no data inspection and little hyperparameter tuning and therefore it is a low-cost and low-effort approach to generate synthetic data. We demonstrate our approach on small-scale skeleton-based datasets with a comprehensive experimental analysis. Our results show that the augmentation strategy is fast to train and can improve classification accuracy for both conventional neural networks and state-of-the-art methods.
△ Less
Submitted 10 August, 2023; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale
Authors:
Isotta Landi,
Benjamin S. Glicksberg,
Hao-Chih Lee,
Sarah Cherng,
Giulia Landi,
Matteo Danieletto,
Joel T. Dudley,
Cesare Furlanello,
Riccardo Miotto
Abstract:
Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficien…
▽ More
Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.
△ Less
Submitted 18 July, 2020; v1 submitted 13 March, 2020;
originally announced March 2020.
-
Scaling structural learning with NO-BEARS to infer causal transcriptome networks
Authors:
Hao-Chih Lee,
Matteo Danieletto,
Riccardo Miotto,
Sarah T. Cherng,
Joel T. Dudley
Abstract:
Constructing gene regulatory networks is a critical step in revealing disease mechanisms from transcriptomic data. In this work, we present NO-BEARS, a novel algorithm for estimating gene regulatory networks. The NO-BEARS algorithm is built on the basis of the NOTEARS algorithm with two improvements. First, we propose a new constraint and its fast approximation to reduce the computational cost of…
▽ More
Constructing gene regulatory networks is a critical step in revealing disease mechanisms from transcriptomic data. In this work, we present NO-BEARS, a novel algorithm for estimating gene regulatory networks. The NO-BEARS algorithm is built on the basis of the NOTEARS algorithm with two improvements. First, we propose a new constraint and its fast approximation to reduce the computational cost of the NO-TEARS algorithm. Next, we introduce a polynomial regression loss to handle non-linearity in gene expressions. Our implementation utilizes modern GPU computation that can decrease the time of hours-long CPU computation to seconds. Using synthetic data, we demonstrate improved performance, both in processing time and accuracy, on inferring gene regulatory networks from gene expression data.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
Enhancing high-content imaging for studying microtubule networks at large-scale
Authors:
Hao-Chih Lee,
Sarah T Cherng,
Riccardo Miotto,
Joel T Dudley
Abstract:
Given the crucial role of microtubules for cell survival, many researchers have found success using microtubule-targeting agents in the search for effective cancer therapeutics. Understanding microtubule responses to targeted interventions requires that the microtubule network within cells can be consistently observed across a large sample of images. However, fluorescence noise sources captured si…
▽ More
Given the crucial role of microtubules for cell survival, many researchers have found success using microtubule-targeting agents in the search for effective cancer therapeutics. Understanding microtubule responses to targeted interventions requires that the microtubule network within cells can be consistently observed across a large sample of images. However, fluorescence noise sources captured simultaneously with biological signals while using wide-field microscopes can obfuscate fine microtubule structures. Such requirements are particularly challenging for high-throughput imaging, where researchers must make decisions related to the trade-off between imaging quality and speed. Here, we propose a computational framework to enhance the quality of high-throughput imaging data to achieve fast speed and high quality simultaneously. Using CycleGAN, we learn an image model from low-throughput, high-resolution images to enhance features, such as microtubule networks in high-throughput low-resolution images. We show that CycleGAN is effective in identifying microtubules with 0.93+ AUC-ROC and that these results are robust to different kinds of image noise. We further apply CycleGAN to quantify the changes in microtubule density as a result of the application of drug compounds, and show that the quantified responses correspond well with known drug effects
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
Authors:
Seyedmostafa Sheikhalishahi,
Riccardo Miotto,
Joel T Dudley,
Alberto Lavelli,
Fabio Rinaldi,
Venet Osmani
Abstract:
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical…
▽ More
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Further efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables
Authors:
Marcus A. Badgeley,
John R. Zech,
Luke Oakden-Rayner,
Benjamin S. Glicksberg,
Manway Liu,
William Gale,
Michael V. McConnell,
Beth Percha,
Thomas M. Snyder,
Joel T. Dudley
Abstract:
Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs. Computer-Aided Diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep learning mo…
▽ More
Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs. Computer-Aided Diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep learning models on 17,587 radiographs to classify fracture, five patient traits, and 14 hospital process variables. All 20 variables could be predicted from a radiograph (p < 0.05), with the best performances on scanner model (AUC=1.00), scanner brand (AUC=0.98), and whether the order was marked "priority" (AUC=0.79). Fracture was predicted moderately well from the image (AUC=0.78) and better when combining image features with patient data (AUC=0.86, p=2e-9) or patient data plus hospital process features (AUC=0.91, p=1e-21). The model performance on a test set with matched patient variables was significantly lower than a random test set (AUC=0.67, p=0.003); and when the test set was matched on patient and image acquisition variables, the model performed randomly (AUC=0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's predictive ability overall. We also used Naive Bayes to combine evidence from image models with patient and hospital data and found their inclusion improved performance, but that this approach was nevertheless inferior to directly modeling all variables. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep learning decision processes so that computers and clinicians can effectively cooperate.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.
-
Machine learning for prediction of extreme statistics in modulation instability
Authors:
Mikko Närhi,
Lauri Salmela,
Juha Toivonen,
Cyril Billet,
John M. Dudley,
Goëry Genty
Abstract:
A central area of research in nonlinear science is the study of instabilities that drive the emergence of extreme events. Unfortunately, experimental techniques for measuring such phenomena often provide only partial characterization. For example, real-time studies of instabilities in nonlinear fibre optics frequently use only spectral data, precluding detailed predictions about the associated tem…
▽ More
A central area of research in nonlinear science is the study of instabilities that drive the emergence of extreme events. Unfortunately, experimental techniques for measuring such phenomena often provide only partial characterization. For example, real-time studies of instabilities in nonlinear fibre optics frequently use only spectral data, precluding detailed predictions about the associated temporal properties. Here, we show how Machine Learning can overcome this limitation by predicting statistics for the maximum intensity of temporal peaks in modulation instability based only on spectral measurements. Specifically, we train a neural network based Machine Learning model to correlate spectral and temporal properties of optical fibre modulation instability using data from numerical simulations, and we then use this model to predict the temporal probability distribution based on high-dynamic range spectral data from experiments. These results open novel perspectives in all systems exhibiting chaos and instability where direct time-domain observations are difficult.
△ Less
Submitted 28 May, 2018;
originally announced June 2018.
-
Processing of Electronic Health Records using Deep Learning: A review
Authors:
Venet Osmani,
Li Li,
Matteo Danieletto,
Benjamin Glicksberg,
Joel Dudley,
Oscar Mayora
Abstract:
Availability of large amount of clinical data is opening up new research avenues in a number of fields. An exciting field in this respect is healthcare, where secondary use of healthcare data is beginning to revolutionize healthcare. Except for availability of Big Data, both medical data from healthcare institutions (such as EMR data) and data generated from health and wellbeing devices (such as p…
▽ More
Availability of large amount of clinical data is opening up new research avenues in a number of fields. An exciting field in this respect is healthcare, where secondary use of healthcare data is beginning to revolutionize healthcare. Except for availability of Big Data, both medical data from healthcare institutions (such as EMR data) and data generated from health and wellbeing devices (such as personal trackers), a significant contribution to this trend is also being made by recent advances on machine learning, specifically deep learning algorithms.
△ Less
Submitted 5 April, 2018;
originally announced April 2018.