Search | arXiv e-print repository

EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos

Authors: Yuxuan Li, Vijay Veerabadran, Michael L. Iuzzolino, Brett D. Roads, Asli Celikyilmaz, Karl Ridgeway

Abstract: We introduce EgoToM, a new video question-answering benchmark that extends Theory-of-Mind (ToM) evaluation to egocentric domains. Using a causal ToM model, we generate multi-choice video QA instances for the Ego4D dataset to benchmark the ability to predict a camera wearer's goals, beliefs, and next actions. We study the performance of both humans and state of the art multimodal large language mod… ▽ More We introduce EgoToM, a new video question-answering benchmark that extends Theory-of-Mind (ToM) evaluation to egocentric domains. Using a causal ToM model, we generate multi-choice video QA instances for the Ego4D dataset to benchmark the ability to predict a camera wearer's goals, beliefs, and next actions. We study the performance of both humans and state of the art multimodal large language models (MLLMs) on these three interconnected inference problems. Our evaluation shows that MLLMs achieve close to human-level accuracy on inferring goals from egocentric videos. However, MLLMs (including the largest ones we tested with over 100B parameters) fall short of human performance when inferring the camera wearers' in-the-moment belief states and future actions that are most consistent with the unseen video future. We believe that our results will shape the future design of an important class of egocentric digital assistants which are equipped with a reasonable model of the user's internal mental states. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2502.19410 [pdf, other]

Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices

Authors: Xinru Wang, Mengjie Yu, Hannah Nguyen, Michael Iuzzolino, Tianyi Wang, Peiqi Tang, Natasha Lynova, Co Tran, Ting Zhang, Naveen Sendhilnathan, Hrvoje Benko, Haijun Xia, Tanya Jonker

Abstract: Large Language Models (LLMs) have shown remarkable potential in recommending everyday actions as personal AI assistants, while Explainable AI (XAI) techniques are being increasingly utilized to help users understand why a recommendation is given. Personal AI assistants today are often located on ultra-small devices such as smartwatches, which have limited screen space. The verbosity of LLM-generat… ▽ More Large Language Models (LLMs) have shown remarkable potential in recommending everyday actions as personal AI assistants, while Explainable AI (XAI) techniques are being increasingly utilized to help users understand why a recommendation is given. Personal AI assistants today are often located on ultra-small devices such as smartwatches, which have limited screen space. The verbosity of LLM-generated explanations, however, makes it challenging to deliver glanceable LLM explanations on such ultra-small devices. To address this, we explored 1) spatially structuring an LLM's explanation text using defined contextual components during prompting and 2) presenting temporally adaptive explanations to users based on confidence levels. We conducted a user study to understand how these approaches impacted user experiences when interacting with LLM recommendations and explanations on ultra-small devices. The results showed that structured explanations reduced users' time to action and cognitive load when reading an explanation. Always-on structured explanations increased users' acceptance of AI recommendations. However, users were less satisfied with structured explanations compared to unstructured ones due to their lack of sufficient, readable details. Additionally, adaptively presenting structured explanations was less effective at improving user perceptions of the AI compared to the always-on structured explanations. Together with users' interview feedback, the results led to design implications to be mindful of when personalizing the content and timing of LLM explanations that are displayed on ultra-small devices. △ Less

Submitted 26 February, 2025; originally announced February 2025.

arXiv:2307.12854 [pdf, other]

Multiscale Video Pretraining for Long-Term Activity Forecasting

Authors: Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

Abstract: Long-term activity forecasting is an especially challenging research problem because it requires understanding the temporal relationships between observed actions, as well as the variability and complexity of human activities. Despite relying on strong supervision via expensive human annotations, state-of-the-art forecasting approaches often generalize poorly to unseen data. To alleviate this issu… ▽ More Long-term activity forecasting is an especially challenging research problem because it requires understanding the temporal relationships between observed actions, as well as the variability and complexity of human activities. Despite relying on strong supervision via expensive human annotations, state-of-the-art forecasting approaches often generalize poorly to unseen data. To alleviate this issue, we propose Multiscale Video Pretraining (MVP), a novel self-supervised pretraining approach that learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales. MVP is based on our observation that actions in videos have a multiscale nature, where atomic actions typically occur at a short timescale and more complex actions may span longer timescales. We compare MVP to state-of-the-art self-supervised video learning approaches on downstream long-term forecasting tasks including long-term action anticipation and video summary prediction. Our comprehensive experiments across the Ego4D and Epic-Kitchens-55/100 datasets demonstrate that MVP out-performs state-of-the-art methods by significant margins. Notably, MVP obtains a relative performance gain of over 20% accuracy in video summary forecasting over existing methods. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.05784 [pdf, other]

EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

Authors: Matthias De Lange, Hamid Eghbalzadeh, Reuben Tan, Michael Iuzzolino, Franziska Meier, Karl Ridgeway

Abstract: In egocentric action recognition a single population model is typically trained and subsequently embodied on a head-mounted device, such as an augmented reality headset. While this model remains static for new users and environments, we introduce an adaptive paradigm of two phases, where after pretraining a population model, the model adapts on-device and online to the user's experience. This sett… ▽ More In egocentric action recognition a single population model is typically trained and subsequently embodied on a head-mounted device, such as an augmented reality headset. While this model remains static for new users and environments, we introduce an adaptive paradigm of two phases, where after pretraining a population model, the model adapts on-device and online to the user's experience. This setting is highly challenging due to the change from population to user domain and the distribution shifts in the user's data stream. Coping with the latter in-stream distribution shifts is the focus of continual learning, where progress has been rooted in controlled benchmarks but challenges faced in real-world applications often remain unaddressed. We introduce EgoAdapt, a benchmark for real-world egocentric action recognition that facilitates our two-phased adaptive paradigm, and real-world challenges naturally occur in the egocentric video streams from Ego4d, such as long-tailed action distributions and large-scale classification over 2740 actions. We introduce an evaluation framework that directly exploits the user's data stream with new metrics to measure the adaptation gain over the population model, online generalization, and hindsight performance. In contrast to single-stream evaluation in existing works, our framework proposes a meta-evaluation that aggregates the results from 50 independent user streams. We provide an extensive empirical study for finetuning and experience replay. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: Preprint

arXiv:2304.09179 [pdf, other]

Pretrained Language Models as Visual Planners for Human Assistance

Authors: Dhruvesh Patel, Hamid Eghbalzadeh, Nitin Kamra, Michael Louis Iuzzolino, Unnat Jain, Ruta Desai

Abstract: In our pursuit of advancing multi-modal AI assistants capable of guiding users to achieve complex multi-step goals, we propose the task of "Visual Planning for Assistance (VPA)". Given a succinct natural language goal, e.g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i.e., a sequence of actions such as "sand shelf", "paint shelf", etc. to realize… ▽ More In our pursuit of advancing multi-modal AI assistants capable of guiding users to achieve complex multi-step goals, we propose the task of "Visual Planning for Assistance (VPA)". Given a succinct natural language goal, e.g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i.e., a sequence of actions such as "sand shelf", "paint shelf", etc. to realize the specified goal. This requires assessing the user's progress from the (untrimmed) video, and relating it to the requirements of natural language goal, i.e., which actions to select and in what order? Consequently, this requires handling long video history and arbitrarily complex action dependencies. To address these challenges, we decompose VPA into video action segmentation and forecasting. Importantly, we experiment by formulating the forecasting step as a multi-modal sequence modeling problem, allowing us to leverage the strength of pre-trained LMs (as the sequence model). This novel approach, which we call Visual Language Model based Planner (VLaMP), outperforms baselines across a suite of metrics that gauge the quality of the generated plans. Furthermore, through comprehensive ablations, we also isolate the value of each component--language pre-training, visual observations, and goal information. We have open-sourced all the data, model checkpoints, and training code. △ Less

Submitted 26 August, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Accepted at ICCV 2023

arXiv:2302.05330 [pdf, other]

Action Dynamics Task Graphs for Learning Plannable Representations of Procedural Tasks

Authors: Weichao Mao, Ruta Desai, Michael Louis Iuzzolino, Nitin Kamra

Abstract: Given video demonstrations and paired narrations of an at-home procedural task such as changing a tire, we present an approach to extract the underlying task structure -- relevant actions and their temporal dependencies -- via action-centric task graphs. Learnt structured representations from our method, Action Dynamics Task Graphs (ADTG), can then be used for understanding such tasks in unseen vi… ▽ More Given video demonstrations and paired narrations of an at-home procedural task such as changing a tire, we present an approach to extract the underlying task structure -- relevant actions and their temporal dependencies -- via action-centric task graphs. Learnt structured representations from our method, Action Dynamics Task Graphs (ADTG), can then be used for understanding such tasks in unseen videos of humans performing them. Furthermore, ADTG can enable providing user-centric guidance to humans in these tasks, either for performing them better or for learning new tasks. Specifically, we show how ADTG can be used for: (1) tracking an ongoing task, (2) recommending next actions, and (3) planning a sequence of actions to accomplish a procedural task. We compare against state-of-the-art Neural Task Graph method and demonstrate substantial gains on 18 procedural tasks from the CrossTask dataset, including 30.1% improvement in task tracking accuracy and 20.3% accuracy gain in next action prediction. △ Less

Submitted 11 January, 2023; originally announced February 2023.

Comments: AAAI 2023 Workshop on User-Centric Artificial Intelligence for Assistance in At-Home Tasks

arXiv:2112.06857 [pdf, other]

Software solutions for numerical modeling of wide-field telescopes

Authors: Salvatore Savarese, Pietro Schipani, Giulio Capasso, Mirko Colapietro, Sergio D'Orsi, Marcella Iuzzolino, Laurent Marty, Francesco Perrotta, Giacomo Basile

Abstract: This paper presents an integrated modeling software to analyze the PSF of wide-field telescopes affected by misalignments. Even relatively small misalignments in the optical system of a telescope can significantly deteriorate the image quality by introducing large aberrations. In particular, wide-field telescopes are critically affected by these errors, insomuch that usually a closed-loop active o… ▽ More This paper presents an integrated modeling software to analyze the PSF of wide-field telescopes affected by misalignments. Even relatively small misalignments in the optical system of a telescope can significantly deteriorate the image quality by introducing large aberrations. In particular, wide-field telescopes are critically affected by these errors, insomuch that usually a closed-loop active optics system is adopted for a continuous correction, rather than for sporadic alignment procedures. Typically, a ray-tracing software such as Zemax OpticStudio is employed to accurately analyze the system during the optical design. However, an analytical model of the optical system is preferable when the PSF of the telescope must be reconstructed quickly for algorithmic purposes. Here the analytical model is derived through a hybrid approach and developed in a custom software package, designed to be general and flexible in order to be tailored to different optical configurations. First, leveraging on the Zemax OpticStudio API, the ray-tracing software is integrated into a Matlab pipeline. This allows to perform a statistical analysis by automatically simulating the system response in a variety of misaligned working conditions. Then, the resulting dataset is employed to populate a database of parameters describing the model. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 4 pages, 3 figures, ADASS 2021 Conference

arXiv:2109.05675 [pdf, other]

Online Unsupervised Learning of Visual Representations and Categories

Authors: Mengye Ren, Tyler R. Scott, Michael L. Iuzzolino, Michael C. Mozer, Richard Zemel

Abstract: Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised mode… ▽ More Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised model that simultaneously performs online visual representation learning and few-shot learning of new categories without relying on any class labels. Our model is a prototype-based memory network with a control component that determines when to form a new class prototype. We formulate it as an online mixture model, where components are created with only a single new example, and assignments do not have to be balanced, which permits an approximation to natural imbalanced distributions from uncurated raw data. Learning includes a contrastive loss that encourages different views of the same image to be assigned to the same prototype. The result is a mechanism that forms categorical representations of objects in nonstationary environments. Experiments show that our method can learn from an online stream of visual input data and its learned representations are significantly better at category recognition compared to state-of-the-art self-supervised learning methods. △ Less

Submitted 28 May, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

Comments: Technical report, 32 pages

arXiv:2102.09808 [pdf, other]

Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss

Authors: Michael L. Iuzzolino, Michael C. Mozer, Samy Bengio

Abstract: Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics. Deep nets typically operate in serial stages wherein each layer completes its computation before processing begins in subsequent layers. In contrast, biological systems have cascaded dynamics: information propagates from neurons at all layers in parallel but tra… ▽ More Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics. Deep nets typically operate in serial stages wherein each layer completes its computation before processing begins in subsequent layers. In contrast, biological systems have cascaded dynamics: information propagates from neurons at all layers in parallel but transmission occurs gradually over time, leading to speed-accuracy trade offs even in feedforward architectures. We explore the consequences of biologically inspired parallel hardware by constructing cascaded ResNets in which each residual block has propagation delays but all blocks update in parallel in a stateful manner. Because information transmitted through skip connections avoids delays, the functional depth of the architecture increases over time, yielding anytime predictions that improve with internal-processing time. We introduce a temporal-difference training loss that achieves a strictly superior speed-accuracy profile over standard losses and enables the cascaded architecture to outperform state-of-the-art anytime-prediction methods. The cascaded architecture has intriguing properties, including: it classifies typical instances more rapidly than atypical instances; it is more robust to both persistent and transient noise than is a conventional ResNet; and its time-varying output trace provides a signal that can be exploited to improve information processing and inference. △ Less

Submitted 2 November, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

arXiv:2007.04546 [pdf, other]

Wandering Within a World: Online Contextualized Few-Shot Learning

Authors: Mengye Ren, Michael L. Iuzzolino, Michael C. Mozer, Richard S. Zemel

Abstract: We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retriev… ▽ More We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retrieve learned skills in the past, our online few-shot learning setting also features an underlying context that changes throughout time. Object classes are correlated within a context and inferring the correct context can lead to better performance. Building upon this setting, we propose a new few-shot learning dataset based on large scale indoor imagery that mimics the visual experience of an agent wandering within a world. Furthermore, we convert popular few-shot learning approaches into online versions and we also propose a new contextual prototypical memory model that can make use of spatiotemporal contextual information from the recent past. △ Less

Submitted 22 April, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: ICLR 2021

arXiv:2004.00762 [pdf, other]

In Automation We Trust: Investigating the Role of Uncertainty in Active Learning Systems

Authors: Michael L. Iuzzolino, Tetsumichi Umada, Nisar R. Ahmed, Danielle A. Szafir

Abstract: We investigate how different active learning (AL) query policies coupled with classification uncertainty visualizations affect analyst trust in automated classification systems. A current standard policy for AL is to query the oracle (e.g., the analyst) to refine labels for datapoints where the classifier has the highest uncertainty. This is an optimal policy for the automation system as it yields… ▽ More We investigate how different active learning (AL) query policies coupled with classification uncertainty visualizations affect analyst trust in automated classification systems. A current standard policy for AL is to query the oracle (e.g., the analyst) to refine labels for datapoints where the classifier has the highest uncertainty. This is an optimal policy for the automation system as it yields maximal information gain. However, model-centric policies neglect the effects of this uncertainty on the human component of the system and the consequent manner in which the human will interact with the system post-training. In this paper, we present an empirical study evaluating how AL query policies and visualizations lending transparency to classification influence trust in automated classification of image data. We found that query policy significantly influences an analyst's trust in an image classification system, and we use these results to propose a set of oracle query policies and visualizations for use during AL training phases that can influence analyst trust in classification. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:2002.10562 [pdf, ps, other]

doi 10.1051/0004-6361/201937369

The GAPS Programme at TNG XXI -- A GIARPS case-study of known young planetary candidates: confirmation of HD 285507 b and refutation of AD Leo b

Authors: I. Carleo, L. Malavolta, A. F. Lanza, M. Damasso, S. Desidera, F. Borsa, M. Mallonn, M. Pinamonti, R. Gratton, E. Alei, S. Benatti, L. Mancini, J. Maldonado, K. Biazzo, M. Esposito, G. Frustagli, E. González-Álvarez, G. Micela, G. Scandariato, A. Sozzetti, L. Affer, A. Bignamini, A. S. Bonomo, R. Claudi, R. Cosentino , et al. (45 additional authors not shown)

Abstract: The existence of hot Jupiters is still not well understood. Two main channels are thought to be responsible for their current location: a smooth planet migration through the proto-planetary disk or the circularization of an initial high eccentric orbit by tidal dissipation leading to a strong decrease of the semimajor axis. Different formation scenarios result in different observable effects, such… ▽ More The existence of hot Jupiters is still not well understood. Two main channels are thought to be responsible for their current location: a smooth planet migration through the proto-planetary disk or the circularization of an initial high eccentric orbit by tidal dissipation leading to a strong decrease of the semimajor axis. Different formation scenarios result in different observable effects, such as orbital parameters (obliquity/eccentricity), or frequency of planets at different stellar ages. In the context of the GAPS Young-Objects project, we are carrying out a radial velocity survey with the aim to search and characterize young hot-Jupiter planets. Our purpose is to put constraints on evolutionary models and establish statistical properties, such as the frequency of these planets from a homogeneous sample. Since young stars are in general magnetically very active, we performed multi-band (visible and near-infrared) spectroscopy with simultaneous GIANO-B + HARPS-N (GIARPS) observing mode at TNG. This helps to deal with stellar activity and distinguish the nature of radial velocity variations: stellar activity will introduce a wavelength-dependent radial velocity amplitude, whereas a Keplerian signal is achromatic. As a pilot study, we present here the cases of two already claimed hot Jupiters orbiting young stars: HD285507 b and AD Leo b. Our analysis of simultaneous high-precision GIARPS spectroscopic data confirms the Keplerian nature of HD285507's radial velocities variation and refines the orbital parameters of the hot Jupiter, obtaining an eccentricity consistent with a circular orbit. On the other hand, our analysis does not confirm the signal previously attributed to a planet orbiting AD Leo. This demonstrates the power of the multi-band spectroscopic technique when observing active stars. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Journal ref: A&A 638, A5 (2020)

arXiv:1911.08670 [pdf, other]

MMTM: Multimodal Transfer Module for CNN Fusion

Authors: Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida

Abstract: In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end. Due to its simplicity late fusion is still the predominant approach in many state-of-the-art multimodal applications. In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convol… ▽ More In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end. Due to its simplicity late fusion is still the predominant approach in many state-of-the-art multimodal applications. In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks. The propose unit, named Multimodal Transfer Module (MMTM), can be added at different levels of the feature hierarchy, enabling slow modality fusion. Using squeeze and excitation operations, MMTM utilizes the knowledge of multiple modalities to recalibrate the channel-wise features in each CNN stream. Despite other intermediate fusion methods, the proposed module could be used for feature modality fusion in convolution layers with different spatial dimensions. Another advantage of the proposed method is that it could be added among unimodal branches with minimum changes in the their network architectures, allowing each branch to be initialized with existing pretrained weights. Experimental results show that our framework improves the recognition accuracy of well-known multimodal networks. We demonstrate state-of-the-art or competitive performance on four datasets that span the task domains of dynamic hand gesture recognition, speech enhancement, and action recognition with RGB and body joints. △ Less

Submitted 30 March, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

arXiv:1910.09795 [pdf, other]

doi 10.1051/0004-6361/201936610

Experimental characterization of modal noise in multimode fibers for astronomical spectrometers

Authors: E. Oliva, M. Rainer, A. Tozzi, N. Sanna, M. Iuzzolino, A. Brucalassi

Abstract: Starting from our puzzling on-sky experience with the GIANO-TNG spectrometer we set up an infrared high resolution spectrometer in our laboratory and used this instrument to characterize the modal noise generated in fibers of different types (circular and octagonal) and sizes. Our experiment includes two conventional scrambling systems for fibers: a mechanical agitator and an optical double scramb… ▽ More Starting from our puzzling on-sky experience with the GIANO-TNG spectrometer we set up an infrared high resolution spectrometer in our laboratory and used this instrument to characterize the modal noise generated in fibers of different types (circular and octagonal) and sizes. Our experiment includes two conventional scrambling systems for fibers: a mechanical agitator and an optical double scrambler. We find that the strength of the modal noise primarily depends on how the fiber is illuminated. It dramatically increases when the fiber is under-illuminated, either in the near field or in the far field. The modal noise is similar in circular and octagonal fibers. The Fourier spectrum of the noise decreases exponentially with frequency; i.e., the modal noise is not white but favors broad spectral features. Using the optical double scrambler has no effect on modal noise. The mechanical agitator has effects that vary between different types of fibers and input illuminations. In some cases this agitator has virtually no effect. In other cases, it mitigates the modal noise, but flattens the noise spectrum in Fourier space; i.e., the mechanical agitator preferentially filters the broad spectral features. Our results show that modal noise is frustratingly insensitive to the use of octagonal fibers and optical double scramblers; i.e., the conventional systems used to improve the performances of spectrographs fed via unevenly illuminated fibers. Fiber agitation may help in some cases, but its effect has to be verified on a case-by-case basis. More generally, our results indicate that the design of the fiber link feeding a spectrograph should be coupled with laboratory measurements that reproduce, as closely as possible, the conditions expected at the telescope △ Less

Submitted 23 October, 2019; v1 submitted 22 October, 2019; originally announced October 2019.

Comments: 7 pages, 6 figures, accepted by Astronomy and Astrophysics

Journal ref: A&A 632, A21 (2019)

arXiv:1906.03504 [pdf, other]

Convolutional Bipartite Attractor Networks

Authors: Michael Iuzzolino, Yoram Singer, Michael C. Mozer

Abstract: In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well matched to an early and often overlooked architecture, the attractor network---a recurrent neural net that performs constraint satisfaction, imputation of missing fea… ▽ More In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well matched to an early and often overlooked architecture, the attractor network---a recurrent neural net that performs constraint satisfaction, imputation of missing features, and clean up of noisy data via energy minimization dynamics. We revisit attractor nets in light of modern deep learning methods and propose a convolutional bipartite architecture with a novel training loss, activation function, and connectivity constraints. We tackle larger problems than have been previously explored with attractor nets and demonstrate their potential for image completion and super-resolution. We argue that this architecture is better motivated than ever-deeper feedforward models and is a viable alternative to more costly sampling-based generative methods on a range of supervised and unsupervised tasks. △ Less

Submitted 26 September, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

arXiv:1901.05599 [pdf, other]

doi 10.1109/IROS.2018.8593883

Virtual-to-Real-World Transfer Learning for Robots on Wilderness Trails

Authors: Michael L. Iuzzolino, Michael E. Walker, Daniel Szafir

Abstract: Robots hold promise in many scenarios involving outdoor use, such as search-and-rescue, wildlife management, and collecting data to improve environment, climate, and weather forecasting. However, autonomous navigation of outdoor trails remains a challenging problem. Recent work has sought to address this issue using deep learning. Although this approach has achieved state-of-the-art results, the d… ▽ More Robots hold promise in many scenarios involving outdoor use, such as search-and-rescue, wildlife management, and collecting data to improve environment, climate, and weather forecasting. However, autonomous navigation of outdoor trails remains a challenging problem. Recent work has sought to address this issue using deep learning. Although this approach has achieved state-of-the-art results, the deep learning paradigm may be limited due to a reliance on large amounts of annotated training data. Collecting and curating training datasets may not be feasible or practical in many situations, especially as trail conditions may change due to seasonal weather variations, storms, and natural erosion. In this paper, we explore an approach to address this issue through virtual-to-real-world transfer learning using a variety of deep learning models trained to classify the direction of a trail in an image. Our approach utilizes synthetic data gathered from virtual environments for model training, bypassing the need to collect a large amount of real images of the outdoors. We validate our approach in three main ways. First, we demonstrate that our models achieve classification accuracies upwards of 95% on our synthetic data set. Next, we utilize our classification models in the control system of a simulated robot to demonstrate feasibility. Finally, we evaluate our models on real-world trail data and demonstrate the potential of virtual-to-real-world transfer learning. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: iROS 2018

Journal ref: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 576-582)

arXiv:1808.03184 [pdf, other]

GIARPS: commissioning and first scientific results

Authors: R. Claudi, S. Benatti, I. Carleo, A. Ghedina, J. Guerra, F. Ghinassi, A. Harutyunyan, G. Micela, E. Molinari, E. Oliva, M. Rainer, A. Tozzi, C. Baffa, A. Baruffolo, V. Biliotti, N. Buchschacher, M. Cecconi, R. Cosentino, G. Falcini, D. Fantinel, L. Fini, E. Giani, E. Gonzalez--Alvarez, M. Gonzalez, C. Gonzalez , et al. (20 additional authors not shown)

Abstract: GIARPS (GIAno \& haRPS) is a project devoted to have on the same focal station of the Telescopio Nazionale Galileo (TNG) both high resolution spectrographs, HARPS-N (VIS) and GIANO-B (NIR), working simultaneously. This could be considered the first and unique worldwide instrument providing cross-dispersed echelle spectroscopy at a resolution of 50,000 in the NIR range and 115,000 in the VIS and ov… ▽ More GIARPS (GIAno \& haRPS) is a project devoted to have on the same focal station of the Telescopio Nazionale Galileo (TNG) both high resolution spectrographs, HARPS-N (VIS) and GIANO-B (NIR), working simultaneously. This could be considered the first and unique worldwide instrument providing cross-dispersed echelle spectroscopy at a resolution of 50,000 in the NIR range and 115,000 in the VIS and over in a wide spectral range ($0.383 - 2.45\ μ$m) in a single exposure. The science case is very broad, given the versatility of such an instrument and its large wavelength range. A number of outstanding science cases encompassing mainly extra-solar planet science starting from rocky planets search and hot Jupiters to atmosphere characterization can be considered. Furthermore both instruments can measure high precision radial velocities by means the simultaneous thorium technique (HARPS-N) and absorbing cell technique (GIANO-B) in a single exposure. Other science cases are also possible. GIARPS, as a brand new observing mode of the TNG started after the moving of GIANO-A (fiber fed spectrograph) from Nasmyth-A to Nasmyth-B where it was re-born as GIANO-B (no more fiber feed spectrograph). The official Commissioning finished on March 2017 and then it was offered to the community. Despite the work is not finished yet. In this paper we describe the preliminary scientific results obtained with GIANO-B and GIARPS observing mode with data taken during commissioning and first open time observations. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: 10 pages, 11 figures, Telescopes and Astronomical instrumentation, SPIE Conf. 2018

arXiv:1805.01281 [pdf, ps, other]

doi 10.1051/0004-6361/201732350

Multi-band high resolution spectroscopy rules out the hot Jupiter BD+20 1790b - First data from the GIARPS Commissioning

Authors: I. Carleo, S. Benatti, A. F. Lanza, R. Gratton, R. Claudi, S. Desidera, G. N. Mace, S. Messina, N. Sanna, E. Sissa, A. Ghedina, F. Ghinassi, J. Guerra, A. Harutyunyan, G. Micela, E. Molinari, E. Oliva, A. Tozzi, C. Baffa, A. Baruffolo, A. Bignamini, N. Buchschacher, M. Cecconi, R. Cosentino, M. Endl , et al. (29 additional authors not shown)

Abstract: Context. Stellar activity is currently challenging the detection of young planets via the radial velocity (RV) technique. Aims. We attempt to definitively discriminate the nature of the RV variations for the young active K5 star BD+20 1790, for which visible (VIS) RV measurements show divergent results on the existence of a substellar companion. Methods. We compare VIS data with high precision RVs… ▽ More Context. Stellar activity is currently challenging the detection of young planets via the radial velocity (RV) technique. Aims. We attempt to definitively discriminate the nature of the RV variations for the young active K5 star BD+20 1790, for which visible (VIS) RV measurements show divergent results on the existence of a substellar companion. Methods. We compare VIS data with high precision RVs in the near infrared (NIR) range by using the GIANO - B and IGRINS spectrographs. In addition, we present for the first time simultaneous VIS-NIR observations obtained with GIARPS (GIANO - B and HARPS - N) at Telescopio Nazionale Galileo (TNG). Orbital RVs are achromatic, so the RV amplitude does not change at different wavelengths, while stellar activity induces wavelength-dependent RV variations, which are significantly reduced in the NIR range with respect to the VIS. Results. The NIR radial velocity measurements from GIANO - B and IGRINS show an average amplitude of about one quarter with respect to previously published VIS data, as expected when the RV jitter is due to stellar activity. Coeval multi-band photometry surprisingly shows larger amplitudes in the NIR range, explainable with a mixture of cool and hot spots in the same active region. Conclusions. In this work, the claimed massive planet around BD+20 1790 is ruled out by our data. We exploited the crucial role of multi- wavelength spectroscopy when observing young active stars: thanks to facilities like GIARPS that provide simultaneous observations, this method can reach its maximum potential. △ Less

Submitted 3 May, 2018; originally announced May 2018.

Comments: 12 pages, 7 figures

Journal ref: A&A 613, A50 (2018)

arXiv:1611.07603 [pdf]

doi 10.1117/12.2231845

GIARPS: the unique VIS-NIR high precision radial velocity facility in this world

Authors: Riccardo Claudi, Serena Benatti, Ilaria Carleo, Adriano Ghedina, Emilio Molinari, Ernesto Oliva, Andrea Tozzi, Andrea Baruffolo, Massimo Cecconi, Rosario Cosentino, Daniela Fantinel, Luca Fini, Francesca Ghinassi, Manuel Gonzalez, Raffaele Gratton, Jose Guerra, Avet Harutyunyan, Nauzet Hernandez, Marcella Iuzzolino, Marcello Lodi, Luca Malavolta, Jesus Maldonado, Giusi Micela, Nicoletta Sanna, Jose Sanjuan , et al. (8 additional authors not shown)

Abstract: GIARPS (GIAno & haRPS) is a project devoted to have on the same focal station of the Telescopio Nazionale Galileo (TNG) both the high resolution spectrographs HARPS-N (VIS) and GIANO (NIR) working simultaneously. This could be considered the first and unique worldwide instrument providing cross-dispersed echelle spectroscopy at a high resolution (R=115,000 in the visual and R=50,000 in the IR) and… ▽ More GIARPS (GIAno & haRPS) is a project devoted to have on the same focal station of the Telescopio Nazionale Galileo (TNG) both the high resolution spectrographs HARPS-N (VIS) and GIANO (NIR) working simultaneously. This could be considered the first and unique worldwide instrument providing cross-dispersed echelle spectroscopy at a high resolution (R=115,000 in the visual and R=50,000 in the IR) and over in a wide spectral range (0.383 - 2.45 micron) in a single exposure. The science case is very broad, given the versatility of such an instrument and the large wavelength range. A number of outstanding science cases encompassing mainly extra-solar planet science starting from rocky planet search and hot Jupiters, atmosphere characterization can be considered. Furthermore both instrument can measure high precision radial velocity by means the simultaneous thorium technique (HARPS - N) and absorbing cell technique (GIANO) in a single exposure. Other science cases are also possible. Young stars and proto-planetary disks, cool stars and stellar populations, moving minor bodies in the solar system, bursting young stellar objects, cataclysmic variables and X-ray binary transients in our Galaxy, supernovae up to gamma-ray bursts in the very distant and young Universe, can take advantage of the unicity of this facility both in terms of contemporaneous wide wavelength range and high resolution spectroscopy. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Comments: 8 pages, 5 figures, SPIE Conference Proceedings

arXiv:1607.03729 [pdf]

doi 10.1117/12.2231866

The new SOXS instrument for the ESO NTT

Authors: P. Schipani, R. Claudi, S. Campana, A. Baruffolo, S. Basa, S. Basso, E. Cappellaro, E. Cascone, R. Cosentino, F. DAlessio, V. De Caprio, M. Della Valle, A. de Ugarte Postigo, S. DOrsi, R. Franzen, J. Fynbo, A. Gal-Yam, D. Gardiol, E. Giro, M. Hamuy, M. Iuzzolino, D. Loreggia, S. Mattila, M. Munari, G. Pignata , et al. (6 additional authors not shown)

Abstract: SOXS (Son Of X-Shooter) will be a unique spectroscopic facility for the ESO-NTT 3.5-m telescope in La Silla (Chile), able to cover the optical/NIR band (350-1750 nm). The design foresees a high-efficiency spectrograph with a resolution-slit product of ~4,500, capable of simultaneously observing the complete spectral range 350 - 1750 nm with a good sensitivity, with light imaging capabilities in th… ▽ More SOXS (Son Of X-Shooter) will be a unique spectroscopic facility for the ESO-NTT 3.5-m telescope in La Silla (Chile), able to cover the optical/NIR band (350-1750 nm). The design foresees a high-efficiency spectrograph with a resolution-slit product of ~4,500, capable of simultaneously observing the complete spectral range 350 - 1750 nm with a good sensitivity, with light imaging capabilities in the visible band. This paper outlines the status of the project. △ Less

Submitted 13 July, 2016; originally announced July 2016.

Comments: 10 pages, submitted to SPIE Astronomical Telescopes & Instrumentation 2016, paper 9908-152

arXiv:1510.06870 [pdf, other]

doi 10.1051/0004-6361/201526649

GIANO-TNG spectroscopy of red supergiants in the young star cluster RSGC3

Authors: L. Origlia, E. Oliva, N. Sanna, A. Mucciarelli, E. Dalessandro, S. Scuderi, C. Baffa, V. Biliotti, L. Carbonaro, G. Falcini, E. Giani, M. Iuzzolino, F. Massi, M. Sozzi, A. Tozzi, A. Ghedina, F. Ghinassi, M. Lodi, A. Harutyunyan, M. Pedani

Abstract: The Scutum complex in the inner disk of the Galaxy has a number of young star clusters dominated by red supergiants that are heavily obscured by dust extinction and observable only at infrared wavelengths. These clusters are important tracers of the recent star formation and chemical enrichment history in the inner Galaxy. During the technical commissioning and as a first science verification of t… ▽ More The Scutum complex in the inner disk of the Galaxy has a number of young star clusters dominated by red supergiants that are heavily obscured by dust extinction and observable only at infrared wavelengths. These clusters are important tracers of the recent star formation and chemical enrichment history in the inner Galaxy. During the technical commissioning and as a first science verification of the GIANO spectrograph at the Telescopio Nazionale Galileo, we secured high-resolution (R=50,000) near-infrared spectra of five red supergiants in the young Scutum cluster RSGC3. Taking advantage of the full YJHK spectral coverage of GIANO in a single exposure, we were able to measure several tens of atomic and molecular lines that were suitable for determining chemical abundances. By means of spectral synthesis and line equivalent width measurements, we obtained abundances of Fe and iron-peak elements such as Ni, Cr, and Cu, alpha (O, Mg, Si, Ca, Ti), other light elements (C, N, F, Na, Al, and Sc), and some s-process elements (Y, Sr). We found average half-solar iron abundances and solar-scaled [X/Fe] abundance patterns for most of the elements, consistent with a thin-disk chemistry. We found depletion of [C/Fe] and enhancement of [N/Fe], consistent with standard CN burning, and low 12C/13C abundance ratios (between 9 and 11), which require extra-mixing processes in the stellar interiors during the post-main sequence evolution. We also found local standard of rest V(LSR)=106 km/s and heliocentric V(HEL)=90 km/s radial velocities with a dispersion of 2.3 km/s. The inferred radial velocities, abundances, and abundance patterns of RSGC3 are very similar to those previously measured in the other two young clusters of the Scutum complex, RSGC1 and RSGC2, suggesting a common kinematics and chemistry within the Scutum complex. △ Less

Submitted 23 October, 2015; originally announced October 2015.

arXiv:1506.09004 [pdf, ps, other]

doi 10.1051/0004-6361/201526291

Lines and continuum sky emission in the near infrared: observational constraints from deep high spectral resolution spectra with GIANO-TNG

Authors: E. Oliva, L. Origlia, S. Scuderi, S. Benatti, I. Carleo, E. Lapenna, A. Mucciarelli, C. Baffa, V. Biliotti, L. Carbonaro, G. Falcini, E. Giani, M. Iuzzolino, F. Massi, N. Sanna, M. Sozzi, A Tozzi, A. Ghedina, F. Ghinassi, M. Lodi, A. Harutyunyan, M. Pedani

Abstract: Aims Determining the intensity of lines and continuum airglow emission in the H-band is important for the design of faint-object infrared spectrographs. Existing spectra at low/medium resolution cannot disentangle the true sky-continuum from instrumental effects (e.g. diffuse light in the wings of strong lines). We aim to obtain, for the first time, a high resolution infrared spectrum deep enough… ▽ More Aims Determining the intensity of lines and continuum airglow emission in the H-band is important for the design of faint-object infrared spectrographs. Existing spectra at low/medium resolution cannot disentangle the true sky-continuum from instrumental effects (e.g. diffuse light in the wings of strong lines). We aim to obtain, for the first time, a high resolution infrared spectrum deep enough to set significant constraints on the continuum emission between the lines in the H-band. Methods During the second commissioning run of the GIANO high-resolution infrared spectrograph at La Palma Observatory, we pointed the instrument directly to the sky and obtained a deep spectrum that extends from 0.97 to 2.4 micron. Results The spectrum shows about 1500 emission lines, a factor of two more than in previous works. Of these, 80% are identified as OH transitions; half of these are from highly excited molecules (hot-OH component) that are not included in the OH airglow emission models normally used for astronomical applications. The other lines are attributable to O2 or unidentified. Several of the faint lines are in spectral regions that were previously believed to be free of line emission. The continuum in the H-band is marginally detected at a level of about 300 photons/m^2/s/arcsec^2/micron, equivalent to 20.1 AB-mag/arcsec^2. The observed spectrum and the list of observed sky-lines are published in electronic format. Conclusions Our measurements indicate that the sky continuum in the H-band could be even darker than previously believed. However, the myriad of airglow emission lines severely limits the spectral ranges where very low background can be effectively achieved with low/medium resolution spectrographs. We identify a few spectral bands that could still remain quite dark at the resolving power foreseen for VLT-MOONS (R ~6,600). △ Less

Submitted 30 June, 2015; originally announced June 2015.

Comments: 7 pages, 4 figures, to be published in Astronomy & Astrophysics

Journal ref: A&A 581, A47 (2015)

arXiv:1407.3126 [pdf]

doi 10.1117/12.2054094

The fiber-fed preslit of GIANO at T.N.G

Authors: A. Tozzi, E. Oliva, L. Origlia, C. Baffa, V. Biliotti, G. Falcini, E. Giani, M. Iuzzolino, F. Massi, N. Sanna, S. Scuderi, M. Sozzi

Abstract: Giano is a Cryogenic Spectrograph located in T.N.G. (Spain) and commisioned in 2013. It works in the range 950-2500 nm with a resolving power of 50000. This instrument was designed and built for direct feeding from the telescope [2]. However, due to constraints imposed on the telescope interfacing during the pre-commissioning phase, it had to be positioned on the rotating building, far from the te… ▽ More Giano is a Cryogenic Spectrograph located in T.N.G. (Spain) and commisioned in 2013. It works in the range 950-2500 nm with a resolving power of 50000. This instrument was designed and built for direct feeding from the telescope [2]. However, due to constraints imposed on the telescope interfacing during the pre-commissioning phase, it had to be positioned on the rotating building, far from the telescope focus. Therefore, a new interface to the telescope, based on IR-transmitting ZBLAN fibers with 85μm core, was developed.Originally designed to work directly at the $f/11$ nasmyth focus of the telescope, in 2011 it has decided to use a fiber to feed it. The beam from the telescope is focused on a double fiber boundle by a Preslit Optical Bench attached to the Nasmith A interface of the telescope. This Optical Bench contains the fiber feeding system and other important features as a guiding system, a fiber viewer, a fiber feed calibration lamp and a nodding facility between the two fibers. The use of two fibers allow us to have in the echellogram two spectrograms side by side in the same acquisition: one of the star and the other of the sky or simultaneously to have the star and a calibration lamp. Before entering the cryostat the light from the fiber is collectd by a second Preslit Optical Bench attached directly to the Giano cryostat: on this bench the correct f-number to illuminate the cold stop is generated and on the same bench is placed an image slicer to increase the efficiency of the system. △ Less

Submitted 11 July, 2014; originally announced July 2014.

Comments: 21 pages, 24 figures, 3 tables. Presented at SPIE Astronomical Telescope + Instrumentation 2014 (Ground-based and Airbone Instrumentation for Astronomy 5, 9147-360). To be published in Proceeding of SPIE Volume 9147

arXiv:1407.3054 [pdf]

doi 10.1117/12.2054425

Updated optical design and trade-off study for MOONS, the Multi-Object Optical and Near Infrared spectrometer for the VLT

Authors: E. Oliva, S. Todd, M. Cirasuolo, H. Schnetler, D. Lunney, P. Rees, A. Bianco, E. Diolaiti, D. Ferruzzi, M. Fisher, I. Guinouard, M. Iuzzolino, I. Parry, X. Sun, A. Tozzi, F. Vitali

Abstract: This paper presents the latest optical design for the MOONS triple-arm spectrographs. MOONS will be a Multi-Object Optical and Near-infrared Spectrograph and will be installed on one of the European Southern Observatory (ESO) Very Large Telescopes (VLT). Included in this paper is a trade-off analysis of different types of collimators, cameras, dichroics and filters. This paper presents the latest optical design for the MOONS triple-arm spectrographs. MOONS will be a Multi-Object Optical and Near-infrared Spectrograph and will be installed on one of the European Southern Observatory (ESO) Very Large Telescopes (VLT). Included in this paper is a trade-off analysis of different types of collimators, cameras, dichroics and filters. △ Less

Submitted 11 July, 2014; originally announced July 2014.

Comments: 10 pages, 8 figures, 5 tables. Presented at SPIE Astronomical Telescope + Instrumentation 2014 (Ground-based and Airbone Instrumentation for Astronomy 5, 9147-84). To be published in Proceeding of SPIE Volume 9147

arXiv:1407.3052 [pdf]

doi 10.1117/12.2055093

Preliminary results on the characterization and performances of ZBLAN fiber for infrared spectrographs

Authors: M. Iuzzolino, A. Tozzi, N. Sanna, L. Zangrilli, E. Oliva

Abstract: Present telescopes and future extremely large telescopes make use of fiber-fed spectrographs to observe at optical and infrared wavelengths. The use of fibers largely simplifies the interfacing of the spectrograph to the telescope. At a high spectral resolution (R>50,000) the fibers can be used to achieve very high spectral accuracy. GIANO is an infrared (0.95-2.5μm) high resolution (R=50,000) spe… ▽ More Present telescopes and future extremely large telescopes make use of fiber-fed spectrographs to observe at optical and infrared wavelengths. The use of fibers largely simplifies the interfacing of the spectrograph to the telescope. At a high spectral resolution (R>50,000) the fibers can be used to achieve very high spectral accuracy. GIANO is an infrared (0.95-2.5μm) high resolution (R=50,000) spectrometer[1] [2] [3] that was recently commissioned at the TNG telescope (La Palma). This instrument was designed and built for direct feeding from the telescope [4]. However, due to constraints imposed on the telescope interfacing during the pre-commissioning phase, it had to be positioned on the rotating building, far from the telescope focus. Therefore, a new interface to the telescope, based on IR-transmitting ZBLAN fibers with 85 μm core, was developed. In this article we report the first, preliminary results of the effects of these fibers on the quality of the recorded spectra with GIANO and with a similar spectrograph that we set-up in the laboratory. The effects can be primarily associated to modal-noise (MN) that, in GIANO, is much more evident than in optical spectrometers, because of the much longer wavelengths. △ Less

Submitted 11 July, 2014; originally announced July 2014.

Comments: 11 pages, 5 figures, 1 table. Presented at SPIE Astronomical Telescope + Instrumentation 2014 (Ground-based and Airbone Instrumentation for Astronomy 5, 9147-231). To be published in Proceeding of SPIE Volume 9147

Showing 1–25 of 25 results for author: Iuzzolino, M