-
Depth-Constrained ASV Navigation with Deep RL and Limited Sensing
Authors:
Amirhossein Zhalehmehrabi,
Daniele Meli,
Francesco Dal Santo,
Francesco Trotti,
Alessandro Farinelli
Abstract:
Autonomous Surface Vehicles (ASVs) play a crucial role in maritime operations, yet their navigation in shallow-water environments remains challenging due to dynamic disturbances and depth constraints. Traditional navigation strategies struggle with limited sensor information, making safe and efficient operation difficult. In this paper, we propose a reinforcement learning (RL) framework for ASV na…
▽ More
Autonomous Surface Vehicles (ASVs) play a crucial role in maritime operations, yet their navigation in shallow-water environments remains challenging due to dynamic disturbances and depth constraints. Traditional navigation strategies struggle with limited sensor information, making safe and efficient operation difficult. In this paper, we propose a reinforcement learning (RL) framework for ASV navigation under depth constraints, where the vehicle must reach a target while avoiding unsafe areas with only a single depth measurement per timestep from a downward-facing Single Beam Echosounder (SBES). To enhance environmental awareness, we integrate Gaussian Process (GP) regression into the RL framework, enabling the agent to progressively estimate a bathymetric depth map from sparse sonar readings. This approach improves decision-making by providing a richer representation of the environment. Furthermore, we demonstrate effective sim-to-real transfer, ensuring that trained policies generalize well to real-world aquatic conditions. Experimental results validate our method's capability to improve ASV navigation performance while maintaining safety in challenging shallow-water environments.
△ Less
Submitted 2 June, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
From Fake Perfects to Conversational Imperfects: Exploring Image-Generative AI as a Boundary Object for Participatory Design of Public Spaces
Authors:
Jose A. Guridi,
Angel Hsing-Chi Hwang,
Duarte Santo,
Maria Goula,
Cristobal Cheyre,
Lee Humphreys,
Marco Rangel
Abstract:
Designing public spaces requires balancing the interests of diverse stakeholders within a constrained physical and institutional space. Designers usually approach these problems through participatory methods but struggle to incorporate diverse perspectives into design outputs. The growing capabilities of image-generative artificial intelligence (IGAI) could support participatory design. Prior work…
▽ More
Designing public spaces requires balancing the interests of diverse stakeholders within a constrained physical and institutional space. Designers usually approach these problems through participatory methods but struggle to incorporate diverse perspectives into design outputs. The growing capabilities of image-generative artificial intelligence (IGAI) could support participatory design. Prior work in leveraging IGAI's capabilities in design has focused on augmenting the experience and performance of individual creators. We study how IGAI could facilitate participatory processes when designing public spaces, a complex collaborative task. We conducted workshops and IGAI-mediated interviews in a real-world participatory process to upgrade a park in Los Angeles. We found (1) a shift from focusing on accuracy to fostering richer conversations as the desirable outcome of adopting IGAI in participatory design, (2) that IGAI promoted more space-aware conversations, and (3) that IGAI-mediated conversations are subject to the abilities of the facilitators in managing the interaction between themselves, the AI, and stakeholders. We contribute by discussing practical implications for using IGAI in participatory design, including success metrics, relevant skills, and asymmetries between designers and stakeholders. We finish by proposing a series of open research questions.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Optimizing tiny colorless feedback delay networks
Authors:
Gloria Dal Santo,
Karolina Prawda,
Sebastian J. Schlecht,
Vesa Välimäki
Abstract:
A common bane of artificial reverberation algorithms is spectral coloration in the synthesized sound, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. In delay network methods, coloration is more pronounced when fewer delay lines are used. This paper presents an optimization framework in which a tiny differentiable feedback delay network, with as…
▽ More
A common bane of artificial reverberation algorithms is spectral coloration in the synthesized sound, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. In delay network methods, coloration is more pronounced when fewer delay lines are used. This paper presents an optimization framework in which a tiny differentiable feedback delay network, with as few as four delay lines, is used to learn a set of parameters to iteratively reduce coloration. The parameters under optimization include the feedback matrix, as well as the input and output gains. The optimization objective is twofold: to maximize spectral flatness through a spectral loss while maintaining temporal density by penalizing sparseness in the parameter values. A favorable narrow distribution of modal excitation is achieved while maintaining the desired impulse response density. In a subjective assessment, the new method proves effective in reducing perceptual coloration of late reverberation. Compared to the author's previous work, which serves as the baseline and utilizes a sparsity loss in the time domain, the proposed method achieves computational savings while maintaining performance. The effectiveness of this work is demonstrated through two application scenarios where smooth-sounding synthetic room impulse responses are obtained via the introduction of attenuation filters and an optimizable scattering feedback matrix.
△ Less
Submitted 12 March, 2025; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1326 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 9 May, 2025; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Improving CFD simulations by local machine-learned correction
Authors:
Peetak Mitra,
Majid Haghshenas,
Niccolo Dal Santo,
Conor Daly,
David P. Schmidt
Abstract:
High-fidelity computational fluid dynamics (CFD) simulations for design space explorations can be exceedingly expensive due to the cost associated with resolving the finer scales. This computational cost/accuracy trade-off is a major challenge for modern CFD simulations. In the present study, we propose a method that uses a trained machine learning model that has learned to predict the discretizat…
▽ More
High-fidelity computational fluid dynamics (CFD) simulations for design space explorations can be exceedingly expensive due to the cost associated with resolving the finer scales. This computational cost/accuracy trade-off is a major challenge for modern CFD simulations. In the present study, we propose a method that uses a trained machine learning model that has learned to predict the discretization error as a function of largescale flow features to inversely estimate the degree of lost information due to mesh coarsening. This information is then added back to the low-resolution solution during runtime, thereby enhancing the quality of the under-resolved coarse mesh simulation. The use of a coarser mesh produces a non-linear benefit in speed while the cost of inferring and correcting for the lost information has a linear cost. We demonstrate the numerical stability of a problem of engineering interest, a 3D turbulent channel flow. In addition to this demonstration, we further show the potential for speedup without sacrificing solution accuracy using this method, thereby making the cost/accuracy trade-off of CFD more favorable.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks
Authors:
Niccolò Dal Santo,
Simone Deparis,
Luca Pegolotti
Abstract:
We are interested in the approximation of partial differential equations with a data-driven approach based on the reduced basis method and machine learning. We suppose that the phenomenon of interest can be modeled by a parametrized partial differential equation, but that the value of the physical parameters is unknown or difficult to be directly measured. Our method allows to estimate fields of i…
▽ More
We are interested in the approximation of partial differential equations with a data-driven approach based on the reduced basis method and machine learning. We suppose that the phenomenon of interest can be modeled by a parametrized partial differential equation, but that the value of the physical parameters is unknown or difficult to be directly measured. Our method allows to estimate fields of interest, for instance temperature of a sample of material or velocity of a fluid, given data at a handful of points in the domain. We propose to accomplish this task with a neural network embedding a reduced basis solver as exotic activation function in the last layer. The reduced basis solver accounts for the underlying physical phenomenonon and it is constructed from snapshots obtained from randomly selected values of the physical parameters during an expensive offline phase. The same full order solutions are then employed for the training of the neural network. As a matter of fact, the chosen architecture resembles an asymmetric autoencoder in which the decoder is the reduced basis solver and as such it does not contain trainable parameters. The resulting latent space of our autoencoder includes parameter-dependent quantities feeding the reduced basis solver, which -- depending on the considered partial differential equation -- are the values of the physical parameters themselves or the affine decomposition coefficients of the differential operators.
△ Less
Submitted 29 June, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
DELTA: Data Extraction and Logging Tool for Android
Authors:
Mauro Conti,
Elia Dal Santo,
Riccardo Spolaor
Abstract:
In the past few years, the use of smartphones has increased exponentially, and so have the capabilities of such devices. Together with an increase in raw processing power, modern smartphones are equipped with a wide variety of sensors and expose an extensive set of API (Accessible Programming Interface). These capabilities allow us to extract a wide spectrum of data that ranges from information ab…
▽ More
In the past few years, the use of smartphones has increased exponentially, and so have the capabilities of such devices. Together with an increase in raw processing power, modern smartphones are equipped with a wide variety of sensors and expose an extensive set of API (Accessible Programming Interface). These capabilities allow us to extract a wide spectrum of data that ranges from information about the environment (e.g., position, orientation) to user habits (e.g., which apps she uses and when), as well as about the status of the operating system itself (e.g., memory, network adapters). This data can be extremely valuable in many research fields such as user authentication, intrusion detection and detection of information leaks. For these reasons, researchers need to use a solid and reliable logging tool to collect data from mobile devices.
In this paper, we first survey the existing logging tools available on the Android platform, comparing the features offered by different tools and their impact on the system, and highlighting some of their shortcomings. Then, we present DELTA - Data Extraction and Logging Tool for Android, which improves the existing Android logging solutions in terms of flexibility, fine-grained tuning capabilities, extensibility, and available set of logging features. We performed a full implementation of DELTA and we run a thorough evaluation on its performance. The results show that our tool has low impact on the performance of the system, on battery consumption, and on user experience. Finally, we make the DELTA source code and toolset available to the research community.
△ Less
Submitted 9 September, 2016;
originally announced September 2016.