-
Lessons from Defending Gemini Against Indirect Prompt Injections
Authors:
Chongyang Shi,
Sharon Lin,
Shuang Song,
Jamie Hayes,
Ilia Shumailov,
Itay Yona,
Juliette Pluto,
Aneesh Pappu,
Christopher A. Choquette-Choo,
Milad Nasr,
Chawin Sitawarin,
Gena Gibson,
Andreas Terzis,
John "Four" Flynn
Abstract:
Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this re…
▽ More
Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this report, we set out Google DeepMind's approach to evaluating the adversarial robustness of Gemini models and describe the main lessons learned from the process. We test how Gemini performs against a sophisticated adversary through an adversarial evaluation framework, which deploys a suite of adaptive attack techniques to run continuously against past, current, and future versions of Gemini. We describe how these ongoing evaluations directly help make Gemini more resilient against manipulation.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Stair Climbing using the Angular Momentum Linear Inverted Pendulum Model and Model Predictive Control
Authors:
Oluwami Dosunmu-Ogunbi,
Aayushi Shrivastava,
Grant Gibson,
Jessy W Grizzle
Abstract:
A new control paradigm using angular momentum and foot placement as state variables in the linear inverted pendulum model has expanded the realm of possibilities for the control of bipedal robots. This new paradigm, known as the ALIP model, has shown effectiveness in cases where a robot's center of mass height can be assumed to be constant or near constant as well as in cases where there are no no…
▽ More
A new control paradigm using angular momentum and foot placement as state variables in the linear inverted pendulum model has expanded the realm of possibilities for the control of bipedal robots. This new paradigm, known as the ALIP model, has shown effectiveness in cases where a robot's center of mass height can be assumed to be constant or near constant as well as in cases where there are no non-kinematic restrictions on foot placement. Walking up and down stairs violates both of these assumptions, where center of mass height varies significantly within a step and the geometry of the stairs restrict the effectiveness of foot placement. In this paper, we explore a variation of the ALIP model that allows the length of the virtual pendulum formed by the robot's stance foot and center of mass to follow smooth trajectories during a step. We couple this model with a control strategy constructed from a novel combination of virtual constraint-based control and a model predictive control algorithm to stabilize a stair climbing gait that does not soley rely on foot placement. Simulations on a 20-degree of freedom model of the Cassie biped in the SimMechanics simulation environment show that the controller is able to achieve periodic gait.
△ Less
Submitted 10 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Exploring Kinodynamic Fabrics for Reactive Whole-Body Control of Underactuated Humanoid Robots
Authors:
Alphonsus Adu-Bredu,
Grant Gibson,
Jessy W. Grizzle
Abstract:
For bipedal humanoid robots to successfully operate in the real world, they must be competent at simultaneously executing multiple motion tasks while reacting to unforeseen external disturbances in real-time. We propose Kinodynamic Fabrics as an approach for the specification, solution and simultaneous execution of multiple motion tasks in real-time while being reactive to dynamism in the environm…
▽ More
For bipedal humanoid robots to successfully operate in the real world, they must be competent at simultaneously executing multiple motion tasks while reacting to unforeseen external disturbances in real-time. We propose Kinodynamic Fabrics as an approach for the specification, solution and simultaneous execution of multiple motion tasks in real-time while being reactive to dynamism in the environment. Kinodynamic Fabrics allows for the specification of prioritized motion tasks as forced spectral semi-sprays and solves for desired robot joint accelerations at real-time frequencies. We evaluate the capabilities of Kinodynamic fabrics on diverse physically challenging whole-body control tasks with a bipedal humanoid robot both in simulation and in the real-world. Kinodynamic Fabrics outperforms the state-of-the-art Quadratic Program based whole-body controller on a variety of whole-body control tasks on run-time and reactivity metrics in our experiments. Our open-source implementation of Kinodynamic Fabrics as well as robot demonstration videos can be found at this url: https://adubredu.github.io/kinofabs.
△ Less
Submitted 23 August, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
Terrain-Adaptive, ALIP-Based Bipedal Locomotion Controller via Model Predictive Control and Virtual Constraints
Authors:
Grant Gibson,
Oluwami Dosunmu-Ogunbi,
Yukai Gong,
Jessy Grizzle
Abstract:
This paper presents a gait controller for bipedal robots to achieve highly agile walking over various terrains given local slope and friction cone information. Without these considerations, untimely impacts can cause a robot to trip and inadequate tangential reaction forces at the stance foot can cause slippages. We address these challenges by combining, in a novel manner, a model based on an Angu…
▽ More
This paper presents a gait controller for bipedal robots to achieve highly agile walking over various terrains given local slope and friction cone information. Without these considerations, untimely impacts can cause a robot to trip and inadequate tangential reaction forces at the stance foot can cause slippages. We address these challenges by combining, in a novel manner, a model based on an Angular Momentum Linear Inverted Pendulum (ALIP) and a Model Predictive Control (MPC) foot placement planner that is executed by the method of virtual constraints. The process starts with abstracting from the full dynamics of a Cassie 3D bipedal robot, an exact low-dimensional representation of its center of mass dynamics, parameterized by angular momentum. Under a piecewise planar terrain assumption and the elimination of terms for the angular momentum about the robot's center of mass, the centroidal dynamics about the contact point become linear and have dimension four. Importantly, we include the intra-step dynamics at uniformly-spaced intervals in the MPC formulation so that realistic workspace constraints on the robot's evolution can be imposed from step-to-step. The output of the low-dimensional MPC controller is directly implemented on a high-dimensional Cassie robot through the method of virtual constraints. In experiments, we validate the performance of our control strategy for the robot on a variety of surfaces with varied inclinations and textures.
△ Less
Submitted 28 July, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
Machine Learning Applications for Therapeutic Tasks with Genomics Data
Authors:
Kexin Huang,
Cao Xiao,
Lucas M. Glass,
Cathy W. Critchlow,
Greg Gibson,
Jimeng Sun
Abstract:
Thanks to the increasing availability of genomics and other biomedical data, many machine learning approaches have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electron…
▽ More
Thanks to the increasing availability of genomics and other biomedical data, many machine learning approaches have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records (EHR), cellular images, and clinical texts. We identify twenty-two machine learning in genomics applications across the entire therapeutics pipeline, from discovering novel targets, personalized medicine, developing gene-editing tools all the way to clinical trials and post-market studies. We also pinpoint seven important challenges in this field with opportunities for expansion and impact. This survey overviews recent research at the intersection of machine learning, genomics, and therapeutic development.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Priority-based Parameter Propagation for Distributed DNN Training
Authors:
Anand Jayarajan,
Jinliang Wei,
Garth Gibson,
Alexandra Fedorova,
Gennady Pekhimenko
Abstract:
Data parallel training is widely used for scaling distributed deep neural network (DNN) training. However, the performance benefits are often limited by the communication-heavy parameter synchronization step. In this paper, we take advantage of the domain specific knowledge of DNN training and overlap parameter synchronization with computation in order to improve the training performance. We make…
▽ More
Data parallel training is widely used for scaling distributed deep neural network (DNN) training. However, the performance benefits are often limited by the communication-heavy parameter synchronization step. In this paper, we take advantage of the domain specific knowledge of DNN training and overlap parameter synchronization with computation in order to improve the training performance. We make two key observations: (1) the optimal data representation granularity for the communication may differ from that used by the underlying DNN model implementation and (2) different parameters can afford different synchronization delays. Based on these observations, we propose a new synchronization mechanism called Priority-based Parameter Propagation (P3). P3 synchronizes parameters at a finer granularity and schedules data transmission in such a way that the training process incurs minimal communication delay. We show that P3 can improve the training throughput of ResNet-50, Sockeye and VGG-19 by as much as 25%, 38% and 66% respectively on clusters with realistic network bandwidth
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
User interface design for military AR applications
Authors:
Mark A. Livingston,
Zhuming Ai,
Kevin Karsch,
Gregory O. Gibson
Abstract:
Designing a user interface for military situation awareness presents challenges for managing information in a useful and usable manner. We present an integrated set of functions for the presentation of and interaction with information for a mobile augmented reality application for military applications. Our research has concentrated on four areas. We filter information based on relevance to the us…
▽ More
Designing a user interface for military situation awareness presents challenges for managing information in a useful and usable manner. We present an integrated set of functions for the presentation of and interaction with information for a mobile augmented reality application for military applications. Our research has concentrated on four areas. We filter information based on relevance to the user (in turn based on location), evaluate methods for presenting information that represents entities occluded from the user's view, enable interaction through a top-down map view metaphor akin to current techniques used in the military, and facilitate collaboration with other mobile users and/or a command center. In addition, we refined the user interface architecture to conform to requirements from subject matter experts. We discuss the lessons learned in our work and directions for future research.
△ Less
Submitted 20 April, 2019;
originally announced April 2019.
-
MLSys: The New Frontier of Machine Learning Systems
Authors:
Alexander Ratner,
Dan Alistarh,
Gustavo Alonso,
David G. Andersen,
Peter Bailis,
Sarah Bird,
Nicholas Carlini,
Bryan Catanzaro,
Jennifer Chayes,
Eric Chung,
Bill Dally,
Jeff Dean,
Inderjit S. Dhillon,
Alexandros Dimakis,
Pradeep Dubey,
Charles Elkan,
Grigori Fursin,
Gregory R. Ganger,
Lise Getoor,
Phillip B. Gibbons,
Garth A. Gibson,
Joseph E. Gonzalez,
Justin Gottschlich,
Song Han,
Kim Hazelwood
, et al. (44 additional authors not shown)
Abstract:
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne…
▽ More
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.
△ Less
Submitted 1 December, 2019; v1 submitted 29 March, 2019;
originally announced April 2019.
-
Adaptive foveated single-pixel imaging with dynamic super-sampling
Authors:
David B. Phillips,
Ming-Jie Sun,
Jonathan M. Taylor,
Matthew P. Edgar,
Stephen M. Barnett,
Graham G. Gibson,
Miles J. Padgett
Abstract:
As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typic…
▽ More
As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typically exhibit low frame-rates. To mitigate this, a range of compressive sensing techniques have been developed which rely on a priori knowledge of the scene to reconstruct images from an under-sampled set of measurements. In this work we take a different approach and adopt a strategy inspired by the foveated vision systems found in the animal kingdom - a framework that exploits the spatio-temporal redundancy present in many dynamic scenes. In our single-pixel imaging system a high-resolution foveal region follows motion within the scene, but unlike a simple zoom, every frame delivers new spatial information from across the entire field-of-view. Using this approach we demonstrate a four-fold reduction in the time taken to record the detail of rapidly evolving features, whilst simultaneously accumulating detail of more slowly evolving regions over several consecutive frames. This tiered super-sampling technique enables the reconstruction of video streams in which both the resolution and the effective exposure-time spatially vary and adapt dynamically in response to the evolution of the scene. The methods described here can complement existing compressive sensing approaches and may be applied to enhance a variety of computational imagers that rely on sequential correlation measurements.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.
-
High-Performance Distributed ML at Scale through Parameter Server Consistency Models
Authors:
Wei Dai,
Abhimanu Kumar,
Jinliang Wei,
Qirong Ho,
Garth Gibson,
Eric P. Xing
Abstract:
As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in sp…
▽ More
As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in specialized ML implementations. The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML applications into distributed ones, while maintaining high throughput through relaxed "consistency models" that allow inconsistent parameter reads. However, due to insufficient theoretical study, it is not clear which of these consistency models can really ensure correct ML algorithm output; at the same time, there remain many theoretically-motivated but undiscovered opportunities to maximize computational throughput. Motivated by this challenge, we study both the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models. We then use the gleaned insights to improve a consistency model using an "eager" PS communication mechanism, and implement it as a new PS system that enables ML algorithms to reach their solution more quickly.
△ Less
Submitted 29 October, 2014;
originally announced October 2014.
-
Primitives for Dynamic Big Model Parallelism
Authors:
Seunghak Lee,
Jin Kyu Kim,
Xun Zheng,
Qirong Ho,
Garth A. Gibson,
Eric P. Xing
Abstract:
When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A natural recourse is to turn to distributed cluster computing, in order to harness additional memory and processors. However, naive, unstructured parallelization of M…
▽ More
When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A natural recourse is to turn to distributed cluster computing, in order to harness additional memory and processors. However, naive, unstructured parallelization of ML algorithms can make inefficient use of distributed memory, while failing to obtain proportional convergence speedups - or can even result in divergence. We develop a framework of primitives for dynamic model-parallelism, STRADS, in order to explore partitioning and update scheduling of model variables in distributed ML algorithms - thus improving their memory efficiency while presenting new opportunities to speed up convergence without compromising inference correctness. We demonstrate the efficacy of model-parallel algorithms implemented in STRADS versus popular implementations for Topic Modeling, Matrix Factorization and Lasso.
△ Less
Submitted 17 June, 2014;
originally announced June 2014.
-
Structure-Aware Dynamic Scheduler for Parallel Machine Learning
Authors:
Seunghak Lee,
Jin Kyu Kim,
Qirong Ho,
Garth A. Gibson,
Eric P. Xing
Abstract:
Training large machine learning (ML) models with many variables or parameters can take a long time if one employs sequential procedures even with stochastic updates. A natural solution is to turn to distributed computing on a cluster; however, naive, unstructured parallelization of ML algorithms does not usually lead to a proportional speedup and can even result in divergence, because dependencies…
▽ More
Training large machine learning (ML) models with many variables or parameters can take a long time if one employs sequential procedures even with stochastic updates. A natural solution is to turn to distributed computing on a cluster; however, naive, unstructured parallelization of ML algorithms does not usually lead to a proportional speedup and can even result in divergence, because dependencies between model elements can attenuate the computational gains from parallelization and compromise correctness of inference. Recent efforts toward this issue have benefited from exploiting the static, a priori block structures residing in ML algorithms. In this paper, we take this path further by exploring the dynamic block structures and workloads therein present during ML program execution, which offers new opportunities for improving convergence, correctness, and load balancing in distributed ML. We propose and showcase a general-purpose scheduler, STRADS, for coordinating distributed updates in ML algorithms, which harnesses the aforementioned opportunities in a systematic way. We provide theoretical guarantees for our scheduler, and demonstrate its efficacy versus static block structures on Lasso and Matrix Factorization.
△ Less
Submitted 30 December, 2013; v1 submitted 19 December, 2013;
originally announced December 2013.