-
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Authors:
Guangtao Zeng,
Maohao Shen,
Delin Chen,
Zhenting Qi,
Subhro Das,
Dan Gutfreund,
David Cox,
Gregory Wornell,
Wei Lu,
Zhang-Wei Hong,
Chuang Gan
Abstract:
Language models (LMs) perform well on standardized coding benchmarks but struggle with real-world software engineering tasks such as resolving GitHub issues in SWE-Bench, especially when model parameters are less than 100B. While smaller models are preferable in practice due to their lower computational cost, improving their performance remains challenging. Existing approaches primarily rely on su…
▽ More
Language models (LMs) perform well on standardized coding benchmarks but struggle with real-world software engineering tasks such as resolving GitHub issues in SWE-Bench, especially when model parameters are less than 100B. While smaller models are preferable in practice due to their lower computational cost, improving their performance remains challenging. Existing approaches primarily rely on supervised fine-tuning (SFT) with high-quality data, which is expensive to curate at scale. An alternative is test-time scaling: generating multiple outputs, scoring them using a verifier, and selecting the best one. Although effective, this strategy often requires excessive sampling and costly scoring, limiting its practical application. We propose Evolutionary Test-Time Scaling (EvoScale), a sample-efficient method that treats generation as an evolutionary process. By iteratively refining outputs via selection and mutation, EvoScale shifts the output distribution toward higher-scoring regions, reducing the number of samples needed to find correct solutions. To reduce the overhead from repeatedly sampling and selection, we train the model to self-evolve using reinforcement learning (RL). Rather than relying on external verifiers at inference time, the model learns to self-improve the scores of its own generations across iterations. Evaluated on SWE-Bench-Verified, EvoScale enables our 32B model, Satori-SWE-32B, to match or exceed the performance of models with over 100B parameters while using a few samples. Code, data, and models will be fully open-sourced.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Inline calibration of spatial light modulators in nonlinear microscopy
Authors:
Daniël W. S. Cox,
Harish Sasikumar,
Ivo M. Vellekoop
Abstract:
We present a method for calibrating the response of a phase-only spatial light modulator in nonlinear microscopy. Our method uses the microscope image itself as calibration measurement and requires no additional hardware components. Our method is adapted to the nonlinear signals encountered in multi-photon excitation fluorescence microscopes, and works well even under low light conditions and with…
▽ More
We present a method for calibrating the response of a phase-only spatial light modulator in nonlinear microscopy. Our method uses the microscope image itself as calibration measurement and requires no additional hardware components. Our method is adapted to the nonlinear signals encountered in multi-photon excitation fluorescence microscopes, and works well even under low light conditions and with strong photobleaching.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Chiral near-field control of quantum light generation using magneto-optical graphene
Authors:
Mikkel Have Eriksen,
Joel D. Cox
Abstract:
We theoretically explore strategies to actively control photon emission from quantum light sources by leveraging the large magneto-optical response of graphene. The quantum electrodynamic response of graphene -- characterized by the Purcell factor and the Lamb shift of a proximal emitter -- is analyzed for extended two-dimensional sheets, one-dimensional nanoribbons, and zero-dimensional nanodisks…
▽ More
We theoretically explore strategies to actively control photon emission from quantum light sources by leveraging the large magneto-optical response of graphene. The quantum electrodynamic response of graphene -- characterized by the Purcell factor and the Lamb shift of a proximal emitter -- is analyzed for extended two-dimensional sheets, one-dimensional nanoribbons, and zero-dimensional nanodisks, all of which are endowed with an intrinsic chiral near-field response under a static perpendicular magnetic field. Using rigorous semianalytical models of these systems, we reveal that the emission properties can be readily tuned by variations in doping charge carrier density and applied magnetic field strength, both with respect to magnetoplasmon resonances (at infrared frequencies) and Shubnikov-de-Haas oscillations (entering telecommunication bands) associated with optical transitions between discrete Landau levels. Localized magnetoplasmons in graphene nanoribbons are predicted to induce large dissymmetry in the spontaneous emission from left-hand and right-hand circularly polarized transitions in a proximal quantum emitter, presenting applications for chiral quantum optical waveguiding. This chiral dissymmetry is further enhanced in gyrotropic graphene nanodisks, signaling that the spatial shaping of near-fields in nanostructured graphene can significantly boost the intrinsic chiral response induced by the magnetic field. These results indicate that magneto-optical graphene constitutes a versatile and highly tunable platform for quantum light generation and manipulation at the nanoscale.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Nonlocal electrodynamics of two-dimensional anisotropic magneto-plasmons
Authors:
A. J. Chaves,
Line Jelver,
D. R. da Costa,
Joel D. Cox,
N. Asger Mortensen,
Nuno M. R. Peres
Abstract:
We present a hydrodynamic model, grounded in Madelung's formalism, to describe collective electronic motion in anisotropic materials. This model incorporates nonlocal contributions from the Thomas-Fermi quantum pressure and quantum effects arising from the Bohm potential. We derive analytical expressions for the magnetoplasmon dispersion and nonlocal optical conductivity. To demonstrate the applic…
▽ More
We present a hydrodynamic model, grounded in Madelung's formalism, to describe collective electronic motion in anisotropic materials. This model incorporates nonlocal contributions from the Thomas-Fermi quantum pressure and quantum effects arising from the Bohm potential. We derive analytical expressions for the magnetoplasmon dispersion and nonlocal optical conductivity. To demonstrate the applicability of the model, we examine electrons in the conduction band of monolayer phosphorene, an exemplary anisotropic two-dimensional electron gas. The dispersion of plasmons derived from our hydrodynamic approach is closely aligned with that predicted by ab~initio calculations. Then, we use our model to analyze few-layer black phosphorus, whose measured infrared optical response is hyperbolic. Our results reveal that the incorporation of nonlocal and quantum effects in the optical conductivity prevents black phosphorus from supporting hyperbolic surface plasmon polaritons. We further demonstrate that the predicted wavefront generated by an electric dipole exhibits a significant difference between the local and nonlocal descriptions for the optical conductivity. This study underscores the necessity of moving beyond local approximations when investigating anisotropic systems capable of hosting strongly confined plasmon-polaritons.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Stow: Robotic Packing of Items into Fabric Pods
Authors:
Nicolas Hudson,
Josh Hooks,
Rahul Warrier,
Curt Salisbury,
Ross Hartley,
Kislay Kumar,
Bhavana Chandrashekhar,
Paul Birkmeyer,
Bosch Tang,
Matt Frost,
Shantanu Thakar,
Tony Piaskowy,
Petter Nilsson,
Josh Petersen,
Neel Doshi,
Alan Slatter,
Ankit Bhatia,
Cassie Meeker,
Yuechuan Xue,
Dylan Cox,
Alex Kyriazis,
Bai Lou,
Nadeem Hasan,
Asif Rana,
Nikhil Chacko
, et al. (12 additional authors not shown)
Abstract:
This paper presents a compliant manipulation system capable of placing items onto densely packed shelves. The wide diversity of items and strict business requirements for high producing rates and low defect generation have prohibited warehouse robotics from performing this task. Our innovations in hardware, perception, decision-making, motion planning, and control have enabled this system to perfo…
▽ More
This paper presents a compliant manipulation system capable of placing items onto densely packed shelves. The wide diversity of items and strict business requirements for high producing rates and low defect generation have prohibited warehouse robotics from performing this task. Our innovations in hardware, perception, decision-making, motion planning, and control have enabled this system to perform over 500,000 stows in a large e-commerce fulfillment center. The system achieves human levels of packing density and speed while prioritizing work on overhead shelves to enhance the safety of humans working alongside the robots.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Activated LoRA: Fine-tuned LLMs for Intrinsics
Authors:
Kristjan Greenewald,
Luis Lastras,
Thomas Parnell,
Vraj Shah,
Lucian Popa,
Giulio Zizzo,
Chulaka Gunasekara,
Ambrish Rawat,
David Cox
Abstract:
Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models, and has become the go-to method for data-driven customization of LLMs. Despite the promise of highly customized behaviors and capabilities, switching between relevant LoRAs in a multiturn setting is inefficient, as the key-value (KV) cache of the entire turn history must be…
▽ More
Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models, and has become the go-to method for data-driven customization of LLMs. Despite the promise of highly customized behaviors and capabilities, switching between relevant LoRAs in a multiturn setting is inefficient, as the key-value (KV) cache of the entire turn history must be recomputed with the LoRA weights before generation can begin. To address this problem, we propose Activated LoRA (aLoRA), an adapter architecture which modifies the LoRA framework to only adapt weights for the tokens in the sequence \emph{after} the aLoRA is invoked. This change crucially allows aLoRA to accept the base model's KV cache of the input string, meaning that aLoRA can be instantly activated whenever needed in a chain without recomputing the cache. This enables building what we call \emph{intrinsics}, i.e. specialized models invoked to perform well-defined operations on portions of an input chain or conversation that otherwise uses the base model by default. We train a set of aLoRA-based intrinsics models, demonstrating competitive accuracy with standard LoRA while achieving significant inference benefits. We include a codebase implementing aLoRA in the supplementary material.
△ Less
Submitted 23 May, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Roadmap for Photonics with 2D Materials
Authors:
F. Javier García de Abajo,
D. N. Basov,
Frank H. L. Koppens,
Lorenzo Orsini,
Matteo Ceccanti,
Sebastián Castilla,
Lorenzo Cavicchi,
Marco Polini,
P. A. D. Gonçalves,
A. T. Costa,
N. M. R. Peres,
N. Asger Mortensen,
Sathwik Bharadwaj,
Zubin Jacob,
P. J. Schuck,
A. N. Pasupathy,
Milan Delor,
M. K. Liu,
Aitor Mugarza,
Pablo Merino,
Marc G. Cuxart,
Emigdio Chávez-Angel,
Martin Svec,
Luiz H. G. Tizei,
Florian Dirnberger
, et al. (123 additional authors not shown)
Abstract:
Triggered by the development of exfoliation and the identification of a wide range of extraordinary physical properties in self-standing films consisting of one or few atomic layers, two-dimensional (2D) materials such as graphene, transition metal dichalcogenides (TMDs), and other van der Waals (vdW) crystals currently constitute a wide research field protruding in multiple directions in combinat…
▽ More
Triggered by the development of exfoliation and the identification of a wide range of extraordinary physical properties in self-standing films consisting of one or few atomic layers, two-dimensional (2D) materials such as graphene, transition metal dichalcogenides (TMDs), and other van der Waals (vdW) crystals currently constitute a wide research field protruding in multiple directions in combination with layer stacking and twisting, nanofabrication, surface-science methods, and integration into nanostructured environments. Photonics encompasses a multidisciplinary collection of those directions, where 2D materials contribute with polaritons of unique characteristics such as strong spatial confinement, large optical-field enhancement, long lifetimes, high sensitivity to external stimuli (e.g., electric and magnetic fields, heating, and strain), a broad spectral range from the far infrared to the ultraviolet, and hybridization with spin and momentum textures of electronic band structures. The explosion of photonics with 2D materials as a vibrant research area is producing breakthroughs, including the discovery and design of new materials and metasurfaces with unprecedented properties as well as applications in integrated photonics, light emission, optical sensing, and exciting prospects for applications in quantum information, and nanoscale thermal transport. This Roadmap summarizes the state of the art in the field, identifies challenges and opportunities, and discusses future goals and how to meet them through a wide collection of topical sections prepared by leading practitioners.
△ Less
Submitted 14 April, 2025; v1 submitted 6 April, 2025;
originally announced April 2025.
-
Roadmap on Nonlocality in Photonic Materials and Metamaterials
Authors:
Francesco Monticone,
N. Asger Mortensen,
Antonio I. Fernández-Domínguez,
Yu Luo,
Xuezhi Zheng,
Christos Tserkezis,
Jacob B. Khurgin,
Tigran V. Shahbazyan,
André J. Chaves,
Nuno M. R. Peres,
Gino Wegner,
Kurt Busch,
Huatian Hu,
Fabio Della Sala,
Pu Zhang,
Cristian Ciracì,
Javier Aizpurua,
Antton Babaze,
Andrei G. Borisov,
Xue-Wen Chen,
Thomas Christensen,
Wei Yan,
Yi Yang,
Ulrich Hohenester,
Lorenz Huber
, et al. (41 additional authors not shown)
Abstract:
Photonic technologies continue to drive the quest for new optical materials with unprecedented responses. A major frontier in this field is the exploration of nonlocal (spatially dispersive) materials, going beyond the local, wavevector-independent assumption traditionally made in optical material modeling. On one end, the growing interest in plasmonic, polaritonic and quantum materials has reveal…
▽ More
Photonic technologies continue to drive the quest for new optical materials with unprecedented responses. A major frontier in this field is the exploration of nonlocal (spatially dispersive) materials, going beyond the local, wavevector-independent assumption traditionally made in optical material modeling. On one end, the growing interest in plasmonic, polaritonic and quantum materials has revealed naturally occurring nonlocalities, emphasizing the need for more accurate models to predict and design their optical responses. This has major implications also for topological, nonreciprocal, and time-varying systems based on these material platforms. Beyond natural materials, artificially structured materials--metamaterials and metasurfaces--can provide even stronger and engineered nonlocal effects, emerging from long-range interactions or multipolar effects. This is a rapidly expanding area in the field of photonic metamaterials, with open frontiers yet to be explored. In the case of metasurfaces, in particular, nonlocality engineering has become a powerful tool for designing strongly wavevector-dependent responses, enabling enhanced wavefront control, spatial compression, multifunctional devices, and wave-based computing. Furthermore, nonlocality and related concepts play a critical role in defining the ultimate limits of what is possible in optics, photonics, and wave physics. This Roadmap aims to survey the most exciting developments in nonlocal photonic materials, highlight new opportunities and open challenges, and chart new pathways that will drive this emerging field forward--toward new scientific discoveries and technological advancements.
△ Less
Submitted 28 March, 2025; v1 submitted 1 March, 2025;
originally announced March 2025.
-
Granite Embedding Models
Authors:
Parul Awasthy,
Aashka Trivedi,
Yulong Li,
Mihaela Bornea,
David Cox,
Abraham Daniels,
Martin Franz,
Gabe Goodhart,
Bhavani Iyer,
Vishwajeet Kumar,
Luis Lastras,
Scott McCarley,
Rudra Murthy,
Vignesh P,
Sara Rosenthal,
Salim Roukos,
Jaydeep Sen,
Sukriti Sharma,
Avirup Sil,
Kate Soule,
Arafat Sultan,
Radu Florian
Abstract:
We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive…
▽ More
We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the Apache 2.0 license, allowing both research and commercial use at https://huggingface.co/collections/ibm-granite.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Authors:
Granite Vision Team,
Leonid Karlinsky,
Assaf Arbelle,
Abraham Daniels,
Ahmed Nassar,
Amit Alfassi,
Bo Wu,
Eli Schwartz,
Dhiraj Joshi,
Jovana Kondic,
Nimrod Shabtay,
Pengyuan Li,
Roei Herzig,
Shafiq Abedin,
Shaked Perek,
Sivan Harary,
Udi Barzelay,
Adi Raz Goldfarb,
Aude Oliva,
Ben Wieles,
Bishwaranjan Bhattacharjee,
Brandon Huang,
Christoph Auer,
Dan Gutfreund,
David Beymer
, et al. (38 additional authors not shown)
Abstract:
We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instruction-following dataset, including document-related tasks, such as content extraction from tables, charts, diagrams, sketches, and infographics, as well as gener…
▽ More
We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instruction-following dataset, including document-related tasks, such as content extraction from tables, charts, diagrams, sketches, and infographics, as well as general image tasks. The architecture of Granite Vision is centered around visual modality alignment with a decoder-only, 2 billion parameter Granite large language model. Additionally, we introduce a dedicated safety classification approach in test-time that leverages a sparse set of attention vectors to identify potential harmful inputs. Despite its lightweight architecture, Granite Vision achieves strong results in standard benchmarks related to visual document understanding, as well as on the LiveXiv benchmark, which is designed to avoid test set contamination by using a constantly updated corpus of recently published Arxiv papers. We are releasing the model under the Apache-2 license, allowing for both research and commercial use, while offering complete visibility into the training data and other relevant details. See https://huggingface.co/ibm-granite/ for model weights.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Authors:
Maohao Shen,
Guangtao Zeng,
Zhenting Qi,
Zhang-Wei Hong,
Zhenfang Chen,
Wei Lu,
Gregory Wornell,
Subhro Das,
David Cox,
Chuang Gan
Abstract:
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. This typically involves extensive sampling at inference time guided by an external LLM verifier, resulting in a two-player system. Despite external guidance, the effectiveness of this system d…
▽ More
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. This typically involves extensive sampling at inference time guided by an external LLM verifier, resulting in a two-player system. Despite external guidance, the effectiveness of this system demonstrates the potential of a single LLM to tackle complex tasks. Thus, we pose a new research problem: Can we internalize the searching capabilities to fundamentally enhance the reasoning abilities of a single LLM? This work explores an orthogonal direction focusing on post-training LLMs for autoregressive searching (i.e., an extended reasoning process with self-reflection and self-exploration of new strategies). To achieve this, we propose the Chain-of-Action-Thought (COAT) reasoning and a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning. Our approach results in Satori, a 7B LLM trained on open-source models and data. Extensive empirical evaluations demonstrate that Satori achieves state-of-the-art performance on mathematical reasoning benchmarks while exhibits strong generalization to out-of-domain tasks. Code, data, and models are fully open-sourced.
△ Less
Submitted 2 June, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Roadmap on Atomic-scale Semiconductor Devices
Authors:
Steven R. Schofield,
Andrew J. Fisher,
Eran Ginossar,
Joseph W. Lyding,
Richard Silver,
Fan Fei,
Pradeep Namboodiri,
Jonathan Wyrick,
M. G. Masteghin,
D. C. Cox,
B. N. Murdin,
S. K Clowes,
Joris G. Keizer,
Michelle Y. Simmons,
Holly G. Stemp,
Andrea Morello,
Benoit Voisin,
Sven Rogge,
Robert A. Wolkow,
Lucian Livadaru,
Jason Pitters,
Taylor J. Z. Stock,
Neil J. Curson,
Robert E. Butera,
Tatiana V. Pavlova
, et al. (25 additional authors not shown)
Abstract:
Spin states in semiconductors provide exceptionally stable and noise-resistant environments for qubits, positioning them as optimal candidates for reliable quantum computing technologies. The proposal to use nuclear and electronic spins of donor atoms in silicon, introduced by Kane in 1998, sparked a new research field focused on the precise positioning of individual impurity atoms for quantum dev…
▽ More
Spin states in semiconductors provide exceptionally stable and noise-resistant environments for qubits, positioning them as optimal candidates for reliable quantum computing technologies. The proposal to use nuclear and electronic spins of donor atoms in silicon, introduced by Kane in 1998, sparked a new research field focused on the precise positioning of individual impurity atoms for quantum devices, utilising scanning tunnelling microscopy and ion implantation. This roadmap article reviews the advancements in the 25 years since Kane's proposal, the current challenges, and the future directions in atomic-scale semiconductor device fabrication and measurement. It covers the quest to create a silicon-based quantum computer and expands to include diverse material systems and fabrication techniques, highlighting the potential for a broad range of semiconductor quantum technological applications. Key developments include phosphorus in silicon devices such as single-atom transistors, arrayed few-donor devices, one- and two-qubit gates, three-dimensional architectures, and the development of a toolbox for future quantum integrated circuits. The roadmap also explores new impurity species like arsenic and antimony for enhanced scalability and higher-dimensional spin systems, new chemistry for dopant precursors and lithographic resists, and the potential for germanium-based devices. Emerging methods, such as photon-based lithography and electron beam manipulation, are discussed for their disruptive potential. This roadmap charts the path toward scalable quantum computing and advanced semiconductor quantum technologies, emphasising the critical intersections of experiment, technological development, and theory.
△ Less
Submitted 22 January, 2025; v1 submitted 8 January, 2025;
originally announced January 2025.
-
Graph Burning On Large $p$-Caterpillars
Authors:
Danielle Cox,
M. E. Messinger,
Kerry Ojakian
Abstract:
Graph burning models the spread of information or contagion in a graph. At each time step, two events occur: neighbours of already burned vertices become burned, and a new vertex is chosen to be burned. The big conjecture is known as the {\it burning number conjecture}: for any connected graph on $n$ vertices, all $n$ vertices can be burned after at most $\lceil \sqrt{n}\ \rceil$ time steps. It is…
▽ More
Graph burning models the spread of information or contagion in a graph. At each time step, two events occur: neighbours of already burned vertices become burned, and a new vertex is chosen to be burned. The big conjecture is known as the {\it burning number conjecture}: for any connected graph on $n$ vertices, all $n$ vertices can be burned after at most $\lceil \sqrt{n}\ \rceil$ time steps. It is well-known that to prove the conjecture, it suffices to prove it for trees. We prove the conjecture for sufficiently large $p$-caterpillars.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Chiral Light-Matter Interactions with Thermal Magnetoplasmons in Graphene Nanodisks
Authors:
Mikkel Have Eriksen,
Juan R. Deop-Ruano,
Joel D. Cox,
Alejandro Manjavacas
Abstract:
We investigate the emergence of self-hybridized thermal magnetoplasmons in doped graphene nanodisks at finite temperatures when subjected to an external magnetic field. Using a semianalytical approach, which fully describes the eigenmodes and polarizability of the graphene nanodisks, we show that the hybridization originates from the coupling of transitions between thermally populated Landau level…
▽ More
We investigate the emergence of self-hybridized thermal magnetoplasmons in doped graphene nanodisks at finite temperatures when subjected to an external magnetic field. Using a semianalytical approach, which fully describes the eigenmodes and polarizability of the graphene nanodisks, we show that the hybridization originates from the coupling of transitions between thermally populated Landau levels and localized magnetoplasmon resonances of the nanodisks. Owing to their origin, these modes combine the extraordinary magneto-optical response of graphene with the strong field enhancement of plasmons, making them an ideal tool for achieving strong chiral light-matter interactions, with the additional advantage of being tunable through carrier concentration, magnetic field, and temperature. As a demonstration of their capabilities, we show that the thermal magnetoplasmons supported by an array of graphene nanodisks enable chiral perfect absorption and chiral thermal emission.
△ Less
Submitted 30 December, 2024; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Notes on Three Formulas of Abel
Authors:
David A. Cox
Abstract:
These notes explore three amazing formulas proved by Abel in his 1826 Paris memoir on what we now call Abelian integrals. We discuss the first two formulas from the point of view of symbolic computation and explain their connection to residues and partial fractions. The third formula arises from the first two and is related to the genus and lattice points in the Newton polygon.
These notes explore three amazing formulas proved by Abel in his 1826 Paris memoir on what we now call Abelian integrals. We discuss the first two formulas from the point of view of symbolic computation and explain their connection to residues and partial fractions. The third formula arises from the first two and is related to the genus and lattice points in the Newton polygon.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Orthonormalization of phase-only basis functions
Authors:
Daniël W. S. Cox,
Ivo M. Vellekoop
Abstract:
Orthonormal bases serve as a powerful mathematical tool in theoretical and experimental optics. However, producing arbitrary optical fields in real-world experiments is limited by the hardware, which in many cases involves a phase-only spatial light modulator. Since most basis functions also have a varying amplitude component, they cannot be represented truthfully. We present a general method to c…
▽ More
Orthonormal bases serve as a powerful mathematical tool in theoretical and experimental optics. However, producing arbitrary optical fields in real-world experiments is limited by the hardware, which in many cases involves a phase-only spatial light modulator. Since most basis functions also have a varying amplitude component, they cannot be represented truthfully. We present a general method to construct an orthonormal phase-only basis, optionally possessing desirable properties like smoothness and symmetry. We demonstrate the practical benefit of our approach in a wavefront shaping experiment, achieving a factor 1.5 increase in performance over a non-orthonormal phase only basis.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Nonlinear thermoplasmonics in graphene nanostructures
Authors:
Line Jelver,
Joel D. Cox
Abstract:
The linear electronic dispersion relation of graphene endows the atomically thin carbon layer with a large intrinsic optical nonlinearity, with regard to both parametric and photothermal processes. While plasmons in graphene nanostructures can further enhance nonlinear optical phenomena, boosting resonances to the technologically relevant mid- and near-infrared (IR) spectral regime necessitates pa…
▽ More
The linear electronic dispersion relation of graphene endows the atomically thin carbon layer with a large intrinsic optical nonlinearity, with regard to both parametric and photothermal processes. While plasmons in graphene nanostructures can further enhance nonlinear optical phenomena, boosting resonances to the technologically relevant mid- and near-infrared (IR) spectral regime necessitates patterning on $\sim10$ nm length scales, for which quantum finite-size effects play a crucial role. Here we show that thermoplasmons in narrow graphene nanoribbons can be activated at mid- and near-IR frequencies with moderate absorbed energy density, and furthermore can drive substantial third-harmonic generation and optical Kerr nonlinearities. Our findings suggest that photothermal excitation by ultrashort optical pulses offers a promising approach to enable nonlinear plasmonic phenomena in nanostructured graphene that avoids potentially invasive electrical gating schemes and excessive charge carrier doping levels.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Authors:
Yikang Shen,
Matthew Stallone,
Mayank Mishra,
Gaoyuan Zhang,
Shawn Tan,
Aditya Prasad,
Adriana Meza Soria,
David D. Cox,
Rameswar Panda
Abstract:
Finding the optimal learning rate for language model pretraining is a challenging task. This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters. Re…
▽ More
Finding the optimal learning rate for language model pretraining is a challenging task. This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters. Recent studies propose using small proxy models and small corpus to perform hyperparameter searches and transposing the optimal parameters to large models and large corpus. While the zero-shot transferability is theoretically and empirically proven for model size related hyperparameters, like depth and width, the zero-shot transfer from small corpus to large corpus is underexplored. In this paper, we study the correlation between optimal learning rate, batch size, and number of training tokens for the recently proposed WSD scheduler. After thousands of small experiments, we found a power-law relationship between variables and demonstrated its transferability across model sizes. Based on the observation, we propose a new learning rate scheduler, Power scheduler, that is agnostic about the number of training tokens and batch size. The experiment shows that combining the Power scheduler with Maximum Update Parameterization (muP) can consistently achieve impressive performance with one set of hyperparameters regardless of the number of training tokens, batch size, model size, and even model architecture. Our 3B dense and MoE models trained with the Power scheduler achieve comparable performance as state-of-the-art small language models. We open-source these pretrained models at https://ibm.biz/BdKhLa.
△ Less
Submitted 11 September, 2024; v1 submitted 23 August, 2024;
originally announced August 2024.
-
Scaling Granite Code Models to 128K Context
Authors:
Matt Stallone,
Vaibhav Saxena,
Leonid Karlinsky,
Bridget McGinn,
Tim Bula,
Mayank Mishra,
Adriana Meza Soria,
Gaoyuan Zhang,
Aditya Prasad,
Yikang Shen,
Saptha Surendran,
Shanmukha Guttula,
Hima Patel,
Parameswaran Selvam,
Xuan-Hong Dang,
Yan Koyfman,
Atin Sood,
Rogerio Feris,
Nirmit Desai,
David D. Cox,
Ruchir Puri,
Rameswar Panda
Abstract:
This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also re…
▽ More
This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
The infrastructure powering IBM's Gen AI model development
Authors:
Talia Gershon,
Seetharami Seelam,
Brian Belgodere,
Milton Bonilla,
Lan Hoang,
Danny Barnett,
I-Hsin Chung,
Apoorve Mohan,
Ming-Hung Chen,
Lixiang Luo,
Robert Walkup,
Constantinos Evangelinos,
Shweta Salaria,
Marc Dombrowa,
Yoonho Park,
Apo Kayi,
Liran Schour,
Alim Alim,
Ali Sydney,
Pavlos Maniotis,
Laurent Schares,
Bernard Metzler,
Bengi Karacali-Akyamac,
Sophia Wen,
Tatsuhiro Chiba
, et al. (122 additional authors not shown)
Abstract:
AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi…
▽ More
AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings.
△ Less
Submitted 13 January, 2025; v1 submitted 7 July, 2024;
originally announced July 2024.
-
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Authors:
Ibrahim Abdelaziz,
Kinjal Basu,
Mayank Agarwal,
Sadhana Kumaravel,
Matthew Stallone,
Rameswar Panda,
Yara Rizk,
GP Bhargav,
Maxwell Crouse,
Chulaka Gunasekara,
Shajith Ikbal,
Sachin Joshi,
Hima Karanam,
Vineet Kumar,
Asim Munawar,
Sumit Neelam,
Dinesh Raghu,
Udit Sharma,
Adriana Meza Soria,
Dheeraj Sreedhar,
Praveen Venkateswaran,
Merve Unuvar,
David Cox,
Salim Roukos,
Luis Lastras
, et al. (1 additional authors not shown)
Abstract:
Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (AP…
▽ More
Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Authors:
Junmo Kang,
Leonid Karlinsky,
Hongyin Luo,
Zhen Wang,
Jacob Hansen,
James Glass,
David Cox,
Rameswar Panda,
Rogerio Feris,
Alan Ritter
Abstract:
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimize…
▽ More
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
△ Less
Submitted 7 October, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Wavefront Threading Enables Effective High-Level Synthesis
Authors:
Blake Pelton,
Adam Sapek,
Ken Eguro,
Daniel Lo,
Alessandro Forin,
Matt Humphrey,
Jinwen Xi,
David Cox,
Rajas Karandikar,
Johannes de Fine Licht,
Evgeny Babin,
Adrian Caulfield,
Doug Burger
Abstract:
Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware…
▽ More
Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware designs. This paper describes Kanagawa, a language that takes a new approach to combine the programmer productivity benefits of traditional High-Level Synthesis (HLS) approaches with the expressibility and hardware efficiency of Register-Transfer Level (RTL) design. The language's concise syntax, matched with a hardware design-friendly execution model, permits a relatively simple toolchain to map high-level code into efficient hardware implementations.
△ Less
Submitted 10 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning
Authors:
Runqian Wang,
Soumya Ghosh,
David Cox,
Diego Antognini,
Aude Oliva,
Rogerio Feris,
Leonid Karlinsky
Abstract:
Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modul…
▽ More
Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modules need to be re-trained. Such re-training requires access to the data used to train the LoRA for the original base model. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers who may not be allowed to host proprietary client task data. To address this challenge, we propose $\textit{Trans-LoRA}$ -- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the $\textit{observed}$ task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Authors:
Mayank Mishra,
Matt Stallone,
Gaoyuan Zhang,
Yikang Shen,
Aditya Prasad,
Adriana Meza Soria,
Michele Merler,
Parameswaran Selvam,
Saptha Surendran,
Shivdeep Singh,
Manish Sethi,
Xuan-Hong Dang,
Pengyuan Li,
Kun-Lung Wu,
Syed Zawad,
Andrew Coleman,
Matthew White,
Mark Lewis,
Raju Pavuluri,
Yan Koyfman,
Boris Lublinsky,
Maximilien de Bayser,
Ibrahim Abdelaziz,
Kinjal Basu,
Mayank Agarwal
, et al. (21 additional authors not shown)
Abstract:
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili…
▽ More
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Practical considerations for high-fidelity wavefront shaping experiments
Authors:
Bahareh Mastiani,
Daniël W. S. Cox,
Ivo M. Vellekoop
Abstract:
Wavefront shaping is a technique for directing light through turbid media. The theoretical aspects of wavefront shaping are well understood, and under near-ideal experimental conditions, accurate predictions for the expected signal enhancement can be given. In practice, however, there are many experimental factors that negatively affect the outcome of the experiment. Here, we present a comprehensi…
▽ More
Wavefront shaping is a technique for directing light through turbid media. The theoretical aspects of wavefront shaping are well understood, and under near-ideal experimental conditions, accurate predictions for the expected signal enhancement can be given. In practice, however, there are many experimental factors that negatively affect the outcome of the experiment. Here, we present a comprehensive overview of these experimental factors, including the effect of sample scattering properties, noise, and response of the spatial light modulator. We present simple means to identify experimental imperfections and to minimize their negative effect on the outcome of the experiment. This paper is accompanied by Python code for automatically quantifying experimental problems using the OpenWFS framework for running and simulating wavefront shaping experiments.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
LAB: Large-Scale Alignment for ChatBots
Authors:
Shivchander Sudalairaj,
Abhishek Bhandwaldar,
Aldo Pareja,
Kai Xu,
David D. Cox,
Akash Srivastava
Abstract:
This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-…
▽ More
This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-4. We demonstrate that LAB-trained models can achieve competitive performance across several benchmarks compared to models trained with traditional human-annotated or GPT-4 generated synthetic data. Thus offering a scalable, cost-effective solution for enhancing LLM capabilities and instruction-following behaviors without the drawbacks of catastrophic forgetting, marking a step forward in the efficient training of LLMs for a wide range of applications.
△ Less
Submitted 29 April, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Generation of entangled waveguided photon pairs by free electrons
Authors:
Theis P. Rasmussen,
Álvaro Rodríguez Echarri,
Joel D. Cox,
F. Javier García de Abajo
Abstract:
Entangled photon pairs are a key resource in future quantum-optical communication and information technologies. While high-power laser light propagating in bulk nonlinear optical crystals is conventionally used to generate entangled photons that are routed into optical configurations, such schemes suffer from low efficiency due to the weak intrinsic nonlinear optical response of known materials an…
▽ More
Entangled photon pairs are a key resource in future quantum-optical communication and information technologies. While high-power laser light propagating in bulk nonlinear optical crystals is conventionally used to generate entangled photons that are routed into optical configurations, such schemes suffer from low efficiency due to the weak intrinsic nonlinear optical response of known materials and losses associated with photon in- and out-coupling. Here, we propose a scheme to generate entangled polariton pairs directly within optical waveguides using free electrons, whereby the measured energy loss of undeflected electrons heralds the production of counter-propagating polaritons pairs that are entangled in energy and direction of emission. As a paradigmatic example, we study the excitation of plasmon polaritons in metal strip waveguides that, within specific frequency regimes, strongly enhance light-matter interactions that lead to two-plasmon generation in comparison to the probability of single-plasmon excitation. We demonstrate that, under appropriate conditions, an electron energy loss detected in an optimal frequency range can reliably signal the generation of a plasmon pair entangled in energy and momentum. Our proposed scheme can be directly applied to other types of optical waveguides for in situ generation of entangled photon pairs in quantum-optics applications.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Model-based aberration corrected microscopy inside a glass tube
Authors:
D. W. S. Cox,
T. Knop,
I. M. Vellekoop
Abstract:
Microscope objectives achieve near diffraction-limited performance only when used under the conditions they are designed for. In non-standard geometries, such as thick cover slips or curved surfaces, severe aberrations arise, inevitably impairing high-resolution imaging. Correcting such large aberrations using standard adaptive optics can be challenging: existing solutions are either not suited fo…
▽ More
Microscope objectives achieve near diffraction-limited performance only when used under the conditions they are designed for. In non-standard geometries, such as thick cover slips or curved surfaces, severe aberrations arise, inevitably impairing high-resolution imaging. Correcting such large aberrations using standard adaptive optics can be challenging: existing solutions are either not suited for strong aberrations, or require extensive feedback measurements, consequently taking a significant portion of the photon budget. We demonstrate that it is possible to pre-compute the corrections needed for high-resolution imaging inside a glass tube based on a priori information only. Our ray-tracing based method achieved over an order of magnitude increase in image contrast without the need for a feedback signal.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Audio-Visual Neural Syntax Acquisition
Authors:
Cheng-I Jeff Lai,
Freda Shi,
Puyuan Peng,
Yoon Kim,
Kevin Gimpel,
Shiyu Chang,
Yung-Sung Chuang,
Saurabhchand Bhati,
David Cox,
David Harwath,
Yang Zhang,
Karen Livescu,
James Glass
Abstract:
We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve…
▽ More
We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without ever being exposed to text. By training on paired images and spoken captions, AV-NSL exhibits the capability to infer meaningful phrase structures that are comparable to those derived by naturally-supervised text parsers, for both English and German. Our findings extend prior work in unsupervised language acquisition from speech and grounded grammar induction, and present one approach to bridge the gap between the two topics.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
SALMON: Self-Alignment with Instructable Reward Models
Authors:
Zhiqing Sun,
Yikang Shen,
Hongxin Zhang,
Qinhong Zhou,
Zhenfang Chen,
David Cox,
Yiming Yang,
Chuang Gan
Abstract:
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents. However, a significant limitation of such an approach is its dependency on high-quality human annotations, making its application to intricate tasks challenging due to difficulties in obtaining consistent response…
▽ More
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents. However, a significant limitation of such an approach is its dependency on high-quality human annotations, making its application to intricate tasks challenging due to difficulties in obtaining consistent response demonstrations and in-distribution response preferences. This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision, using only a small set of human-defined principles, yet achieving superior performance. Central to our approach is an instructable reward model. Trained on synthetic preference data, this model can generate reward scores based on arbitrary human-defined principles. By merely adjusting these principles during the RL training phase, we gain full control over the preferences with the instructable reward model, subsequently influencing the behavior of the RL-trained policy models, and reducing the reliance on the collection of online human preferences. Applying our method to the LLaMA-2-70b base language model, we developed an AI assistant named Dromedary-2. With only 6 exemplars for in-context learning and 31 human-defined principles, Dromedary-2 significantly surpasses the performance of several state-of-the-art AI systems, including LLaMA-2-Chat-70b, on various benchmark datasets. We have open-sourced the code and model weights to encourage further research into aligning LLM-based AI agents with enhanced supervision efficiency, improved controllability, and scalable oversight.
△ Less
Submitted 9 April, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Detection Sensitivity Limit of Hundreds of Atoms with X-Ray Fluorescence Microscopy
Authors:
Mateus G. Masteghin,
Toussaint Gervais,
Steven K. Clowes,
David C. Cox,
Veronika Zelyk,
Ajith Pattammattel,
Yong S. Chu,
Nikola Kolev,
Taylor Z. Stock,
Neil Curson,
Paul G. Evans,
Michael Stuckelberger,
Benedict N. Murdin
Abstract:
We report X-ray fluorescence (XRF) imaging of nanoscale inclusions of impurities for quantum technology. A very bright diffraction-limited focus of the X-ray beam produces very high sensitivity and resolution. We investigated gallium (Ga) dopants in silicon (Si) produced by a focused ion beam (FIB). These dopants might provide 3/2-spin qubits or p-type electrical contacts and quantum dots. We find…
▽ More
We report X-ray fluorescence (XRF) imaging of nanoscale inclusions of impurities for quantum technology. A very bright diffraction-limited focus of the X-ray beam produces very high sensitivity and resolution. We investigated gallium (Ga) dopants in silicon (Si) produced by a focused ion beam (FIB). These dopants might provide 3/2-spin qubits or p-type electrical contacts and quantum dots. We find that the ion beam spot is somewhat larger than expected, and the technique provides a useful calibration for the resolution of FIBs. Enticingly, we demonstrate that with a single shot detection of 1 second integration time, the sensitivity of the XRF would be sufficient to find amongst background a single isolated inclusion of unknown location comprising only 3000 Ga impurities (a mass of just 350 zg) without any need for specialized nm-thickness lamellae, and down from >105 atoms in previous reports of similar work. With increased integration we were able to detect 650 impurities. The results show that planned facility upgrades might achieve single atom sensitivity with a generally applicable, non-destructive technique in the near future.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Self-Specialization: Uncovering Latent Expertise within Large Language Models
Authors:
Junmo Kang,
Hongyin Luo,
Yada Zhu,
Jacob Hansen,
James Glass,
David Cox,
Alan Ritter,
Rogerio Feris,
Leonid Karlinsky
Abstract:
Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we…
▽ More
Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we quantitively show the marginal effect that generic instruction-following training has on downstream expert domains' performance. To remedy this, we propose self-specialization - allowing for effective model specialization while achieving cross-task generalization by leveraging only a few labeled seeds. Self-specialization offers a data- and parameter-efficient way of "carving out" an expert model out of a generalist pre-trained LLM. Exploring a variety of popular open large models as a base for specialization, our experimental results in both biomedical and financial domains show that our self-specialized models outperform their base models by a large margin, and even larger models that are generally instruction-tuned or that have been adapted to the target domain by other means.
△ Less
Submitted 5 June, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Nonlocal effects in atom-plasmon interactions
Authors:
Mikkel Have Eriksen,
Christos Tserkezis,
N. Asger Mortensen,
Joel D. Cox
Abstract:
Nonlocal and quantum mechanical phenomena in noble metal nanostructures become increasingly crucial when the relevant length scales in hybrid nanostructures reach the few-nanometer regime. In practice, such mesoscopic effects at metal-dielectric interfaces can be described using exemplary surface-response functions (SRFs) embodied by the Feibelman $d$-parameters. Here we show that SRFs dramaticall…
▽ More
Nonlocal and quantum mechanical phenomena in noble metal nanostructures become increasingly crucial when the relevant length scales in hybrid nanostructures reach the few-nanometer regime. In practice, such mesoscopic effects at metal-dielectric interfaces can be described using exemplary surface-response functions (SRFs) embodied by the Feibelman $d$-parameters. Here we show that SRFs dramatically influence quantum electrodynamic phenomena -- such as the Purcell enhancement and Lamb shift -- for quantum emitters close to a diverse range of noble metal nanostructures interfacing different homogeneous media. Dielectric environments with higher permittivities are shown to increase the magnitude of SRFs calculated within the specular-reflection model. In parallel, the role of SRFs is enhanced in nanostructures characterized by large surface-to-volume ratios, such as thin planar metallic films or shells of core-shell nanoparticles. By investigating emitter quantum dynamics close to such plasmonic architectures, we show that decreasing the width of the metal region, or increasing the permittivity of the interfacing dielectric, leads to a significant change in the Purcell enhancement, Lamb shift, and visible far-field spontaneous emission spectrum, as an immediate consequence of SRFs. We anticipate that fitting the theoretically modelled spectra to experiments could allow for experimental determination of the $d$-parameters.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Quantum-mechanical effects in photoluminescence from thin crystalline gold films
Authors:
Alan R. Bowman,
Álvaro Rodríguez Echarri,
Fatemeh Kiani,
Fadil Iyikanat,
Ted V. Tsoulos,
Joel D. Cox,
Ravishankar Sundararaman,
F. Javier García de Abajo,
Giulia Tagliabue
Abstract:
Luminescence constitutes a unique source of insight into hot carrier processes in metals, including those in plasmonic nanostructures used for sensing and energy applications. However, being weak in nature, metal luminescence remains poorly understood, its microscopic origin strongly debated, and its potential for unravelling nanoscale carrier dynamics largely unexploited. Here, we reveal quantum-…
▽ More
Luminescence constitutes a unique source of insight into hot carrier processes in metals, including those in plasmonic nanostructures used for sensing and energy applications. However, being weak in nature, metal luminescence remains poorly understood, its microscopic origin strongly debated, and its potential for unravelling nanoscale carrier dynamics largely unexploited. Here, we reveal quantum-mechanical effects emanating in the luminescence from thin monocrystalline gold flakes. Specifically, we present experimental evidence, supported by first-principles simulations, to demonstrate its photoluminescence origin when exciting in the interband regime. Our model allows us to identify changes to the measured gold luminescence due to quantum-mechanical effects as the gold film thickness is reduced. Excitingly, such effects are observable in the luminescence signal from flakes up to 40 nm in thickness, associated with the out-of-plane discreteness of the electronic band structure near the Fermi level. We qualitatively reproduce the observations with first-principles modelling, thus establishing a unified description of luminescence in gold and enabling its widespread application as a probe of carrier dynamics and light-matter interactions in this material. Our study paves the way for future explorations of hot-carriers and charge-transfer dynamics in a multitude of material systems.
△ Less
Submitted 25 September, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Nuclear Level Density and $γ$-ray Strength Function of $^{67}\mathrm{Ni}$ and the impact on the i-process
Authors:
V. W. Ingeberg,
S. Siem,
M. Wiedeking,
A. Choplin,
S. Goriely,
L. Siess,
K. J. Abrahams,
K. Arnswald,
F. Bello Garrote,
D. L. Bleuel,
J. Cederkäll,
T. L. Christoffersen,
D. M. Cox,
H. De Witte,
L. P. Gaffney,
A. Görgen,
C. Henrich,
A. Illana,
P. Jones,
B. V. Kheswa,
T. Kröll,
S. N. T. Majola,
K. L. Malatji,
J. Ojala,
J. Pakarinen
, et al. (7 additional authors not shown)
Abstract:
Proton-$γ$ coincidences from $(\mathrm{d},\mathrm{p})$ reactions between a $^{66}\mathrm{Ni}$ beam and a deuterated polyethylene target have been analyzed with the inverse-Oslo method to find the nuclear level density (NLD) and $γ$-ray strength function ($γ$SF) of $^{67}\mathrm{Ni}$. The $^{66}\mathrm{Ni}(n,γ)$ capture cross section has been calculated using the Hauser-Feshbach model in TALYS usin…
▽ More
Proton-$γ$ coincidences from $(\mathrm{d},\mathrm{p})$ reactions between a $^{66}\mathrm{Ni}$ beam and a deuterated polyethylene target have been analyzed with the inverse-Oslo method to find the nuclear level density (NLD) and $γ$-ray strength function ($γ$SF) of $^{67}\mathrm{Ni}$. The $^{66}\mathrm{Ni}(n,γ)$ capture cross section has been calculated using the Hauser-Feshbach model in TALYS using the measured NLD and $γ$SF as constraints. The results confirm that the $^{66}\mathrm{Ni}(n,γ)$ reaction acts as a bottleneck when relying on one-zone nucleosynthesis calculations. However, the impact of this reaction is strongly dampened in multi-zone models of low-metallicity AGB stars experiencing i-process nucleosynthesis.
△ Less
Submitted 14 November, 2024; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Simultaneous $γ$-ray and electron spectroscopy of $^{182,184,186}$Hg isotopes
Authors:
M. Stryjczyk,
B. Andel,
J. G. Cubiss,
K. Rezynkina,
T. R. Rodríguez,
J. E. García-Ramos,
A. N. Andreyev,
J. Pakarinen,
P. Van Duppen,
S. Antalic,
T. Berry,
M. J. G. Borge,
C. Clisu,
D. M. Cox,
H. De Witte,
L. M. Fraile,
H. O. U. Fynbo,
L. P. Gaffney,
L. J. Harkness-Brennan,
M. Huyse,
A. Illana,
D. S. Judson,
J. Konki,
J. Kurcewicz,
I. Lazarus
, et al. (26 additional authors not shown)
Abstract:
Background: The mercury isotopes around $N=104$ are a well-known example of nuclei exhibiting shape coexistence. Mixing of configurations can be studied by measuring the monopole strength $ρ^2(E0)$, however, currently the experimental information is scarce and lacks precision, especially for the $I^π\rightarrow I^π$ ($I \neq 0$) transitions. Purpose: The goals of this study were to increase the pr…
▽ More
Background: The mercury isotopes around $N=104$ are a well-known example of nuclei exhibiting shape coexistence. Mixing of configurations can be studied by measuring the monopole strength $ρ^2(E0)$, however, currently the experimental information is scarce and lacks precision, especially for the $I^π\rightarrow I^π$ ($I \neq 0$) transitions. Purpose: The goals of this study were to increase the precision of the known branching ratios and internal conversion coefficients, to increase the amount of available information regarding excited states in $^{182,184,186}$Hg and to interpret the results in the framework of shape coexistence using different models. Method: The low-energy structures in $^{182,184,186}$Hg were populated in the $β$ decay of $^{182,184,186}$Tl, produced at ISOLDE and purified by laser ionization and mass separation. The $γ$-ray and internal conversion electron events were detected by five germanium clover detectors and a segmented silicon detector, respectively, and correlated in time to build decay schemes. Results: In total, 193, 178 and 156 transitions, including 144, 140 and 108 observed for the first time in a $β$-decay experiment, were assigned to $^{182,184,186}$Hg, respectively. Internal conversion coefficients were determined for 23 transitions, out of which 12 had an $E0$ component. Extracted branching ratios allowed the sign of the interference term in $^{182}$Hg as well as $ρ^2(E0;0^+_2\rightarrow 0^+_1)$ and $B(E2;0^+_2\rightarrow 2^+_1)$ in $^{184}$Hg to be determined. By means of electron-electron coincidences, the $0^+_3$ state was identified in $^{184}$Hg. The experimental results were qualitatively reproduced by five theoretical approaches, the IBM with configuration mixing with two different parametrizations, the General Bohr Hamiltonian, the BMF model and the SCCM model. However, a quantitative description is lacking.
△ Less
Submitted 6 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Plasmons in phosphorene nanoribbons
Authors:
Line Jelver,
Joel D. Cox
Abstract:
Phosphorene has emerged as an atomically-thin platform for optoelectronics and nanophotonics due to its excellent nonlinear optical properties and the possibility of actively tuning light-matter interactions through electrical doping. While phosphorene is a two-dimensional semiconductor, plasmon resonances characterized by pronounced anisotropy and strong optical confinement are anticipated to eme…
▽ More
Phosphorene has emerged as an atomically-thin platform for optoelectronics and nanophotonics due to its excellent nonlinear optical properties and the possibility of actively tuning light-matter interactions through electrical doping. While phosphorene is a two-dimensional semiconductor, plasmon resonances characterized by pronounced anisotropy and strong optical confinement are anticipated to emerge in highly-doped samples. Here we show that the localized plasmons supported by phosphorene nanoribbons (PNRs) exhibit high tunability in relation to both edge termination and doping charge polarity, and can trigger an intense nonlinear optical response at moderate doping levels. Our explorations are based on a second-principles theoretical framework, employing maximally localized Wannier functions constructed from ab-inito electronic structure calculations, which we introduce here to describe the linear and nonlinear optical response of PNRs on mesoscopic length scales. Atomistic simulations reveal the high tunability of plasmons in doped PNRs at near-infrared frequencies, which can facilitate synergy between electronic band structure and plasmonic field confinement in doped PNRs to drive efficient high-harmonic generation. Our findings establish phosphorene nanoribbons as a versatile atomically-thin material candidate for nonlinear plasmonics.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Authors:
Zhiqing Sun,
Yikang Shen,
Qinhong Zhou,
Hongxin Zhang,
Zhenfang Chen,
David Cox,
Yiming Yang,
Chuang Gan
Abstract:
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to…
▽ More
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user's queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings.
△ Less
Submitted 2 December, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
Authors:
Mingyu Ding,
Yan Xu,
Zhenfang Chen,
David Daniel Cox,
Ping Luo,
Joshua B. Tenenbaum,
Chuang Gan
Abstract:
Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts…
▽ More
Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts, build semantic maps and plan actions to complete tasks by learning purely from human demonstrations and language instructions, without access to ground-truth semantic and depth supervisions from simulations. ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program. ECL has several appealing benefits thanks to its modularized design. Firstly, it enables the robotic agent to learn semantics and depth unsupervisedly acting like babies, e.g., ground concepts through active interaction and perceive depth by disparities when moving forward. Secondly, ECL is fully transparent and step-by-step interpretable in long-term planning. Thirdly, ECL could be beneficial for the embodied instruction following (EIF), outperforming previous works on the ALFRED benchmark when the semantic label is not provided. Also, the learned concept can be reused for other downstream tasks, such as reasoning of object states. Project page: http://ecl.csail.mit.edu/
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Learning to Grow Pretrained Models for Efficient Transformer Training
Authors:
Peihao Wang,
Rameswar Panda,
Lucas Torroba Hennigen,
Philip Greengard,
Leonid Karlinsky,
Rogerio Feris,
David Daniel Cox,
Zhangyang Wang,
Yoon Kim
Abstract:
Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the…
▽ More
Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the parameters of smaller, extant models to enable faster training of newer, larger models? This paper describes an approach for accelerating transformer training by learning to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. For tractable learning, we factorize the linear transformation as a composition of (linear) width- and depth-growth operators, and further employ a Kronecker factorization of these growth operators to encode architectural knowledge. Extensive experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch, while also consistently outperforming strong baselines that also reuse smaller pretrained models to initialize larger models.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Rapid Development of Compositional AI
Authors:
Lee Martie,
Jessie Rosenberg,
Veronique Demers,
Gaoyuan Zhang,
Onkar Bhardwaj,
John Henning,
Aditya Prasad,
Matt Stallone,
Ja Young Lee,
Lucy Yip,
Damilola Adesina,
Elahe Paikari,
Oscar Resendiz,
Sarah Shaw,
David Cox
Abstract:
Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications,…
▽ More
Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications, we have developed a novel framework called (Bee)* (written as a regular expression and pronounced as "beestar"). We illustrate how (Bee)* supports building integrated, scalable, and interactive compositional AI applications with a simplified developer experience.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
Nonlinear photoluminescence in gold thin films
Authors:
A. Rodríguez Echarri,
F. Iyikanat,
S. Boroviks,
N. Asger Mortensen,
Joel D. Cox,
F. Javier García de Abajo
Abstract:
Promising applications in photonics are driven by the ability to fabricate crystal-quality metal thin films of controlled thickness down to a few nanometers. In particular, these materials exhibit a highly nonlinear response to optical fields owing to the induced ultrafast electron dynamics, which is however poorly understood on such mesoscopic length scales. Here, we reveal a new mechanism that c…
▽ More
Promising applications in photonics are driven by the ability to fabricate crystal-quality metal thin films of controlled thickness down to a few nanometers. In particular, these materials exhibit a highly nonlinear response to optical fields owing to the induced ultrafast electron dynamics, which is however poorly understood on such mesoscopic length scales. Here, we reveal a new mechanism that controls the nonlinear optical response of thin metallic films, dominated by ultrafast electronic heat transport when the thickness is sufficiently small. By experimentally and theoretically studying electronic transport in such materials, we explain the observed temporal evolution of photoluminescence in pump-probe measurements that we report for crystalline gold flakes. Incorporating a first-principles description of the electronic band structures, we model electronic transport and find that ultrafast thermal dynamics plays a pivotal role in determining the strength and time-dependent characteristics of the nonlinear photoluminescence signal, which is largely influenced by the distribution of hot electrons and holes, subject to diffusion across the film as well as relaxation to lattice modes. Our findings introduce conceptually novel elements triggering the nonlinear optical response of nanoscale materials while suggesting additional ways to control and leverage hot carrier distributions in metallic films.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
Authors:
James Seale Smith,
Paola Cascante-Bonilla,
Assaf Arbelle,
Donghyun Kim,
Rameswar Panda,
David Cox,
Diyi Yang,
Zsolt Kira,
Rogerio Feris,
Leonid Karlinsky
Abstract:
Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object…
▽ More
Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object attributes, states, and inter-object relations. This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting. In this work, we introduce the first Continual Data-Free Structured VL Concepts Learning (ConStruct-VL) benchmark and show it is challenging for many existing data-free CL strategies. We, therefore, propose a data-free method comprised of a new approach of Adversarial Pseudo-Replay (APR) which generates adversarial reminders of past tasks from past task models. To use this method efficiently, we also propose a continual parameter-efficient Layered-LoRA (LaLo) neural architecture allowing no-memory-cost access to all past models at train time. We show this approach outperforms all data-free methods by as much as ~7% while even matching some levels of experience-replay (prohibitive for applications where data-privacy must be preserved). Our code is publicly available at https://github.com/jamessealesmith/ConStruct-VL
△ Less
Submitted 30 March, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Optoelectronic control of atomic bistability with graphene
Authors:
Mikkel Have Eriksen,
Jakob E. Olsen,
Christian Wolff,
Joel D. Cox
Abstract:
We explore the emergence and active control of optical bistability in a two-level atom near a graphene sheet. Our theory incorporates self-interaction of the optically-driven atom and its coupling to electromagnetic vacuum modes, both of which are sensitive to the electrically-tunable interband transition threshold in graphene. We show that electro-optical bistability and hysteresis can manifest i…
▽ More
We explore the emergence and active control of optical bistability in a two-level atom near a graphene sheet. Our theory incorporates self-interaction of the optically-driven atom and its coupling to electromagnetic vacuum modes, both of which are sensitive to the electrically-tunable interband transition threshold in graphene. We show that electro-optical bistability and hysteresis can manifest in the intensity, spectrum, and quantum statistics of the light emitted by the atom, which undergoes critical slow-down to steady-state. The optically-driven atom-graphene interaction constitutes a platform for active control of driven atomic systems in quantum coherent control and atomic physics.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Nonlinear quantum logic with colliding graphene plasmons
Authors:
Giuseppe Calajò,
Philipp K. Jenke,
Lee A. Rozema,
Philip Walther,
Darrick E. Chang,
Joel D. Cox
Abstract:
Graphene has emerged as a promising platform to bring nonlinear quantum optics to the nanoscale, where a large intrinsic optical nonlinearity enables long-lived and actively tunable plasmon polaritons to strongly interact. Here we theoretically study the collision between two counter-propagating plasmons in a graphene nanoribbon, where transversal subwavelength confinement endows propagating plasm…
▽ More
Graphene has emerged as a promising platform to bring nonlinear quantum optics to the nanoscale, where a large intrinsic optical nonlinearity enables long-lived and actively tunable plasmon polaritons to strongly interact. Here we theoretically study the collision between two counter-propagating plasmons in a graphene nanoribbon, where transversal subwavelength confinement endows propagating plasmons with %large effective masses a flat band dispersion that enhances their interaction. This scenario presents interesting possibilities towards the implementation of multi-mode polaritonic gates that circumvent limitations imposed by the Shapiro no-go theorem for photonic gates in nonlinear optical fibers. As a paradigmatic example we demonstrate the feasibility of a high fidelity conditional Pi phase shift (CZ), where the gate performance is fundamentally limited only by the single-plasmon lifetime. These results open new exciting avenues towards quantum information and many-body applications with strongly-interacting polaritons.
△ Less
Submitted 18 March, 2023; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Existence of Optimally-Greatest Digraphs for Strongly Connected Node Reliability
Authors:
Danielle Cox,
Kyle MacKeigan,
Emily Wright
Abstract:
In this paper, we introduce a new model to study network reliability with node failures. This model, strongly connected node reliability, is the directed variant of node reliability and measures the probability that the operational vertices induce a subdigraph that is strongly connected. If we are restricted to directed graphs with $n$ vertices and $n+1\leq m\leq 2n-3$ or $m=2n$ arcs, an optimally…
▽ More
In this paper, we introduce a new model to study network reliability with node failures. This model, strongly connected node reliability, is the directed variant of node reliability and measures the probability that the operational vertices induce a subdigraph that is strongly connected. If we are restricted to directed graphs with $n$ vertices and $n+1\leq m\leq 2n-3$ or $m=2n$ arcs, an optimally-greatest digraph does not exist. Furthermore, we study optimally-greatest directed circulant graphs when the vertices operate with probability $p$ near zero and near one.
In particular, we show that the graph $Γ\left(\mathbb{Z}_n,\{1,-1\}\right)$ is optimally-greatest for values of $p$ near zero. Then, we determine that the graph $Γ\left(\mathbb{Z}_{n},\{1,\frac{n+2}{2}\}\right)$ is optimally-greatest for values of $p$ near one when $n$ is even. Next, we show that the graph $Γ\left(\mathbb{Z}_{n},\{1,2(3^{-1})\}\right)$ is optimally-greatest for values of $p$ near one when $n$ is odd and not divisible by three and that $Γ\left(\mathbb{Z}_{n},\{1,3(2^{-1})\}\right)$ is optimally-greatest for values of $p$ near one when $n$ is odd and divisible by three. We conclude with a discussion of open problems.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Hot methanol in the [BHB2007] 11 protobinary system: hot corino versus shock origin? : FAUST V
Authors:
C. Vastel,
F. Alves,
C. Ceccarelli,
M. Bouvier,
I. Jimenez-Serra,
T. Sakai,
P. Caselli,
L. Evans,
F. Fontani,
R. Le Gal,
C. J. Chandler,
B. Svoboda,
L. Maud,
C. Codella,
N. Sakai,
A. Lopez-Sepulcre,
G. Moellenbrock,
Y. Aikawa,
N. Balucani,
E. Bianchi,
G. Busquet,
E. Caux,
S. Charnley,
N. Cuello,
M. De Simone
, et al. (41 additional authors not shown)
Abstract:
Methanol is a ubiquitous species commonly found in the molecular interstellar medium. It is also a crucial seed species for the building-up of the chemical complexity in star forming regions. Thus, understanding how its abundance evolves during the star formation process and whether it enriches the emerging planetary system is of paramount importance. We used new data from the ALMA Large Program F…
▽ More
Methanol is a ubiquitous species commonly found in the molecular interstellar medium. It is also a crucial seed species for the building-up of the chemical complexity in star forming regions. Thus, understanding how its abundance evolves during the star formation process and whether it enriches the emerging planetary system is of paramount importance. We used new data from the ALMA Large Program FAUST (Fifty AU STudy of the chemistry in the disk/envelope system of Solar-like protostars) to study the methanol line emission towards the [BHB2007] 11 protobinary system (sources A and B), where a complex structure of filaments connecting the two sources with a larger circumbinary disk has been previously detected. Twelve methanol lines have been detected with upper energies in the range [45-537] K along with one 13CH3OH transition. The methanol emission is compact and encompasses both protostars, separated by only 28 au and presents three velocity components, not spatially resolved by our observations, associated with three different spatial regions, with two of them close to 11B and the third one associated with 11A. A non-LTE radiative transfer analysis of the methanol lines concludes that the gas is hot and dense and highly enriched in methanol with an abundance as high as 1e-5. Using previous continuum data, we show that dust opacity can potentially completely absorb the methanol line emission from the two binary objects. Although we cannot firmly exclude other possibilities, we suggest that the detected hot methanol is resulting from the shocked gas from the incoming filaments streaming towards [BHB2007] 11 A and B, respectively. Higher spatial resolution observations are necessary to confirm this hypothesis.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
VALHALLA: Visual Hallucination for Machine Translation
Authors:
Yi Li,
Rameswar Panda,
Yoon Kim,
Chun-Fu Chen,
Rogerio Feris,
David Cox,
Nuno Vasconcelos
Abstract:
Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a…
▽ More
Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. Project page: http://www.svcl.ucsd.edu/projects/valhalla.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Giant enhancement of third-harmonic generation in graphene-metal heterostructures
Authors:
Irati Alonso Calafell,
Lee A. Rozema,
David Alcaraz Iranzo,
Alessandro Trenti,
Joel D. Cox,
Avinash Kumar,
Hlib Bieliaiev,
Sebastian Nanot,
Cheng Peng,
Dmitri K. Efetov,
Jin Yong Hong,
Jing Kong,
Dirk R. Englund,
F. Javier García de Abajo,
Frank H. L. Koppens,
Philp Walther
Abstract:
Nonlinear nanophotonics leverages engineered nanostructures to funnel light into small volumes and intensify nonlinear optical processes with spectral and spatial control. Due to its intrinsically large and electrically tunable nonlinear optical response, graphene is an especially promising nanomaterial for nonlinear optoelectronic applications. Here we report on exceptionally strong optical nonli…
▽ More
Nonlinear nanophotonics leverages engineered nanostructures to funnel light into small volumes and intensify nonlinear optical processes with spectral and spatial control. Due to its intrinsically large and electrically tunable nonlinear optical response, graphene is an especially promising nanomaterial for nonlinear optoelectronic applications. Here we report on exceptionally strong optical nonlinearities in graphene-insulator-metal heterostructures, demonstrating an enhancement by three orders of magnitude in the third-harmonic signal compared to bare graphene. Furthermore, by increasing the graphene Fermi energy through an external gate voltage, we find that graphene plasmons mediate the optical nonlinearity and modify the third-harmonic signal. Our findings show that graphene-insulator-metal is a promising heterostructure for optically-controlled and electrically-tunable nano-optoelectronic components.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.