Search | arXiv e-print repository

The Amazon Nova Family of Models: Technical Report and Model Card

Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation. △ Less

Submitted 17 March, 2025; originally announced June 2025.

Comments: 48 pages, 10 figures

Report number: 20250317

arXiv:2506.04708 [pdf, other]

Accelerated Test-Time Scaling with Model-Free Speculative Sampling

Authors: Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati

Abstract: Language models have demonstrated remarkable capabilities in reasoning tasks through test-time scaling techniques like best-of-N sampling and tree search. However, these approaches often demand substantial computational resources, creating a critical trade-off between performance and efficiency. We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding appro… ▽ More Language models have demonstrated remarkable capabilities in reasoning tasks through test-time scaling techniques like best-of-N sampling and tree search. However, these approaches often demand substantial computational resources, creating a critical trade-off between performance and efficiency. We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach that leverages the inherent redundancy in reasoning trajectories to achieve significant acceleration without compromising accuracy. Our analysis reveals that reasoning paths frequently reuse similar reasoning patterns, enabling efficient model-free token prediction without requiring separate draft models. By introducing stochastic drafting and preserving probabilistic information through a memory-efficient logit-based N-gram module, combined with optimized Gumbel-Top-K sampling and data-driven tree construction, STAND significantly improves token acceptance rates. Extensive evaluations across multiple models and reasoning tasks (AIME-2024, GPQA-Diamond, and LiveCodeBench) demonstrate that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding while maintaining accuracy. Furthermore, STAND outperforms state-of-the-art speculative decoding methods by 14-28% in throughput and shows strong performance even in single-trajectory scenarios, reducing inference latency by 48-58%. As a model-free approach, STAND can be applied to any existing language model without additional training, being a powerful plug-and-play solution for accelerating language model reasoning. △ Less

Submitted 5 June, 2025; originally announced June 2025.

arXiv:2504.18572 [pdf]

BELL: Benchmarking the Explainability of Large Language Models

Authors: Syed Quiser Ahmed, Bharathi Vokkaliga Ganesh, Jagadish Babu P, Karthick Selvaraj, ReddySiva Naga Parvathi Devi, Sravya Kappala

Abstract: Large Language Models have demonstrated remarkable capabilities in natural language processing, yet their decision-making processes often lack transparency. This opaqueness raises significant concerns regarding trust, bias, and model performance. To address these issues, understanding and evaluating the interpretability of LLMs is crucial. This paper introduces a standardised benchmarking techniqu… ▽ More Large Language Models have demonstrated remarkable capabilities in natural language processing, yet their decision-making processes often lack transparency. This opaqueness raises significant concerns regarding trust, bias, and model performance. To address these issues, understanding and evaluating the interpretability of LLMs is crucial. This paper introduces a standardised benchmarking technique, Benchmarking the Explainability of Large Language Models, designed to evaluate the explainability of large language models. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2503.04992 [pdf, ps, other]

Wanda++: Pruning Large Language Models via Regional Gradients

Authors: Yifan Yang, Kai Zhen, Bhavana Ganesh, Aram Galstyan, Goeric Huybrechts, Markus Müller, Jonas M. Kübler, Rupak Vignesh Swaminathan, Athanasios Mouchtaris, Sravan Babu Bodapati, Nathan Susanj, Zheng Zhang, Jack FitzGerald, Abhishek Kumar

Abstract: Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal accuracy impact. However, existing methods often suffer from accuracy degradation without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level \textbf{regional} gradients.… ▽ More Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal accuracy impact. However, existing methods often suffer from accuracy degradation without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing decoder-block-level \textbf{regional} gradients. Specifically, Wanda++ improves the pruning score with regional gradients for the first time and proposes an efficient regional optimization method to minimize pruning-induced output discrepancies between the dense and sparse decoder output. Notably, Wanda++ improves perplexity by up to 32\% over Wanda in the language modeling task and generalizes effectively to downstream tasks. Moreover, despite updating weights with regional optimization, Wanda++ remains orthogonal to sparsity-aware fine-tuning, further reducing perplexity with LoRA in great extend. Our approach is lightweight, pruning a 7B LLaMA model in under 10 minutes on a single H100 GPU. △ Less

Submitted 1 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

Comments: Paper accepted at ACL 2025 Findings

arXiv:2412.11384 [pdf]

A Comprehensive Review of Adversarial Attacks on Machine Learning

Authors: Syed Quiser Ahmed, Bharathi Vokkaliga Ganesh, Sathyanarayana Sampath Kumar, Prakhar Mishra, Ravi Anand, Bhanuteja Akurathi

Abstract: This research provides a comprehensive overview of adversarial attacks on AI and ML models, exploring various attack types, techniques, and their potential harms. We also delve into the business implications, mitigation strategies, and future research directions. To gain practical insights, we employ the Adversarial Robustness Toolbox (ART) [1] library to simulate these attacks on real-world use c… ▽ More This research provides a comprehensive overview of adversarial attacks on AI and ML models, exploring various attack types, techniques, and their potential harms. We also delve into the business implications, mitigation strategies, and future research directions. To gain practical insights, we employ the Adversarial Robustness Toolbox (ART) [1] library to simulate these attacks on real-world use cases, such as self-driving cars. Our goal is to inform practitioners and researchers about the challenges and opportunities in defending AI systems against adversarial threats. By providing a comprehensive comparison of different attack methods, we aim to contribute to the development of more robust and secure AI systems. △ Less

Submitted 15 December, 2024; originally announced December 2024.

arXiv:2410.20252 [pdf, other]

Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning

Authors: Sullam Jeoung, Goeric Huybrechts, Bhavana Ganesh, Aram Galstyan, Sravan Bodapati

Abstract: Understanding long-form video content presents significant challenges due to its temporal complexity and the substantial computational resources required. In this work, we propose an agent-based approach to enhance both the efficiency and effectiveness of long-form video understanding by utilizing large language models (LLMs) and their tool-harnessing ability. A key aspect of our method is query-a… ▽ More Understanding long-form video content presents significant challenges due to its temporal complexity and the substantial computational resources required. In this work, we propose an agent-based approach to enhance both the efficiency and effectiveness of long-form video understanding by utilizing large language models (LLMs) and their tool-harnessing ability. A key aspect of our method is query-adaptive frame sampling, which leverages the reasoning capabilities of LLMs to process only the most relevant frames in real-time, and addresses an important limitation of existing methods which typically involve sampling redundant or irrelevant frames. To enhance the reasoning abilities of our video-understanding agent, we leverage the self-reflective capabilities of LLMs to provide verbal reinforcement to the agent, which leads to improved performance while minimizing the number of frames accessed. We evaluate our method across several video understanding benchmarks and demonstrate that not only it enhances state-of-the-art performance but also improves efficiency by reducing the number of frames sampled. △ Less

Submitted 26 October, 2024; originally announced October 2024.

arXiv:2410.09362 [pdf, other]

SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins

Authors: Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh, Sailik Sengupta, Sravan Bodapati, Aram Galstyan

Abstract: Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This can lead to two problems where the policy… ▽ More Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This can lead to two problems where the policy model (1) picks up on spurious correlations in the dataset (as opposed to learning the intended alignment expressed in the human preference labels), and (2) overfits to feedback on off-policy trajectories that have less likelihood of being generated by an updated policy model. To address these issues, we introduce Self-Reviewing and Alignment (SeRA), a cost-efficient and effective method that can be readily combined with existing DAAs. SeRA comprises of two components: (1) sample selection using implicit reward margins, which helps alleviate over-fitting to some undesired features, and (2) preference bootstrapping using implicit rewards to augment preference data with updated policy models in a cost-efficient manner. Extensive experimentation, including some on instruction-following tasks, demonstrate the effectiveness and generality of SeRA in training LLMs on offline preference datasets with DAAs. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2211.04780 [pdf, other]

On the Robustness of Explanations of Deep Neural Network Models: A Survey

Authors: Amlan Jyoti, Karthik Balaji Ganesh, Manoj Gayala, Nandita Lakshmi Tunuguntla, Sandesh Kamath, Vineeth N Balasubramanian

Abstract: Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can… ▽ More Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: Under Review ACM Computing Surveys "Special Issue on Trustworthy AI"

arXiv:2201.11674 [pdf, other]

Vision Checklist: Towards Testable Error Analysis of Image Models to Help System Designers Interrogate Model Capabilities

Authors: Xin Du, Benedicte Legastelois, Bhargavi Ganesh, Ajitha Rajan, Hana Chockler, Vaishak Belle, Stuart Anderson, Subramanian Ramamoorthy

Abstract: Using large pre-trained models for image recognition tasks is becoming increasingly common owing to the well acknowledged success of recent models like vision transformers and other CNN-based models like VGG and Resnet. The high accuracy of these models on benchmark tasks has translated into their practical use across many domains including safety-critical applications like autonomous driving and… ▽ More Using large pre-trained models for image recognition tasks is becoming increasingly common owing to the well acknowledged success of recent models like vision transformers and other CNN-based models like VGG and Resnet. The high accuracy of these models on benchmark tasks has translated into their practical use across many domains including safety-critical applications like autonomous driving and medical diagnostics. Despite their widespread use, image models have been shown to be fragile to changes in the operating environment, bringing their robustness into question. There is an urgent need for methods that systematically characterise and quantify the capabilities of these models to help designers understand and provide guarantees about their safety and robustness. In this paper, we propose Vision Checklist, a framework aimed at interrogating the capabilities of a model in order to produce a report that can be used by a system designer for robustness evaluations. This framework proposes a set of perturbation operations that can be applied on the underlying data to generate test samples of different types. The perturbations reflect potential changes in operating environments, and interrogate various properties ranging from the strictly quantitative to more qualitative. Our framework is evaluated on multiple datasets like Tinyimagenet, CIFAR10, CIFAR100 and Camelyon17 and for models like ViT and Resnet. Our Vision Checklist proposes a specific set of evaluations that can be integrated into the previously proposed concept of a model card. Robustness evaluations like our checklist will be crucial in future safety evaluations of visual perception modules, and be useful for a wide range of stakeholders including designers, deployers, and regulators involved in the certification of these systems. Source code of Vision Checklist would be open for public use. △ Less

Submitted 31 January, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 17 pages, 18 figures

MSC Class: 62R07 ACM Class: I.4.0

arXiv:2010.14950 [pdf]

Predicting Engagement with the Internet Research Agency's Facebook and Instagram Campaigns around the 2016 U.S. Presidential Election

Authors: Dimitra Liotsiou, Bharath Ganesh, Philip N. Howard

Abstract: The Russian Internet Research Agency's (IRA) online interference campaign in the 2016 U.S. presidential election represents a turning point in the trajectory of democratic elections in the digital age. What can we learn about how the IRA engages U.S. audiences, ahead of the 2020 U.S. presidential election? We provide the first in-depth analysis of the relationships between IRA content characterist… ▽ More The Russian Internet Research Agency's (IRA) online interference campaign in the 2016 U.S. presidential election represents a turning point in the trajectory of democratic elections in the digital age. What can we learn about how the IRA engages U.S. audiences, ahead of the 2020 U.S. presidential election? We provide the first in-depth analysis of the relationships between IRA content characteristics and user engagement on Facebook and Instagram around the 2016 election. We find that content targeting right-wing and non-Black marginalised groups had the strongest positive association with engagement on both Facebook and Instagram, in contrast to findings from the IRA campaign on Twitter and to some previous commentary in the media. Higher engagement was associated with posting later in the 2015-2017 period and using less text on both platforms, using negative wording and not including links on Facebook, and using fewer hashtags on Instagram. The sub-audiences and sub-issues associated with most engagement differed across the platforms. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:2001.11461 [pdf]

Echo Chambers Exist! (But They're Full of Opposing Views)

Authors: Jonathan Bright, Nahema Marchal, Bharath Ganesh, Stevan Rudinac

Abstract: The theory of echo chambers, which suggests that online political discussions take place in conditions of ideological homogeneity, has recently gained popularity as an explanation for patterns of political polarization and radicalization observed in many democratic countries. However, while micro-level experimental work has shown evidence that individuals may gravitate towards information that sup… ▽ More The theory of echo chambers, which suggests that online political discussions take place in conditions of ideological homogeneity, has recently gained popularity as an explanation for patterns of political polarization and radicalization observed in many democratic countries. However, while micro-level experimental work has shown evidence that individuals may gravitate towards information that supports their beliefs, recent macro-level studies have cast doubt on whether this tendency generates echo chambers in practice, instead suggesting that cross-cutting exposures are a common feature of digital life. In this article, we offer an explanation for these diverging results. Building on cognitive dissonance theory, and making use of observational trace data taken from an online white nationalist website, we explore how individuals in an ideological 'echo chamber' engage with opposing viewpoints. We show that this type of exposure, far from being detrimental to radical online discussions, is actually a core feature of such spaces that encourages people to stay engaged. The most common 'echoes' in this echo chamber are in fact the sound of opposing viewpoints being undermined and marginalized. Hence echo chambers exist not only in spite of but thanks to the unifying presence of oppositional viewpoints. We conclude with reflections on policy implications of our study for those seeking to promote a more moderate political internet. △ Less

Submitted 30 January, 2020; originally announced January 2020.

arXiv:1710.07087 [pdf]

Does Campaigning on Social Media Make a Difference? Evidence from candidate use of Twitter during the 2015 and 2017 UK Elections

Authors: Jonathan Bright, Scott A Hale, Bharath Ganesh, Andrew Bulovsky, Helen Margetts, Phil Howard

Abstract: Social media are now a routine part of political campaigns all over the world. However, studies of the impact of campaigning on social platform have thus far been limited to cross-sectional datasets from one election period which are vulnerable to unobserved variable bias. Hence empirical evidence on the effectiveness of political social media activity is thin. We address this deficit by analysing… ▽ More Social media are now a routine part of political campaigns all over the world. However, studies of the impact of campaigning on social platform have thus far been limited to cross-sectional datasets from one election period which are vulnerable to unobserved variable bias. Hence empirical evidence on the effectiveness of political social media activity is thin. We address this deficit by analysing a novel panel dataset of political Twitter activity in the 2015 and 2017 elections in the United Kingdom. We find that Twitter based campaigning does seem to help win votes, a finding which is consistent across a variety of different model specifications including a first difference regression. The impact of Twitter use is small in absolute terms, though comparable with that of campaign spending. Our data also support the idea that effects are mediated through other communication channels, hence challenging the relevance of engaging in an interactive fashion. △ Less

Submitted 27 July, 2018; v1 submitted 19 October, 2017; originally announced October 2017.

Showing 1–12 of 12 results for author: Ganesh, B