Skip to main content

Showing 1–32 of 32 results for author: Raman, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12346  [pdf, ps, other

    cs.CL cs.AI

    Refract ICL: Rethinking Example Selection in the Era of Million-Token Models

    Authors: Arjun R. Akula, Kazuma Hashimoto, Krishna Srinivasan, Aditi Chaudhary, Karthik Raman, Michael Bendersky

    Abstract: The emergence of long-context large language models (LLMs) has enabled the use of hundreds, or even thousands, of demonstrations for in-context learning (ICL) - a previously impractical regime. This paper investigates whether traditional ICL selection strategies, which balance the similarity of ICL examples to the test input (using a text retriever) with diversity within the ICL set, remain effect… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  3. arXiv:2311.09619  [pdf, other

    cs.CL

    Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning

    Authors: Kazuma Hashimoto, Karthik Raman, Michael Bendersky

    Abstract: In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs). Only a few demonstrations enable LLMs to be used as blackbox for new tasks. Previous studies have shown that using LLMs' outputs as labels is effective in training models to select demonstrations. Such a label is expected to estimate utility of a demonstration in ICL; however, it has not been well understood how d… ▽ More

    Submitted 2 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted as a long paper at NAACL 2024

  4. arXiv:2311.07930  [pdf, other

    cs.CL

    It's All Relative! -- A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction

    Authors: Aditi Chaudhary, Karthik Raman, Michael Bendersky

    Abstract: Recent developments in large language models (LLMs) have shown promise in their ability to generate synthetic query-document pairs by prompting with as few as 8 demonstrations. This has enabled building better IR models, especially for tasks with no training data readily available. Typically, such synthetic query generation (QGen) approaches condition on an input context (e.g. a text document) and… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 18 pages

  5. arXiv:2309.07900  [pdf, other

    cs.CL cs.IR

    Ambiguity-Aware In-Context Learning with Large Language Models

    Authors: Lingyu Gao, Aditi Chaudhary, Krishna Srinivasan, Kazuma Hashimoto, Karthik Raman, Michael Bendersky

    Abstract: In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic similarity between the ICL demonstrations and test input… ▽ More

    Submitted 30 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 15 pages in total

  6. arXiv:2305.11944  [pdf, other

    cs.IR cs.CL

    Exploring the Viability of Synthetic Query Generation for Relevance Prediction

    Authors: Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Kazuma Hashimoto, Mike Bendersky, Marc Najork

    Abstract: Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address t… ▽ More

    Submitted 16 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: In Proceedings of ACM SIGIRWorkshop on eCommerce (SIGIR eCom 23)

  7. arXiv:2212.10767  [pdf, other

    cs.CL

    How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?

    Authors: Kazuma Hashimoto, Iftekhar Naim, Karthik Raman

    Abstract: Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we… ▽ More

    Submitted 30 January, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: UncertaiNLP 2024 (an EACL 2024 workshop: https://uncertainlp.github.io/)

  8. arXiv:2210.15718  [pdf, other

    cs.CL cs.IR

    QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation

    Authors: Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca Bertelli, Mike Bendersky

    Abstract: Large Language Models (LLMs) have shown impressive results on a variety of text understanding tasks. Search queries though pose a unique challenge, given their short-length and lack of nuance or context. Complicated feature engineering efforts do not always lead to downstream improvements as their performance benefits may be offset by increased complexity of knowledge distillation. Thus, in this p… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 Industry Track

  9. arXiv:2209.14694  [pdf, other

    cs.CL

    GROOT: Corrective Reward Optimization for Generative Sequential Labeling

    Authors: Kazuma Hashimoto, Karthik Raman

    Abstract: Sequential labeling is a fundamental NLP task, forming the backbone of many applications. Supervised learning of Seq2Seq models has shown great success on these problems. However, the training objectives are still significantly disconnected with the metrics and desiderata we care about in practice. For example, a practical sequence tagging application may want to optimize for a certain precision-r… ▽ More

    Submitted 21 December, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

  10. arXiv:2209.14290  [pdf, other

    cs.CL cs.IR

    FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

    Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

    Abstract: Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base. However, they are also more complex systems and need to handle long inputs. In this work, we introduce FiD-Light to strongly increase the efficiency of the state-of-the-art retrieval-augmented… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  11. arXiv:2207.03030  [pdf, other

    cs.CL cs.IR

    Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

    Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

    Abstract: This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted at the ICML 2022 Workshop on Knowledge Retrieval and Language Models (KRLM)

  12. arXiv:2203.08378  [pdf, other

    cs.CL

    Transforming Sequence Tagging Into A Seq2Seq Task

    Authors: Karthik Raman, Iftekhar Naim, Jiecao Chen, Kazuma Hashimoto, Kiran Yalasangi, Krishna Srinivasan

    Abstract: Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks. Casting a sequence tagging task as a Seq2Seq one requires deciding the formats of the input and output sequences. However, we lack a principled understanding of the trade-offs associated with these formats (such as the effect on model accuracy, sequence le… ▽ More

    Submitted 25 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Accepted at EMNLP 2022

  13. arXiv:2111.04508  [pdf, other

    q-bio.MN cs.ET

    Designing biological circuits: from principles to applications

    Authors: Debomita Chakraborty, Raghunathan Rengaswamy, Karthik Raman

    Abstract: Genetic circuit design is a well-studied problem in synthetic biology. Ever since the first genetic circuits -- the repressilator and the toggle switch -- were designed and implemented, many advances have been made in this area of research. The current review systematically organizes a number of key works in this domain by employing the versatile framework of generalized morphological analysis. Li… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

  14. arXiv:2103.01913  [pdf, other

    cs.CV cs.CL cs.IR

    WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

    Authors: Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, Marc Najork

    Abstract: The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information (across image and text modalities). In this paper, we introduce the Wikipedia-based Imag… ▽ More

    Submitted 3 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  15. arXiv:2010.16399  [pdf, other

    cs.LG cs.AI

    Goal directed molecule generation using Monte Carlo Tree Search

    Authors: Anand A. Rajasekar, Karthik Raman, Balaraman Ravindran

    Abstract: One challenging and essential task in biochemistry is the generation of novel molecules with desired properties. Novel molecule generation remains a challenge since the molecule space is difficult to navigate through, and the generated molecules should obey the rules of chemical valency. Through this work, we propose a novel method, which we call unitMCTS, to perform molecule generation by making… ▽ More

    Submitted 11 December, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: 6 pages, 2 tables, 1 figure

  16. arXiv:2010.12566  [pdf, other

    cs.CL

    DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

    Authors: Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Jiecao Chen

    Abstract: Pre-trained multilingual language models such as mBERT have shown immense gains for several natural language processing (NLP) tasks, especially in the zero-shot cross-lingual setting. Most, if not all, of these pre-trained models rely on the masked-language modeling (MLM) objective as the key language learning objective. The principle behind these approaches is that predicting the masked words wit… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: 13 pages

  17. DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

    Authors: Jiecao Chen, Liu Yang, Karthik Raman, Michael Bendersky, Jung-Jung Yeh, Yun Zhou, Marc Najork, Danyang Cai, Ehsan Emadzadeh

    Abstract: Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. Howev… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 13 pages. Accepted to Findings of EMNLP 2020

  18. arXiv:2009.01265  [pdf, ps, other

    cs.CR

    Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (version 1.0)

    Authors: Shailesh Bavadekar, Andrew Dai, John Davis, Damien Desfontaines, Ilya Eckstein, Katie Everett, Alex Fabrikant, Gerardo Flores, Evgeniy Gabrilovich, Krishna Gadepalli, Shane Glass, Rayman Huang, Chaitanya Kamath, Dennis Kraft, Akim Kumok, Hinali Marfatia, Yael Mayer, Benjamin Miller, Adam Pearce, Irippuge Milinda Perera, Venky Ramachandran, Karthik Raman, Thomas Roessler, Izhak Shafran, Tomer Shekel , et al. (5 additional authors not shown)

    Abstract: This report describes the aggregation and anonymization process applied to the initial version of COVID-19 Search Trends symptoms dataset (published at https://goo.gle/covid19symptomdataset on September 2, 2020), a publicly available dataset that shows aggregated, anonymized trends in Google searches for symptoms (and some related topics). The anonymization process is designed to protect the daily… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  19. arXiv:2006.08003  [pdf, other

    eess.IV cs.CV cs.LG

    CompressNet: Generative Compression at Extremely Low Bitrates

    Authors: Suraj Kiran Raman, Aditya Ramesh, Vijayakrishna Naganoor, Shubham Dash, Giridharan Kumaravelu, Honglak Lee

    Abstract: Compressing images at extremely low bitrates (< 0.1 bpp) has always been a challenging task since the quality of reconstruction significantly reduces due to the strong imposed constraint on the number of bits allocated for the compressed data. With the increasing need to transfer large amounts of images with limited bandwidth, compressing images to very low sizes is a crucial task. However, the ex… ▽ More

    Submitted 14 June, 2020; originally announced June 2020.

  20. arXiv:2001.03869  [pdf, other

    cs.IT

    Finite-Sample Analysis of Image Registration

    Authors: Ravi Kiran Raman, Lav R. Varshney

    Abstract: We study the problem of image registration in the finite-resolution regime and characterize the error probability of algorithms as a function of properties of the transformation and the image capture noise. Specifically, we define a channel-aware Feinstein decoder to obtain upper bounds on the minimum achievable error probability under finite resolution. We specifically focus on the higher-order t… ▽ More

    Submitted 12 January, 2020; originally announced January 2020.

    Comments: 16 pages, 3 figures

  21. arXiv:1912.12284  [pdf, other

    cs.IT eess.SP

    Decision Making in Star Networks with Incorrect Beliefs

    Authors: Daewon Seo, Ravi Kiran Raman, Lav R. Varshney

    Abstract: Consider a Bayesian binary decision-making problem in star networks, where local agents make selfish decisions independently, and a fusion agent makes a final decision based on aggregated decisions and its own private signal. In particular, we assume all agents have private beliefs for the true prior probability, based on which they perform Bayesian decision making. We focus on the Bayes risk of t… ▽ More

    Submitted 26 October, 2021; v1 submitted 27 December, 2019; originally announced December 2019.

    Comments: final version, to appear in IEEE Transactions on Signal Processing

  22. arXiv:1909.00437  [pdf, other

    cs.CL

    Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

    Authors: Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman

    Abstract: The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of represent… ▽ More

    Submitted 1 September, 2019; originally announced September 2019.

    Journal ref: AAAI 2020

  23. arXiv:1905.12260  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Learning Multilingual Word Embeddings Using Image-Text Data

    Authors: Karan Singhal, Karthik Raman, Balder ten Cate

    Abstract: There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Report number: W19-1807

  24. Beliefs in Decision-Making Cascades

    Authors: Daewon Seo, Ravi Kiran Raman, Joong Bum Rhim, Vivek K Goyal, Lav R Varshney

    Abstract: This work explores a social learning problem with agents having nonidentical noise variances and mismatched beliefs. We consider an $N$-agent binary hypothesis test in which each agent sequentially makes a decision based not only on a private observation, but also on preceding agents' decisions. In addition, the agents have their own beliefs instead of the true prior, and have nonidentical noise v… ▽ More

    Submitted 5 August, 2019; v1 submitted 23 November, 2018; originally announced December 2018.

    Comments: final version, to appear in IEEE Transactions on Signal Processing

  25. arXiv:1810.11126  [pdf, other

    cs.DC

    Promoting Distributed Trust in Machine Learning and Computational Simulation via a Blockchain Network

    Authors: Nelson Kibichii Bore, Ravi Kiran Raman, Isaac M. Markus, Sekou L. Remy, Oliver Bent, Michael Hind, Eleftheria K. Pissadaki, Biplav Srivastava, Roman Vaculin, Kush R. Varshney, Komminist Weldemariam

    Abstract: Policy decisions are increasingly dependent on the outcomes of simulations and/or machine learning models. The ability to share and interact with these outcomes is relevant across multiple fields and is especially critical in the disease modeling community where models are often only accessible and workable to the researchers that generate them. This work presents a blockchain-enabled system that… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

  26. arXiv:1809.08438  [pdf, other

    cs.DC cs.IT eess.SY stat.ML

    Trusted Multi-Party Computation and Verifiable Simulations: A Scalable Blockchain Approach

    Authors: Ravi Kiran Raman, Roman Vaculin, Michael Hind, Sekou L. Remy, Eleftheria K. Pissadaki, Nelson Kibichii Bore, Roozbeh Daneshvar, Biplav Srivastava, Kush R. Varshney

    Abstract: Large-scale computational experiments, often running over weeks and over large datasets, are used extensively in fields such as epidemiology, meteorology, computational biology, and healthcare to understand phenomena, and design high-stakes policies affecting everyday health and economy. For instance, the OpenMalaria framework is a computationally-intensive simulation used by various non-governmen… ▽ More

    Submitted 22 September, 2018; originally announced September 2018.

    Comments: 16 pages, 8 figures

  27. arXiv:1804.06438  [pdf, other

    cs.CV

    Vision Based Dynamic Offside Line Marker for Soccer Games

    Authors: Karthik Muthuraman, Pranav Joshi, Suraj Kiran Raman

    Abstract: Offside detection in soccer has emerged as one of the most important decisions with an average of 50 offside decisions every game. False detections and rash calls adversely affect game conditions and in many cases drastically change the outcome of the game. The human eye has finite precision and can only discern a limited amount of detail in a given instance. Current offside decisions are made man… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

  28. arXiv:1711.07617  [pdf, other

    cs.IT cs.CR

    Dynamic Distributed Storage for Scaling Blockchains

    Authors: Ravi Kiran Raman, Lav R. Varshney

    Abstract: Blockchain uses the idea of storing transaction data in the form of a distributed ledger wherein each node in the network stores a current copy of the sequence of transactions in the form of a hash chain. This requirement of storing the entire ledger incurs a high storage cost that grows undesirably large for high transaction rates and large networks. In this work we use the ideas of secret key sh… ▽ More

    Submitted 7 January, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

    Comments: 19 pages, 6 figures

  29. arXiv:1701.02776  [pdf, other

    cs.IT stat.ML

    Universal Joint Image Clustering and Registration using Partition Information

    Authors: Ravi Kiran Raman, Lav R. Varshney

    Abstract: We consider the problem of universal joint clustering and registration of images and define algorithms using multivariate information functionals. We first study registering two images using maximum mutual information and prove its asymptotic optimality. We then show the shortcomings of pairwise registration in multi-image registration, and design an asymptotically optimal algorithm based on multi… ▽ More

    Submitted 30 November, 2017; v1 submitted 10 January, 2017; originally announced January 2017.

    Comments: 22 pages, 4 figures

  30. arXiv:1610.02276  [pdf, other

    cs.HC cs.IT stat.ML

    Universal Clustering via Crowdsourcing

    Authors: Ravi Kiran Raman, Lav Varshney

    Abstract: Consider unsupervised clustering of objects drawn from a discrete set, through the use of human intelligence available in crowdsourcing platforms. This paper defines and studies the problem of universal clustering using responses of crowd workers, without knowledge of worker reliability or task difficulty. We model stochastic worker response distributions by incorporating traits of memory for simi… ▽ More

    Submitted 5 October, 2016; originally announced October 2016.

  31. arXiv:1404.3656  [pdf, other

    cs.LG cs.IR

    Methods for Ordinal Peer Grading

    Authors: Karthik Raman, Thorsten Joachims

    Abstract: MOOCs have the potential to revolutionize higher education with their wide outreach and accessibility, but they require instructors to come up with scalable alternates to traditional student evaluation. Peer grading -- having students assess each other -- is a promising approach to tackling the problem of evaluation at scale, since the number of "graders" naturally scales with the number of studen… ▽ More

    Submitted 14 April, 2014; originally announced April 2014.

    Comments: Submitted to KDD 2014

    ACM Class: H.4

  32. arXiv:1108.2754  [pdf, other

    cs.IR

    Structured Learning of Two-Level Dynamic Rankings

    Authors: Karthik Raman, Thorsten Joachims, Pannaga Shivaswamy

    Abstract: For ambiguous queries, conventional retrieval systems are bound by two conflicting goals. On the one hand, they should diversify and strive to present results for as many query intents as possible. On the other hand, they should provide depth for each intent by displaying more than a single result. Since both diversity and depth cannot be achieved simultaneously in the conventional static retrieva… ▽ More

    Submitted 12 August, 2011; originally announced August 2011.

    Comments: 10 Pages (Longer Version of CIKM 2011 paper containing more details and experiments)

    ACM Class: H.3.3