Skip to main content

Showing 1–9 of 9 results for author: Majumder, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  2. arXiv:2412.12391  [pdf, other

    cs.CV cs.CL cs.LG

    Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

    Authors: Hao Li, Shamit Lal, Zhiheng Li, Yusheng Xie, Ying Wang, Yang Zou, Orchid Majumder, R. Manmatha, Zhuowen Tu, Stefano Ermon, Stefano Soatto, Ashwin Swaminathan

    Abstract: We empirically study the scaling properties of various Diffusion Transformers (DiTs) for text-to-image generation by performing extensive and rigorous ablations, including training scaled DiTs ranging from 0.3B upto 8B parameters on datasets up to 600M images. We find that U-ViT, a pure self-attention based DiT model provides a simpler design and scales more effectively in comparison with cross-at… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  3. arXiv:2404.02883  [pdf, other

    cs.CV cs.AI cs.LG

    On the Scalability of Diffusion-based Text-to-Image Generation

    Authors: Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto

    Abstract: Scaling up model and data size has been quite successful for the evolution of LLMs. However, the scaling law for the diffusion based text-to-image (T2I) models is not fully explored. It is also unclear how to efficiently scale the model for better performance at reduced cost. The different training settings and expensive training cost make a fair model comparison extremely difficult. In this work,… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  4. arXiv:2101.11058  [pdf, other

    cs.CV cs.LG

    Supervised Momentum Contrastive Learning for Few-Shot Classification

    Authors: Orchid Majumder, Avinash Ravichandran, Subhransu Maji, Alessandro Achille, Marzia Polito, Stefano Soatto

    Abstract: Few-shot learning aims to transfer information from one task to enable generalization on novel tasks given a few examples. This information is present both in the domain and the class labels. In this work we investigate the complementary roles of these two sources of information by combining instance-discriminative contrastive learning and supervised learning in a single framework called Supervise… ▽ More

    Submitted 21 June, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: V2 version; updated with new experiments and figures

  5. arXiv:2101.06640  [pdf, other

    cs.LG stat.ML

    Estimating informativeness of samples with Smooth Unique Information

    Authors: Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a lin… ▽ More

    Submitted 28 March, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: ICLR 2021, 22 pages

  6. arXiv:2002.04162  [pdf, other

    cs.LG cs.CV stat.ML

    Incremental Meta-Learning via Indirect Discriminant Alignment

    Authors: Qing Liu, Orchid Majumder, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: Majority of the modern meta-learning methods for few-shot classification tasks operate in two phases: a meta-training phase where the meta-learner learns a generic representation by solving multiple few-shot tasks sampled from a large dataset and a testing phase, where the meta-learner leverages its learnt internal representation for a specific few-shot task involving classes which were not seen d… ▽ More

    Submitted 21 April, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  7. arXiv:1910.11959  [pdf, other

    cs.CL

    FineText: Text Classification via Attention-based Language Model Fine-tuning

    Authors: Yunzhe Tao, Saurabh Gupta, Satyapriya Krishna, Xiong Zhou, Orchid Majumder, Vineet Khare

    Abstract: Training deep neural networks from scratch on natural language processing (NLP) tasks requires significant amount of manually labeled text corpus and substantial time to converge, which usually cannot be satisfied by the customers. In this paper, we aim to develop an effective transfer learning algorithm by fine-tuning a pre-trained language model. The goal is to provide expressive and convenient-… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

  8. arXiv:1910.08525  [pdf, other

    cs.LG stat.ML

    MARTHE: Scheduling the Learning Rate Via Online Hypergradients

    Authors: Michele Donini, Luca Franceschi, Massimiliano Pontil, Orchid Majumder, Paolo Frasconi

    Abstract: We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses pas… ▽ More

    Submitted 17 May, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: IJCAI 2020. Larger images. Code available at https://github.com/awslabs/adatune

  9. arXiv:1905.12775  [pdf, other

    cs.CV cs.LG

    $d$-SNE: Domain Adaptation using Stochastic Neighborhood Embedding

    Authors: Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumder

    Abstract: Deep neural networks often require copious amount of labeled-data to train their scads of parameters. Training larger and deeper networks is hard without appropriate regularization, particularly while using a small dataset. Laterally, collecting well-annotated data is expensive, time-consuming and often infeasible. A popular way to regularize these networks is to simply train the network with more… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted as Oral at CVPR 2019