Skip to main content

Showing 1–7 of 7 results for author: Karpathy, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander MÄ…dry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  2. arXiv:1701.05517  [pdf, other

    cs.LG stat.ML

    PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

    Authors: Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma

    Abstract: PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood o… ▽ More

    Submitted 19 January, 2017; originally announced January 2017.

  3. arXiv:1511.07571  [pdf, other

    cs.CV cs.LG

    DenseCap: Fully Convolutional Localization Networks for Dense Captioning

    Authors: Justin Johnson, Andrej Karpathy, Li Fei-Fei

    Abstract: We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a… ▽ More

    Submitted 24 November, 2015; originally announced November 2015.

  4. arXiv:1506.02078  [pdf, other

    cs.LG cs.CL cs.NE

    Visualizing and Understanding Recurrent Networks

    Authors: Andrej Karpathy, Justin Johnson, Li Fei-Fei

    Abstract: Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. However, while LSTMs provide exceptional results in practice, the source of their performance and their limitations remain rather poorly understood. Using char… ▽ More

    Submitted 16 November, 2015; v1 submitted 5 June, 2015; originally announced June 2015.

    Comments: changing style, adding references, minor changes to text

  5. arXiv:1412.2306  [pdf, other

    cs.CV

    Deep Visual-Semantic Alignments for Generating Image Descriptions

    Authors: Andrej Karpathy, Li Fei-Fei

    Abstract: We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over se… ▽ More

    Submitted 14 April, 2015; v1 submitted 6 December, 2014; originally announced December 2014.

  6. arXiv:1409.0575  [pdf, other

    cs.CV

    ImageNet Large Scale Visual Recognition Challenge

    Authors: Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei

    Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that ha… ▽ More

    Submitted 29 January, 2015; v1 submitted 1 September, 2014; originally announced September 2014.

    Comments: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional references

    ACM Class: I.4.8; I.5.2

  7. arXiv:1406.5679  [pdf, other

    cs.CV cs.CL cs.LG

    Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

    Authors: Andrej Karpathy, Armand Joulin, Li Fei-Fei

    Abstract: We introduce a model for bidirectional retrieval of images and sentences through a multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. In additio… ▽ More

    Submitted 22 June, 2014; originally announced June 2014.