Skip to main content

Showing 1–12 of 12 results for author: Kiros, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander MÄ…dry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  2. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  3. arXiv:2106.06168  [pdf, other

    cs.LG

    Generate, Annotate, and Learn: NLP with Synthetic Text

    Authors: Xuanli He, Islam Nassar, Jamie Kiros, Gholamreza Haffari, Mohammad Norouzi

    Abstract: This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We formulate a general framework called ``generate, annotate, and learn (GAL)'' to take advantage of synthetic text within knowledge distillation, self-training, and few-shot learning applications. To generate high-quality task-specific text, we either fine-tune LMs on inputs from the task of interest, o… ▽ More

    Submitted 31 May, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: accepted to TACL2022

  4. arXiv:2010.04438  [pdf, other

    cs.CL cs.LG stat.ML

    Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels

    Authors: Harris Chan, Jamie Kiros, William Chan

    Abstract: A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this work, we present the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels. MGLM marginalizes over all… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 10 pages (+3 appendix), 11 figures, 5 tables. Accepted to Findings of EMNLP 2020

  5. arXiv:2002.08866  [pdf, other

    cs.CL cs.LG

    Contextual Lensing of Universal Sentence Representations

    Authors: Jamie Kiros

    Abstract: What makes a universal sentence encoder universal? The notion of a generic encoder of text appears to be at odds with the inherent contextualization and non-permanence of language use in a dynamic world. However, mapping sentences into generic fixed-length vectors for downstream similarity and retrieval tasks has been fruitful, particularly for multilingual applications. How do we manage this dile… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: 10 pages

  6. arXiv:1910.13437  [pdf, ps, other

    cs.CL cs.LG

    An Empirical Study of Generation Order for Machine Translation

    Authors: William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit

    Abstract: In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft order-reward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, location-bas… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  7. arXiv:1905.13177  [pdf, other

    cs.LG stat.ML

    Graph Normalizing Flows

    Authors: Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

    Abstract: We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation. On supervised tasks, graph normalizing flows perform similarly to message passing neural networks, but at a significantly reduced memory footprint, allowing them to scale to larger graphs. In the unsupervised case, we combine graph normalizing flows with a novel graph auto-encoder to c… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  8. arXiv:1902.07257  [pdf, other

    cs.LG stat.ML

    DOM-Q-NET: Grounded RL on Structured Language

    Authors: Sheng Jia, Jamie Kiros, Jimmy Ba

    Abstract: Building agents to interact with the web would allow for significant improvements in knowledge understanding and representation learning. However, web navigation tasks are difficult for current deep reinforcement learning (RL) models due to the large discrete action space and the varying number of actions between the states. In this work, we introduce DOM-Q-NET, a novel architecture for RL-based w… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.

    Comments: International Conference on Learning Representations (ICLR), 2019

  9. arXiv:1902.04546  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

    Authors: Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

    Abstract: Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR's advi… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  10. arXiv:1902.03249  [pdf, other

    cs.CL cs.LG stat.ML

    Insertion Transformer: Flexible Sequence Generation via Insertion Operations

    Authors: Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit

    Abstract: We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations. Unlike typical autoregressive models which rely on a fixed, often left-to-right ordering of the output, our approach accommodates arbitrary orderings by allowing for tokens to be inserted anywhere in the sequence during decoding. This flexibility confers a numbe… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  11. arXiv:1707.05612  [pdf, other

    cs.LG cs.CL cs.CV

    VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

    Authors: Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler

    Abstract: We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to common loss functions used for multi-modal embeddings. That, combined with fine-tuning and use of augmented data, yields significant gains in retrieval performa… ▽ More

    Submitted 29 July, 2018; v1 submitted 18 July, 2017; originally announced July 2017.

    Comments: Accepted as spotlight presentation at British Machine Vision Conference (BMVC) 2018. Code: https://github.com/fartashf/vsepp

  12. arXiv:1607.06450  [pdf, other

    stat.ML cs.LG

    Layer Normalization

    Authors: Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

    Abstract: Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that n… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.