Skip to main content

Showing 1–7 of 7 results for author: Erdoğan, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.15212  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Scaling 4D Representations

    Authors: João Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, Aravindh Mahendran, Thomas Albert Keck, Joseph Heyward, Skanda Koppula, Etienne Pot, Goker Erdogan, Yana Hasson, Yi Yang, Klaus Greff, Guillaume Le Moing, Sjoerd van Steenkiste, Daniel Zoran, Drew A. Hudson, Pedro Vélez, Luisa Polanía, Luke Friedman, Chris Duvarney, Ross Goroshin, Kelsey Allen, Jacob Walker , et al. (10 additional authors not shown)

    Abstract: Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this paper we focus on evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose… ▽ More

    Submitted 9 July, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2401.09865  [pdf, other

    cs.CV cs.AI cs.LG

    Improving fine-grained understanding in image-text pre-training

    Authors: Ioana Bica, Anastasija Ilić, Matthias Bauer, Goker Erdogan, Matko Bošnjak, Christos Kaplanis, Alexey A. Gritsenko, Matthias Minderer, Charles Blundell, Razvan Pascanu, Jovana Mitrović

    Abstract: We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language to… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 26 pages

  4. arXiv:2206.10735  [pdf, ps, other

    cs.IT

    Signature Codes for a Noisy Adder Multiple Access Channel

    Authors: Gökberk Erdoğan, Georg Maringer, Nikita Polyanskii

    Abstract: In this work, we consider $q$-ary signature codes of length $k$ and size $n$ for a noisy adder multiple access channel. A signature code in this model has the property that any subset of codewords can be uniquely reconstructed based on any vector that is obtained from the sum (over integers) of these codewords. We show that there exists an algorithm to construct a signature code of length… ▽ More

    Submitted 23 July, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: 12 pages, 0 figures, submitted to 2022 IEEE Information Theory Workshop

  5. arXiv:2106.03849  [pdf, other

    cs.CV cs.LG

    SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

    Authors: Rishabh Kabra, Daniel Zoran, Goker Erdogan, Loic Matthey, Antonia Creswell, Matthew Botvinick, Alexander Lerchner, Christopher P. Burgess

    Abstract: To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is especially difficult when scene structure needs to be inferred while also estimating the agent's location/viewpoint, as the two variables jointly give rise to t… ▽ More

    Submitted 6 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Animated figures are available at https://sites.google.com/view/simone-scene-understanding/

  6. arXiv:1409.6745  [pdf, other

    cs.CV

    A Concept Learning Approach to Multisensory Object Perception

    Authors: Ifeoma Nwogu, Goker Erdogan, Ilker Yildirim, Robert Jacobs

    Abstract: This paper presents a computational model of concept learning using Bayesian inference for a grammatically structured hypothesis space, and test the model on multisensory (visual and haptics) recognition of 3D objects. The study is performed on a set of artificially generated 3D objects known as fribbles, which are complex, multipart objects with categorical structures. The goal of this work is to… ▽ More

    Submitted 23 September, 2014; originally announced September 2014.

    Comments: 6 pages and 6 figures

  7. arXiv:1404.6696  [pdf, other

    cs.AI

    Hybrid Metaheuristics for the Clustered Vehicle Routing Problem

    Authors: Thibaut Vidal, Maria Battarra, Anand Subramanian, Güneş Erdoǧan

    Abstract: The Clustered Vehicle Routing Problem (CluVRP) is a variant of the Capacitated Vehicle Routing Problem in which customers are grouped into clusters. Each cluster has to be visited once, and a vehicle entering a cluster cannot leave it until all customers have been visited. This article presents two alternative hybrid metaheuristic algorithms for the CluVRP. The first algorithm is based on an Itera… ▽ More

    Submitted 26 April, 2014; originally announced April 2014.

    Comments: Working Paper, MIT -- 22 pages