Skip to main content

Showing 1–8 of 8 results for author: Gokay, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.01961  [pdf, other

    cs.CV

    Learning from Streaming Video with Orthogonal Gradients

    Authors: Tengda Han, Dilara Gokay, Joseph Heyward, Chuhan Zhang, Daniel Zoran, Viorica Pătrăucean, João Carreira, Dima Damen, Andrew Zisserman

    Abstract: We address the challenge of representation learning from a continuous stream of video as input, in a self-supervised manner. This differs from the standard approaches to video learning where videos are chopped and shuffled during training in order to create a non-redundant batch that satisfies the independently and identically distributed (IID) sample assumption expected by conventional training p… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: CVPR2025

  2. arXiv:2412.15212  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling 4D Representations

    Authors: João Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, Aravindh Mahendran, Thomas Albert Keck, Joseph Heyward, Skanda Koppula, Etienne Pot, Goker Erdogan, Yana Hasson, Yi Yang, Klaus Greff, Guillaume Le Moing, Sjoerd van Steenkiste, Daniel Zoran, Drew A. Hudson, Pedro Vélez, Luisa Polanía, Luke Friedman, Chris Duvarney, Ross Goroshin, Kelsey Allen, Jacob Walker , et al. (10 additional authors not shown)

    Abstract: Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this paper we focus on evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

  3. arXiv:2411.05927  [pdf, other

    cs.CV cs.AI cs.LG

    Moving Off-the-Grid: Scene-Grounded Video Representations

    Authors: Sjoerd van Steenkiste, Daniel Zoran, Yi Yang, Yulia Rubanova, Rishabh Kabra, Carl Doersch, Dilara Gokay, Joseph Heyward, Etienne Pot, Klaus Greff, Drew A. Hudson, Thomas Albert Keck, Joao Carreira, Alexey Dosovitskiy, Mehdi S. M. Sajjadi, Thomas Kipf

    Abstract: Current vision models typically maintain a fixed correspondence between their representation structure and image space. Each layer comprises a set of tokens arranged "on-the-grid," which biases patches or tokens to encode information at a specific spatio(-temporal) location. In this work we present Moving Off-the-Grid (MooG), a self-supervised video representation model that offers an alternative… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024 (spotlight). Project page: https://moog-paper.github.io/

  4. arXiv:2402.00847  [pdf, other

    cs.CV stat.ML

    BootsTAP: Bootstrapped Training for Tracking-Any-Point

    Authors: Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, Andrew Zisserman

    Abstract: To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulat… ▽ More

    Submitted 23 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2312.00598  [pdf, other

    cs.CV cs.AI

    Learning from One Continuous Video Stream

    Authors: João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

    Abstract: We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive video frames and there is very little prior work on it. Our framework allows us to do a first deep dive into the topic and includes a collection of str… ▽ More

    Submitted 28 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: CVPR camera ready version

  6. arXiv:2306.08637  [pdf, other

    cs.CV

    TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

    Authors: Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

    Abstract: We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on loc… ▽ More

    Submitted 30 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Published at ICCV 2023

  7. arXiv:2108.09752  [pdf, other

    cs.CV

    Graph2Pix: A Graph-Based Image to Image Translation Framework

    Authors: Dilara Gokay, Enis Simsar, Efehan Atici, Alper Ahmetoglu, Atif Emre Yuksel, Pinar Yanardag

    Abstract: In this paper, we propose a graph-based image-to-image translation framework for generating images. We use rich data collected from the popular creativity platform Artbreeder (http://artbreeder.com), where users interpolate multiple GAN-generated images to create artworks. This unique approach of creating new images leads to a tree-like structure where one can track historical data about the creat… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

  8. Analytical Derivation of the Impulse Response for the Bounded 2-D Diffusion Channel

    Authors: Fatih Dinc, Bayram Cevdet Akdeniz, Ecda Erol, Dilara Gokay, Ezgi Tekgul, Ali Emre Pusane, Tuna Tugcu

    Abstract: This paper focuses on the derivation of the distribution of diffused particles absorbed by an agent in a bounded environment. In particular, we analogously consider to derive the impulse response of a molecular communication channel in 2-D and 3-D environment. In 2-D, the channel involves a point transmitter that releases molecules to a circular absorbing receiver that absorbs incoming molecules i… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

    Comments: 13 pages and 5 figures