Skip to main content

Showing 1–15 of 15 results for author: Green, J R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18051  [pdf, other

    cs.CV

    LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

    Authors: Anthony Fuller, Yousef Yassin, Junfeng Wen, Daniel G. Kyrollos, Tarek Ibrahim, James R. Green, Evan Shelhamer

    Abstract: Vision transformers are ever larger, more accurate, and more expensive to compute. The expense is even more extreme at high resolution as the number of tokens grows quadratically with the image size. We turn to adaptive computation to cope with this cost by learning to predict where to compute. Our LookWhere method divides the computation between a low-resolution selector and a high-resolution ext… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2502.15021  [pdf, other

    cs.CV

    Simpler Fast Vision Transformers with a Jumbo CLS Token

    Authors: Anthony Fuller, Yousef Yassin, Daniel G. Kyrollos, Evan Shelhamer, James R. Green

    Abstract: We introduce a simple enhancement of vision transformers (ViTs) to improve accuracy while maintaining throughput. Our approach, Jumbo, creates a wider CLS token, which is split to match the patch token width before attention, processed with self-attention, and reassembled. After attention, Jumbo applies a dedicated, wider FFN to this token. Since there is only one Jumbo token, its cost is minimal,… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  3. arXiv:2502.09356  [pdf, ps, other

    cs.CV

    Galileo: Learning Global & Local Features of Many Remote Sensing Modalities

    Authors: Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Herzog, Patrick Beukema, Favyen Bastani, James R. Green, Evan Shelhamer, Hannah Kerner, David Rolnick

    Abstract: We introduce a highly multimodal transformer to represent many remote sensing modalities - multispectral optical, synthetic aperture radar, elevation, weather, pseudo-labels, and more - across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the d… ▽ More

    Submitted 4 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

  4. arXiv:2409.09190  [pdf, other

    eess.AS cs.SD

    Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

    Authors: Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green

    Abstract: Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Interspeech 2024

  5. arXiv:2405.13985  [pdf, other

    cs.CV

    LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

    Authors: Anthony Fuller, Daniel G. Kyrollos, Yousef Yassin, James R. Green

    Abstract: High-resolution images offer more information about scenes that can improve model accuracy. However, the dominant model architecture in computer vision, the vision transformer (ViT), cannot effectively leverage larger images without finetuning -- ViTs poorly extrapolate to more patches at test time, although transformers offer sequence length flexibility. We attribute this shortcoming to the curre… ▽ More

    Submitted 29 October, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024 Camera Ready

  6. arXiv:2311.00566  [pdf, other

    cs.CV

    CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders

    Authors: Anthony Fuller, Koreen Millard, James R. Green

    Abstract: A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled, spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable. We present CROMA: a framework that combines contrastive and reconstruction self-supervised objectives to learn rich unimodal and multimodal representations. Our method separately encodes masked-out multispectral opti… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023 Camera Ready

  7. arXiv:2211.00089  [pdf, other

    eess.AS cs.CL cs.SD

    An analysis of degenerating speech due to progressive dysarthria on ASR performance

    Authors: Katrin Tomanek, Katie Seaver, Pan-Pan Jiang, Richard Cave, Lauren Harrel, Jordan R. Green

    Abstract: Although personalized automatic speech recognition (ASR) models have recently been designed to recognize even severely impaired speech, model performance may degrade over time for persons with degenerating speech. The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  8. arXiv:2210.11616  [pdf, other

    cs.LG

    Generalized Reciprocal Perspective

    Authors: Kevin Dick, Daniel G. Kyrollos, James R. Green

    Abstract: Across many domains, real-world problems can be represented as a network. Nodes represent domain-specific elements and edges capture the relationship between elements. Leveraging high-performance computing and optimized link prediction algorithms, it is increasingly possible to evaluate every possible combination of nodal pairs enabling the generation of a comprehensive prediction matrix (CPM) tha… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  9. Under the Cover Infant Pose Estimation using Multimodal Data

    Authors: Daniel G. Kyrollos, Anthony Fuller, Kim Greenwood, JoAnn Harrold, James R. Green

    Abstract: Infant pose monitoring during sleep has multiple applications in both healthcare and home settings. In a healthcare setting, pose detection can be used for region of interest detection and movement detection for noncontact based monitoring systems. In a home setting, pose detection can be used to detect sleep positions which has shown to have a strong influence on multiple health factors. However,… ▽ More

    Submitted 15 February, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

  10. arXiv:2209.14969  [pdf

    cs.CV

    Transfer Learning with Pretrained Remote Sensing Transformers

    Authors: Anthony Fuller, Koreen Millard, James R. Green

    Abstract: Although the remote sensing (RS) community has begun to pretrain transformers (intended to be fine-tuned on RS tasks), it is unclear how these models perform under distribution shifts. Here, we pretrain a new RS transformer--called SatViT-V2--on 1.3 million satellite-derived RS images, then fine-tune it (along with five other models) to investigate how it performs on distributions not seen during… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: Draft of manuscript that is being prepared for IEEE TGRS

  11. arXiv:2209.04406  [pdf, other

    q-bio.NC cs.SD eess.AS

    Longitudinal Acoustic Speech Tracking Following Pediatric Traumatic Brain Injury

    Authors: Camille Noufi, Adam C. Lammert, Daryush D. Mehta, James R. Williamson, Gregory Ciccarelli, Douglas Sturim, Jordan R. Green, Thomas F. Quatieri, Thomas F. Campbell

    Abstract: Recommendations for common outcome measures following pediatric traumatic brain injury (TBI) support the integration of instrumental measurements alongside perceptual assessment in recovery and treatment plans. A comprehensive set of sensitive, robust and non-invasive measurements is therefore essential in assessing variations in speech characteristics over time following pediatric TBI. In this ar… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  12. arXiv:2208.14621  [pdf, other

    cs.CV cs.SD

    Audiogram Digitization Tool for Audiological Reports

    Authors: François Charih, James R. Green

    Abstract: A number of private and public insurers compensate workers whose hearing loss can be directly attributed to excessive exposure to noise in the workplace. The claim assessment process is typically lengthy and requires significant effort from human adjudicators who must interpret hand-recorded audiograms, often sent via fax or equivalent. In this work, we present a solution developed in partnership… ▽ More

    Submitted 13 September, 2022; v1 submitted 30 August, 2022; originally announced August 2022.

  13. arXiv:2107.03985  [pdf, other

    eess.AS cs.LG cs.SD

    Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases

    Authors: Subhashini Venugopalan, Joel Shor, Manoj Plakal, Jimmy Tobin, Katrin Tomanek, Jordan R. Green, Michael P. Brenner

    Abstract: Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of diso… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: Accepted at INTERSPEECH 2021

  14. arXiv:2104.07310  [pdf, other

    eess.AS cs.SD

    Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measures for the Assessment and Monitoring of Amyotrophic Lateral Sclerosis at Scale

    Authors: Michael Neumann, Oliver Roesler, Jackson Liscombe, Hardik Kothare, David Suendermann-Oeft, David Pautler, Indu Navar, Aria Anvar, Jochen Kumm, Raquel Norel, Ernest Fraenkel, Alexander V. Sherman, James D. Berry, Gary L. Pattee, Jun Wang, Jordan R. Green, Vikram Ramanarayanan

    Abstract: We propose a cloud-based multimodal dialog platform for the remote assessment and monitoring of Amyotrophic Lateral Sclerosis (ALS) at scale. This paper presents our vision, technology setup, and an initial investigation of the efficacy of the various acoustic and visual speech metrics automatically extracted by the platform. 82 healthy controls and 54 people with ALS (pALS) were instructed to int… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  15. A Sparse Non-negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior from MRI

    Authors: Jonghye Woo, Jerry L. Prince, Maureen Stone, Fangxu Xing, Arnold Gomez, Jordan R. Green, Christopher J. Hartnick, Thomas J. Brady, Timothy G. Reese, Van J. Wedeen, Georges El Fakhri

    Abstract: Muscle coordination patterns of lingual behaviors are synergies generated by deforming local muscle groups in a variety of ways. Functional units are functional muscle groups of local structural elements within the tongue that compress, expand, and move in a cohesive and consistent manner. Identifying the functional units using tagged-Magnetic Resonance Imaging (MRI) sheds light on the mechanisms… ▽ More

    Submitted 29 September, 2018; v1 submitted 15 April, 2018; originally announced April 2018.

    Comments: Accepted at IEEE TMI (https://ieeexplore.ieee.org/document/8467354)