Skip to main content

Showing 1–34 of 34 results for author: Shaker, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.05349  [pdf, ps, other

    cs.CV

    VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

    Authors: Hanoona Rasheed, Abdelrahman Shaker, Anqi Tang, Muhammad Maaz, Ming-Hsuan Yang, Salman Khan, Fahad Shahbaz Khan

    Abstract: Mathematical reasoning in real-world video settings presents a fundamentally different challenge than in static images or text. It requires interpreting fine-grained visual information, accurately reading handwritten or digital text, and integrating spoken cues, often dispersed non-linearly over time. In such multimodal contexts, success hinges not just on perception, but on selectively identifyin… ▽ More

    Submitted 24 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: VideoMathQA Technical Report

  2. arXiv:2506.05336  [pdf, ps, other

    cs.CV

    VideoMolmo: Spatio-Temporal Grounding Meets Pointing

    Authors: Ghazi Shazan Ahmad, Ahmed Heakl, Hanan Gani, Abdelrahman Shaker, Zhiqiang Shen, Ranjay Krishna, Fahad Shahbaz Khan, Salman Khan

    Abstract: Spatio-temporal localization is vital for precise interactions across diverse domains, from biological research to autonomous navigation and interactive interfaces. Current video-based approaches, while proficient in tracking, lack the sophisticated reasoning capabilities of large language models, limiting their contextual understanding and generalization. We introduce VideoMolmo, a large multimod… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 20 pages, 13 figures

  3. arXiv:2503.21782  [pdf, other

    cs.CV

    Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

    Authors: Abdelrahman Shaker, Muhammad Maaz, Chenhui Gou, Hamid Rezatofighi, Salman Khan, Fahad Shahbaz Khan

    Abstract: Video understanding models often struggle with high computational requirements, extensive parameter counts, and slow inference speed, making them inefficient for practical use. To tackle these challenges, we propose Mobile-VideoGPT, an efficient multimodal framework designed to operate with fewer than a billion parameters. Unlike traditional video large multimodal models (LMMs), Mobile-VideoGPT co… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Technical Report. Project Page: https://amshaker.github.io/Mobile-VideoGPT

  4. arXiv:2412.15499  [pdf, other

    cs.LG cs.AI cs.CV

    A Robust Prototype-Based Network with Interpretable RBF Classifier Foundations

    Authors: Sascha Saralajew, Ashish Rana, Thomas Villmann, Ammar Shaker

    Abstract: Prototype-based classification learning methods are known to be inherently interpretable. However, this paradigm suffers from major limitations compared to deep models, such as lower performance. This led to the development of the so-called deep Prototype-Based Networks (PBNs), also known as prototypical parts models. In this work, we analyze these models with respect to different properties, incl… ▽ More

    Submitted 17 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

    Comments: To appear at AAAI 2025. Includes the Appendix of the AAAI submission. In v2, the font size has been increased in some figures. In v3, an incorrect hyperparameter specification (Table 6; $λ$) has been corrected

  5. arXiv:2411.16508  [pdf, other

    cs.CV cs.CL

    All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

    Authors: Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani , et al. (44 additional authors not shown)

    Abstract: Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All La… ▽ More

    Submitted 30 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: A Multilingual Multimodal cultural benchmark for 100 languages

  6. arXiv:2407.13772  [pdf, other

    cs.CV

    GroupMamba: Efficient Group-Based Visual State Space Model

    Authors: Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Juergen Gall, Fahad Shahbaz Khan

    Abstract: State-space models (SSMs) have recently shown promise in capturing long-range dependencies with subquadratic computational complexity, making them attractive for various applications. However, purely SSM-based models face critical challenges related to stability and achieving state-of-the-art performance in computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for c… ▽ More

    Submitted 28 March, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted at CVPR-2025

  7. arXiv:2403.17937  [pdf, other

    cs.CV

    Efficient Video Object Segmentation via Modulated Cross-Attention Memory

    Authors: Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

    Abstract: Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation. However, these approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand the memory bank every few frames. We propose a transformer-based approach, named MAVOS, that introduces an optimized and dynamic long-term modulated cross-attenti… ▽ More

    Submitted 26 September, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: WACV 2025

  8. arXiv:2402.14818  [pdf, other

    cs.CL cs.CV

    PALO: A Polyglot Large Multimodal Model for 5B People

    Authors: Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan

    Abstract: In pursuit of more inclusive Vision-Language Models (VLMs), this study introduces a Large Multilingual Multimodal Model called PALO. PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population). Our approach involves a semi-automated tr… ▽ More

    Submitted 5 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Technical Report of PALO

  9. arXiv:2402.13812  [pdf, other

    cs.LG cs.SD eess.AS

    Voice-Driven Mortality Prediction in Hospitalized Heart Failure Patients: A Machine Learning Approach Enhanced with Diagnostic Biomarkers

    Authors: Nihat Ahmadli, Mehmet Ali Sarsil, Berk Mizrak, Kurtulus Karauzum, Ata Shaker, Erol Tulumen, Didar Mirzamidinov, Dilek Ural, Onur Ergen

    Abstract: Addressing heart failure (HF) as a prevalent global health concern poses difficulties in implementing innovative approaches for enhanced patient care. Predicting mortality rates in HF patients, in particular, is difficult yet critical, necessitating individualized care, proactive management, and enabling educated decision-making to enhance outcomes. Recently, the significance of voice biomarkers c… ▽ More

    Submitted 14 August, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 11 pages, 6 figures, 5 tables. The first 2 authors have contributed equally

  10. Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM

    Authors: Sahal Shaji Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

    Abstract: Climate change is one of the most significant challenges we face together as a society. Creating awareness and educating policy makers the wide-ranging impact of climate change is an essential step towards a sustainable future. Recently, Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. While these models are… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted to EMNLP 2023 (Findings)

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14126-14136

  11. arXiv:2311.16179  [pdf, other

    cs.CV cs.AI

    Next-gen traffic surveillance: AI-assisted mobile traffic violation detection system

    Authors: Dila Dede, Mehmet Ali Sarsıl, Ata Shaker, Olgu Altıntaş, Onur Ergen

    Abstract: Road traffic accidents pose a significant global public health concern, leading to injuries, fatalities, and vehicle damage. Approximately 1,3 million people lose their lives daily due to traffic accidents [World Health Organization, 2022]. Addressing this issue requires accurate traffic law violation detection systems to ensure adherence to regulations. The integration of Artificial Intelligence… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  12. arXiv:2311.03356  [pdf, other

    cs.CV cs.AI

    GLaMM: Pixel Grounding Large Multimodal Model

    Authors: Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

    Abstract: Large Multimodal Models (LMMs) extend Large Language Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses. Recently, region-level LMMs have been used to generate visually grounded responses. However, they are limited to only referring to a single object category at a time, require users to specify the regions, or cannot offer dens… ▽ More

    Submitted 1 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  13. arXiv:2306.09320  [pdf, other

    eess.IV cs.CV

    Learnable Weight Initialization for Volumetric Medical Image Segmentation

    Authors: Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

    Abstract: Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention. While mainly focusing on architectural modifications, most existing hybrid approaches still use conventional data-independent weight initialization schemes which restrict their performance due to ignoring the inherent volumetric nat… ▽ More

    Submitted 3 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted at Elsevier AI in Medicine Journal

  14. arXiv:2306.07971  [pdf, other

    cs.CV

    XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

    Authors: Omkar Thawakar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan

    Abstract: The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks. Such models are trained on massive datasets comprising billions of public image-text pairs with diverse tasks. However, their performance on task-specific domains, such as radiology, is still under-investigated and potentially limited due to… ▽ More

    Submitted 7 May, 2025; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted at ACL 2024-BIONLP Workshop. Code: https://github.com/mbzuai-oryx/XrayGPT

  15. arXiv:2304.03399  [pdf

    cs.CL

    Using LSTM and GRU With a New Dataset for Named Entity Recognition in the Arabic Language

    Authors: Alaa Shaker, Alaa Aldarf, Igor Bessmertny

    Abstract: Named entity recognition (NER) is a natural language processing task (NLP), which aims to identify named entities and classify them like person, location, organization, etc. In the Arabic language, we can find a considerable size of unstructured data, and it needs to different preprocessing tool than languages like (English, Russian, German...). From this point, we can note the importance of build… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: Proceedings of the 13th Majorov International Conference on Software Engineering and Computer Systems

  16. Uncertainty Propagation in Node Classification

    Authors: Zhao Xu, Carolin Lawrence, Ammar Shaker, Raman Siarheyeu

    Abstract: Quantifying predictive uncertainty of neural networks has recently attracted increasing attention. In this work, we focus on measuring uncertainty of graph neural networks (GNNs) for the task of node classification. Most existing GNNs model message passing among nodes. The messages are often deterministic. Questions naturally arise: Does there exist uncertainty in the messages? How could we propag… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  17. arXiv:2303.15446  [pdf, other

    cs.CV

    SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

    Authors: Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

    Abstract: Self-attention has become a defacto choice for capturing global context in various vision applications. However, its quadratic computational complexity with respect to image resolution limits its use in real-time applications, especially for deployment on resource-constrained mobile devices. Although hybrid approaches have been proposed to combine the advantages of convolutions and self-attention… ▽ More

    Submitted 25 July, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted at ICCV 2023

  18. arXiv:2212.04497  [pdf, other

    cs.CV

    UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

    Authors: Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

    Abstract: Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medi… ▽ More

    Submitted 4 May, 2024; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted at IEEE TMI-2024

  19. arXiv:2212.00424  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-Source Survival Domain Adaptation

    Authors: Ammar Shaker, Carolin Lawrence

    Abstract: Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis c… ▽ More

    Submitted 6 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: 37th AAAI Conference on Artificial Intelligence, 2023. Includes Appendix

  20. arXiv:2207.04447  [pdf, other

    cs.CL

    Human-Centric Research for NLP: Towards a Definition and Guiding Questions

    Authors: Bhushan Kotnis, Kiril Gashteovski, Julia Gastinger, Giuseppe Serra, Francesco Alesiani, Timo Sztyler, Ammar Shaker, Na Gong, Carolin Lawrence, Zhao Xu

    Abstract: With Human-Centric Research (HCR) we can steer research activities so that the research outcome is beneficial for human stakeholders, such as end users. But what exactly makes research human-centric? We address this question by providing a working definition and define how a research pipeline can be split into different stages in which human-centric components can be added. Additionally, we discus… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

  21. arXiv:2206.10589  [pdf, other

    cs.CV

    EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

    Authors: Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan

    Abstract: In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. It is of great interest to build resource-efficient general purpose networks due to their usefulness in several application areas. In this work, we strive to effectively combine the strengths… ▽ More

    Submitted 22 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted at ECCVW 2022 (Oral, CADL: Computational Aspects of Deep Learning)

    Report number: 197

  22. arXiv:2205.12749  [pdf, other

    cs.AI cs.HC

    A Human-Centric Assessment Framework for AI

    Authors: Sascha Saralajew, Ammar Shaker, Zhao Xu, Kiril Gashteovski, Bhushan Kotnis, Wiem Ben Rim, Jürgen Quittek, Carolin Lawrence

    Abstract: With the rise of AI systems in real-world applications comes the need for reliable and trustworthy AI. An essential aspect of this are explainable AI systems. However, there is no agreed standard on how explainable AI systems should be assessed. Inspired by the Turing test, we introduce a human-centric assessment framework where a leading domain expert accepts or rejects the solutions of an AI sys… ▽ More

    Submitted 1 July, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted as submission to ICML 2022 Workshop on Human-Machine Collaboration and Teaming

  23. arXiv:2110.08144  [pdf, other

    cs.CL cs.AI

    milIE: Modular & Iterative Multilingual Open Information Extraction

    Authors: Bhushan Kotnis, Kiril Gashteovski, Daniel Oñoro Rubio, Vanesa Rodriguez-Tembras, Ammar Shaker, Makoto Takamoto, Mathias Niepert, Carolin Lawrence

    Abstract: Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones by conditioning on the easy slots, and theref… ▽ More

    Submitted 25 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

  24. arXiv:2108.03531  [pdf, other

    cs.LG stat.ML

    Learning to Transfer with von Neumann Conditional Divergence

    Authors: Ammar Shaker, Shujian Yu, Daniel Oñoro-Rubio

    Abstract: The similarity of feature representations plays a pivotal role in the success of problems related to domain adaptation. Feature similarity includes both the invariance of marginal distributions and the closeness of conditional distributions given the desired response $y$ (e.g., class labels). Unfortunately, traditional methods always learn such features without fully taking into consideration the… ▽ More

    Submitted 6 January, 2022; v1 submitted 7 August, 2021; originally announced August 2021.

    Comments: Accepted at AAAI2022

  25. arXiv:2102.06777  [pdf, other

    cs.CV cs.LG

    INSTA-YOLO: Real-Time Instance Segmentation

    Authors: Eslam Mohamed, Abdelrahman Shaker, Ahmad El-Sallab, Mayada Hadhoud

    Abstract: Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different object of the scene, even if they belong to the same class. This is useful in various scenarios, especially in occlusions. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within… ▽ More

    Submitted 2 September, 2024; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted at ICML-2020 Workshop on AI for Autonomous Driving

  26. arXiv:2011.01272   

    cs.LG stat.ML

    Modular-Relatedness for Continual Learning

    Authors: Ammar Shaker, Shujian Yu, Francesco Alesiani

    Abstract: In this paper, we propose a continual learning (CL) technique that is beneficial to sequential task learners by improving their retained accuracy and reducing catastrophic forgetting. The principal target of our approach is the automatic extraction of modular parts of the neural network and then estimating the relatedness between the tasks given these modular components. This technique is applicab… ▽ More

    Submitted 17 January, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: We realized one conclusion in the submission is erroneous and disconnected from the results shown in one theorem is. We decide to withdraw the current version to avoid misleading conclusion

  27. arXiv:2011.01168  [pdf, other

    cs.LG

    Bilevel Continual Learning

    Authors: Ammar Shaker, Francesco Alesiani, Shujian Yu, Wenzhe Yin

    Abstract: Continual learning (CL) studies the problem of learning a sequence of tasks, one at a time, such that the learning of each new task does not lead to the deterioration in performance on the previously seen ones while exploiting previously learned features. This paper presents Bilevel Continual Learning (BiCL), a general framework for continual learning that fuses bilevel optimization and recent adv… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  28. arXiv:2009.05618  [pdf, other

    cs.LG stat.ML

    Learning an Interpretable Graph Structure in Multi-Task Learning

    Authors: Shujian Yu, Francesco Alesiani, Ammar Shaker, Wenzhe Yin

    Abstract: We present a novel methodology to jointly perform multi-task learning and infer intrinsic relationship among tasks by an interpretable and sparse graph. Unlike existing multi-task learning methodologies, the graph structure is not assumed to be known a priori or estimated separately in a preprocessing step. Instead, our graph is learned simultaneously with model parameters of each task, thus it re… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: 11 pages, 7 figures

  29. arXiv:2009.05483  [pdf, other

    cs.LG stat.ML

    Towards Interpretable Multi-Task Learning Using Bilevel Programming

    Authors: Francesco Alesiani, Shujian Yu, Ammar Shaker, Wenzhe Yin

    Abstract: Interpretable Multi-Task Learning can be expressed as learning a sparse graph of the task relationship based on the prediction performance of the learned models. Since many natural phenomenon exhibit sparse structures, enforcing sparsity on learned models reveals the underlying task relationship. Moreover, different sparsification degrees from a fully connected graph uncover various types of struc… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: Manuscript accepted at ECML PKDD 2020

  30. arXiv:2005.02196  [pdf, other

    cs.LG cs.IT stat.ML

    Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications

    Authors: Shujian Yu, Ammar Shaker, Francesco Alesiani, Jose C. Principe

    Abstract: We propose a simple yet powerful test statistic to quantify the discrepancy between two conditional distributions. The new statistic avoids the explicit estimation of the underlying distributions in highdimensional space and it operates on the cone of symmetric positive semidefinite (SPS) matrix using the Bregman matrix divergence. Moreover, it inherits the merits of the correntropy function to ex… ▽ More

    Submitted 28 December, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

    Comments: manuscript accepted at IJCAI 20; added additional notes on computational complexity and auto-differentiable property; code is available at https://github.com/SJYuCNEL/Bregman-Correntropy-Conditional-Divergence

  31. arXiv:1911.03951  [pdf, ps, other

    cs.LG stat.ML

    TSK-Streams: Learning TSK Fuzzy Systems on Data Streams

    Authors: Ammar Shaker, Eyke Hüllermeier

    Abstract: The problem of adaptive learning from evolving and possibly non-stationary data streams has attracted a lot of interest in machine learning in the recent past, and also stimulated research in related fields, such as computational intelligence and fuzzy systems. In particular, several rule-based methods for the incremental induction of regression models have been proposed. In this paper, we develop… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

  32. arXiv:1906.05388  [pdf, other

    cs.CV

    Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors

    Authors: Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, Babak N. Araabi

    Abstract: We present a simple and effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed. During network training, we carefully feed in localization information. We excite certain activations in order to help the network learn to better localize. In the later stages of training, we gradually reduce our assisted excitation to zero. We reached a… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  33. arXiv:1811.05695  [pdf, other

    cs.LG stat.ML

    Efficient and Scalable Multi-task Regression on Massive Number of Tasks

    Authors: Xiao He, Francesco Alesiani, Ammar Shaker

    Abstract: Many real-world large-scale regression problems can be formulated as Multi-task Learning (MTL) problems with a massive number of tasks, as in retail and transportation domains. However, existing MTL methods still fail to offer both the generalization performance and the scalability for such problems. Scaling up MTL methods to problems with a tremendous number of tasks is a big challenge. Here, we… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: Accepted at AAAI 2019

  34. arXiv:1804.06207  [pdf, other

    cs.LG stat.ML

    MetaBags: Bagged Meta-Decision Trees for Regression

    Authors: Jihed Khiari, Luis Moreira-Matias, Ammar Shaker, Bernard Zenko, Saso Dzeroski

    Abstract: Ensembles are popular methods for solving practical supervised learning problems. They reduce the risk of having underperforming models in production-grade software. Although critical, methods for learning heterogeneous regression ensembles have not been proposed at large scale, whereas in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Reg… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.