Skip to main content

Showing 1–21 of 21 results for author: Tajbakhsh, N

.
  1. arXiv:2504.11409  [pdf, other

    cs.CL

    Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

    Authors: Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan, Marcin Chochowski, Yashaswi Karnati, Raviraj Joshi, Ameya Sunil Mahabaleshwarkar, Zijia Chen, Yoshi Suhara, Oluwatobi Olabiyi, Daniel Korzekwa, Mostofa Patwary, Mohammad Shoeybi, Jan Kautz, Bryan Catanzaro, Ashwath Aithal, Nima Tajbakhsh, Pavlo Molchanov

    Abstract: Hybrid LLM architectures that combine Attention and State Space Models (SSMs) achieve state-of-the-art accuracy and runtime performance. Recent work has demonstrated that applying compression and distillation to Attention-only models yields smaller, more accurate models at a fraction of the training cost. In this work, we explore the effectiveness of compressing Hybrid architectures. We introduce… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  2. arXiv:2504.03624  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  3. arXiv:2503.12964  [pdf, other

    cs.CV cs.AI cs.LG

    Training Video Foundation Models with NVIDIA NeMo

    Authors: Zeeshan Patel, Ethan He, Parth Mannan, Xiaowei Ren, Ryan Wolf, Niket Agarwal, Jacob Huffman, Zhuoyao Wang, Carl Wang, Jack Chang, Yan Bai, Tommy Huang, Linnan Wang, Sahil Jain, Shanmugam Ramasamy, Joseph Jennings, Ekaterina Sirazitdinova, Oleg Sudakov, Mingyuan Ma, Bobby Chen, Forrest Lin, Hao Wang, Vasanth Rao Naik Sabavat, Sriharsha Niverty, Rong Ou , et al. (4 additional authors not shown)

    Abstract: Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, mul… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  4. arXiv:2412.09952  [pdf, other

    cs.LG

    Llama 3 Meets MoE: Efficient Upcycling

    Authors: Aditya Vavre, Ethan He, Dennis Liu, Zijie Yan, June Yang, Nima Tajbakhsh, Ashwath Aithal

    Abstract: Scaling large language models (LLMs) significantly improves performance but comes with prohibitive computational costs. Mixture-of-Experts (MoE) models offer an efficient alternative, increasing capacity without a proportional rise in compute requirements. However, training MoE models from scratch poses challenges like overfitting and routing instability. We present an efficient training recipe le… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

  5. arXiv:2409.06493  [pdf, other

    cs.CV cs.AI

    Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models

    Authors: Rohit Jena, Ali Taghibakhshi, Sahil Jain, Gerald Shen, Nima Tajbakhsh, Arash Vahdat

    Abstract: Text-to-image (T2I) diffusion models have become prominent tools for generating high-fidelity images from text prompts. However, when trained on unfiltered internet data, these models can produce unsafe, incorrect, or stylistically undesirable images that are not aligned with human preferences. To address this, recent approaches have incorporated human preference datasets to fine-tune T2I models o… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  6. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  7. arXiv:2308.03349  [pdf, other

    cs.CL cs.AI cs.CV

    SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs

    Authors: Shengzhi Li, Nima Tajbakhsh

    Abstract: In this work, we present SciGraphQA, a synthetic multi-turn question-answer dataset related to academic graphs. SciGraphQA is 13 times larger than ChartVQA, the previously largest chart-visual question-answering dataset. It is also the largest open-sourced chart VQA dataset with non-synthetic charts. To build our dataset, we selected 290,000 Computer Science or Machine Learning ArXiv papers publis… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  8. arXiv:2203.06363  [pdf, other

    eess.IV cs.CV

    MDT-Net: Multi-domain Transfer by Perceptual Supervision for Unpaired Images in OCT Scan

    Authors: Weinan Song, Gaurav Fotedar, Nima Tajbakhsh, Ziheng Zhou, Lei He, Xiaowei Ding

    Abstract: Deep learning models tend to underperform in the presence of domain shifts. Domain transfer has recently emerged as a promising approach wherein images exhibiting a domain shift are transformed into other domains for augmentation or adaptation. However, with the absence of paired and annotated images, models merely learned by adversarial loss and cycle consistency loss could result in poor consist… ▽ More

    Submitted 25 October, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

  9. arXiv:2103.10178  [pdf, other

    cs.CV

    A Location-Sensitive Local Prototype Network for Few-Shot Medical Image Segmentation

    Authors: Qinji Yu, Kang Dang, Nima Tajbakhsh, Demetri Terzopoulos, Xiaowei Ding

    Abstract: Despite the tremendous success of deep neural networks in medical image segmentation, they typically require a large amount of costly, expert-level annotated data. Few-shot segmentation approaches address this issue by learning to transfer knowledge from limited quantities of labeled examples. Incorporating appropriate prior knowledge is critical in designing high-performance few-shot segmentation… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: ISBI2021 accepted

  10. arXiv:2004.11966  [pdf, other

    cs.CV cs.LG

    Extreme Consistency: Overcoming Annotation Scarcity and Domain Shifts

    Authors: Gaurav Fotedar, Nima Tajbakhsh, Shilpa Ananth, Xiaowei Ding

    Abstract: Supervised learning has proved effective for medical image analysis. However, it can utilize only the small labeled portion of data; it fails to leverage the large amounts of unlabeled data that is often available in medical image datasets. Supervised models are further handicapped by domain shifts, when the labeled dataset, despite being large enough, fails to cover different protocols or ethnici… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: submitted for peer-review on March 17

  11. arXiv:2003.13440  [pdf

    eess.IV cs.CV

    Computer Aided Detection for Pulmonary Embolism Challenge (CAD-PE)

    Authors: Germán González, Daniel Jimenez-Carretero, Sara Rodríguez-López, Carlos Cano-Espinosa, Miguel Cazorla, Tanya Agarwal, Vinit Agarwal, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang, Mojtaba Masoudi, Noushin Eftekhari, Mahdi Saadatmand, Hamid-Reza Pourreza, Patricia Fraga-Rivas, Eduardo Fraile, Frank J. Rybicki, Ara Kassarjian, Raúl San José Estépar, Maria J. Ledesma-Carbayo

    Abstract: Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: 8 pages, 3 figures

  12. arXiv:1912.05074  [pdf, other

    eess.IV cs.CV cs.LG

    UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

    Authors: Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang

    Abstract: The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, fo… ▽ More

    Submitted 28 January, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Journal of IEEE Transactions on Medical Imaging

  13. arXiv:1910.04814  [pdf, other

    eess.IV cs.CV cs.LG

    ErrorNet: Learning error representations from limited data to improve vascular segmentation

    Authors: Nima Tajbakhsh, Brian Lai, Shilpa Ananth, Xiaowei Ding

    Abstract: Deep convolutional neural networks have proved effective in segmenting lesions and anatomies in various medical imaging modalities. However, in the presence of small sample size and domain shift problems, these models often produce masks with non-intuitive segmentation mistakes. In this paper, we propose a segmentation framework called ErrorNet, which learns to correct these segmentation mistakes… ▽ More

    Submitted 1 February, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted in ISBI 2019. The supplementary material is only available in the arxiv version of our paper

  14. arXiv:1908.10454  [pdf, other

    eess.IV cs.CV cs.LG

    Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation

    Authors: Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey Chiang, Zhihao Wu, Xiaowei Ding

    Abstract: The medical imaging literature has witnessed remarkable progress in high-performing segmentation models based on convolutional neural networks. Despite the new performance highs, the recent advanced segmentation models still require large, representative, and high quality annotated datasets. However, rarely do we have a perfect training dataset, particularly in the field of medical imaging, where… ▽ More

    Submitted 11 February, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: Accepted for publication in the journal of Medical Image Analysis

  15. arXiv:1908.06965  [pdf, other

    eess.IV cs.CV

    Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization

    Authors: Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang

    Abstract: Generative adversarial networks (GANs) have ushered in a revolution in image-to-image translation. The development and proliferation of GANs raises an interesting question: can we train a GAN to remove an object, if present, from an image while otherwise preserving the image? Specifically, can a GAN "virtually heal" anyone by turning his medical image, with an unknown health status (diseased or he… ▽ More

    Submitted 29 August, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

  16. arXiv:1908.06912  [pdf, other

    eess.IV cs.CV

    Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis

    Authors: Zongwei Zhou, Vatsal Sodha, Md Mahfuzur Rahman Siddiquee, Ruibin Feng, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang

    Abstract: Transfer learning from natural image to medical image has established as one of the most practical paradigms in deep learning for medical image analysis. However, to fit this paradigm, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information and inevitably compromising the performance. To overcome t… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    Comments: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)

  17. Automatic Segmentation of Pulmonary Lobes Using a Progressive Dense V-Network

    Authors: Abdullah-Al-Zubaer Imran, Ali Hatamizadeh, Shilpa P. Ananth, Xiaowei Ding, Demetri Terzopoulos, Nima Tajbakhsh

    Abstract: Reliable and automatic segmentation of lung lobes is important for diagnosis, assessment, and quantification of pulmonary diseases. The existing techniques are prohibitively slow, undesirably rely on prior (airway/vessel) segmentation, and/or require user interactions for optimal results. This work presents a reliable, fast, and fully automated lung lobe segmentation based on a progressive dense V… ▽ More

    Submitted 17 February, 2019; originally announced February 2019.

  18. arXiv:1901.08707  [pdf, other

    cs.CV

    Surrogate Supervision for Medical Image Analysis: Effective Deep Learning From Limited Quantities of Labeled Data

    Authors: Nima Tajbakhsh, Yufei Hu, Junli Cao, Xingjian Yan, Yi Xiao, Yong Lu, Jianming Liang, Demetri Terzopoulos, Xiaowei Ding

    Abstract: We investigate the effectiveness of a simple solution to the common problem of deep learning in medical image analysis with limited quantities of labeled training data. The underlying idea is to assign artificial labels to abundantly available unlabeled medical images and, through a process known as surrogate supervision, pre-train a deep neural network model for the target medical image analysis… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Comments: Accepted in IEEE International Symposium on Biomedical Imaging (ISBI 2019)

  19. arXiv:1807.10165  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    UNet++: A Nested U-Net Architecture for Medical Image Segmentation

    Authors: Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang

    Abstract: In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

    Comments: 8 pages, 3 figures, 3 tables, accepted by 4th Deep Learning in Medical Image Analysis (DLMIA) Workshop

  20. arXiv:1706.00719  [pdf, other

    cs.CV cs.LG

    Automating Carotid Intima-Media Thickness Video Interpretation with Convolutional Neural Networks

    Authors: Jae Y. Shin, Nima Tajbakhsh, R. Todd Hurst, Christopher B. Kendall, Jianming Liang

    Abstract: Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable, offering several advantages over CT coronary artery calcium score. However, each CIMT examina… ▽ More

    Submitted 2 June, 2017; originally announced June 2017.

    Comments: J. Y. Shin, N. Tajbakhsh, R. T. Hurst, C. B. Kendall, and J. Liang. Automating carotid intima-media thickness video interpretation with convolutional neural networks. CVPR 2016, pp 2526-2535; N. Tajbakhsh, J. Y. Shin, R. T. Hurst, C. B. Kendall, and J. Liang. Automatic interpretation of CIMT videos using convolutional neural networks. Deep Learning for Medical Image Analysis, Academic Press, 2017

  21. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?

    Authors: Nima Tajbakhsh, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall, Michael B. Gotway, Jianming Liang

    Abstract: Training a deep convolutional neural network (CNN) from scratch is difficult because it requires a large amount of labeled training data and a great deal of expertise to ensure proper convergence. A promising alternative is to fine-tune a CNN that has been pre-trained using, for instance, a large set of labeled natural images. However, the substantial differences between natural and medical images… ▽ More

    Submitted 2 June, 2017; originally announced June 2017.

    Journal ref: IEEE Transactions on Medical Imaging. 35(5):1299-1312 (2016)