Skip to main content

Showing 1–6 of 6 results for author: Rintamaki, T

.
  1. arXiv:2504.15271  [pdf, other

    cs.CV

    Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

    Authors: Guo Chen, Zhiqi Li, Shihao Wang, Jindong Jiang, Yicheng Liu, Lidong Lu, De-An Huang, Wonmin Byeon, Matthieu Le, Tuomas Rintamaki, Tyler Poon, Max Ehrlich, Tuomas Rintamaki, Tyler Poon, Tong Lu, Limin Wang, Bryan Catanzaro, Jan Kautz, Andrew Tao, Zhiding Yu, Guilin Liu

    Abstract: We introduce Eagle 2.5, a family of frontier vision-language models (VLMs) for long-context multimodal learning. Our work addresses the challenges in long video comprehension and high-resolution image understanding, introducing a generalist framework for both tasks. The proposed training framework incorporates Automatic Degrade Sampling and Image Area Preservation, two techniques that preserve con… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  2. arXiv:2504.03624  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  3. arXiv:2501.14818  [pdf, other

    cs.CV cs.AI cs.LG

    Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

    Authors: Zhiqi Li, Guo Chen, Shilong Liu, Shihao Wang, Vibashan VS, Yishen Ji, Shiyi Lan, Hao Zhang, Yilin Zhao, Subhashree Radhakrishnan, Nadine Chang, Karan Sapra, Amala Sanjay Deshmukh, Tuomas Rintamaki, Matthieu Le, Ilia Karmanov, Lukas Voegtle, Philipp Fischer, De-An Huang, Timo Roman, Tong Lu, Jose M. Alvarez, Bryan Catanzaro, Jan Kautz, Andrew Tao , et al. (2 additional authors not shown)

    Abstract: Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, s… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  4. arXiv:2501.14756  [pdf, other

    cs.CY cs.AI

    Towards An Automated AI Act FRIA Tool That Can Reuse GDPR's DPIA

    Authors: Tytti Rintamaki, Harshvardhan J. Pandit

    Abstract: The AI Act introduces the obligation to conduct a Fundamental Rights Impact Assessment (FRIA), with the possibility to reuse a Data Protection Impact Assessment (DPIA), and requires the EU Commission to create of an automated tool to support the FRIA process. In this article, we provide our novel exploration of the DPIA and FRIA as information processes to enable the creation of automated tools. W… ▽ More

    Submitted 23 December, 2024; originally announced January 2025.

    Comments: Presented at CLAIRvoyant (ConventicLE on Artificial Intelligence Regulation) Workshop 2024

  5. arXiv:2501.10391  [pdf, other

    cs.CY cs.AI

    Developing an Ontology for AI Act Fundamental Rights Impact Assessments

    Authors: Tytti Rintamaki, Harshvardhan J. Pandit

    Abstract: The recently published EU Artificial Intelligence Act (AI Act) is a landmark regulation that regulates the use of AI technologies. One of its novel requirements is the obligation to conduct a Fundamental Rights Impact Assessment (FRIA), where organisations in the role of deployers must assess the risks of their AI system regarding health, safety, and fundamental rights. Another novelty in the AI A… ▽ More

    Submitted 19 December, 2024; originally announced January 2025.

    Comments: Presented at CLAIRvoyant (ConventicLE on Artificial Intelligence Regulation) Workshop 2024

  6. arXiv:2409.11402  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    NVLM: Open Frontier-Class Multimodal LLMs

    Authors: Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuolin Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

    Abstract: We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training. In terms of model desi… ▽ More

    Submitted 22 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Fixed the typos. For more information, please visit our project page at: https://research.nvidia.com/labs/adlr/NVLM-1