Skip to main content

Showing 1–13 of 13 results for author: Lausen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  2. arXiv:2502.12340  [pdf, other

    cs.LG cs.DC

    Understanding Silent Data Corruption in LLM Training

    Authors: Jeffrey Ma, Hengzhi Pei, Leonard Lausen, George Karypis

    Abstract: As the scale of training large language models (LLMs) increases, one emergent failure is silent data corruption (SDC), where hardware produces incorrect computations without explicit failure signals. In this work, we are the first to investigate the impact of real-world SDCs on LLM training by comparing model training between healthy production nodes and unhealthy nodes exhibiting SDCs. With the h… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  3. arXiv:2409.01483  [pdf, other

    cs.LG cs.CL

    Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning

    Authors: Soumajyoti Sarkar, Leonard Lausen, Volkan Cevher, Sheng Zha, Thomas Brox, George Karypis

    Abstract: Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. These models use conditionally activated feedforward subnetworks in transformer blocks, allowing for a separation between total model parameters and per-example computation. However, large token-routed SMoE models face a significant challenge: during inference, the entire model must… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2407.20061  [pdf, other

    cond-mat.mes-hall cs.ET cs.LG quant-ph

    Autonomous Bootstrapping of Quantum Dot Devices

    Authors: Anton Zubchenko, Danielle Middlebrooks, Torbjørn Rasmussen, Lara Lausen, Ferdinand Kuemmeth, Anasua Chatterjee, Justyna P. Zwolak

    Abstract: Semiconductor quantum dots (QDs) are a promising platform for multiple different qubit implementations, all of which are voltage controlled by programmable gate electrodes. However, as the QD arrays grow in size and complexity, tuning procedures that can fully autonomously handle the increasing number of control parameters are becoming essential for enabling scalability. We propose a bootstrapping… ▽ More

    Submitted 28 January, 2025; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures, 1 table

    Report number: NBI QDEV 2024

    Journal ref: Phys. Rev. Applied 23, 014072 (2025)

  5. arXiv:2310.00789  [pdf, ps, other

    cs.CL cs.LG

    Testing the Limits of Unified Sequence to Sequence LLM Pretraining on Diverse Table Data Tasks

    Authors: Soumajyoti Sarkar, Leonard Lausen

    Abstract: Tables stored in databases and tables which are present in web pages and articles account for a large part of semi-structured data that is available on the internet. It then becomes pertinent to develop a modeling approach with large language models (LLMs) that can be used to solve diverse table tasks such as semantic parsing, question answering as well as classification problems. Traditionally, t… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  6. arXiv:2307.08623  [pdf, other

    cs.LG cs.AI cs.CL

    HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

    Authors: Pei Chen, Soumajyoti Sarkar, Leonard Lausen, Balasubramaniam Srinivasan, Sheng Zha, Ruihong Huang, George Karypis

    Abstract: Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariance… ▽ More

    Submitted 26 October, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 (spotlight)

  7. arXiv:2306.03438  [pdf, other

    cs.LG cs.AI cs.CL cs.SE

    Large Language Models of Code Fail at Completing Code with Potential Bugs

    Authors: Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis

    Abstract: Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired… ▽ More

    Submitted 30 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 27 pages, accepted to NeurIPS 2023

  8. arXiv:2306.00381  [pdf, other

    cs.SE cs.LG

    Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

    Authors: Hengzhi Pei, Jinman Zhao, Leonard Lausen, Sheng Zha, George Karypis

    Abstract: Pretrained code language models have enabled great progress towards program synthesis. However, common approaches only consider in-file local context and thus miss information and constraints imposed by other parts of the codebase and its external dependencies. Existing code completion benchmarks also lack such context. To resolve these restrictions we curate a new dataset of permissively licensed… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 12 pages. Accepted to AAAI 2023

    ACM Class: I.2.2; I.2.7

  9. arXiv:2211.03966  [pdf, ps, other

    cs.CL cs.LG

    Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

    Authors: Soumajyoti Sarkar, Kaixiang Lin, Sailik Sengupta, Leonard Lausen, Sheng Zha, Saab Mansour

    Abstract: The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of s… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  10. arXiv:2204.11117  [pdf, other

    cs.CL cs.LG

    Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning

    Authors: Vishakh Padmakumar, Leonard Lausen, Miguel Ballesteros, Sheng Zha, He He, George Karypis

    Abstract: Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task re… ▽ More

    Submitted 12 July, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

    Comments: NAACL 2022 - Camera ready version

  11. arXiv:1907.04433  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

    Authors: Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

    Abstract: We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customiza… ▽ More

    Submitted 12 February, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-7

  12. arXiv:1712.05902  [pdf, other

    cs.LG cs.DC

    NSML: A Machine Learning Platform That Enables You to Focus on Your Models

    Authors: Nako Sung, Minkyu Kim, Hyunwoo Jo, Youngil Yang, Jingwoong Kim, Leonard Lausen, Youngkwan Kim, Gayoung Lee, Donghyun Kwak, Jung-Woo Ha, Sunghun Kim

    Abstract: Machine learning libraries such as TensorFlow and PyTorch simplify model implementation. However, researchers are still required to perform a non-trivial amount of manual tasks such as GPU allocation, training status tracking, and comparison of models with different hyperparameter settings. We propose a system to handle these tasks and help researchers focus on models. We present the requirements… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

    Comments: 8 pages, 4figures

  13. arXiv:1706.03458  [pdf, other

    cs.CV

    Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model

    Authors: Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo

    Abstract: With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that dee… ▽ More

    Submitted 5 October, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: NIPS 2017 Spotlight