Search | arXiv e-print repository

Exploring the Impact of Temperature on Large Language Models:Hot or Cold?

Authors: Lujun Li, Lama Sleem, Niccolo' Gentile, Geoffrey Nichil, Radu State

Abstract: The sampling temperature, a critical hyperparameter in large language models (LLMs), modifies the logits before the softmax layer, thereby reshaping the distribution of output tokens. Recent studies have challenged the Stochastic Parrots analogy by demonstrating that LLMs are capable of understanding semantics rather than merely memorizing data and that randomness, modulated by sampling temperatur… ▽ More The sampling temperature, a critical hyperparameter in large language models (LLMs), modifies the logits before the softmax layer, thereby reshaping the distribution of output tokens. Recent studies have challenged the Stochastic Parrots analogy by demonstrating that LLMs are capable of understanding semantics rather than merely memorizing data and that randomness, modulated by sampling temperature, plays a crucial role in model inference. In this study, we systematically evaluated the impact of temperature in the range of 0 to 2 on data sets designed to assess six different capabilities, conducting statistical analyses on open source models of three different sizes: small (1B--4B), medium (6B--13B), and large (40B--80B). Our findings reveal distinct skill-specific effects of temperature on model performance, highlighting the complexity of optimal temperature selection in practical applications. To address this challenge, we propose a BERT-based temperature selector that takes advantage of these observed effects to identify the optimal temperature for a given prompt. We demonstrate that this approach can significantly improve the performance of small and medium models in the SuperGLUE datasets. Furthermore, our study extends to FP16 precision inference, revealing that temperature effects are consistent with those observed in 4-bit quantized models. By evaluating temperature effects up to 4.0 in three quantized models, we find that the Mutation Temperature -- the point at which significant performance changes occur -- increases with model size. △ Less

Submitted 8 June, 2025; originally announced June 2025.

arXiv:2505.16078 [pdf, ps, other]

Small Language Models in the Real World: Insights from Industrial Text Classification

Authors: Lujun Li, Lama Sleem, Niccolo' Gentile, Geoffrey Nichil, Radu State

Abstract: With the emergence of ChatGPT, Transformer models have significantly advanced text classification and related tasks. Decoder-only models such as Llama exhibit strong performance and flexibility, yet they suffer from inefficiency on inference due to token-by-token generation, and their effectiveness in text classification tasks heavily depends on prompt quality. Moreover, their substantial GPU reso… ▽ More With the emergence of ChatGPT, Transformer models have significantly advanced text classification and related tasks. Decoder-only models such as Llama exhibit strong performance and flexibility, yet they suffer from inefficiency on inference due to token-by-token generation, and their effectiveness in text classification tasks heavily depends on prompt quality. Moreover, their substantial GPU resource requirements often limit widespread adoption. Thus, the question of whether smaller language models are capable of effectively handling text classification tasks emerges as a topic of significant interest. However, the selection of appropriate models and methodologies remains largely underexplored. In this paper, we conduct a comprehensive evaluation of prompt engineering and supervised fine-tuning methods for transformer-based text classification. Specifically, we focus on practical industrial scenarios, including email classification, legal document categorization, and the classification of extremely long academic texts. We examine the strengths and limitations of smaller models, with particular attention to both their performance and their efficiency in Video Random-Access Memory (VRAM) utilization, thereby providing valuable insights for the local deployment and application of compact models in industrial settings. △ Less

Submitted 23 June, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

Comments: This paper has been accepted as a conference paper in the Industry Track of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)

arXiv:2503.24102 [pdf, ps, other]

Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?

Authors: Yewei Song, Lujun Li, Cedric Lothritz, Saad Ezzini, Lama Sleem, Niccolo Gentile, Radu State, Tegawendé F. Bissyandé, Jacques Klein

Abstract: Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. While recent advances in Large Language Models (LLMs) and Neural Machine Translation have substantially improved translation capabilities for high-resource languages, performance disparities persist for LRLs, particularl… ▽ More Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. While recent advances in Large Language Models (LLMs) and Neural Machine Translation have substantially improved translation capabilities for high-resource languages, performance disparities persist for LRLs, particularly impacting privacy-sensitive and resource-constrained scenarios. This paper systematically evaluates current LLMs in 200 languages using the FLORES-200 benchmark and demonstrates their limitations in LRL translation capability. We also explore alternative data sources, including news articles and bilingual dictionaries, and demonstrate how knowledge distillation from large pre-trained teacher models can significantly improve the performance of small LLMs on LRL translation tasks. For example, this approach increases EN->LB with the LLM-as-a-Judge score on the validation set from 0.36 to 0.89 for Llama-3.2-3B. Furthermore, we examine different fine-tuning configurations, providing practical insights on optimal data scale, training efficiency, and the preservation of generalization capabilities of models under study. △ Less

Submitted 5 June, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

arXiv:2009.01905 [pdf, other]

Frequency-Dependent Material Motion Benchmarks for Radiative Transfer

Authors: Ryan G. McClarren, N. A. Gentile

Abstract: We present a general solution for the radiation intensity in front of a purely absorbing slab moving toward an observer at constant speed and with a constant temperature. The solution is obtained by integrating the lab-frame radiation transport equation through the slab to the observer. We present comparisons between our benchmark and results from the Kull simulation code for an aluminum slab movi… ▽ More We present a general solution for the radiation intensity in front of a purely absorbing slab moving toward an observer at constant speed and with a constant temperature. The solution is obtained by integrating the lab-frame radiation transport equation through the slab to the observer. We present comparisons between our benchmark and results from the Kull simulation code for an aluminum slab moving toward the observer at 2% the speed-of-light. We demonstrate that ignoring certain material motion correction terms in the transport equation can lead to 20-80% errors with the error magnitude growing as the frequency resolution is improved. Our results also indicate that our benchmark can identify potential errors in the implementation of material motion corrections. △ Less

Submitted 3 August, 2020; originally announced September 2020.

Comments: Submitted to ANS M&C 2021 - The International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering

Report number: LLNL-ABS-813186

Showing 1–4 of 4 results for author: Gentile, N