-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Nudge: Haptic Pre-Cueing to Communicate Automotive Intent
Authors:
Nikhil Gowda,
Srinath Sibi,
Sonia Baltodano,
Nikolas Martelaro,
Rohan Maheshwari,
David Milller,
Wendy Ju
Abstract:
To increase driver awareness in a fully autonomous vehicle, we developed several haptic interaction prototypes that signal what the car is planning to do next. The goal was to use haptic cues so that the driver could be situation aware but not distracted from the non-driving tasks they may be engaged in. This paper discusses the three prototypes tested and the guiding metaphor behind each concept.…
▽ More
To increase driver awareness in a fully autonomous vehicle, we developed several haptic interaction prototypes that signal what the car is planning to do next. The goal was to use haptic cues so that the driver could be situation aware but not distracted from the non-driving tasks they may be engaged in. This paper discusses the three prototypes tested and the guiding metaphor behind each concept. We also highlight the Wizard of Oz protocol adopted to test the haptic interaction prototypes and some key findings from the pilot study.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results
Authors:
Yuekun Dai,
Dafeng Zhang,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangchen Zhou,
Ruicheng Feng,
Peiqing Yang,
Zhezhu Jin,
Guanqun Liu,
Chen Change Loy,
Lize Zhang,
Shuai Liu,
Chaoyu Feng,
Luyang Wang,
Shuan Chen,
Guangqi Shao,
Xiaotao Wang,
Lei Lei,
Qirui Yang,
Qihua Cheng,
Zhiqiang Xu,
Yihao Liu,
Huanjing Yue,
Jingyu Yang
, et al. (38 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 27 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Machine Learning For Classification Of Antithetical Emotional States
Authors:
Jeevanshi Sharma,
Rajat Maheshwari,
Yusuf Uzzaman Khan
Abstract:
Emotion Classification through EEG signals has achieved many advancements. However, the problems like lack of data and learning the important features and patterns have always been areas with scope for improvement both computationally and in prediction accuracy. This works analyses the baseline machine learning classifiers' performance on DEAP Dataset along with a tabular learning approach that pr…
▽ More
Emotion Classification through EEG signals has achieved many advancements. However, the problems like lack of data and learning the important features and patterns have always been areas with scope for improvement both computationally and in prediction accuracy. This works analyses the baseline machine learning classifiers' performance on DEAP Dataset along with a tabular learning approach that provided state-of-the-art comparable results leveraging the performance boost due to its deep learning architecture without deploying heavy neural networks.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
QuickSync: A Quickly Synchronizing PoS-Based Blockchain Protocol
Authors:
Shoeb Siddiqui,
Varul Srivastava,
Raj Maheshwari,
Sujit Gujar
Abstract:
To implement a blockchain, we need a blockchain protocol for all the nodes to follow. To design a blockchain protocol, we need a block publisher selection mechanism and a chain selection rule. In Proof-of-Stake (PoS) based blockchain protocols, block publisher selection mechanism selects the node to publish the next block based on the relative stake held by the node. However, PoS protocols, such a…
▽ More
To implement a blockchain, we need a blockchain protocol for all the nodes to follow. To design a blockchain protocol, we need a block publisher selection mechanism and a chain selection rule. In Proof-of-Stake (PoS) based blockchain protocols, block publisher selection mechanism selects the node to publish the next block based on the relative stake held by the node. However, PoS protocols, such as Ouroboros v1, may face vulnerability to fully adaptive corruptions.
In this paper, we propose a novel PoS-based blockchain protocol, QuickSync, to achieve security against fully adaptive corruptions while improving on performance. We propose a metric called block power, a value defined for each block, derived from the output of the verifiable random function based on the digital signature of the block publisher. With this metric, we compute chain power, the sum of block powers of all the blocks comprising the chain, for all the valid chains. These metrics are a function of the block publisher's stake to enable the PoS aspect of the protocol. The chain selection rule selects the chain with the highest chain power as the one to extend. This chain selection rule hence determines the selected block publisher of the previous block. When we use metrics to define the chain selection rule, it may lead to vulnerabilities against Sybil attacks. QuickSync uses a Sybil attack resistant function implemented using histogram matching. We prove that QuickSync satisfies common prefix, chain growth, and chain quality properties and hence it is secure. We also show that it is resilient to different types of adversarial attack strategies. Our analysis demonstrates that QuickSync performs better than Bitcoin by an order of magnitude on both transactions per second and time to finality, and better than Ouroboros v1 by a factor of three on time to finality.
△ Less
Submitted 16 March, 2023; v1 submitted 7 May, 2020;
originally announced May 2020.
-
BHAAV- A Text Corpus for Emotion Analysis from Hindi Stories
Authors:
Yaman Kumar,
Debanjan Mahata,
Sagar Aggarwal,
Anmol Chugh,
Rajat Maheshwari,
Rajiv Ratn Shah
Abstract:
In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader. The corpus consists of 20,304 sentences collected from 230 different short stories spanning across 18 genres such as Inspirational and Mystery. Each sentence has been ann…
▽ More
In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader. The corpus consists of 20,304 sentences collected from 230 different short stories spanning across 18 genres such as Inspirational and Mystery. Each sentence has been annotated into one of the five emotion categories - anger, joy, suspense, sad, and neutral, by three native Hindi speakers with at least ten years of formal education in Hindi. We also discuss challenges in the annotation of low resource languages such as Hindi, and discuss the scope of the proposed corpus along with its possible uses. We also provide a detailed analysis of the dataset and train strong baseline classifiers reporting their performances.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.