-
LMUnit: Fine-grained Evaluation with Natural Language Unit Tests
Authors:
Jon Saad-Falcon,
Rajan Vivek,
William Berrios,
Nandita Shankar Naik,
Matija Franklin,
Bertie Vidgen,
Amanpreet Singh,
Douwe Kiela,
Shikib Mehri
Abstract:
As language models become integral to critical workflows, assessing their behavior remains a fundamental challenge -- human evaluation is costly and noisy, while automated metrics provide only coarse, difficult-to-interpret signals. We introduce natural language unit tests, a paradigm that decomposes response quality into explicit, testable criteria, along with a unified scoring model, LMUnit, whi…
▽ More
As language models become integral to critical workflows, assessing their behavior remains a fundamental challenge -- human evaluation is costly and noisy, while automated metrics provide only coarse, difficult-to-interpret signals. We introduce natural language unit tests, a paradigm that decomposes response quality into explicit, testable criteria, along with a unified scoring model, LMUnit, which combines multi-objective training across preferences, direct ratings, and natural language rationales. Through controlled human studies, we show this paradigm significantly improves inter-annotator agreement and enables more effective LLM development workflows. LMUnit achieves state-of-the-art performance on evaluation benchmarks (FLASK, BigGenBench) and competitive results on RewardBench. These results validate both our proposed paradigm and scoring model, suggesting a promising path forward for language model evaluation and development.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
CommVQA: Situating Visual Question Answering in Communicative Contexts
Authors:
Nandita Shankar Naik,
Christopher Potts,
Elisa Kreiss
Abstract:
Current visual question answering (VQA) models tend to be trained and evaluated on image-question pairs in isolation. However, the questions people ask are dependent on their informational needs and prior knowledge about the image content. To evaluate how situating images within naturalistic contexts shapes visual questions, we introduce CommVQA, a VQA dataset consisting of images, image descripti…
▽ More
Current visual question answering (VQA) models tend to be trained and evaluated on image-question pairs in isolation. However, the questions people ask are dependent on their informational needs and prior knowledge about the image content. To evaluate how situating images within naturalistic contexts shapes visual questions, we introduce CommVQA, a VQA dataset consisting of images, image descriptions, real-world communicative scenarios where the image might appear (e.g., a travel website), and follow-up questions and answers conditioned on the scenario and description. CommVQA, which contains 1000 images and 8,949 question-answer pairs, poses a challenge for current models. Error analyses and a human-subjects study suggest that generated answers still contain high rates of hallucinations, fail to fittingly address unanswerable questions, and don't suitably reflect contextual information. Overall, we show that access to contextual information is essential for solving CommVQA, leading to the highest performing VQA model and highlighting the relevance of situating systems within communicative scenarios.
△ Less
Submitted 3 October, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Reinforcing Security and Usability of Crypto-Wallet with Post-Quantum Cryptography and Zero-Knowledge Proof
Authors:
Yathin Kethepalli,
Rony Joseph,
Sai Raja Vajrala,
Jashwanth Vemula,
Nenavath Srinivas Naik
Abstract:
Crypto-wallets or digital asset wallets are a crucial aspect of managing cryptocurrencies and other digital assets such as NFTs. However, these wallets are not immune to security threats, particularly from the growing risk of quantum computing. The use of traditional public-key cryptography systems in digital asset wallets makes them vulnerable to attacks from quantum computers, which may increase…
▽ More
Crypto-wallets or digital asset wallets are a crucial aspect of managing cryptocurrencies and other digital assets such as NFTs. However, these wallets are not immune to security threats, particularly from the growing risk of quantum computing. The use of traditional public-key cryptography systems in digital asset wallets makes them vulnerable to attacks from quantum computers, which may increase in the future. Moreover, current digital wallets require users to keep track of seed-phrases, which can be challenging and lead to additional security risks. To overcome these challenges, a new algorithm is proposed that uses post-quantum cryptography (PQC) and zero-knowledge proof (ZKP) to enhance the security of digital asset wallets. The research focuses on the use of the Lattice-based Threshold Secret Sharing Scheme (LTSSS), Kyber Algorithm for key generation and ZKP for wallet unlocking, providing a more secure and user-friendly alternative to seed-phrase, brain and multi-sig protocol wallets. This algorithm also includes several innovative security features such as recovery of wallets in case of downtime of the server, and the ability to rekey the private key associated with a specific username-password combination, offering improved security and usability. The incorporation of PQC and ZKP provides a robust and comprehensive framework for securing digital assets in the present and future. This research aims to address the security challenges faced by digital asset wallets and proposes practical solutions to ensure their safety in the era of quantum computing.
△ Less
Submitted 29 August, 2023; v1 submitted 14 August, 2023;
originally announced August 2023.
-
An Efficient Reconfigurable FIR Digital Filter Using Modified Distribute Arithmetic Technique
Authors:
Naveen S Naik,
Kiran A Gupta
Abstract:
This paper provides modified Distributed Arithmetic based technique to compute sum of products saving appreciable number of Multiply And accumulation blocks and this consecutively reduces circuit size. In this technique multiplexer based structure is used to reuse the blocks so as to reduce the required memory locations. In this technique a Carry Look Ahead based adder tree is used to have better…
▽ More
This paper provides modified Distributed Arithmetic based technique to compute sum of products saving appreciable number of Multiply And accumulation blocks and this consecutively reduces circuit size. In this technique multiplexer based structure is used to reuse the blocks so as to reduce the required memory locations. In this technique a Carry Look Ahead based adder tree is used to have better area-delay product. Designing of FIR filter is done using VHDL and synthesized using Xilinx 12.2 synthesis tool and ISIM simulator. The power analysis is done using Xilinx Xpower analyzer. The proposed structure requires nearly 42% less cells, 40% less LUT flip-flop pairs used, and also 2% less power compared with existing structure.
△ Less
Submitted 27 April, 2017;
originally announced April 2017.