Search | arXiv e-print repository

Uncovering the Computational Ingredients of Human-Like Representations in LLMs

Authors: Zach Studdiford, Timothy T. Rogers, Kushin Mukherjee, Siddharth Suresh

Abstract: The ability to translate diverse patterns of inputs into structured patterns of behavior has been thought to rest on both humans' and machines' ability to learn robust representations of relevant concepts. The rapid advancement of transformer-based large language models (LLMs) has led to a diversity of computational ingredients -- architectures, fine tuning methods, and training datasets among oth… ▽ More The ability to translate diverse patterns of inputs into structured patterns of behavior has been thought to rest on both humans' and machines' ability to learn robust representations of relevant concepts. The rapid advancement of transformer-based large language models (LLMs) has led to a diversity of computational ingredients -- architectures, fine tuning methods, and training datasets among others -- but it remains unclear which of these ingredients are most crucial for building models that develop human-like representations. Further, most current LLM benchmarks are not suited to measuring representational alignment between humans and models, making benchmark scores unreliable for assessing if current LLMs are making progress towards becoming useful cognitive models. We address these limitations by first evaluating a set of over 70 models that widely vary in their computational ingredients on a triplet similarity task, a method well established in the cognitive sciences for measuring human conceptual representations, using concepts from the THINGS database. Comparing human and model representations, we find that models that undergo instruction-finetuning and which have larger dimensionality of attention heads are among the most human aligned, while multimodal pretraining and parameter size have limited bearing on alignment. Correlations between alignment scores and scores on existing benchmarks reveal that while some benchmarks (e.g., MMLU) are better suited than others (e.g., MUSR) for capturing representational alignment, no existing benchmark is capable of fully accounting for the variance of alignment scores, demonstrating their insufficiency in capturing human-AI alignment. Taken together, our findings help highlight the computational ingredients most essential for advancing LLMs towards models of human conceptual representation and address a key benchmarking gap in LLM evaluation. △ Less

Submitted 1 October, 2025; originally announced October 2025.

Comments: 9 pages

arXiv:2509.25520 [pdf, ps, other]

doi 10.1109/LRA.2025.3614045

Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity

Authors: Tu-Hoa Pham, Philip Bailey, Daniel Posada, Georgios Georgakis, Jorge Enriquez, Surya Suresh, Marco Dolci, Philip Twu

Abstract: We consider the problem of vision-based 6-DoF object pose estimation in the context of the notional Mars Sample Return campaign, in which a robotic arm would need to localize multiple objects of interest for low-clearance pickup and insertion, under severely constrained hardware. We propose a novel localization algorithm leveraging a custom renderer together with a new template matching metric tai… ▽ More We consider the problem of vision-based 6-DoF object pose estimation in the context of the notional Mars Sample Return campaign, in which a robotic arm would need to localize multiple objects of interest for low-clearance pickup and insertion, under severely constrained hardware. We propose a novel localization algorithm leveraging a custom renderer together with a new template matching metric tailored to the edge domain to achieve robust pose estimation using only low-fidelity, textureless 3D models as inputs. Extensive evaluations on synthetic datasets as well as from physical testbeds on Earth and in situ Mars imagery shows that our method consistently beats the state of the art in compute and memory-constrained localization, both in terms of robustness and accuracy, in turn enabling new possibilities for cheap and reliable localization on general-purpose hardware. △ Less

Submitted 29 September, 2025; originally announced September 2025.

Comments: To appear in IEEE Robotics and Automation Letters

arXiv:2509.16369 [pdf, ps, other]

Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction

Authors: Akshay Govind Srinivasan, Ryan Jacob George, Jayden Koshy Joe, Hrushikesh Kant, Harshith M R, Sachin Sundar, Sudharshan Suresh, Rahul Vimalkanth, Vijayavallabh

Abstract: Accurate and reliable knowledge retrieval is vital for financial question-answering, where continually updated data sources and complex, high-stakes contexts demand precision. Traditional retrieval systems rely on a single database and retriever, but financial applications require more sophisticated approaches to handle intricate regulatory filings, market analyses, and extensive multi-year report… ▽ More Accurate and reliable knowledge retrieval is vital for financial question-answering, where continually updated data sources and complex, high-stakes contexts demand precision. Traditional retrieval systems rely on a single database and retriever, but financial applications require more sophisticated approaches to handle intricate regulatory filings, market analyses, and extensive multi-year reports. We introduce a framework for financial Retrieval Augmented Generation (RAG) that leverages agentic AI and the Multi-HyDE system, an approach that generates multiple, nonequivalent queries to boost the effectiveness and coverage of retrieval from large, structured financial corpora. Our pipeline is optimized for token efficiency and multi-step financial reasoning, and we demonstrate that their combination improves accuracy by 11.2% and reduces hallucinations by 15%. Our method is evaluated on standard financial QA benchmarks, showing that integrating domain-specific retrieval mechanisms such as Multi-HyDE with robust toolsets, including keyword and table-based retrieval, significantly enhances both the accuracy and reliability of answers. This research not only delivers a modular, adaptable retrieval framework for finance but also highlights the importance of structured agent workflows and multi-perspective retrieval for trustworthy deployment of AI in high-stakes financial applications. △ Less

Submitted 19 September, 2025; originally announced September 2025.

Comments: 14 Pages, 8 Tables, 2 Figures. Accepted and to be published in the proceedings of FinNLP, Empirical Methods in Natural Language Processing 2025

ACM Class: H.4; H.5; H.3.3

arXiv:2509.10430 [pdf, ps, other]

Global vs. Local Discrimination of Locally Implementable Multipartite Unitaries

Authors: Satyaki Manna, Sneha Suresh, Anandamay Das Bhowmik, Debashis Saha

Abstract: We study single-shot distinguishability of locally implementable multipartite unitaries under Local Operations and Classical Communication (LOCC) and global operations. As unitary discrimination depends on both the choice of probing states and the measurements on the evolved states, we classify LOCC and global distinguishability into two categories: adaptive strategies, where probing states are ch… ▽ More We study single-shot distinguishability of locally implementable multipartite unitaries under Local Operations and Classical Communication (LOCC) and global operations. As unitary discrimination depends on both the choice of probing states and the measurements on the evolved states, we classify LOCC and global distinguishability into two categories: adaptive strategies, where probing states are chosen based on measurement outcomes from other subsystems, and restricted strategies, where probing states remain fixed. Our findings uncover three surprising features in the bipartite setting and establish new structural limits for unitary discrimination: (i) Certain pairs of unitaries are globally distinguishable with restricted strategies but indistinguishable under LOCC, even with adaptive strategies. (ii) There exist sets of four unitaries that are distinguishable via LOCC, yet remain globally indistinguishable with restricted strategies. (iii) Some sets of unitaries are globally indistinguishable under adaptive strategies, when probed with separable states, but become distinguishable via LOCC. △ Less

Submitted 12 September, 2025; originally announced September 2025.

arXiv:2508.06591 [pdf, ps, other]

Generative Artificial Intelligence Extracts Structure-Function Relationships from Plants for New Materials

Authors: Rachel K. Luu, Jingyu Deng, Mohammed Shahrudin Ibrahim, Nam-Joon Cho, Ming Dao, Subra Suresh, Markus J. Buehler

Abstract: Large language models (LLMs) have reshaped the research landscape by enabling new approaches to knowledge retrieval and creative ideation. Yet their application in discipline-specific experimental science, particularly in highly multi-disciplinary domains like materials science, remains limited. We present a first-of-its-kind framework that integrates generative AI with literature from hitherto-un… ▽ More Large language models (LLMs) have reshaped the research landscape by enabling new approaches to knowledge retrieval and creative ideation. Yet their application in discipline-specific experimental science, particularly in highly multi-disciplinary domains like materials science, remains limited. We present a first-of-its-kind framework that integrates generative AI with literature from hitherto-unconnected fields such as plant science, biomimetics, and materials engineering to extract insights and design experiments for materials. We focus on humidity-responsive systems such as pollen-based materials and Rhapis excelsa (broadleaf lady palm) leaves, which exhibit self-actuation and adaptive performance. Using a suite of AI tools, including a fine-tuned model (BioinspiredLLM), Retrieval-Augmented Generation (RAG), agentic systems, and a Hierarchical Sampling strategy, we extract structure-property relationships and translate them into new classes of bioinspired materials. Structured inference protocols generate and evaluate hundreds of hypotheses from a single query, surfacing novel and experimentally tractable ideas. We validate our approach through real-world implementation: LLM-generated procedures, materials designs, and mechanical predictions were tested in the laboratory, culminating in the fabrication of a novel pollen-based adhesive with tunable morphology and measured shear strength, establishing a foundation for future plant-derived adhesive design. This work demonstrates how AI-assisted ideation can drive real-world materials design and enable effective human-AI collaboration. △ Less

Submitted 8 August, 2025; originally announced August 2025.

arXiv:2507.21476 [pdf, ps, other]

Which LLMs Get the Joke? Probing Non-STEM Reasoning Abilities with HumorBench

Authors: Reuben Narad, Siddharth Suresh, Jiayi Chen, Pine S. L. Dysart-Bricken, Bob Mankoff, Robert Nowak, Jifan Zhang, Lalit Jain

Abstract: We present HumorBench, a benchmark designed to evaluate large language models' (LLMs) ability to reason about and explain sophisticated humor in cartoon captions. As reasoning models increasingly saturate existing benchmarks in mathematics and science, novel and challenging evaluations of model intelligence beyond STEM domains are essential. Reasoning is fundamentally involved in text-based humor… ▽ More We present HumorBench, a benchmark designed to evaluate large language models' (LLMs) ability to reason about and explain sophisticated humor in cartoon captions. As reasoning models increasingly saturate existing benchmarks in mathematics and science, novel and challenging evaluations of model intelligence beyond STEM domains are essential. Reasoning is fundamentally involved in text-based humor comprehension, requiring the identification of connections between concepts in cartoons/captions and external cultural references, wordplays, and other mechanisms. HumorBench includes approximately 300 unique cartoon-caption pairs from the New Yorker Caption Contest and Cartoonstock.com, with expert-annotated evaluation rubrics identifying essential joke elements. LLMs are evaluated based on their explanations towards the humor and abilities in identifying the joke elements. To perform well on this task, models must form and test hypotheses about associations between concepts, potentially backtracking from initial interpretations to arrive at the most plausible explanation. Our extensive benchmarking of current SOTA models reveals three key insights: (1) LLM progress on STEM reasoning transfers effectively to humor comprehension; (2) models trained exclusively on STEM reasoning data still perform well on HumorBench, demonstrating strong transferability of reasoning abilities; and (3) test-time scaling by increasing thinking token budgets yields mixed results across different models in humor reasoning. △ Less

Submitted 28 July, 2025; originally announced July 2025.

arXiv:2507.04888 [pdf, ps, other]

SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems

Authors: Nolwenn Bernard, Sharath Chandra Etagi Suresh, Krisztian Balog, ChengXiang Zhai

Abstract: Progress in conversational information access (CIA) systems has been hindered by the difficulty of evaluating such systems with reproducible experiments. While user simulation offers a promising solution, the lack of infrastructure and tooling to support this evaluation paradigm remains a significant barrier. To address this gap, we introduce SimLab, the first cloud-based platform providing a cent… ▽ More Progress in conversational information access (CIA) systems has been hindered by the difficulty of evaluating such systems with reproducible experiments. While user simulation offers a promising solution, the lack of infrastructure and tooling to support this evaluation paradigm remains a significant barrier. To address this gap, we introduce SimLab, the first cloud-based platform providing a centralized solution for the community to benchmark both conversational systems and user simulators in a controlled and reproducible setting. We articulate the requirements for such a platform and propose a general infrastructure to meet them. We then present the design and implementation of an initial version of SimLab and showcase its features through an initial simulation-based evaluation task in conversational movie recommendation. Furthermore, we discuss the platform's sustainability and future opportunities for development, inviting the community to drive further progress in the fields of CIA and user simulation. △ Less

Submitted 24 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

arXiv:2506.24072 [pdf, ps, other]

Protocol insecurity with finitely many sessions and XOR

Authors: R Ramanujam, Vaishnavi Sundararajan, S P Suresh

Abstract: We present a different proof of the insecurity problem for XOR, solved in by Chevalier, Kuesters, Rusinowitch and Turuani (2005). Our proof uses the notion of typed terms and well-typed proofs, and removes a restriction on the class of protocols to which the [CKRT05] proof applies, by introducing a slightly different (but very natural) notion of protocols, where honest agent sends are derivable fr… ▽ More We present a different proof of the insecurity problem for XOR, solved in by Chevalier, Kuesters, Rusinowitch and Turuani (2005). Our proof uses the notion of typed terms and well-typed proofs, and removes a restriction on the class of protocols to which the [CKRT05] proof applies, by introducing a slightly different (but very natural) notion of protocols, where honest agent sends are derivable from previous receives in the same session. △ Less

Submitted 30 June, 2025; originally announced June 2025.

arXiv:2505.19333 [pdf, ps, other]

Evaluating Steering Techniques using Human Similarity Judgments

Authors: Zach Studdiford, Timothy T. Rogers, Siddharth Suresh, Kushin Mukherjee

Abstract: Current evaluations of Large Language Model (LLM) steering techniques focus on task-specific performance, overlooking how well steered representations align with human cognition. Using a well-established triadic similarity judgment task, we assessed steered LLMs on their ability to flexibly judge similarity between concepts based on size or kind. We found that prompt-based steering methods outperf… ▽ More Current evaluations of Large Language Model (LLM) steering techniques focus on task-specific performance, overlooking how well steered representations align with human cognition. Using a well-established triadic similarity judgment task, we assessed steered LLMs on their ability to flexibly judge similarity between concepts based on size or kind. We found that prompt-based steering methods outperformed other methods both in terms of steering accuracy and model-to-human alignment. We also found LLMs were biased towards 'kind' similarity and struggled with 'size' alignment. This evaluation approach, grounded in human cognition, adds further support to the efficacy of prompt-based steering and reveals privileged representational axes in LLMs prior to steering. △ Less

Submitted 25 May, 2025; originally announced May 2025.

ACM Class: I.2.7

arXiv:2505.13559 [pdf, ps, other]

CS-Sum: A Benchmark for Code-Switching Dialogue Summarization and the Limits of Large Language Models

Authors: Sathya Krishnan Suresh, Tanmay Surana, Lim Zhi Hao, Eng Siong Chng

Abstract: Code-switching (CS) poses a significant challenge for Large Language Models (LLMs), yet its comprehensibility remains underexplored in LLMs. We introduce CS-Sum, to evaluate the comprehensibility of CS by the LLMs through CS dialogue to English summarization. CS-Sum is the first benchmark for CS dialogue summarization across Mandarin-English (EN-ZH), Tamil-English (EN-TA), and Malay-English (EN-MS… ▽ More Code-switching (CS) poses a significant challenge for Large Language Models (LLMs), yet its comprehensibility remains underexplored in LLMs. We introduce CS-Sum, to evaluate the comprehensibility of CS by the LLMs through CS dialogue to English summarization. CS-Sum is the first benchmark for CS dialogue summarization across Mandarin-English (EN-ZH), Tamil-English (EN-TA), and Malay-English (EN-MS), with 900-1300 human-annotated dialogues per language pair. Evaluating ten LLMs, including open and closed-source models, we analyze performance across few-shot, translate-summarize, and fine-tuning (LoRA, QLoRA on synthetic data) approaches. Our findings show that though the scores on automated metrics are high, LLMs make subtle mistakes that alter the complete meaning of the dialogue. To this end, we introduce 3 most common type of errors that LLMs make when handling CS input. Error rates vary across CS pairs and LLMs, with some LLMs showing more frequent errors on certain language pairs, underscoring the need for specialized training on code-switched data. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 17 pages, 5 figures and 11 tables

arXiv:2505.10718 [pdf, other]

AI-enhanced semantic feature norms for 786 concepts

Authors: Siddharth Suresh, Kushin Mukherjee, Tyler Giallanza, Xizheng Yu, Mia Patil, Jonathan D. Cohen, Timothy T. Rogers

Abstract: Semantic feature norms have been foundational in the study of human conceptual knowledge, yet traditional methods face trade-offs between concept/feature coverage and verifiability of quality due to the labor-intensive nature of norming studies. Here, we introduce a novel approach that augments a dataset of human-generated feature norms with responses from large language models (LLMs) while verify… ▽ More Semantic feature norms have been foundational in the study of human conceptual knowledge, yet traditional methods face trade-offs between concept/feature coverage and verifiability of quality due to the labor-intensive nature of norming studies. Here, we introduce a novel approach that augments a dataset of human-generated feature norms with responses from large language models (LLMs) while verifying the quality of norms against reliable human judgments. We find that our AI-enhanced feature norm dataset, NOVA: Norms Optimized Via AI, shows much higher feature density and overlap among concepts while outperforming a comparable human-only norm dataset and word-embedding models in predicting people's semantic similarity judgments. Taken together, we demonstrate that human conceptual knowledge is richer than captured in previous norm datasets and show that, with proper validation, LLMs can serve as powerful tools for cognitive science research. △ Less

Submitted 15 May, 2025; originally announced May 2025.

Comments: 8 pages, 5 figures

arXiv:2505.06950 [pdf, ps, other]

Copula Analysis of Risk: A Multivariate Risk Analysis for VaR and CoVaR using Copulas and DCC-GARCH

Authors: Aryan Singh, Paul O Reilly, Daim Sharif, Patrick Haughey, Eoghan McCarthy, Sathvika Thorali Suresh, Aakhil Anvar, Adarsh Sajeev Kumar

Abstract: A multivariate risk analysis for VaR and CVaR using different copula families is performed on historical financial time series fitted with DCC-GARCH models. A theoretical background is provided alongside a comparison of goodness-of-fit across different copula families to estimate the validity and effectiveness of approaches discussed. A multivariate risk analysis for VaR and CVaR using different copula families is performed on historical financial time series fitted with DCC-GARCH models. A theoretical background is provided alongside a comparison of goodness-of-fit across different copula families to estimate the validity and effectiveness of approaches discussed. △ Less

Submitted 11 May, 2025; originally announced May 2025.

Comments: 15 pages, 12 figures, presented as part of the CS7DS1 - Data Analytics module at Trinity College Dublin, May 2025

MSC Class: 60G70; 62H05; 91G70 ACM Class: G.3; I.5.1; I.2.6

arXiv:2504.18553 [pdf]

Cracking in polymer substrates for flexible devices and its mitigation

Authors: Anush Ranka, Madhuja Layek, Sayaka Kochiyama, Cristina Lopez-Pernia, Alicia M. Chandler, Conrad A. Kocoj, Erica Magliano, Aldo Di Carlo, Francesca Brunetti, Peijun Guo, Subra Suresh, David C. Paine, Haneesh Kesari, Nitin P. Padture

Abstract: Mechanical reliability plays an outsized role in determining the durability of flexible electronic devices because of the significant mechanical stresses they can experience during manufacturing and operation. These devices are typically built on sheets comprising stiff thin-film electrodes on compliant polymer substrates, and it is generally assumed that the high-toughness substrates do not crack… ▽ More Mechanical reliability plays an outsized role in determining the durability of flexible electronic devices because of the significant mechanical stresses they can experience during manufacturing and operation. These devices are typically built on sheets comprising stiff thin-film electrodes on compliant polymer substrates, and it is generally assumed that the high-toughness substrates do not crack easily. Contrary to this widespread assumption, here we reveal severe, pervasive, and extensive cracking in the polymer substrates during bending of electrode/substrate sheets, which compromises the overall mechanical integrity of the entire device. The substrate-cracking phenomenon appears to be general, and it is driven by the amplified stress intensity factor caused by the elastic mismatch at the film/substrate interface. To mitigate this substrate cracking, an interlayer-engineering approach is designed and experimentally demonstrated. This approach is generic, and it is potentially applicable to myriad flexible electronic devices that utilize stiff films on compliant substrates, for improving their durability and reliability. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: 22 pages, 5 main figures, 7 supplementary figures, 2 supplementary tables, 2 supplementary notes

arXiv:2502.20356 [pdf, other]

Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs

Authors: Kuan Lok Zhou, Jiayi Chen, Siddharth Suresh, Reuben Narad, Timothy T. Rogers, Lalit K Jain, Robert D Nowak, Bob Mankoff, Jifan Zhang

Abstract: Large Language Models (LLMs) have shown significant limitations in understanding creative content, as demonstrated by Hessel et al. (2023)'s influential work on the New Yorker Cartoon Caption Contest (NYCCC). Their study exposed a substantial gap between LLMs and humans in humor comprehension, establishing that understanding and evaluating creative content is key challenge in AI development. We re… ▽ More Large Language Models (LLMs) have shown significant limitations in understanding creative content, as demonstrated by Hessel et al. (2023)'s influential work on the New Yorker Cartoon Caption Contest (NYCCC). Their study exposed a substantial gap between LLMs and humans in humor comprehension, establishing that understanding and evaluating creative content is key challenge in AI development. We revisit this challenge by decomposing humor understanding into three components and systematically improve each: enhancing visual understanding through improved annotation, utilizing LLM-generated humor reasoning and explanations, and implementing targeted alignment with human preference data. Our refined approach achieves 82.4% accuracy in caption ranking, singificantly improving upon the previous 67% benchmark and matching the performance of world-renowned human experts in this domain. Notably, while attempts to mimic subgroup preferences through various persona prompts showed minimal impact, model finetuning with crowd preferences proved remarkably effective. These findings reveal that LLM limitations in creative judgment can be effectively addressed through focused alignment to specific subgroups and individuals. Lastly, we propose the position that achieving artificial general intelligence necessitates systematic collection of human preference data across creative domains. We advocate that just as human creativity is deeply influenced by individual and cultural preferences, training LLMs with diverse human preference data may be essential for developing true creative understanding. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2501.17310 [pdf, ps, other]

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

Authors: Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers

Abstract: Guesstimation -- the task of making approximate quantitative estimates about objects or events -- is a common real-world skill, yet remains underexplored in large language model (LLM) research. We introduce three guesstimation datasets: MARBLES, FUTURE, and ELECPRED, spanning physical estimation (e.g., how many marbles fit in a cup) to abstract predictions (e.g., the 2024 U.S. presidential electio… ▽ More Guesstimation -- the task of making approximate quantitative estimates about objects or events -- is a common real-world skill, yet remains underexplored in large language model (LLM) research. We introduce three guesstimation datasets: MARBLES, FUTURE, and ELECPRED, spanning physical estimation (e.g., how many marbles fit in a cup) to abstract predictions (e.g., the 2024 U.S. presidential election). Inspired by the social science concept of Wisdom of Crowds (WOC)- where the median of multiple estimates improves accuracy-we propose WOC decoding for LLMs. We replicate WOC effects in human participants and find that LLMs exhibit similar benefits: median aggregation across sampled responses consistently improves accuracy over greedy decoding, self-consistency decoding, and mean decoding. This suggests that LLMs encode a world model that supports approximate reasoning. Our results position guesstimation as a useful probe of LLM world knowledge and highlight WOC decoding as a strategy for enhancing LLM guesstimation performance on real-world tasks. △ Less

Submitted 23 September, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.14249 [pdf, ps, other]

Humanity's Last Exam

Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1087 additional authors not shown)

Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai. △ Less

Submitted 25 September, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

Comments: 29 pages, 6 figures

arXiv:2412.05868 [pdf]

Automated Extraction and Creation of FBS Design Reasoning Knowledge Graphs from Structured Data in Product Catalogues Lacking Contextual Information

Authors: Vijayalaxmi Sahadevan, Sushil Mario, Yash Jaiswal, Divyanshu Bajpai, Vishal Singh, Hiralal Aggarwal, Suhas Suresh, Manjunath Maigur

Abstract: Ontology-based knowledge graphs (KG) are desirable for effective knowledge management and reuse in various decision making scenarios, including design. Creating and populating extensive KG based on specific ontological models can be highly labour and time-intensive unless automated processes are developed for knowledge extraction and graph creation. Most research and development on automated extra… ▽ More Ontology-based knowledge graphs (KG) are desirable for effective knowledge management and reuse in various decision making scenarios, including design. Creating and populating extensive KG based on specific ontological models can be highly labour and time-intensive unless automated processes are developed for knowledge extraction and graph creation. Most research and development on automated extraction and creation of KG is based on extensive unstructured data sets that provide contextual information. However, some of the most useful information about the products and services of a company has traditionally been recorded as structured data. Such structured data sets rarely follow a standard ontology, do not capture explicit mapping of relationships between the entities, and provide no contextual information. Therefore, this research reports a method and digital workflow developed to address this gap. The developed method and workflow employ rule-based techniques to extract and create a Function Behaviour-Structure (FBS) ontology-based KG from legacy structured data, especially specification sheets and product catalogues. The solution approach consists of two main components: a process for deriving context and context-based classification rules for FBS ontology concepts and a workflow for populating and retrieving the FBS ontology-based KG. KG and Natural Language Processing (NLP) are used to automate knowledge extraction, representation, and retrieval. The workflow's effectiveness is demonstrated via pilot implementation in an industrial context. Insights gained from the pilot study are reported regarding the challenges and opportunities, including discussing the FBS ontology and concepts. △ Less

Submitted 8 December, 2024; originally announced December 2024.

Comments: 31 pages, with 17 figures and 10 tables

arXiv:2411.16511 [pdf, other]

Use-Inspired Mobile Robot to Improve Safety of Building Retrofit Workforce in Constrained Spaces

Authors: Smruti Suresh, Michael Angelo Carvajal, Nathaniel Hanson, Ethan Holand, Samuel Hibbard, Taskin Padir

Abstract: The inspection of confined critical infrastructure such as attics or crawlspaces is challenging for human operators due to insufficient task space, limited visibility, and the presence of hazardous materials. This paper introduces a prototype of PARIS (Precision Application Robot for Inaccessible Spaces): a use-inspired teleoperated mobile robot manipulator system that was conceived, developed, an… ▽ More The inspection of confined critical infrastructure such as attics or crawlspaces is challenging for human operators due to insufficient task space, limited visibility, and the presence of hazardous materials. This paper introduces a prototype of PARIS (Precision Application Robot for Inaccessible Spaces): a use-inspired teleoperated mobile robot manipulator system that was conceived, developed, and tested for and selected as a Phase I winner of the U.S. Department of Energy's E-ROBOT Prize. To improve the thermal efficiency of buildings, the PARIS platform supports: 1) teleoperated mapping and navigation, enabling the human operator to explore compact spaces; 2) inspection and sensing, facilitating the identification and localization of under-insulated areas; and 3) air-sealing targeted gaps and cracks through which thermal energy is lost. The resulting versatile platform can also be tailored for targeted application of treatments and remediation in constrained spaces. △ Less

Submitted 25 November, 2024; originally announced November 2024.

Comments: 6 Pages, 7 Figures. Accepted for publication in the Proceedings of 2024 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)

arXiv:2410.10632 [pdf, ps, other]

doi 10.1103/PhysRevA.111.022221

Single-shot Distinguishability and Anti-distinguishability of Quantum Measurements

Authors: Satyaki Manna, Sneha Suresh, Manan Singh Kachhawaha, Debashis Saha

Abstract: Among the surprising features of quantum measurements, the problem of distinguishing and antidistinguishing general quantum measurements is fundamentally appealing. Unlike classical systems, quantum theory offers entangled states and peculiar state update rule of the post-measurement state, which gives rise to four distinct scenarios: (i) probing single systems and without access to the Post-measu… ▽ More Among the surprising features of quantum measurements, the problem of distinguishing and antidistinguishing general quantum measurements is fundamentally appealing. Unlike classical systems, quantum theory offers entangled states and peculiar state update rule of the post-measurement state, which gives rise to four distinct scenarios: (i) probing single systems and without access to the Post-measurement States (PMS), (ii) probing entangled systems and without access to the PMS, (iii) probing single systems with access to the PMS, and (iv) probing entangled systems with access to the PMS. We study the probability of distinguishing (and antidistinguishing) quantum measurements sampled from a given set in the single-shot regime. For some scenarios, we provide the analytical expressions of distinguishability (and antidistinguishability) for qubit projective measurements. We show that the distinguishability of any pair of qubit projective measurements in scenario (iii) is always greater than its value in scenario (ii). Interestingly, certain pairs of non-projective qubit measurements achieve optimal distinguishability in scenario (ii) with a non-maximally entangled state. In general, for any set of measurements, distinguishability (and antidistinguishability) in scenario (i) never exceeds that in any other scenario, while it reaches its highest possible value in scenario (iv). We establish that there is no hierarchical relation between scenarios (ii) and (iii). In particular, we introduce different variants of the well-known `trine' qubit measurement to construct pairs (and triples) of qubit quantum measurements such that they are perfectly distinguishable (and antidistinguishable) in scenario (ii) but not in scenario (iii), and vice versa. Additionally, we present qubit measurements that are perfectly distinguishable (and antidistinguishable) in scenario (iv) but not in any other scenarios. △ Less

Submitted 17 August, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: 22 pages, 11 figures. Close to the published version

Journal ref: Phys. Rev. A 111, 022221 (2025)

arXiv:2410.04236 [pdf, other]

Overview of Factify5WQA: Fact Verification through 5W Question-Answering

Authors: Suryavardan Suresh, Anku Rani, Parth Patwa, Aishwarya Reganti, Vinija Jain, Aman Chadha, Amitava Das, Amit Sheth, Asif Ekbal

Abstract: Researchers have found that fake news spreads much times faster than real news. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The… ▽ More Researchers have found that fake news spreads much times faster than real news. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The Factify5WQA shared task aims to increase research towards automated fake news detection by providing a dataset with an aspect-based question answering based fact verification method. Each claim and its supporting document is associated with 5W questions that help compare the two information sources. The objective performance measure in the task is done by comparing answers using BLEU score to measure the accuracy of the answers, followed by an accuracy measure of the classification. The task had submissions using custom training setup and pre-trained language-models among others. The best performing team posted an accuracy of 69.56%, which is a near 35% improvement over the baseline. △ Less

Submitted 5 October, 2024; originally announced October 2024.

Comments: Accepted at defactify3@aaai2024

arXiv:2410.01790 [pdf, other]

Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning

Authors: Prasanth Sengadu Suresh, Siddarth Jain, Prashant Doshi, Diego Romeres

Abstract: The growing interest in human-robot collaboration (HRC), where humans and robots cooperate towards shared goals, has seen significant advancements over the past decade. While previous research has addressed various challenges, several key issues remain unresolved. Many domains within HRC involve activities that do not necessarily require human presence throughout the entire task. Existing literatu… ▽ More The growing interest in human-robot collaboration (HRC), where humans and robots cooperate towards shared goals, has seen significant advancements over the past decade. While previous research has addressed various challenges, several key issues remain unresolved. Many domains within HRC involve activities that do not necessarily require human presence throughout the entire task. Existing literature typically models HRC as a closed system, where all agents are present for the entire duration of the task. In contrast, an open model offers flexibility by allowing an agent to enter and exit the collaboration as needed, enabling them to concurrently manage other tasks. In this paper, we introduce a novel multiagent framework called oDec-MDP, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution. We generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP. Our method is validated through experiments conducted in both a simplified toy firefighting domain and a realistic dyadic human-robot collaborative assembly. Results show that our framework and learning method improves upon its closed system counterpart. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.19020 [pdf, other]

DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications

Authors: Sathya Krishnan Suresh, Wu Mengjun, Tushar Pranav, Eng Siong Chng

Abstract: The scarcity of domain-specific dialogue datasets limits the development of dialogue systems across applications. Existing research is constrained by general or niche datasets that lack sufficient scale for training dialogue systems. To address this gap, we introduce DiaSynth - a synthetic dialogue generation framework capable of generating high-quality, contextually rich dialogues across a wide r… ▽ More The scarcity of domain-specific dialogue datasets limits the development of dialogue systems across applications. Existing research is constrained by general or niche datasets that lack sufficient scale for training dialogue systems. To address this gap, we introduce DiaSynth - a synthetic dialogue generation framework capable of generating high-quality, contextually rich dialogues across a wide range of domains. Unlike existing frameworks, DiaSynth uses Large Language Models (LLMs) and Chain of Thought (CoT) reasoning to generate dynamic, domain-specific dialogues with simulated personas and diverse conversational features. We perform our experiments by generating synthetic data using different LLMs and few-shot examples from DialogSum and SAMSum. The pretrained language models fine-tuned on the synthetic data outperform the base models by 16.47% on dialogue summarization, while the comparison between models fine-tuned on in-domain data and synthetic data shows that the synthetic data is able to capture 90.48% of the performance distribution of the in-domain data on dialogue summarization. The quality of the data generated also increases as we increase the size of LLM from 3B to 8B. These results validate DiaSynth's potential as a robust alternative to traditional data collection methods. We open source the code and data generated for future research. △ Less

Submitted 10 February, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

Comments: 13 pages, 1 figure

Journal ref: NAACL 2025

arXiv:2409.15027 [pdf, other]

Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19

Authors: Mohammad Amin Roshani, Xiangyu Zhou, Yao Qiang, Srinivasan Suresh, Steve Hicks, Usha Sethuraman, Dongxiao Zhu

Abstract: Large language models (LLMs) have shown remarkable capabilities in various natural language tasks and are increasingly being applied in healthcare domains. This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches. In a COVID-19 severity risk assessment case… ▽ More Large language models (LLMs) have shown remarkable capabilities in various natural language tasks and are increasingly being applied in healthcare domains. This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches. In a COVID-19 severity risk assessment case study, we fine-tune pre-trained generative LLMs (e.g., Llama2-7b and Flan-t5-xl) using a few shots of natural language examples, comparing their performance with traditional classifiers (i.e., Logistic Regression, XGBoost, Random Forest) that are trained de novo using tabular data across various experimental settings. We develop a mobile application that uses these fine-tuned LLMs as its generative AI (GenAI) core to facilitate real-time interaction between clinicians and patients, providing no-code risk assessment through conversational interfaces. This integration not only allows for the use of streaming Questions and Answers (QA) as inputs but also offers personalized feature importance analysis derived from the LLM's attention layers, enhancing the interpretability of risk assessments. By achieving high Area Under the Curve (AUC) scores with a limited number of fine-tuning samples, our results demonstrate the potential of generative LLMs to outperform discriminative classification methods in low-data regimes, highlighting their real-world adaptability and effectiveness. This work aims to fill the existing gap in leveraging generative LLMs for interactive no-code risk assessment and to encourage further research in this emerging field. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.13171 [pdf, other]

Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring

Authors: Francis Ogoke, Sumesh Kalambettu Suresh, Jesse Adamczyk, Dan Bolintineanu, Anthony Garland, Michael Heiden, Amir Barati Farimani

Abstract: The stochastic formation of defects during Laser Powder Bed Fusion (L-PBF) negatively impacts its adoption for high-precision use cases. Optical monitoring techniques can be used to identify defects based on layer-wise imaging, but these methods are difficult to scale to high resolutions due to cost and memory constraints. Therefore, we implement generative deep learning models to link low-cost, l… ▽ More The stochastic formation of defects during Laser Powder Bed Fusion (L-PBF) negatively impacts its adoption for high-precision use cases. Optical monitoring techniques can be used to identify defects based on layer-wise imaging, but these methods are difficult to scale to high resolutions due to cost and memory constraints. Therefore, we implement generative deep learning models to link low-cost, low-resolution images of the build plate to detailed high-resolution optical images of the build plate, enabling cost-efficient process monitoring. To do so, a conditional latent probabilistic diffusion model is trained to produce realistic high-resolution images of the build plate from low-resolution webcam images, recovering the distribution of small-scale features and surface roughness. We first evaluate the performance of the model by analyzing the reconstruction quality of the generated images using peak-signal-to-noise-ratio (PSNR), structural similarity index measure (SSIM) and wavelet covariance metrics that describe the preservation of high-frequency information. Additionally, we design a framework based upon the Segment Anything foundation model to recreate the 3D morphology of the printed part and analyze the surface roughness of the reconstructed samples. Finally, we explore the zero-shot generalization capabilities of the implemented framework to other part geometries by creating synthetic low-resolution data. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2407.17891 [pdf, other]

doi 10.1051/0004-6361/202451665

Role of NH3 Binding Energy in the Early Evolution of Protostellar Cores

Authors: S. Kakkenpara Suresh, O. Sipila, P. Caselli, F. Dulieu

Abstract: NH$_{3}$(ammonia) plays a critical role in the chemistry of star and planet formation, yet uncertainties in its binding energy (BE) values complicate accurate estimates of its abundances. Recent research suggests a multi-binding energy approach, challenging the previous single-value notion. In this work, we use different values of NH$_{3}$ binding energy to examine its effects on the NH$_{3}$ abun… ▽ More NH$_{3}$(ammonia) plays a critical role in the chemistry of star and planet formation, yet uncertainties in its binding energy (BE) values complicate accurate estimates of its abundances. Recent research suggests a multi-binding energy approach, challenging the previous single-value notion. In this work, we use different values of NH$_{3}$ binding energy to examine its effects on the NH$_{3}$ abundances and, consequently, in the early evolution of protostellar cores. Using a gas-grain chemical network, we systematically vary the values of NH$_{3}$ binding energies in a model Class 0 protostellar core and study the effects of these binding energies on the NH$_{3}$ abundances. Our simulations indicate that abundance profiles of NH$_{3}$ are highly sensitive to the binding energy used, particularly in the warmer inner regions of the core. Higher binding energies lead to lower gas-phase NH$_{3}$ abundances, while lower values of binding energy have the opposite effect. Furthermore, this BE-dependent abundance variation of NH$_{3}$ significantly affects the formation pathways and abundances of key species such as HNC, HCN, and CN. Our tests also reveal that the size variation of the emitting region due to binding energy becomes discernible only with beam sizes of 10 arcsec or less. These findings underscore the importance of considering a range of binding energies in astrochemical models and highlight the need for higher resolution observations to better understand the subtleties of molecular cloud chemistry and star formation processes. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Journal ref: A&A 696, A71 (2025)

arXiv:2406.10522 [pdf, other]

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Authors: Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak

Abstract: We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning… ▽ More We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even state-of-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation. △ Less

Submitted 18 December, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.00204 [pdf]

doi 10.1103/PhysRevMaterials.8.123605

Learning from metastable grain boundaries

Authors: Avanish Mishra, Sumit A. Suresh, Saryu J. Fensin, Nithin Mathew, Edward M. Kober

Abstract: Grain boundaries (GBs) govern critical properties of polycrystals. Although significant advancements have been made in characterizing minimum energy GBs, real GBs are seldom found in such states, making it challenging to establish structure-property relationships. This diversity of atomic arrangements in metastable states motivates using data-driven methods to establish these relationships. In thi… ▽ More Grain boundaries (GBs) govern critical properties of polycrystals. Although significant advancements have been made in characterizing minimum energy GBs, real GBs are seldom found in such states, making it challenging to establish structure-property relationships. This diversity of atomic arrangements in metastable states motivates using data-driven methods to establish these relationships. In this study, we utilize a vast atomistic database (~5000) of minimum energy and metastable states of symmetric tilt copper GBs, combined with physically-motivated local atomic environment (LAE) descriptors (Strain Functional Descriptors, SFDs) to predict GB properties. Our regression models exhibit robust predictive capabilities using only 19 descriptors, generalizing to atomic environments in nanocrystals. A significant highlight of our work is integration of an unsupervised method with SFDs to elucidate LAEs at GBs and their role in determining properties. Our research underscores the role of a physics-based representation of LAEs and efficacy of data-driven methods in establishing GB structure-property relationships. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.03029 [pdf, other]

Optimal Box Contraction for Solving Linear Systems via Simulated and Quantum Annealing

Authors: Sanjay Suresh, Krishnan Suresh

Abstract: Solving linear systems of equations is an important problem in science and engineering. Many quantum algorithms, such as the Harrow-Hassidim-Lloyd (HHL) algorithm (for quantum-gate computers) and the box algorithm (for quantum-annealing machines), have been proposed for solving such systems. The focus of this paper is on improving the efficiency of the box algorithm. The basic principle behind t… ▽ More Solving linear systems of equations is an important problem in science and engineering. Many quantum algorithms, such as the Harrow-Hassidim-Lloyd (HHL) algorithm (for quantum-gate computers) and the box algorithm (for quantum-annealing machines), have been proposed for solving such systems. The focus of this paper is on improving the efficiency of the box algorithm. The basic principle behind this algorithm is to transform the linear system into a series of quadratic unconstrained binary optimization (QUBO) problems, which are then solved on annealing machines. The computational efficiency of the box algorithm is entirely determined by the number of iterations, which, in turn, depends on the box contraction ratio, typically set to 0.5. Here, we show through theory that a contraction ratio of 0.5 is sub-optimal and that we can achieve a speed-up with a contraction ratio of 0.2. This is confirmed through numerical experiments where a speed-up between $20 \%$ to $60 \%$ is observed when the optimal contraction ratio is used. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.14462 [pdf, other]

Towards smaller, faster decoder-only transformers: Architectural variants and their implications

Authors: Sathya Krishnan Suresh, Shunmugapriya P

Abstract: In recent times, the research on Large Language Models (LLMs) has grown exponentially, predominantly focusing on models underpinned by the transformer architecture, as established by [1], and further developed through the decoder-only variations by [2]. Contemporary efforts in this field primarily aim to enhance model capabilities by scaling up both the architecture and data volumes utilized durin… ▽ More In recent times, the research on Large Language Models (LLMs) has grown exponentially, predominantly focusing on models underpinned by the transformer architecture, as established by [1], and further developed through the decoder-only variations by [2]. Contemporary efforts in this field primarily aim to enhance model capabilities by scaling up both the architecture and data volumes utilized during training. However, the exploration into reduce these model sizes while preserving their efficacy remains scant. In this study, we introduce three modifications to the decoder-only transformer architecture, namely ParallelGPT (pgpt), LinearGPT (lgpt), and ConvGPT (cgpt). These variants demonstrate comparable performance to the conventional architecture in language generation, yet benefit from reduced model sizes and faster training processes. We open-source the model weights and the complete codebase for these implementation for further research. △ Less

Submitted 8 October, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures

arXiv:2402.18105 [pdf, ps, other]

JEL ratio test for independence between a continuous and a categorical random variable

Authors: Saparya Suresh, Sudheesh K. Kattumannil

Abstract: The categorical Gini covariance is a dependence measure between a numerical variable and a categorical variable. The Gini covariance measures dependence by quantifying the difference between the conditional and unconditional distributional functions. The categorical Gini covariance equals zero if and only if the numerical variable and the categorical variable are independent. We propose a non-para… ▽ More The categorical Gini covariance is a dependence measure between a numerical variable and a categorical variable. The Gini covariance measures dependence by quantifying the difference between the conditional and unconditional distributional functions. The categorical Gini covariance equals zero if and only if the numerical variable and the categorical variable are independent. We propose a non-parametric test for testing the independence between a numerical and categorical variable using a modified categorical Gini covariance. We used the theory of U-statistics to find the test statistics and study the properties. The test has an asymptotic normal distribution. Since the implementation of a normal-based test is difficult, we develop a jackknife empirical likelihood (JEL) ratio test for testing independence. Extensive Monte Carlo simulation studies are carried out to validate the performance of the proposed JEL ratio test. We illustrate the test procedure using Iris flower data set. △ Less

Submitted 19 September, 2025; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: This is the first test developed in this direction

arXiv:2402.05086 [pdf, other]

Hyperspectral acquisition with ScanImage at the single pixel level: Application to time domain coherent Raman imaging

Authors: Samuel Metais, Sisira Suresh, Paulo Diniz, Siddarth Shivkumar, Randy Bartels, Nicolas Forget, Hervé Rigneault

Abstract: We present a comprehensive strategy and its practical implementation using the commercial ScanImage software platform to perform hyperspectral point scanning microscopy when a fast time dependent signal varies at each pixel level. In the proposed acquisition scheme the scan along the X axis is slowed down while the data acquisition is maintained at high pace to enable the rapid acquisition of the… ▽ More We present a comprehensive strategy and its practical implementation using the commercial ScanImage software platform to perform hyperspectral point scanning microscopy when a fast time dependent signal varies at each pixel level. In the proposed acquisition scheme the scan along the X axis is slowed down while the data acquisition is maintained at high pace to enable the rapid acquisition of the time dependent signal at each pixel level. The ScanImage generated raw 2D images have a very asymmetric aspect ratio between X and Y, the X axis encoding both for space and time acquisition. The results are X axis macro-pixel where the associated time depend signal is sampled therefore providing an hyperspectral information. We exemplified the proposed hyperspectral scheme in the context of time domain coherent Raman imaging where a pump pulse impulsively excites molecular vibrations that are subsequently probed by a time delayed probe pulse. In this case the time dependent signal is a fast acousto-optics delay line that can scan a delay of 4.5ps in 25$μ$s, at each pixel level. We this acquisition scheme we demonstrate ultra-fast hyperspectral vibrational imaging in the low frequency range [10$cm^{-1}$, 150 $cm^{-1}$] over a 500 $μm$ field of view in 14ms (7 frames/s). The proposed acquisition scheme can be readily extended to other applications requiring to acquired a fast evolving signal at each pixel level. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2401.09816 [pdf, other]

Jackknife empirical likelihood ratio test for testing the equality of semivariance

Authors: Saparya Suresh, Sudheesh K. Kattumannil

Abstract: Semivariance is a measure of the dispersion of all observations that fall above the mean or target value of a random variable and it plays an important role in life-length, actuarial and income studies. In this paper, we develop a new non-parametric test for equality of upper semi-variance. We use the U-statistic theory to derive the test statistic and then study the asymptotic properties of the t… ▽ More Semivariance is a measure of the dispersion of all observations that fall above the mean or target value of a random variable and it plays an important role in life-length, actuarial and income studies. In this paper, we develop a new non-parametric test for equality of upper semi-variance. We use the U-statistic theory to derive the test statistic and then study the asymptotic properties of the test statistic. We also develop a jackknife empirical likelihood (JEL) ratio test for equality of upper Semivariance. Extensive Monte Carlo simulation studies are carried out to validate the performance of the proposed JEL-based test. We illustrate the test procedure using real data. △ Less

Submitted 18 January, 2024; originally announced January 2024.

MSC Class: 62G10

arXiv:2312.13469 [pdf, other]

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

Authors: Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam

Abstract: To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects… ▽ More To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of $81$% and average pose drifts of $4.7\,\text{mm}$, further reduced to $2.3\,\text{mm}$ with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to $94$% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/ △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 43 pages, 20 figures, 1 table; https://suddhu.github.io/neural-feels/

arXiv:2311.18619 [pdf, other]

Experimental study of the binding energy of NH3 on different types of ice and its impact on the snow line of NH3 and H2O

Authors: S. Kakkenpara Suresh, F. Dulieu, J. Vitorino, P. Caselli

Abstract: N-bearing molecules (like N2H+ or NH3) are excellent tracers of high-density, low-temperature regions like dense cloud cores and could shed light into snowlines in protoplanetary disks and the chemical evolution of comets. However, uncertainties exist about the grain surface chemistry of these molecules -- which could play an important role in their formation and evolution. This study explores exp… ▽ More N-bearing molecules (like N2H+ or NH3) are excellent tracers of high-density, low-temperature regions like dense cloud cores and could shed light into snowlines in protoplanetary disks and the chemical evolution of comets. However, uncertainties exist about the grain surface chemistry of these molecules -- which could play an important role in their formation and evolution. This study explores experimentally the behaviour of NH$_3$ on surfaces mimicking grains under interstellar conditions alongside other major interstellar ice components (ie. H$_2$O, CO, CO$_2$). We performed co-deposition experiments using the Ultra High Vacuum (UHV) setup VENUS (VErs des NoUvelles Syntheses) of NH$_3$ along with other adsorbates (here, H$_2$O, $^{13}$CO and CO$_2$) and performed Temperature Programmed Desorption (TPD) and Temperature Programmed-During Exposure Desorption (TP-DED) experiments. We obtained binding Energy (BE) distribution of NH$_3$ on Crystalline Ice(CI) and compact-Amorphous Solid Water (c-ASW) by analyses of the TPD profiles of NH3 on the substrates. We observe a significant delay in the desorption and a decrease in the desorption rate of NH$_3$ when H$_2$O is introduced into the co-deposited mixture of NH$_3$-$^{13}$Co or NH$_3$-CO$_2$, absent without H$_2$O. Secondly, H$_2$O traps nearly 5-9 per cent of the co-deposited NH3, released during water's amorphous-to-crystalline phase change. Thirdly, for CI, we obtained a BE distribution between 3780K-4080K, and c-ASW between 3780K-5280K -- using a pre-exponential factor A = 1.94$\times 10^{15}$/s. We conclude that NH$_3$ behaviour is significantly influenced by the presence of H$_2$O due to the formation of hydrogen bonds, in line with quantum calculations. This interaction preserves NH$_3$ on grain surfaces to higher temperatures making it available to the central protostar in protoplanetary disks. It also explains why NH$_3$ freeze out in pre-stellar cores is efficient. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.10127 [pdf, other]

Learning interactions to boost human creativity with bandits and GPT-4

Authors: Ara Vartanian, Xiaoxi Sun, Yun-Shiuan Chuang, Siddharth Suresh, Xiaojin Zhu, Timothy T. Rogers

Abstract: This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experimen… ▽ More This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experiments with humans and with a language AI (GPT-4) we contrast behavior in the standard task versus a variant in which participants can ask for algorithmically-generated hints. Algorithm choice is administered by a multi-armed bandit whose reward indicates whether the hint helped generating more features. Humans and the AI show similar benefits from hints, and remarkably, bandits learning from AI responses prefer the same prompting strategy as those learning from human behavior. The results suggest that strategies for boosting human creativity via computer interactions can be learned by bandits run on groups of simulated participants. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09947 [pdf]

Natural Disaster Analysis using Satellite Imagery and Social-Media Data for Emergency Response Situations

Authors: Sukeerthi Mandyam, Shanmuga Priya MG, Shalini Suresh, Kavitha Srinivasan

Abstract: Disaster Management is one of the most promising research areas because of its significant economic, environmental and social repercussions. This research focuses on analyzing different types of data (pre and post satellite images and twitter data) related to disaster management for in-depth analysis of location-wise emergency requirements. This research has been divided into two stages, namely, s… ▽ More Disaster Management is one of the most promising research areas because of its significant economic, environmental and social repercussions. This research focuses on analyzing different types of data (pre and post satellite images and twitter data) related to disaster management for in-depth analysis of location-wise emergency requirements. This research has been divided into two stages, namely, satellite image analysis and twitter data analysis followed by integration using location. The first stage involves pre and post disaster satellite image analysis of the location using multi-class land cover segmentation technique based on U-Net architecture. The second stage focuses on mapping the region with essential information about the disaster situation and immediate requirements for relief operations. The severely affected regions are demarcated and twitter data is extracted using keywords respective to that location. The extraction of situational information from a large corpus of raw tweets adopts Content Word based Tweet Summarization (COWTS) technique. An integration of these modules using real-time location-based mapping and frequency analysis technique gathers multi-dimensional information in the advent of disaster occurrence such as the Kerala and Mississippi floods that were analyzed and validated as test cases. The novelty of this research lies in the application of segmented satellite images for disaster relief using highlighted land cover changes and integration of twitter data by mapping these region-specific filters for obtaining a complete overview of the disaster. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09665 [pdf, other]

The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

Authors: Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human gro… ▽ More Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence. △ Less

Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09618 [pdf, other]

Simulating Opinion Dynamics with Networks of LLM-based Agents

Authors: Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings re… ▽ More Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings reveal a strong inherent bias in LLM agents towards producing accurate information, leading simulated agents to consensus in line with scientific reality. This bias limits their utility for understanding resistance to consensus views on issues like climate change. After inducing confirmation bias through prompt engineering, however, we observed opinion fragmentation in line with existing agent-based modeling and opinion dynamics research. These insights highlight the promise and limitations of LLM agents in this domain and suggest a path forward: refining LLMs with real-world discourse to better simulate the evolution of human beliefs. △ Less

Submitted 31 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.04592 [pdf, other]

On Characterizing the Evolution of Embedding Space of Neural Networks using Algebraic Topology

Authors: Suryaka Suresh, Bishshoy Das, Vinayak Abrol, Sumantra Dutta Roy

Abstract: We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. Motivated by existing studies using simplicial complexes on shallow fully connected networks (FCN), we present an extended analysis using Cubical homology instead, with a variety of popular deep architectures and real image datasets. We dem… ▽ More We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. Motivated by existing studies using simplicial complexes on shallow fully connected networks (FCN), we present an extended analysis using Cubical homology instead, with a variety of popular deep architectures and real image datasets. We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value. The rate of decay in topological complexity (as a metric) helps quantify the impact of architectural choices on the generalization ability. Interestingly from a representation learning perspective, we highlight several invariances such as topological invariance of (1) an architecture on similar datasets; (2) embedding space of a dataset for architectures of variable depth; (3) embedding space to input resolution/size, and (4) data sub-sampling. In order to further demonstrate the link between expressivity \& the generalization capability of a network, we consider the task of ranking pre-trained models for downstream classification task (transfer learning). Compared to existing approaches, the proposed metric has a better correlation to the actually achievable accuracy via fine-tuning the pre-trained model. △ Less

Submitted 9 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.14340 [pdf, other]

Social Commonsense-Guided Search Query Generation for Open-Domain Knowledge-Powered Conversations

Authors: Revanth Gangi Reddy, Hao Bai, Wentao Yao, Sharath Chandra Etagi Suresh, Heng Ji, ChengXiang Zhai

Abstract: Open-domain dialog involves generating search queries that help obtain relevant knowledge for holding informative conversations. However, it can be challenging to determine what information to retrieve when the user is passive and does not express a clear need or request. To tackle this issue, we present a novel approach that focuses on generating internet search queries that are guided by social… ▽ More Open-domain dialog involves generating search queries that help obtain relevant knowledge for holding informative conversations. However, it can be challenging to determine what information to retrieve when the user is passive and does not express a clear need or request. To tackle this issue, we present a novel approach that focuses on generating internet search queries that are guided by social commonsense. Specifically, we leverage a commonsense dialog system to establish connections related to the conversation topic, which subsequently guides our query generation. Our proposed framework addresses passive user interactions by integrating topic tracking, commonsense response generation and instruction-driven query generation. Through extensive evaluations, we show that our approach overcomes limitations of existing query generation techniques that rely solely on explicit dialog information, and produces search queries that are more relevant, specific, and compelling, ultimately resulting in more engaging responses. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted in EMNLP 2023 Findings

arXiv:2310.02388 [pdf, other]

Computing a Sparse Approximate Inverse on Quantum Annealing Machines

Authors: Sanjay Suresh, Krishnan Suresh

Abstract: Many engineering problems involve solving large linear systems of equations. Conjugate gradient (CG) is one of the most popular iterative methods for solving such systems. However, CG typically requires a good preconditioner to speed up convergence. One such preconditioner is the sparse approximate inverse (SPAI). In this paper, we explore the computation of an SPAI on quantum annealing machines… ▽ More Many engineering problems involve solving large linear systems of equations. Conjugate gradient (CG) is one of the most popular iterative methods for solving such systems. However, CG typically requires a good preconditioner to speed up convergence. One such preconditioner is the sparse approximate inverse (SPAI). In this paper, we explore the computation of an SPAI on quantum annealing machines by solving a series of quadratic unconstrained binary optimization (QUBO) problems. Numerical experiments are conducted using both well-conditioned and poorly-conditioned linear systems arising from a 2D finite difference formulation of the Poisson problem. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 16 pages, 8 figures

arXiv:2310.02273 [pdf, other]

A New measure of income inequality

Authors: Sudheesh K Kattumannil, Saparya Suresh

Abstract: A new measure of income inequality that captures the heavy tail behavior of the income distribution is proposed. We discuss two different approaches to find the estimators of the proposed measure. We show that these estimators are consistent and have an asymptotically normal distribution. We also obtain a jackknife empirical likelihood (JEL) confidence interval of the income inequality measure. A… ▽ More A new measure of income inequality that captures the heavy tail behavior of the income distribution is proposed. We discuss two different approaches to find the estimators of the proposed measure. We show that these estimators are consistent and have an asymptotically normal distribution. We also obtain a jackknife empirical likelihood (JEL) confidence interval of the income inequality measure. A Monte Carlo simulation study is conducted to evaluate the finite sample properties of the estimators and JEL-based confidence inerval. Finally, we use our measure to study the income inequality of three states in India. △ Less

Submitted 20 August, 2024; v1 submitted 28 September, 2023; originally announced October 2023.

arXiv:2309.09979 [pdf, other]

General In-Hand Object Rotation with Vision and Touch

Authors: Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

Abstract: We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a… ▽ More We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing. △ Less

Submitted 28 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: CoRL 2023; Website: https://haozhi.io/rotateit/

arXiv:2308.13773 [pdf, other]

Solving the insecurity problem for assertions

Authors: R Ramanujam, Vaishnavi Sundararajan, S P Suresh

Abstract: In the symbolic verification of cryptographic protocols, a central problem is deciding whether a protocol admits an execution which leaks a designated secret to the malicious intruder. Rusinowitch & Turuani (2003) show that, when considering finitely many sessions, this ``insecurity problem'' is NP-complete. Central to their proof strategy is the observation that any execution of a protocol can be… ▽ More In the symbolic verification of cryptographic protocols, a central problem is deciding whether a protocol admits an execution which leaks a designated secret to the malicious intruder. Rusinowitch & Turuani (2003) show that, when considering finitely many sessions, this ``insecurity problem'' is NP-complete. Central to their proof strategy is the observation that any execution of a protocol can be simulated by one where the intruder only communicates terms of bounded size. However, when we consider models where, in addition to terms, one can also communicate logical statements about terms, the analysis of the insecurity problem becomes tricky when both these inference systems are considered together. In this paper we consider the insecurity problem for protocols with logical statements that include {\em equality on terms} and {\em existential quantification}. Witnesses for existential quantifiers may be unbounded, and obtaining small witness terms while maintaining equality proofs complicates the analysis considerably. We extend techniques from Rusinowitch & Turuani (2003) to show that this problem is also in NP. △ Less

Submitted 26 January, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

arXiv:2307.16393 [pdf, other]

Modular Self-Lock Origami: design, modeling, and simulation to improve the performance of a rotational joint

Authors: Samira Zare, Alex Spaeth, Sandya Suresh, and Mircea Teodorescu

Abstract: Origami structures have been widely explored in robotics due to their many potential advantages. Origami robots can be very compact, as well as cheap and efficient to produce. In particular, they can be constructed in a flat format using modern manufacturing techniques. Rotational motion is essential for robotics, and a variety of origami rotational joints have been proposed in the literature. How… ▽ More Origami structures have been widely explored in robotics due to their many potential advantages. Origami robots can be very compact, as well as cheap and efficient to produce. In particular, they can be constructed in a flat format using modern manufacturing techniques. Rotational motion is essential for robotics, and a variety of origami rotational joints have been proposed in the literature. However, few of these are even approximately flat-foldable. One potential enabler of flat origami rotational joints is the inclusion of lightweight pneumatic pouches which actuate the origami's folds; however, pouch actuators only enable a relatively small amount of rotational displacement. The previously proposed Four-Vertex Origami is a flat-foldable structure which provides an angular multiplier for a pouch actuator, but suffers from a degenerate state. This paper presents a novel rigid origami, the Self-Lock Origami, which eliminates this degeneracy by slightly relaxing the assumption of flat-foldability. This joint is analysed in terms of a trade-off between the angular multiplier and the mechanical advantage. Furthermore, the Self-Lock Origami is a modular joint which can be connected to similar or different joints to produce complex movements for various applications; three different manipulator designs are introduced as a proof of concept. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 11 pages, 8 figures

arXiv:2306.15236 [pdf, other]

doi 10.1088/1367-2630/acd94e

Towards Stirling engine using an optically confined particle subjected to asymmetric temperature profile

Authors: Gokul Nalupurackal, Muruga Lokesh, Sarangi Suresh, Srestha Roy, Snigdhadev Chakraborty, Jayesh Goswami, Arnab Pal, Basudev Roy

Abstract: The realization of microscopic heat engines has gained a surge of research interest in statistical physics, soft matter, and biological physics. A typical microscopic heat engine employs a colloidal particle trapped in a confining potential, which is modulated in time to mimic the cycle operations. Here, we use a lanthanide-doped upconverting particle (UCP) suspended in a passive aqueous bath, whi… ▽ More The realization of microscopic heat engines has gained a surge of research interest in statistical physics, soft matter, and biological physics. A typical microscopic heat engine employs a colloidal particle trapped in a confining potential, which is modulated in time to mimic the cycle operations. Here, we use a lanthanide-doped upconverting particle (UCP) suspended in a passive aqueous bath, which is highly absorptive at 975 nm and converts NIR photons to visible, as the working substance of the engine. When a single UCP is optically trapped with a 975 nm laser, it behaves like an active particle by executing motion subjected to an asymmetric temperature profile along the direction of propagation of the laser. The strong absorption of 975 nm light by the particle introduces a temperature gradient and results in significant thermophoretic diffusion along the temperature gradient. However, the activity of the particle vanishes when the trapping wavelength is switched to 1064 nm. We carefully regulate the wavelength-dependent activity of the particle to engineer all four cycles of a Stirling engine by using a combination of 1064 nm and 975 nm wavelengths. Since the motion of the particle is stochastic, the work done on the particle due to the stiffness modulation per cycle is random. We provide statistical estimation for this work averaged over 5 cycles which can be extended towards several cycles to make a Stirling engine. Our experiment proposes a robust set-up to systematically harness temperature which is a crucial factor behind building microscopic engines. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: For published version, see https://iopscience.iop.org/article/10.1088/1367-2630/acd94e/meta

Journal ref: New J. Phys. 25 063001 (2023)

arXiv:2304.05591 [pdf, other]

Semantic Feature Verification in FLAN-T5

Authors: Siddharth Suresh, Kushin Mukherjee, Timothy T. Rogers

Abstract: This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of… ▽ More This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of semantic similarity amongst items that are distally related. The results suggest that LLMs can greatly enhance traditional methods of semantic feature norm verification, with implications for our understanding of conceptual representation in humans and machines. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: To appear as a Tiny Paper at ICLR 2023

arXiv:2304.05012 [pdf, other]

Human-machine cooperation for semantic feature listing

Authors: Kushin Mukherjee, Siddharth Suresh, Timothy T. Rogers

Abstract: Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexica… ▽ More Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexical-semantics from limited data with LLM-generated data to efficiently generate high-quality feature norms. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: To be published in the ICLR TinyPaper track

arXiv:2304.02754 [pdf, other]

Conceptual structure coheres in human cognition but not in large language models

Authors: Siddharth Suresh, Kushin Mukherjee, Xizheng Yu, Wei-Chun Huang, Lisa Padua, Timothy T Rogers

Abstract: Neural network models of language have long been used as a tool for developing hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to i… ▽ More Neural network models of language have long been used as a tool for developing hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to interrogate the latent structure of conceptual representations using experimental methods nearly identical to those commonly used with human participants. The current work utilizes three common techniques borrowed from cognitive psychology to estimate and compare the structure of concepts in humans and a suite of LLMs. In humans, we show that conceptual structure is robust to differences in culture, language, and method of estimation. Structures estimated from LLM behavior, while individually fairly consistent with those estimated from human behavior, vary much more depending upon the particular task used to generate responses--across tasks, estimates of conceptual structure from the very same model cohere less with one another than do human structure estimates. These results highlight an important difference between contemporary LLMs and human cognition, with implications for understanding some fundamental limitations of contemporary machine language. △ Less

Submitted 10 November, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

arXiv:2304.01396 [pdf, other]

Lidar based 3D Tracking and State Estimation of Dynamic Objects

Authors: Patil Shubham Suresh, Gautham Narayan Narasimhan

Abstract: State estimation of oncoming vehicles: Earlier research has been based on determining states like position, velocity, orientation , angular velocity, etc of ego-vehicle. Our approach focuses on estimating the states of non-ego vehicles which is crucial for Motion planning and decision-making. Dynamic Scene Based Localization: Our project will work on dynamic scenes like moving ego (self) and non-e… ▽ More State estimation of oncoming vehicles: Earlier research has been based on determining states like position, velocity, orientation , angular velocity, etc of ego-vehicle. Our approach focuses on estimating the states of non-ego vehicles which is crucial for Motion planning and decision-making. Dynamic Scene Based Localization: Our project will work on dynamic scenes like moving ego (self) and non-ego vehicles. Previous methods were focused on static environments. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 6 pages, 12 figures, Carnegie Mellon University work

Showing 1–50 of 95 results for author: Suresh, S