-
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Authors:
Liyan Tang,
Grace Kim,
Xinyu Zhao,
Thom Lake,
Wenxuan Ding,
Fangcong Yin,
Prasann Singhal,
Manya Wadhwa,
Zeyu Leo Liu,
Zayne Sprague,
Ramya Namuduri,
Bodun Hu,
Juan Diego Rodriguez,
Puyuan Peng,
Greg Durrett
Abstract:
Chart understanding presents a unique challenge for large vision-language models (LVLMs), as it requires the integration of sophisticated textual and visual reasoning capabilities. However, current LVLMs exhibit a notable imbalance between these skills, falling short on visual reasoning that is difficult to perform in text. We conduct a case study using a synthetic dataset solvable only through vi…
▽ More
Chart understanding presents a unique challenge for large vision-language models (LVLMs), as it requires the integration of sophisticated textual and visual reasoning capabilities. However, current LVLMs exhibit a notable imbalance between these skills, falling short on visual reasoning that is difficult to perform in text. We conduct a case study using a synthetic dataset solvable only through visual reasoning and show that model performance degrades significantly with increasing visual complexity, while human performance remains robust. We then introduce ChartMuseum, a new Chart Question Answering (QA) benchmark containing 1,162 expert-annotated questions spanning multiple reasoning types, curated from real-world charts across 184 sources, specifically built to evaluate complex visual and textual reasoning. Unlike prior chart understanding benchmarks -- where frontier models perform similarly and near saturation -- our benchmark exposes a substantial gap between model and human performance, while effectively differentiating model capabilities: although humans achieve 93% accuracy, the best-performing model Gemini-2.5-Pro attains only 63.0%, and the leading open-source LVLM Qwen2.5-VL-72B-Instruct achieves only 38.5%. Moreover, on questions requiring primarily visual reasoning, all models experience a 35%-55% performance drop from text-reasoning-heavy question performance. Lastly, our qualitative error analysis reveals specific categories of visual reasoning that are challenging for current LVLMs.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
EvalAgent: Discovering Implicit Evaluation Criteria from the Web
Authors:
Manya Wadhwa,
Zayne Sprague,
Chaitanya Malaviya,
Philippe Laban,
Junyi Jessy Li,
Greg Durrett
Abstract:
Evaluation of language model outputs on structured writing tasks is typically conducted with a number of desirable criteria presented to human evaluators or large language models (LLMs). For instance, on a prompt like "Help me draft an academic talk on coffee intake vs research productivity", a model response may be evaluated for criteria like accuracy and coherence. However, high-quality response…
▽ More
Evaluation of language model outputs on structured writing tasks is typically conducted with a number of desirable criteria presented to human evaluators or large language models (LLMs). For instance, on a prompt like "Help me draft an academic talk on coffee intake vs research productivity", a model response may be evaluated for criteria like accuracy and coherence. However, high-quality responses should do more than just satisfy basic task requirements. An effective response to this query should include quintessential features of an academic talk, such as a compelling opening, clear research questions, and a takeaway. To help identify these implicit criteria, we introduce EvalAgent, a novel framework designed to automatically uncover nuanced and task-specific criteria. EvalAgent first mines expert-authored online guidance. It then uses this evidence to propose diverse, long-tail evaluation criteria that are grounded in reliable external sources. Our experiments demonstrate that the grounded criteria produced by EvalAgent are often implicit (not directly stated in the user's prompt), yet specific (high degree of lexical precision). Further, EvalAgent criteria are often not satisfied by initial responses but they are actionable, such that responses can be refined to satisfy them. Finally, we show that combining LLM-generated and EvalAgent criteria uncovers more human-valued criteria than using LLMs alone.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
Authors:
Tuhina Tripathi,
Manya Wadhwa,
Greg Durrett,
Scott Niekum
Abstract:
Large Language Models (LLMs) are widely used as proxies for human labelers in both training (Reinforcement Learning from AI Feedback) and large-scale response evaluation (LLM-as-a-judge). Alignment and evaluation are critical components in the development of reliable LLMs, and the choice of feedback protocol plays a central role in both but remains understudied. In this work, we show that the choi…
▽ More
Large Language Models (LLMs) are widely used as proxies for human labelers in both training (Reinforcement Learning from AI Feedback) and large-scale response evaluation (LLM-as-a-judge). Alignment and evaluation are critical components in the development of reliable LLMs, and the choice of feedback protocol plays a central role in both but remains understudied. In this work, we show that the choice of feedback protocol (absolute scores versus relative preferences) can significantly affect evaluation reliability and induce systematic biases. In particular, we show that pairwise evaluation protocols are more vulnerable to distracted evaluation. Generator models can exploit spurious attributes (or distractor features) favored by the LLM judge, resulting in inflated scores for lower-quality outputs and misleading training signals. We find that absolute scoring is more robust to such manipulation, producing judgments that better reflect response quality and are less influenced by distractor features. Our results demonstrate that generator models can flip preferences by embedding distractor features, skewing LLM-as-a-judge comparisons and leading to inaccurate conclusions about model quality in benchmark evaluations. Pairwise preferences flip in about 35% of the cases, compared to only 9% for absolute scores. We offer recommendations for choosing feedback protocols based on dataset characteristics and evaluation objectives.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
QUDsim: Quantifying Discourse Similarities in LLM-Generated Text
Authors:
Ramya Namuduri,
Yating Wu,
Anshun Asher Zheng,
Manya Wadhwa,
Greg Durrett,
Junyi Jessy Li
Abstract:
As large language models become increasingly capable at various writing tasks, their weakness at generating unique and creative content becomes a major liability. Although LLMs have the ability to generate text covering diverse topics, there is an overall sense of repetitiveness across texts that we aim to formalize and quantify via a similarity metric. The familiarity between documents arises fro…
▽ More
As large language models become increasingly capable at various writing tasks, their weakness at generating unique and creative content becomes a major liability. Although LLMs have the ability to generate text covering diverse topics, there is an overall sense of repetitiveness across texts that we aim to formalize and quantify via a similarity metric. The familiarity between documents arises from the persistence of underlying discourse structures. However, existing similarity metrics dependent on lexical overlap and syntactic patterns largely capture $\textit{content}$ overlap, thus making them unsuitable for detecting $\textit{structural}$ similarities. We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression. We then use this framework to build $\textbf{QUDsim}$, a similarity metric that can detect discursive parallels between documents. Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs. Furthermore, LLMs are not only repetitive and structurally uniform, but are also divergent from human authors in the types of structures they use.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Strange quark stars in modified vector MIT bag model: role of $ρ$ and $φ$ mesons
Authors:
Mukul Wadhwa,
Manisha Kumari,
Arvind Kumar
Abstract:
In the present work, we study the properties of strange quark stars (SQSs) using the vector MIT bag model with modification in vector channels. Unlike recent studies which only consider interactions through $ω$ mesons, we analyze the possibility of $ρ$ and $φ$ vector channels. We consider two types of higher order non-linear self-interaction terms for the vector mesons. With these modifications, w…
▽ More
In the present work, we study the properties of strange quark stars (SQSs) using the vector MIT bag model with modification in vector channels. Unlike recent studies which only consider interactions through $ω$ mesons, we analyze the possibility of $ρ$ and $φ$ vector channels. We consider two types of higher order non-linear self-interaction terms for the vector mesons. With these modifications, we computed the equation of state (EoS) and mass-radius of strange stars for different values of vector coupling strength. We also calculate the tidal deformability, the Love number $k_2$ and the gravitational redshift of SQSs.
△ Less
Submitted 19 January, 2025;
originally announced January 2025.
-
The Ni isotopic composition of Ryugu reveals a common accretion region for carbonaceous chondrites
Authors:
Fridolin Spitzer,
Thorsten Kleine,
Christoph Burkhardt,
Timo Hopp,
Tetsuya Yokoyama,
Yoshinari Abe,
Jérôme Aléon,
Conel M. O'D. Alexander,
Sachiko Amari,
Yuri Amelin,
Ken-ichi Bajo,
Martin Bizzarro,
Audrey Bouvier,
Richard W. Carlson,
Marc Chaussidon,
Byeon-Gak Choi,
Nicolas Dauphas,
Andrew M. Davis,
Tommaso Di Rocco,
Wataru Fujiya,
Ryota Fukai,
Ikshu Gautam,
Makiko K. Haba,
Yuki Hibiya,
Hiroshi Hidaka
, et al. (66 additional authors not shown)
Abstract:
The isotopic compositions of samples returned from Cb-type asteroid Ryugu and Ivuna-type (CI) chondrites are distinct from other carbonaceous chondrites, which has led to the suggestion that Ryugu and CI chondrites formed in a different region of the accretion disk, possibly around the orbits of Uranus and Neptune. We show that, like for Fe, Ryugu and CI chondrites also have indistinguishable Ni i…
▽ More
The isotopic compositions of samples returned from Cb-type asteroid Ryugu and Ivuna-type (CI) chondrites are distinct from other carbonaceous chondrites, which has led to the suggestion that Ryugu and CI chondrites formed in a different region of the accretion disk, possibly around the orbits of Uranus and Neptune. We show that, like for Fe, Ryugu and CI chondrites also have indistinguishable Ni isotope anomalies, which differ from those of other carbonaceous chondrites. We propose that this unique Fe and Ni isotopic composition reflects different accretion efficiencies of small FeNi metal grains among the carbonaceous chondrite parent bodies. The CI chondrites incorporated these grains more efficiently, possibly because they formed at the end of the disk's lifetime, when planetesimal formation was also triggered by photoevaporation of the disk. Isotopic variations among carbonaceous chondrites may thus reflect fractionation of distinct dust components from a common reservoir, implying CI chondrites and Ryugu may have formed in the same region of the accretion disk as other carbonaceous chondrites.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Authors:
Zayne Sprague,
Fangcong Yin,
Juan Diego Rodriguez,
Dongwei Jiang,
Manya Wadhwa,
Prasann Singhal,
Xinyu Zhao,
Xi Ye,
Kyle Mahowald,
Greg Durrett
Abstract:
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs). But for what kinds of tasks is this extra ``thinking'' really helpful? To analyze this, we conducted a quantitative meta-analysis covering over 100 papers using CoT and ran our own evaluations of 20 datasets across 14 models. Our results show that CoT gives strong per…
▽ More
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs). But for what kinds of tasks is this extra ``thinking'' really helpful? To analyze this, we conducted a quantitative meta-analysis covering over 100 papers using CoT and ran our own evaluations of 20 datasets across 14 models. Our results show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks. On MMLU, directly generating the answer without CoT leads to almost identical accuracy as CoT unless the question or model's response contains an equals sign, indicating symbolic operations and reasoning. Following this finding, we analyze the behavior of CoT on these problems by separating planning and execution and comparing against tool-augmented LLMs. Much of CoT's gain comes from improving symbolic execution, but it underperforms relative to using a symbolic solver. Our results indicate that CoT can be applied selectively, maintaining performance while saving inference costs. Furthermore, they suggest a need to move beyond prompt-based CoT to new paradigms that better leverage intermediate computation across the whole range of LLM applications.
△ Less
Submitted 7 May, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Learning to Refine with Fine-Grained Natural Language Feedback
Authors:
Manya Wadhwa,
Xinyu Zhao,
Junyi Jessy Li,
Greg Durrett
Abstract:
Recent work has explored the capability of large language models (LLMs) to identify and correct errors in LLM-generated responses. These refinement approaches frequently evaluate what sizes of models are able to do refinement for what problems, but less attention is paid to what effective feedback for refinement looks like. In this work, we propose looking at refinement with feedback as a composit…
▽ More
Recent work has explored the capability of large language models (LLMs) to identify and correct errors in LLM-generated responses. These refinement approaches frequently evaluate what sizes of models are able to do refinement for what problems, but less attention is paid to what effective feedback for refinement looks like. In this work, we propose looking at refinement with feedback as a composition of three distinct LLM competencies: (1) detection of bad generations; (2) fine-grained natural language critique generation; (3) refining with fine-grained feedback. The first step can be implemented with a high-performing discriminative model and steps 2 and 3 can be implemented either via prompted or fine-tuned LLMs. A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors, made possible by offloading the discrimination to a separate model in step 1. We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries. Overall, DCR consistently outperforms existing end-to-end refinement approaches and current trained models not fine-tuned for factuality critiquing.
△ Less
Submitted 3 October, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Using Natural Language Explanations to Rescale Human Judgments
Authors:
Manya Wadhwa,
Jifan Chen,
Junyi Jessy Li,
Greg Durrett
Abstract:
The rise of large language models (LLMs) has brought a critical need for high-quality human-labeled data, particularly for processes like human feedback and evaluation. A common practice is to label data via consensus annotation over human judgments. However, annotators' judgments for subjective tasks can differ in many ways: they may reflect different qualitative judgments about an example, and t…
▽ More
The rise of large language models (LLMs) has brought a critical need for high-quality human-labeled data, particularly for processes like human feedback and evaluation. A common practice is to label data via consensus annotation over human judgments. However, annotators' judgments for subjective tasks can differ in many ways: they may reflect different qualitative judgments about an example, and they may be mapped to a labeling scheme in different ways. We show that these nuances can be captured by natural language explanations, and propose a method to rescale ordinal annotations and explanations using LLMs. Specifically, we feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric. These scores should reflect the annotators' underlying assessments of the example. The rubric can be designed or modified after annotation, and include distinctions that may not have been known when the original error taxonomy was devised. We explore our technique in the context of rating system outputs for a document-grounded question answering task, where LLMs achieve near-human performance. Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
△ Less
Submitted 9 September, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
PFSL: Personalized & Fair Split Learning with Data & Label Privacy for thin clients
Authors:
Manas Wadhwa,
Gagan Raj Gupta,
Ashutosh Sahu,
Rahul Saini,
Vidhi Mittal
Abstract:
The traditional framework of federated learning (FL) requires each client to re-train their models in every iteration, making it infeasible for resource-constrained mobile devices to train deep-learning (DL) models. Split learning (SL) provides an alternative by using a centralized server to offload the computation of activations and gradients for a subset of the model but suffers from problems of…
▽ More
The traditional framework of federated learning (FL) requires each client to re-train their models in every iteration, making it infeasible for resource-constrained mobile devices to train deep-learning (DL) models. Split learning (SL) provides an alternative by using a centralized server to offload the computation of activations and gradients for a subset of the model but suffers from problems of slow convergence and lower accuracy. In this paper, we implement PFSL, a new framework of distributed split learning where a large number of thin clients perform transfer learning in parallel, starting with a pre-trained DL model without sharing their data or labels with a central server. We implement a lightweight step of personalization of client models to provide high performance for their respective data distributions. Furthermore, we evaluate performance fairness amongst clients under a work fairness constraint for various scenarios of non-i.i.d. data distributions and unequal sample sizes. Our accuracy far exceeds that of current SL algorithms and is very close to that of centralized learning on several real-life benchmarks. It has a very low computation cost compared to FL variants and promises to deliver the full benefits of DL to extremely thin, resource-constrained clients.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Presolar stardust in asteroid Ryugu
Authors:
Jens Barosch,
Larry R. Nittler,
Jianhua Wang,
Conel M. O'D. Alexander,
Bradley T. De Gregorio,
Cécile Engrand,
Yoko Kebukawa,
Kazuhide Nagashima,
Rhonda M. Stroud,
Hikaru Yabuta,
Yoshinari Abe,
Jérôme Aléon,
Sachiko Amari,
Yuri Amelin,
Ken-ichi Bajo,
Laure Bejach,
Martin Bizzarro,
Lydie Bonal,
Audrey Bouvier,
Richard W. Carlson,
Marc Chaussidon,
Byeon-Gak Choi,
George D. Cody,
Emmanuel Dartois,
Nicolas Dauphas
, et al. (99 additional authors not shown)
Abstract:
We have conducted a NanoSIMS-based search for presolar material in samples recently returned from C-type asteroid Ryugu as part of JAXA's Hayabusa2 mission. We report the detection of all major presolar grain types with O- and C-anomalous isotopic compositions typically identified in carbonaceous chondrite meteorites: 1 silicate, 1 oxide, 1 O-anomalous supernova grain of ambiguous phase, 38 SiC, a…
▽ More
We have conducted a NanoSIMS-based search for presolar material in samples recently returned from C-type asteroid Ryugu as part of JAXA's Hayabusa2 mission. We report the detection of all major presolar grain types with O- and C-anomalous isotopic compositions typically identified in carbonaceous chondrite meteorites: 1 silicate, 1 oxide, 1 O-anomalous supernova grain of ambiguous phase, 38 SiC, and 16 carbonaceous grains. At least two of the carbonaceous grains are presolar graphites, whereas several grains with moderate C isotopic anomalies are probably organics. The presolar silicate was located in a clast with a less altered lithology than the typical extensively aqueously altered Ryugu matrix. The matrix-normalized presolar grain abundances in Ryugu are 4.8$^{+4.7}_{-2.6}$ ppm for O-anomalous grains, 25$^{+6}_{-5}$ ppm for SiC grains and 11$^{+5}_{-3}$ ppm for carbonaceous grains. Ryugu is isotopically and petrologically similar to carbonaceous Ivuna-type (CI) chondrites. To compare the in situ presolar grain abundances of Ryugu with CI chondrites, we also mapped Ivuna and Orgueil samples and found a total of SiC grains and 6 carbonaceous grains. No O-anomalous grains were detected. The matrix-normalized presolar grain abundances in the CI chondrites are similar to those in Ryugu: 23 $^{+7}_{-6}$ ppm SiC and 9.0$^{+5.3}_{-4.6}$ ppm carbonaceous grains. Thus, our results provide further evidence in support of the Ryugu-CI connection. They also reveal intriguing hints of small-scale heterogeneities in the Ryugu samples, such as locally distinct degrees of alteration that allowed the preservation of delicate presolar material.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Fairness for Text Classification Tasks with Identity Information Data Augmentation Methods
Authors:
Mohit Wadhwa,
Mohan Bhambhani,
Ashvini Jindal,
Uma Sawant,
Ramanujam Madhavan
Abstract:
Counterfactual fairness methods address the question: How would the prediction change if the sensitive identity attributes referenced in the text instance were different? These methods are entirely based on generating counterfactuals for the given training and test set instances. Counterfactual instances are commonly prepared by replacing sensitive identity terms, i.e., the identity terms present…
▽ More
Counterfactual fairness methods address the question: How would the prediction change if the sensitive identity attributes referenced in the text instance were different? These methods are entirely based on generating counterfactuals for the given training and test set instances. Counterfactual instances are commonly prepared by replacing sensitive identity terms, i.e., the identity terms present in the instance are replaced with other identity terms that fall under the same sensitive category. Therefore, the efficacy of these methods depends heavily on the quality and comprehensiveness of identity pairs. In this paper, we offer a two-step data augmentation process where (1) the former stage consists of a novel method for preparing a comprehensive list of identity pairs with word embeddings, and (2) the latter consists of leveraging prepared identity pairs list to enhance the training instances by applying three simple operations (namely identity pair replacement, identity term blindness, and identity pair swap). We empirically show that the two-stage augmentation process leads to diverse identity pairs and an enhanced training set, with an improved counterfactual token-based fairness metric score on two well-known text classification tasks.
△ Less
Submitted 4 February, 2022;
originally announced March 2022.
-
SSMF: Shifting Seasonal Matrix Factorization
Authors:
Koki Kawabata,
Siddharth Bhatia,
Rui Liu,
Mohit Wadhwa,
Bryan Hooi
Abstract:
Given taxi-ride counts information between departure and destination locations, how can we forecast their future demands? In general, given a data stream of events with seasonal patterns that innovate over time, how can we effectively and efficiently forecast future events? In this paper, we propose Shifting Seasonal Matrix Factorization approach, namely SSMF, that can adaptively learn multiple se…
▽ More
Given taxi-ride counts information between departure and destination locations, how can we forecast their future demands? In general, given a data stream of events with seasonal patterns that innovate over time, how can we effectively and efficiently forecast future events? In this paper, we propose Shifting Seasonal Matrix Factorization approach, namely SSMF, that can adaptively learn multiple seasonal patterns (called regimes), as well as switching between them. Our proposed method has the following properties: (a) it accurately forecasts future events by detecting regime shifts in seasonal patterns as the data stream evolves; (b) it works in an online setting, i.e., processes each observation in constant time and memory; (c) it effectively realizes regime shifts without human intervention by using a lossless data compression scheme. We demonstrate that our algorithm outperforms state-of-the-art baseline methods by accurately forecasting upcoming events on three real-world data streams.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Sketch-Based Anomaly Detection in Streaming Graphs
Authors:
Siddharth Bhatia,
Mohit Wadhwa,
Kenji Kawaguchi,
Neil Shah,
Philip S. Yu,
Bryan Hooi
Abstract:
Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges and subgraphs in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? For example, in intrusion detection, existing work seeks to detect either anomalous edges or anomalous subgraphs, but not both. In this paper, we first extend the count-min sketch data structu…
▽ More
Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges and subgraphs in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? For example, in intrusion detection, existing work seeks to detect either anomalous edges or anomalous subgraphs, but not both. In this paper, we first extend the count-min sketch data structure to a higher-order sketch. This higher-order sketch has the useful property of preserving the dense subgraph structure (dense subgraphs in the input turn into dense submatrices in the data structure). We then propose 4 online algorithms that utilize this enhanced data structure, which (a) detect both edge and graph anomalies; (b) process each edge and graph in constant memory and constant update time per newly arriving edge, and; (c) outperform state-of-the-art baselines on 4 real-world datasets. Our method is the first streaming approach that incorporates dense subgraph search to detect graph anomalies in constant memory and time.
△ Less
Submitted 13 July, 2023; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Directed Graph Representation through Vector Cross Product
Authors:
Ramanujam Madhavan,
Mohit Wadhwa
Abstract:
Graph embedding methods embed the nodes in a graph in low dimensional vector space while preserving graph topology to carry out the downstream tasks such as link prediction, node recommendation and clustering. These tasks depend on a similarity measure such as cosine similarity and Euclidean distance between a pair of embeddings that are symmetric in nature and hence do not hold good for directed…
▽ More
Graph embedding methods embed the nodes in a graph in low dimensional vector space while preserving graph topology to carry out the downstream tasks such as link prediction, node recommendation and clustering. These tasks depend on a similarity measure such as cosine similarity and Euclidean distance between a pair of embeddings that are symmetric in nature and hence do not hold good for directed graphs. Recent work on directed graphs, HOPE, APP, and NERD, proposed to preserve the direction of edges among nodes by learning two embeddings, source and target, for every node. However, these methods do not take into account the properties of directed edges explicitly. To understand the directional relation among nodes, we propose a novel approach that takes advantage of the non commutative property of vector cross product to learn embeddings that inherently preserve the direction of edges among nodes. We learn the node embeddings through a Siamese neural network where the cross-product operation is incorporated into the network architecture. Although cross product between a pair of vectors is defined in three dimensional, the approach is extended to learn N dimensional embeddings while maintaining the non-commutative property. In our empirical experiments on three real-world datasets, we observed that even very low dimensional embeddings could effectively preserve the directional property while outperforming some of the state-of-the-art methods on link prediction and node recommendation tasks
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
The Case for Non-Cryogenic Comet Nucleus Sample Return
Authors:
Keiko Nakamura-Messenger,
Alexander G. Hayes,
Scott Sandford,
Carol Raymond,
Steven W. Squyres,
Larry R. Nittler,
Samuel Birch,
Denis Bodewits,
Nancy Chabot,
Meenakshi Wadhwa,
Mathieu Choukroun,
Simon J. Clemett,
Maitrayee Bose,
Neil Dello Russo,
Jason P. Dworkin,
Jamie E. Elsila,
Kenton Fisher,
Perry Gerakines,
Daniel P. Glavin,
Julie Mitchell,
Michael Mumma,
Ann. N. Nguyen,
Lisa Pace,
Jason Soderblom,
Jessica M. Sunshine
Abstract:
Comets hold answers to mysteries of the Solar System by recording presolar history, the initial states of planet formation and prebiotic organics and volatiles to the early Earth. Analysis of returned samples from a comet nucleus will provide unparalleled knowledge about the Solar System starting materials and how they came together to form planets and give rise to life:
1. How did comets form?…
▽ More
Comets hold answers to mysteries of the Solar System by recording presolar history, the initial states of planet formation and prebiotic organics and volatiles to the early Earth. Analysis of returned samples from a comet nucleus will provide unparalleled knowledge about the Solar System starting materials and how they came together to form planets and give rise to life:
1. How did comets form?
2. Is comet material primordial, or has it undergone a complex alteration history?
3. Does aqueous alteration occur in comets?
4. What is the composition of cometary organics?
5. Did comets supply a substantial fraction of Earth's volatiles?
6. Did cometary organics contribute to the homochirality in life on Earth?
7. How do complex organic molecules form and evolve in interstellar, nebular, and planetary environments?
8. What can comets tell us about the mixing of materials in the protosolar nebula?
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Volatile Sample Return in the Solar System
Authors:
Stefanie N. Milam,
Jason P. Dworkin,
Jamie E. Elsila,
Daniel P. Glavin,
Perry A. Gerakines,
Julie L. Mitchell,
Keiko Nakamura-Messenger,
Marc Neveu,
Larry Nittler,
James Parker,
Elisa Quintana,
Scott A. Sandford,
Joshua E. Schlieder,
Rhonda Stroud,
Melissa G. Trainer,
Meenakshi Wadhwa,
Andrew J. Westphal,
Michael Zolensky,
Dennis Bodewits,
Simon Clemett
Abstract:
We advocate for the realization of volatile sample return from various destinations including: small bodies, the Moon, Mars, ocean worlds/satellites, and plumes. As part of recent mission studies (e.g., Comet Astrobiology Exploration SAmple Return (CAESAR) and Mars Sample Return), new concepts, technologies, and protocols have been considered for specific environments and cost. Here we provide a p…
▽ More
We advocate for the realization of volatile sample return from various destinations including: small bodies, the Moon, Mars, ocean worlds/satellites, and plumes. As part of recent mission studies (e.g., Comet Astrobiology Exploration SAmple Return (CAESAR) and Mars Sample Return), new concepts, technologies, and protocols have been considered for specific environments and cost. Here we provide a plan for volatile sample collection and identify the associated challenges with the environment, transit/storage, Earth re-entry, and curation. Laboratory and theoretical simulations are proposed to verify sample integrity during each mission phase. Sample collection mechanisms are evaluated for a given environment with consideration for alteration. Transport and curation are essential for sample return to maximize the science investment and ensure pristine samples for analysis upon return and after years of preservation. All aspects of a volatile sample return mission are driven by the science motivation: isotope fractionation, noble gases, organics and prebiotic species; plus planetary protection considerations for collection and for the sample.
The science value of sample return missions has been clearly demonstrated by previous sample return programs and missions.
Sample return of volatile material is key to understanding (exo)planet formation, evolution, and habitability.
Returning planetary volatiles poses unique and potentially severe technical challenges. These include preventing changes to samples between (and including) collection and analyses, and meeting planetary protection requirements.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Fairness-Aware Learning with Prejudice Free Representations
Authors:
Ramanujam Madhavan,
Mohit Wadhwa
Abstract:
Machine learning models are extensively being used to make decisions that have a significant impact on human life. These models are trained over historical data that may contain information about sensitive attributes such as race, sex, religion, etc. The presence of such sensitive attributes can impact certain population subgroups unfairly. It is straightforward to remove sensitive features from t…
▽ More
Machine learning models are extensively being used to make decisions that have a significant impact on human life. These models are trained over historical data that may contain information about sensitive attributes such as race, sex, religion, etc. The presence of such sensitive attributes can impact certain population subgroups unfairly. It is straightforward to remove sensitive features from the data; however, a model could pick up prejudice from latent sensitive attributes that may exist in the training data. This has led to the growing apprehension about the fairness of the employed models. In this paper, we propose a novel algorithm that can effectively identify and treat latent discriminating features. The approach is agnostic of the learning algorithm and generalizes well for classification as well as regression tasks. It can also be used as a key aid in proving that the model is free of discrimination towards regulatory compliance if the need arises. The approach helps to collect discrimination-free features that would improve the model performance while ensuring the fairness of the model. The experimental results from our evaluations on publicly available real-world datasets show a near-ideal fairness measurement in comparison to other methods.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
Group Affect Prediction Using Multimodal Distributions
Authors:
Saqib Shamsi,
Bhanu Pratap Singh Rawat,
Manya Wadhwa
Abstract:
We describe our approach towards building an efficient predictive model to detect emotions for a group of people in an image. We have proposed that training a Convolutional Neural Network (CNN) model on the emotion heatmaps extracted from the image, outperforms a CNN model trained entirely on the raw images. The comparison of the models have been done on a recently published dataset of Emotion Rec…
▽ More
We describe our approach towards building an efficient predictive model to detect emotions for a group of people in an image. We have proposed that training a Convolutional Neural Network (CNN) model on the emotion heatmaps extracted from the image, outperforms a CNN model trained entirely on the raw images. The comparison of the models have been done on a recently published dataset of Emotion Recognition in the Wild (EmotiW) challenge, 2017. The proposed method achieved validation accuracy of 55.23% which is 2.44% above the baseline accuracy, provided by the EmotiW organizers.
△ Less
Submitted 12 March, 2018; v1 submitted 17 September, 2017;
originally announced October 2017.
-
Rules in Play: On the Complexity of Routing Tables and Firewalls
Authors:
Mohit Wadhwa,
Ambar Pal,
Ayush Shah,
Paritosh Mittal,
H. B. Acharya
Abstract:
A fundamental component of networking infras- tructure is the policy, used in routing tables and firewalls. Accordingly, there has been extensive study of policies. However, the theory of such policies indicates that the size of the decision tree for a policy is very large ( O((2n)d), where the policy has n rules and examines d features of packets). If this was indeed the case, the existing algori…
▽ More
A fundamental component of networking infras- tructure is the policy, used in routing tables and firewalls. Accordingly, there has been extensive study of policies. However, the theory of such policies indicates that the size of the decision tree for a policy is very large ( O((2n)d), where the policy has n rules and examines d features of packets). If this was indeed the case, the existing algorithms to detect anomalies, conflicts, and redundancies would not be tractable for practical policies (say, n = 1000 and d = 10). In this paper, we clear up this apparent paradox. Using the concept of 'rules in play', we calculate the actual upper bound on the size of the decision tree, and demonstrate how three other factors - narrow fields, singletons, and all-matches make the problem tractable in practice. We also show how this concept may be used to solve an open problem: pruning a policy to the minimum possible number of rules, without changing its meaning.
△ Less
Submitted 27 October, 2015;
originally announced October 2015.
-
Iron-60 evidence for early injection and efficient mixing of stellar debris in the protosolar nebula
Authors:
N. Dauphas,
D. L. Cook,
A. Sacarabany,
C. Frohlich,
A. M. Davis,
M. Wadhwa,
A. Pourmand,
T. Rauscher,
R. Gallino
Abstract:
Among extinct radioactivities present in meteorites, 60Fe (t1/2 = 1.49 Myr) plays a key role as a high-resolution chronometer, a heat source in planetesimals, and a fingerprint of the astrophysical setting of solar system formation. A critical issue with 60Fe is that it could have been heterogeneously distributed in the protoplanetary disk, calling into question the efficiency of mixing in the s…
▽ More
Among extinct radioactivities present in meteorites, 60Fe (t1/2 = 1.49 Myr) plays a key role as a high-resolution chronometer, a heat source in planetesimals, and a fingerprint of the astrophysical setting of solar system formation. A critical issue with 60Fe is that it could have been heterogeneously distributed in the protoplanetary disk, calling into question the efficiency of mixing in the solar nebula or the timing of 60Fe injection relative to planetesimal formation. If this were the case, one would expect meteorites that did not incorporate 60Fe (either because of late injection or incomplete mixing) to show 60Ni deficits (from lack of 60Fe decay) and collateral effects on other neutron-rich isotopes of Fe and Ni (coproduced with 60Fe in core-collapse supernovae and AGB-stars). Here, we show that measured iron meteorites and chondrites have Fe and Ni isotopic compositions identical to Earth. This demonstrates that 60Fe must have been injected into the protosolar nebula and mixed to less than 10 % heterogeneity before formation of planetary bodies.
△ Less
Submitted 16 May, 2008;
originally announced May 2008.
-
Double Tag Events in Two-Photon Collisions at LEP
Authors:
M. Wadhwa
Abstract:
Double tag events in two photon collisions are studied using the L3 detector at the LEP center of mass energies $\sqrt{s} \simeq 189-202$ GeV. The cross-section of $γ^* γ^*$ collisions is measured at an average photon virtuality $<Q^2 > = 15 \rm{GeV}^2$. The results are in agreement with Monte Carlo predictions based on perturbative QCD, while the Quark Parton Model alone is insufficient to desc…
▽ More
Double tag events in two photon collisions are studied using the L3 detector at the LEP center of mass energies $\sqrt{s} \simeq 189-202$ GeV. The cross-section of $γ^* γ^*$ collisions is measured at an average photon virtuality $<Q^2 > = 15 \rm{GeV}^2$. The results are in agreement with Monte Carlo predictions based on perturbative QCD, while the Quark Parton Model alone is insufficient to describe the data. The measurements are compared to the LO and the NLO BFKL calculations
△ Less
Submitted 13 October, 2000;
originally announced October 2000.
-
Two Photon Physics at LEP
Authors:
Maneesh Wadhwa
Abstract:
LEP offers an excellent opportunity to measure two photon processes over a large kinematical range and thus study the complex nature of the photon. This article reviews the experimental status of ``Two Photon Physics'' at LEP. The recent results on resonances, multi-hadron production and photon structure functions are discussed
LEP offers an excellent opportunity to measure two photon processes over a large kinematical range and thus study the complex nature of the photon. This article reviews the experimental status of ``Two Photon Physics'' at LEP. The recent results on resonances, multi-hadron production and photon structure functions are discussed
△ Less
Submitted 1 September, 1999;
originally announced September 1999.