-
DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis
Authors:
Nithin C. Babu,
Aniruddha Mahapatra,
Harsh Rangwani,
Rajiv Soundararajan,
Kuldeep Kulkarni
Abstract:
Existing text-to-video (T2V) evaluation benchmarks, such as VBench and EvalCrafter, suffer from two limitations. (i) While the emphasis is on subject-centric prompts or static camera scenes, camera motion essential for producing cinematic shots and existing metrics under dynamic motion are largely unexplored. (ii) These benchmarks typically aggregate video-level scores into a single model-level sc…
▽ More
Existing text-to-video (T2V) evaluation benchmarks, such as VBench and EvalCrafter, suffer from two limitations. (i) While the emphasis is on subject-centric prompts or static camera scenes, camera motion essential for producing cinematic shots and existing metrics under dynamic motion are largely unexplored. (ii) These benchmarks typically aggregate video-level scores into a single model-level score for ranking generative models. Such aggregation, however, overlook video-level evaluation, which is vital to selecting the better video among the candidate videos generated for a given prompt. To address these gaps, we introduce DynamicEval, a benchmark consisting of systematically curated prompts emphasizing dynamic camera motion, paired with 45k human annotations on video pairs from 3k videos generated by ten T2V models. DynamicEval evaluates two key dimensions of video quality: background scene consistency and foreground object consistency. For background scene consistency, we obtain the interpretable error maps based on the Vbench motion smoothness metric. We observe that while the Vbench motion smoothness metric shows promising alignment with human judgments, it fails in two cases: occlusions/disocclusions arising from camera and foreground object movements. Building on this, we propose a new background consistency metric that leverages object error maps to correct two failure cases in a principled manner. Our second innovation is the introduction of a foreground consistency metric that tracks points and their neighbors within each object instance to assess object fidelity. Extensive experiments demonstrate that our proposed metrics achieve stronger correlations with human preferences at both the video level and the model level (an improvement of more than 2% points), establishing DynamicEval as a more comprehensive benchmark for evaluating T2V models under dynamic camera motion.
△ Less
Submitted 8 October, 2025;
originally announced October 2025.
-
Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
Authors:
Amir Hossein Yari,
Kalmit Kulkarni,
Ahmad Raza Khan,
Fajri Koto
Abstract:
While automatic metrics drive progress in Machine Translation (MT) and Text Summarization (TS), existing metrics have been developed and validated almost exclusively for English and other high-resource languages. This narrow focus leaves Indian languages, spoken by over 1.5 billion people, largely overlooked, casting doubt on the universality of current evaluation practices. To address this gap, w…
▽ More
While automatic metrics drive progress in Machine Translation (MT) and Text Summarization (TS), existing metrics have been developed and validated almost exclusively for English and other high-resource languages. This narrow focus leaves Indian languages, spoken by over 1.5 billion people, largely overlooked, casting doubt on the universality of current evaluation practices. To address this gap, we introduce ITEM, a large-scale benchmark that systematically evaluates the alignment of 26 automatic metrics with human judgments across six major Indian languages, enriched with fine-grained annotations. Our extensive evaluation, covering agreement with human judgments, sensitivity to outliers, language-specific reliability, inter-metric correlations, and resilience to controlled perturbations, reveals four central findings: (1) LLM-based evaluators show the strongest alignment with human judgments at both segment and system levels; (2) outliers exert a significant impact on metric-human agreement; (3) in TS, metrics are more effective at capturing content fidelity, whereas in MT, they better reflect fluency; and (4) metrics differ in their robustness and sensitivity when subjected to diverse perturbations. Collectively, these findings offer critical guidance for advancing metric design and evaluation in Indian languages.
△ Less
Submitted 8 October, 2025;
originally announced October 2025.
-
ChannelFlow-Tools: A Standardized Dataset Creation Pipeline for 3D Obstructed Channel Flows
Authors:
Shubham Kavane,
Kajol Kulkarni,
Harald Koestler
Abstract:
We present ChannelFlow-Tools, a configuration-driven framework that standardizes the end-to-end path from programmatic CAD solid generation to ML-ready inputs and targets for 3D obstructed channel flows. The toolchain integrates geometry synthesis with feasibility checks, signed distance field (SDF) voxelization, automated solver orchestration on HPC (waLBerla LBM), and Cartesian resampling to co-…
▽ More
We present ChannelFlow-Tools, a configuration-driven framework that standardizes the end-to-end path from programmatic CAD solid generation to ML-ready inputs and targets for 3D obstructed channel flows. The toolchain integrates geometry synthesis with feasibility checks, signed distance field (SDF) voxelization, automated solver orchestration on HPC (waLBerla LBM), and Cartesian resampling to co-registered multi-resolution tensors. A single Hydra/OmegaConf configuration governs all stages, enabling deterministic reproduction and controlled ablations. As a case study, we generate 10k+ scenes spanning Re=100-15000 with diverse shapes and poses. An end-to-end evaluation of storage trade-offs directly from the emitted artifacts, a minimal 3D U-Net at 128x32x32, and example surrogate models with dataset size illustrate that the standardized representations support reproducible ML training. ChannelFlow-Tools turns one-off dataset creation into a reproducible, configurable pipeline for CFD surrogate modeling.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
Realization of a Chiral Photonic-Crystal Cavity with Broken Time-Reversal Symmetry
Authors:
Kiran M. Kulkarni,
Hongjing Xu,
Fuyang Tay,
Gustavo M. Rodriguez-Barrios,
Dasom Kim,
Alessandro Alabastri,
Vasil Rokaj,
Ceren B. Dag,
Andrey Baydin,
Junichiro Kono
Abstract:
Light-matter interactions in chiral cavities offer a compelling route to manipulate material properties by breaking fundamental symmetries such as time-reversal symmetry. However, only a limited number of chiral cavity implementations exhibiting broken time-reversal symmetry have been demonstrated to date. These typically rely on either the application of strong magnetic fields, circularly polariz…
▽ More
Light-matter interactions in chiral cavities offer a compelling route to manipulate material properties by breaking fundamental symmetries such as time-reversal symmetry. However, only a limited number of chiral cavity implementations exhibiting broken time-reversal symmetry have been demonstrated to date. These typically rely on either the application of strong magnetic fields, circularly polarized Floquet driving, or the hybridization of cavity modes with matter excitations in the ultrastrong coupling regime. Here, we present a one-dimensional terahertz photonic-crystal cavity that exhibits broken time-reversal symmetry. The cavity consists of a high-resistivity silicon wafer sandwiched between lightly n-doped InSb wafers. By exploiting the nonreciprocal response of a terahertz magnetoplasma and the exceptionally low effective mass of electrons in InSb, we demonstrate a circularly polarized cavity mode at 0.67 THz under a modest magnetic field of 0.3 T, with a quality factor exceeding 50. Temperature-, magnetic field-, and polarization-dependent measurements, supported by simulations, confirm the realization of a chiral cavity with broken time-reversal symmetry. This platform offers a robust and accessible approach for exploring chiral light--matter interactions and vacuum dressed quantum condensed matter in the terahertz regime.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
Towards physician-centered oversight of conversational diagnostic AI
Authors:
Elahe Vedadi,
David Barrett,
Natalie Harris,
Ellery Wulczyn,
Shashir Reddy,
Roma Ruparel,
Mike Schaekermann,
Tim Strother,
Ryutaro Tanno,
Yash Sharma,
Jihyeon Lee,
Cían Hughes,
Dylan Slack,
Anil Palepu,
Jan Freyberg,
Khaled Saab,
Valentin Liévin,
Wei-Hung Weng,
Tao Tu,
Yun Liu,
Nenad Tomasev,
Kavita Kulkarni,
S. Sara Mahdavi,
Kelvin Guu,
Joëlle Barral
, et al. (10 additional authors not shown)
Abstract:
Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assi…
▽ More
Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.
△ Less
Submitted 21 July, 2025;
originally announced July 2025.
-
Comparative Analysis of Deep Learning Models for Crop Disease Detection: A Transfer Learning Approach
Authors:
Saundarya Subramaniam,
Shalini Majumdar,
Shantanu Nadar,
Kaustubh Kulkarni
Abstract:
This research presents the development of an Artificial Intelligence (AI) - driven crop disease detection system designed to assist farmers in rural areas with limited resources. We aim to compare different deep learning models for a comparative analysis, focusing on their efficacy in transfer learning. By leveraging deep learning models, including EfficientNet, ResNet101, MobileNetV2, and our cus…
▽ More
This research presents the development of an Artificial Intelligence (AI) - driven crop disease detection system designed to assist farmers in rural areas with limited resources. We aim to compare different deep learning models for a comparative analysis, focusing on their efficacy in transfer learning. By leveraging deep learning models, including EfficientNet, ResNet101, MobileNetV2, and our custom CNN, which achieved a validation accuracy of 95.76%, the system effectively classifies plant diseases. This research demonstrates the potential of transfer learning in reshaping agricultural practices, improving crop health management, and supporting sustainable farming in rural environments.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Code Generation for Near-Roofline Finite Element Actions on GPUs from Symbolic Variational Forms
Authors:
Kaushik Kulkarni,
Andreas Klöckner
Abstract:
We present a novel parallelization strategy for evaluating Finite Element Method (FEM) variational forms on GPUs, focusing on those that are expressible through the Unified Form Language (UFL) on simplex meshes. We base our approach on code transformations, wherein we construct a space of scheduling candidates and rank them via a heuristic cost model to effectively handle the large diversity of co…
▽ More
We present a novel parallelization strategy for evaluating Finite Element Method (FEM) variational forms on GPUs, focusing on those that are expressible through the Unified Form Language (UFL) on simplex meshes. We base our approach on code transformations, wherein we construct a space of scheduling candidates and rank them via a heuristic cost model to effectively handle the large diversity of computational workloads that can be expressed in this way. We present a design of a search space to which the cost model is applied, along with an associated pruning strategy to limit the number of configurations that need to be empirically evaluated. The goal of our design is to strike a balance between the device's latency-hiding capabilities and the amount of state space, a key factor in attaining near-roofline performance.
To make our work widely available, we have prototyped our parallelization strategy within the \textsc{Firedrake} framework, a UFL-based FEM solver. We evaluate the performance of our parallelization scheme on two generations of Nvidia GPUs, specifically the Titan V (Volta architecture) and Tesla K40c (Kepler architecture), across a range of operators commonly used in applications, including fluid dynamics, wave propagation, and structural mechanics, in 2D and 3D geometries. Our results demonstrate that our proposed algorithm achieves more than $50\%$ roofline performance in $65\%$ of the test cases on both devices.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
vApps: Verifiable Applications at Internet Scale
Authors:
Isaac Zhang,
Kshitij Kulkarni,
Tan Li,
Daniel Wong,
Thomas Kim,
John Guibas,
Uma Roy,
Bryan Pellegrino,
Ryan Zarick
Abstract:
Blockchain technology promises a decentralized, trustless, and interoperable infrastructure. However, widespread adoption remains hindered by issues such as limited scalability, high transaction costs, and the complexity of maintaining coherent verification logic across different blockchain layers. This paper introduces Verifiable Applications (vApps), a novel development framework designed to str…
▽ More
Blockchain technology promises a decentralized, trustless, and interoperable infrastructure. However, widespread adoption remains hindered by issues such as limited scalability, high transaction costs, and the complexity of maintaining coherent verification logic across different blockchain layers. This paper introduces Verifiable Applications (vApps), a novel development framework designed to streamline the creation and deployment of verifiable blockchain computing applications. vApps offer a unified Rust-based Domain-Specific Language (DSL) within a comprehensive SDK, featuring modular abstractions for verification, proof generation, and inter-chain connectivity. This eases the developer's burden in securing diverse software components, allowing them to focus on application logic. The DSL also ensures that applications can automatically take advantage of specialized precompiles and hardware acceleration to achieve consistently high performance with minimal developer effort, as demonstrated by benchmark results for zero-knowledge virtual machines (zkVMs). Experiments show that native Rust execution eliminates interpretation overhead, delivering up to an 197x cycle count improvement compared to EVM-based approaches. Precompiled circuits can accelerate the proof by more than 95%, while GPU acceleration increases throughput by up to 30x and recursion compresses the proof size by up to 230x, enabling succinct and efficient verification. The framework also supports seamless integration with the Web2 and Web3 systems, enabling developers to focus solely on their application logic. Through modular architecture, robust security guarantees, and composability, vApps pave the way toward a trust-minimized and verifiable Internet-scale application environment.
△ Less
Submitted 29 April, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
Towards Conversational AI for Disease Management
Authors:
Anil Palepu,
Valentin Liévin,
Wei-Hung Weng,
Khaled Saab,
David Stutz,
Yong Cheng,
Kavita Kulkarni,
S. Sara Mahdavi,
Joëlle Barral,
Dale R. Webster,
Katherine Chou,
Avinatan Hassidim,
Yossi Matias,
James Manyika,
Ryutaro Tanno,
Vivek Natarajan,
Adam Rodman,
Tao Tu,
Alan Karthikesalingam,
Mike Schaekermann
Abstract:
While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic syste…
▽ More
While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Towards an AI co-scientist
Authors:
Juraj Gottweis,
Wei-Hung Weng,
Alexander Daryin,
Tao Tu,
Anil Palepu,
Petar Sirkovic,
Artiom Myaskovsky,
Felix Weissenberger,
Keran Rong,
Ryutaro Tanno,
Khaled Saab,
Dan Popovici,
Jacob Blum,
Fan Zhang,
Katherine Chou,
Avinatan Hassidim,
Burak Gokturk,
Amin Vahdat,
Pushmeet Kohli,
Yossi Matias,
Andrew Carroll,
Kavita Kulkarni,
Nenad Tomasev,
Yuan Guan,
Vikram Dhillon
, et al. (9 additional authors not shown)
Abstract:
Scientific discovery relies on scientists generating novel hypotheses that undergo rigorous experimental validation. To augment this process, we introduce an AI co-scientist, a multi-agent system built on Gemini 2.0. The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned…
▽ More
Scientific discovery relies on scientists generating novel hypotheses that undergo rigorous experimental validation. To augment this process, we introduce an AI co-scientist, a multi-agent system built on Gemini 2.0. The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned to scientist-provided research objectives and guidance. The system's design incorporates a generate, debate, and evolve approach to hypothesis generation, inspired by the scientific method and accelerated by scaling test-time compute. Key contributions include: (1) a multi-agent architecture with an asynchronous task execution framework for flexible compute scaling; (2) a tournament evolution process for self-improving hypotheses generation. Automated evaluations show continued benefits of test-time compute, improving hypothesis quality. While general purpose, we focus development and validation in three biomedical areas: drug repurposing, novel target discovery, and explaining mechanisms of bacterial evolution and anti-microbial resistance. For drug repurposing, the system proposes candidates with promising validation findings, including candidates for acute myeloid leukemia that show tumor inhibition in vitro at clinically applicable concentrations. For novel target discovery, the AI co-scientist proposed new epigenetic targets for liver fibrosis, validated by anti-fibrotic activity and liver cell regeneration in human hepatic organoids. Finally, the AI co-scientist recapitulated unpublished experimental results via a parallel in silico discovery of a novel gene transfer mechanism in bacterial evolution. These results, detailed in separate, co-timed reports, demonstrate the potential to augment biomedical and scientific discovery and usher an era of AI empowered scientists.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
NeuroTree: Hierarchical Functional Brain Pathway Decoding for Mental Health Disorders
Authors:
Jun-En Ding,
Dongsheng Luo,
Anna Zilverstand,
Kaustubh Kulkarni,
Feng Liu
Abstract:
Mental disorders are among the most widespread diseases globally. Analyzing functional brain networks through functional magnetic resonance imaging (fMRI) is crucial for understanding mental disorder behaviors. Although existing fMRI-based graph neural networks (GNNs) have demonstrated significant potential in brain network feature extraction, they often fail to characterize complex relationships…
▽ More
Mental disorders are among the most widespread diseases globally. Analyzing functional brain networks through functional magnetic resonance imaging (fMRI) is crucial for understanding mental disorder behaviors. Although existing fMRI-based graph neural networks (GNNs) have demonstrated significant potential in brain network feature extraction, they often fail to characterize complex relationships between brain regions and demographic information in mental disorders. To overcome these limitations, we propose a learnable NeuroTree framework that integrates a k-hop AGE-GCN with neural ordinary differential equations (ODEs) and contrastive masked functional connectivity (CMFC) to enhance similarities and dissimilarities of brain region distance. Furthermore, NeuroTree effectively decodes fMRI network features into tree structures, which improves the capture of high-order brain regional pathway features and enables the identification of hierarchical neural behavioral patterns essential for understanding disease-related brain subnetworks. Our empirical evaluations demonstrate that NeuroTree achieves state-of-the-art performance across two distinct mental disorder datasets. It provides valuable insights into age-related deterioration patterns, elucidating their underlying neural mechanisms.
△ Less
Submitted 23 May, 2025; v1 submitted 25 February, 2025;
originally announced February 2025.
-
Optimal Routing in the Presence of Hooks: Three Case Studies
Authors:
Tarun Chitra,
Kshitij Kulkarni,
Karthik Srinivasan
Abstract:
We consider the problem of optimally executing a user trade over networks of constant function market makers (CFMMs) in the presence of hooks. Hooks, introduced in an upcoming version of Uniswap, are auxiliary smart contracts that allow for extra information to be added to liquidity pools. This allows liquidity providers to enable constraints on trades, allowing CFMMs to read external data, such a…
▽ More
We consider the problem of optimally executing a user trade over networks of constant function market makers (CFMMs) in the presence of hooks. Hooks, introduced in an upcoming version of Uniswap, are auxiliary smart contracts that allow for extra information to be added to liquidity pools. This allows liquidity providers to enable constraints on trades, allowing CFMMs to read external data, such as volatility information, and implement additional features, such as onchain limit orders. We consider three important case studies for how to optimally route trades in the presence of hooks: 1) routing through limit orders, 2) optimal liquidations and time-weighted average market makers (TWAMMs), and 3) noncomposable hooks, which provide additional output in exchange for fill risk. Leveraging tools from convex optimization and dynamic programming, we propose simple methods for formulating and solving these problems that can be useful for practitioners.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Enhanced Rooftop Solar Panel Detection by Efficiently Aggregating Local Features
Authors:
Kuldeep Kurte,
Kedar Kulkarni
Abstract:
In this paper, we present an enhanced Convolutional Neural Network (CNN)-based rooftop solar photovoltaic (PV) panel detection approach using satellite images. We propose to use pre-trained CNN-based model to extract the local convolutional features of rooftops. These local features are then combined using the Vectors of Locally Aggregated Descriptors (VLAD) technique to obtain rooftop-level globa…
▽ More
In this paper, we present an enhanced Convolutional Neural Network (CNN)-based rooftop solar photovoltaic (PV) panel detection approach using satellite images. We propose to use pre-trained CNN-based model to extract the local convolutional features of rooftops. These local features are then combined using the Vectors of Locally Aggregated Descriptors (VLAD) technique to obtain rooftop-level global features, which are then used to train traditional Machine Learning (ML) models to identify rooftop images that do and do not contain PV panels. On the dataset used in this study, the proposed approach achieved rooftop-PV classification scores exceeding the predefined threshold of 0.9 across all three cities for each of the feature extractor networks evaluated. Moreover, we propose a 3-phase approach to enable efficient utilization of the previously trained models on a new city or region with limited labelled data. We illustrate the effectiveness of this 3-phase approach for multi-city rooftop-PV detection task.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
FloAt: Flow Warping of Self-Attention for Clothing Animation Generation
Authors:
Swasti Shreya Mishra,
Kuldeep Kulkarni,
Duygu Ceylan,
Balaji Vasan Srinivasan
Abstract:
We propose a diffusion model-based approach, FloAtControlNet to generate cinemagraphs composed of animations of human clothing. We focus on human clothing like dresses, skirts and pants. The input to our model is a text prompt depicting the type of clothing and the texture of clothing like leopard, striped, or plain, and a sequence of normal maps that capture the underlying animation that we desir…
▽ More
We propose a diffusion model-based approach, FloAtControlNet to generate cinemagraphs composed of animations of human clothing. We focus on human clothing like dresses, skirts and pants. The input to our model is a text prompt depicting the type of clothing and the texture of clothing like leopard, striped, or plain, and a sequence of normal maps that capture the underlying animation that we desire in the output. The backbone of our method is a normal-map conditioned ControlNet which is operated in a training-free regime. The key observation is that the underlying animation is embedded in the flow of the normal maps. We utilize the flow thus obtained to manipulate the self-attention maps of appropriate layers. Specifically, the self-attention maps of a particular layer and frame are recomputed as a linear combination of itself and the self-attention maps of the same layer and the previous frame, warped by the flow on the normal maps of the two frames. We show that manipulating the self-attention maps greatly enhances the quality of the clothing animation, making it look more natural as well as suppressing the background artifacts. Through extensive experiments, we show that the method proposed beats all baselines both qualitatively in terms of visual results and user study. Specifically, our method is able to alleviate the background flickering that exists in other diffusion model-based baselines that we consider. In addition, we show that our method beats all baselines in terms of RMSE and PSNR computed using the input normal map sequences and the normal map sequences obtained from the output RGB frames. Further, we show that well-established evaluation metrics like LPIPS, SSIM, and CLIP scores that are generally for visual quality are not necessarily suitable for capturing the subtle motions in human clothing animations.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Exploring Large Language Models for Specialist-level Oncology Care
Authors:
Anil Palepu,
Vikram Dhillon,
Polly Niravath,
Wei-Hung Weng,
Preethi Prasad,
Khaled Saab,
Ryutaro Tanno,
Yong Cheng,
Hanh Mai,
Ethan Burns,
Zainub Ajmal,
Kavita Kulkarni,
Philip Mansfield,
Dale Webster,
Joelle Barral,
Juraj Gottweis,
Mike Schaekermann,
S. Sara Mahdavi,
Vivek Natarajan,
Alan Karthikesalingam,
Tao Tu
Abstract:
Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast…
▽ More
Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care without specific fine-tuning to this challenging domain. To perform this evaluation, we curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases and mirroring the key information available to a multidisciplinary tumor board for decision-making (openly released with this work). We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy. To improve performance, we enhanced AMIE with the inference-time ability to perform web search retrieval to gather relevant and up-to-date clinical knowledge and refine its responses with a multi-stage self-critique pipeline. We compare response quality of AMIE with internal medicine trainees, oncology fellows, and general oncology attendings under both automated and specialist clinician evaluations. In our evaluations, AMIE outperformed trainees and fellows demonstrating the potential of the system in this challenging and important domain. We further demonstrate through qualitative examples, how systems such as AMIE might facilitate conversational interactions to assist clinicians in their decision making. However, AMIE's performance was overall inferior to attending oncologists suggesting that further research is needed prior to consideration of prospective uses.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Towards Democratization of Subspeciality Medical Expertise
Authors:
Jack W. O'Sullivan,
Anil Palepu,
Khaled Saab,
Wei-Hung Weng,
Yong Cheng,
Emily Chu,
Yaanik Desai,
Aly Elezaby,
Daniel Seung Kim,
Roy Lan,
Wilson Tang,
Natalie Tapaskar,
Victoria Parikh,
Sneha S. Jain,
Kavita Kulkarni,
Philip Mansfield,
Dale Webster,
Juraj Gottweis,
Joelle Barral,
Mike Schaekermann,
Ryutaro Tanno,
S. Sara Mahdavi,
Vivek Natarajan,
Alan Karthikesalingam,
Euan Ashley
, et al. (1 additional authors not shown)
Abstract:
The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI syst…
▽ More
The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI system optimized for diagnostic dialogue, to potentially augment and support clinical decision-making in this challenging context. We curated a real-world dataset of 204 complex cases from a subspecialist cardiology practice, including results for electrocardiograms, echocardiograms, cardiac MRI, genetic tests, and cardiopulmonary stress tests. We developed a ten-domain evaluation rubric used by subspecialists to evaluate the quality of diagnosis and clinical management plans produced by general cardiologists or AMIE, the latter enhanced with web-search and self-critique capabilities. AMIE was rated superior to general cardiologists for 5 of the 10 domains (with preference ranging from 9% to 20%), and equivalent for the rest. Access to AMIE's response improved cardiologists' overall response quality in 63.7% of cases while lowering quality in just 3.4%. Cardiologists' responses with access to AMIE were superior to cardiologist responses without access to AMIE for all 10 domains. Qualitative examinations suggest AMIE and general cardiologist could complement each other, with AMIE thorough and sensitive, while general cardiologist concise and specific. Overall, our results suggest that specialized medical LLMs have the potential to augment general cardiologists' capabilities by bridging gaps in subspecialty expertise, though further research and validation are essential for wide clinical utility.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models
Authors:
Divya Kothandaraman,
Kuldeep Kulkarni,
Sumit Shekhar,
Balaji Vasan Srinivasan,
Dinesh Manocha
Abstract:
We present ImPoster, a novel algorithm for generating a target image of a 'source' subject performing a 'driving' action. The inputs to our algorithm are a single pair of a source image with the subject that we wish to edit and a driving image with a subject of an arbitrary class performing the driving action, along with the text descriptions of the two images. Our approach is completely unsupervi…
▽ More
We present ImPoster, a novel algorithm for generating a target image of a 'source' subject performing a 'driving' action. The inputs to our algorithm are a single pair of a source image with the subject that we wish to edit and a driving image with a subject of an arbitrary class performing the driving action, along with the text descriptions of the two images. Our approach is completely unsupervised and does not require any access to additional annotations like keypoints or pose. Our approach builds on a pretrained text-to-image latent diffusion model and learns the characteristics of the source and the driving image by finetuning the diffusion model for a small number of iterations. At inference time, ImPoster performs step-wise text prompting i.e. it denoises by first moving in the direction of the image manifold corresponding to the driving image followed by the direction of the image manifold corresponding to the text description of the desired target image. We propose a novel diffusion guidance formulation, image frequency guidance, to steer the generation towards the manifold of the source subject and the driving action at every step of the inference denoising. Our frequency guidance formulations are derived from the frequency domain properties of images. We extensively evaluate ImPoster on a diverse set of source-driving image pairs to demonstrate improvements over baselines. To the best of our knowledge, ImPoster is the first approach towards achieving both subject-driven as well as action-driven image personalization. Code and data is available at https://github.com/divyakraman/ImPosterDiffusion2024.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Composing Parts for Expressive Object Generation
Authors:
Harsh Rangwani,
Aishwarya Agarwal,
Kuldeep Kulkarni,
R. Venkatesh Babu,
Srikrishna Karanam
Abstract:
Image composition and generation are processes where the artists need control over various parts of the generated images. However, the current state-of-the-art generation models, like Stable Diffusion, cannot handle fine-grained part-level attributes in the text prompts. Specifically, when additional attribute details are added to the base text prompt, these text-to-image models either generate an…
▽ More
Image composition and generation are processes where the artists need control over various parts of the generated images. However, the current state-of-the-art generation models, like Stable Diffusion, cannot handle fine-grained part-level attributes in the text prompts. Specifically, when additional attribute details are added to the base text prompt, these text-to-image models either generate an image vastly different from the image generated from the base prompt or ignore the attribute details. To mitigate these issues, we introduce PartComposer, a training-free method that enables image generation based on fine-grained part-level attributes specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartComposer first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right region. After obtaining part masks, we run a localized diffusion process in each part region based on fine-grained part attributes and combine them to produce the final image. All stages of PartComposer are based on repurposing a pre-trained diffusion model, which enables it to generalize across domains. We demonstrate the effectiveness of part-level control provided by PartComposer through qualitative visual examples and quantitative comparisons with contemporary baselines.
△ Less
Submitted 29 June, 2025; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Adaptive Incentive Design with Learning Agents
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Manxi Wu,
Shankar Sastry
Abstract:
We propose an adaptive incentive mechanism that learns the optimal incentives in environments where players continuously update their strategies. Our mechanism updates incentives based on each player's externality, defined as the difference between the player's marginal cost and the operator's marginal cost at each time step. The proposed mechanism updates the incentives on a slower timescale comp…
▽ More
We propose an adaptive incentive mechanism that learns the optimal incentives in environments where players continuously update their strategies. Our mechanism updates incentives based on each player's externality, defined as the difference between the player's marginal cost and the operator's marginal cost at each time step. The proposed mechanism updates the incentives on a slower timescale compared to the players' learning dynamics, resulting in a two-timescale coupled dynamical system. Notably, this mechanism is agnostic to the specific learning dynamics used by players to update their strategies. We show that any fixed point of this adaptive incentive mechanism corresponds to the optimal incentive mechanism, ensuring that the Nash equilibrium coincides with the socially optimal strategy. Additionally, we provide sufficient conditions under which the adaptive mechanism converges to a fixed point. Our results apply to both atomic and non-atomic games. To demonstrate the effectiveness of our proposed mechanism, we verify the convergence conditions in two practically relevant classes of games: atomic aggregative games and non-atomic routing games.
△ Less
Submitted 1 March, 2025; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Capabilities of Gemini Models in Medicine
Authors:
Khaled Saab,
Tao Tu,
Wei-Hung Weng,
Ryutaro Tanno,
David Stutz,
Ellery Wulczyn,
Fan Zhang,
Tim Strother,
Chunjong Park,
Elahe Vedadi,
Juanma Zambrano Chaves,
Szu-Yeu Hu,
Mike Schaekermann,
Aishwarya Kamath,
Yong Cheng,
David G. T. Barrett,
Cathy Cheung,
Basil Mustafa,
Anil Palepu,
Daniel McDuff,
Le Hou,
Tomer Golany,
Luyang Liu,
Jean-baptiste Alayrac,
Neil Houlsby
, et al. (42 additional authors not shown)
Abstract:
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G…
▽ More
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Understanding the Impact of Coalitions between EV Charging Stations
Authors:
Sukanya Kudva,
Kshitij Kulkarni,
Chinmay Maheshwari,
Anil Aswani,
Shankar Sastry
Abstract:
The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. As charging stations become ubiquitous, their substantial electricity consumption can influence grid operation and electricity pricing. Naturally, \textit{some} groups of charging stations, which could be jointly operated by a company, may coordinate to decide their charging profile. While coo…
▽ More
The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. As charging stations become ubiquitous, their substantial electricity consumption can influence grid operation and electricity pricing. Naturally, \textit{some} groups of charging stations, which could be jointly operated by a company, may coordinate to decide their charging profile. While coordination among all charging stations is ideal, it is unclear if coordination of some charging stations is better than no coordination. In this paper, we analyze this intermediate regime between no and full coordination of charging stations. We model EV charging as a non-cooperative aggregative game, where each station's cost is determined by both monetary payments tied to reactive electricity prices on the grid and its sensitivity to deviations from a desired charging profile. We consider a solution concept that we call $\mathcal{C}$-Nash equilibrium, which is tied to a coalition $\mathcal{C}$ of charging stations coordinating to reduce their costs. We provide sufficient conditions, in terms of the demand and sensitivity of charging stations, to determine when independent (aka uncoordinated) operation of charging stations could result in lower overall costs to charging stations, coalition and charging stations outside the coalition. Somewhat counter to common intuition, we show numerical instances where allowing charging stations to operate independently is better than coordinating a subset of stations as a coalition. Jointly, these results provide operators of charging stations insights into how to coordinate their charging behavior, and open several research directions.
△ Less
Submitted 3 October, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
An Analysis of Intent-Based Markets
Authors:
Tarun Chitra,
Kshitij Kulkarni,
Mallesh Pai,
Theo Diamandis
Abstract:
Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse user experience compared to their centralized counterparts. Recently, off-chain marketplaces, colloquially called `intent markets,' have been proposed as a solution to these problems. In these markets, agents called \emph{solvers} compete to satisfy…
▽ More
Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse user experience compared to their centralized counterparts. Recently, off-chain marketplaces, colloquially called `intent markets,' have been proposed as a solution to these problems. In these markets, agents called \emph{solvers} compete to satisfy user orders, which may include complicated user-specified conditions. We provide two formal models of solvers' strategic behavior: one probabilistic and another deterministic. In our first model, solvers initially pay upfront costs to enter a Dutch auction to fill the user's order and then exert congestive, costly effort to search for prices for the user. Our results show that the costs incurred by solvers result in restricted entry in the market. Further, in the presence of costly effort and congestion, our results counter-intuitively show that a planner who aims to maximize user welfare may actually prefer to restrict entry, resulting in limited oligopoly. We then introduce an alternative, optimization-based deterministic model which corroborates these results. We conclude with extensions of our model to other auctions within blockchains and non-cryptocurrency applications, such as the US SEC's Proposal 615.
△ Less
Submitted 6 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Spatial Craving Patterns in Marijuana Users: Insights from fMRI Brain Connectivity Analysis with High-Order Graph Attention Neural Networks
Authors:
Jun-En Ding,
Shihao Yang,
Anna Zilverstand,
Kaustubh R. Kulkarni,
Xiaosi Gu,
Feng Liu
Abstract:
The excessive consumption of marijuana can induce substantial psychological and social consequences. In this investigation, we propose an elucidative framework termed high-order graph attention neural networks (HOGANN) for the classification of Marijuana addiction, coupled with an analysis of localized brain network communities exhibiting abnormal activities among chronic marijuana users. HOGANN i…
▽ More
The excessive consumption of marijuana can induce substantial psychological and social consequences. In this investigation, we propose an elucidative framework termed high-order graph attention neural networks (HOGANN) for the classification of Marijuana addiction, coupled with an analysis of localized brain network communities exhibiting abnormal activities among chronic marijuana users. HOGANN integrates dynamic intrinsic functional brain networks, estimated from functional magnetic resonance imaging (fMRI), using graph attention-based long short-term memory (GAT-LSTM) to capture temporal network dynamics. We employ a high-order attention module for information fusion and message passing among neighboring nodes, enhancing the network community analysis. Our model is validated across two distinct data cohorts, yielding substantially higher classification accuracy than benchmark algorithms. Furthermore, we discern the most pertinent subnetworks and cognitive regions affected by persistent marijuana consumption, indicating adverse effects on functional brain networks, particularly within the dorsal attention and frontoparietal networks. Intriguingly, our model demonstrates superior performance in cohorts exhibiting prolonged dependence, implying that prolonged marijuana usage induces more pronounced alterations in brain networks. The model proficiently identifies craving brain maps, thereby delineating critical brain regions for analysis
△ Less
Submitted 8 September, 2024; v1 submitted 28 February, 2024;
originally announced March 2024.
-
Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Druv Pai,
Jiarui Yang,
Manxi Wu,
Shankar Sastry
Abstract:
Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. We address this concern by proposing a new class of congestion pricing schemes that not only minimize total travel time, but also incorporate an equity objective, reducing disparities in the relative c…
▽ More
Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. We address this concern by proposing a new class of congestion pricing schemes that not only minimize total travel time, but also incorporate an equity objective, reducing disparities in the relative change in travel costs across populations with different incomes, following the implementation of tolls. Our analysis builds on a congestion game model with heterogeneous traveler populations. We present four pricing schemes that account for practical considerations, such as the ability to charge differentiated tolls to various traveler populations and the option to toll all or only a subset of edges in the network. We evaluate our pricing schemes in the calibrated freeway network of the San Francisco Bay Area. We demonstrate that the proposed congestion pricing schemes improve both the total travel time and the equity objective compared to the current pricing scheme.
Our results further show that pricing schemes charging differentiated prices to traveler populations with varying value-of-time lead to a more equitable distribution of travel costs compared to those that charge a homogeneous price to all.
△ Less
Submitted 20 September, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Towards Conversational Diagnostic AI
Authors:
Tao Tu,
Anil Palepu,
Mike Schaekermann,
Khaled Saab,
Jan Freyberg,
Ryutaro Tanno,
Amy Wang,
Brenna Li,
Mohamed Amin,
Nenad Tomasev,
Shekoofeh Azizi,
Karan Singhal,
Yong Cheng,
Le Hou,
Albert Webson,
Kavita Kulkarni,
S Sara Mahdavi,
Christopher Semturs,
Juraj Gottweis,
Joelle Barral,
Katherine Chou,
Greg S Corrado,
Yossi Matias,
Alan Karthikesalingam,
Vivek Natarajan
Abstract:
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introdu…
▽ More
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Towards Accurate Differential Diagnosis with Large Language Models
Authors:
Daniel McDuff,
Mike Schaekermann,
Tao Tu,
Anil Palepu,
Amy Wang,
Jake Garrison,
Karan Singhal,
Yash Sharma,
Shekoofeh Azizi,
Kavita Kulkarni,
Le Hou,
Yong Cheng,
Yun Liu,
S Sara Mahdavi,
Sushant Prakash,
Anupam Pathak,
Christopher Semturs,
Shwetak Patel,
Dale R Webster,
Ewa Dominowska,
Juraj Gottweis,
Joelle Barral,
Katherine Chou,
Greg S Corrado,
Yossi Matias
, et al. (3 additional authors not shown)
Abstract:
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op…
▽ More
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
The Specter (and Spectra) of Miner Extractable Value
Authors:
Guillermo Angeris,
Tarun Chitra,
Theo Diamandis,
Kshitij Kulkarni
Abstract:
Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering of transactions. In this work, we introduce a simple theoretical definition of the 'cost of MEV', prove some basic properties, and show that the definition is useful via a number of examples. In a variety of settings, this definition is related to the 'smoothness' of a fun…
▽ More
Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering of transactions. In this work, we introduce a simple theoretical definition of the 'cost of MEV', prove some basic properties, and show that the definition is useful via a number of examples. In a variety of settings, this definition is related to the 'smoothness' of a function over the symmetric group. From this definition and some basic observations, we recover a number of results from the literature.
△ Less
Submitted 12 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
The Geometry of Constant Function Market Makers
Authors:
Guillermo Angeris,
Tarun Chitra,
Theo Diamandis,
Alex Evans,
Kshitij Kulkarni
Abstract:
Constant function market makers (CFMMs) are the most popular type of decentralized trading venue for cryptocurrency tokens. In this paper, we give a very general geometric framework (or 'axioms') which encompass and generalize many of the known results for CFMMs in the literature, without requiring strong conditions such as differentiability or homogeneity. One particular consequence of this frame…
▽ More
Constant function market makers (CFMMs) are the most popular type of decentralized trading venue for cryptocurrency tokens. In this paper, we give a very general geometric framework (or 'axioms') which encompass and generalize many of the known results for CFMMs in the literature, without requiring strong conditions such as differentiability or homogeneity. One particular consequence of this framework is that every CFMM has a (unique) canonical trading function that is nondecreasing, concave, and homogeneous, showing that many results known only for homogeneous trading functions are actually fully general. We also show that CFMMs satisfy a number of intuitive and geometric composition rules, and give a new proof, via conic duality, of the equivalence of the portfolio value function and the trading function. Many results are extended to the general setting where the CFMM is not assumed to be path-independent, but only one trade is allowed. Finally, we show that all 'path-independent' CFMMs have a simple geometric description that does not depend on any notion of a 'trading history'.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Attacks on Dynamic DeFi Interest Rate Curves
Authors:
Tarun Chitra,
Peteris Erins,
Kshitij Kulkarni
Abstract:
As decentralized money market protocols continue to grow in value locked, there have been a number of optimizations proposed for improving capital efficiency. One set of proposals from Euler Finance and Mars Protocol is to have an interest rate curve that is a proportional-integral-derivative (PID) controller. In this paper, we demonstrate attacks on proportional and proportional-integral controll…
▽ More
As decentralized money market protocols continue to grow in value locked, there have been a number of optimizations proposed for improving capital efficiency. One set of proposals from Euler Finance and Mars Protocol is to have an interest rate curve that is a proportional-integral-derivative (PID) controller. In this paper, we demonstrate attacks on proportional and proportional-integral controlled interest rate curves. The attack allows one to manipulate the interest rate curve to take a higher proportion of the earned yield than their pro-rata share of the lending pool. We conclude with an argument that PID interest rate curves can actually \emph{reduce} capital efficiency (due to attack mitigations) unless supply and demand elasticity to rate changes are sufficiently high.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images
Authors:
Hugo Bertiche,
Niloy J. Mitra,
Kuldeep Kulkarni,
Chun-Hao Paul Huang,
Tuanfeng Y. Wang,
Meysam Madadi,
Sergio Escalera,
Duygu Ceylan
Abstract:
Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We inv…
▽ More
Cinemagraphs are short looping videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces looping cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
Authors:
Shikha Baghel,
Shreyas Ramoji,
Sidharth,
Ranjana H,
Prachi Singh,
Somil Jain,
Pratik Roy Chowdhuri,
Kaustubh Kulkarni,
Swapnil Padhi,
Deepu Vijayasenan,
Sriram Ganapathy
Abstract:
In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed s…
▽ More
In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions.
△ Less
Submitted 5 June, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Table Tennis Stroke Detection and Recognition Using Ball Trajectory Data
Authors:
Kaustubh Milind Kulkarni,
Rohan S Jamadagni,
Jeffrey Aaron Paul,
Sucheth Shenoy
Abstract:
In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a tempora…
▽ More
In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a temporal heatmap based model, have been implemented on our dataset and their performances have been benchmarked. A mathematical approach developed to extract temporal boundaries of strokes using the ball trajectory data yielded a total of 2023 valid strokes in our dataset, while also detecting services and missed strokes successfully. The temporal convolutional network developed performed stroke recognition on completely unseen data with an accuracy of 87.155%. Several machine learning and deep learning based model architectures have been trained for stroke recognition using ball trajectory input and benchmarked based on their performances. While stroke recognition in the field of table tennis has been extensively explored based on human action recognition using video data focused on the player's actions, the use of ball trajectory data for the same is an unexplored characteristic of the sport. Hence, the motivation behind the work is to demonstrate that meaningful inferences such as stroke detection and recognition can be drawn using minimal input information.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Self-supervised Multi-view Disentanglement for Expansion of Visual Collections
Authors:
Nihal Jain,
Praneetha Vaddamanu,
Paridhi Maheshwari,
Vishwa Vinay,
Kuldeep Kulkarni
Abstract:
Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a s…
▽ More
Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a specific view. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views. To this end, we propose a self-supervised learning method for extracting disentangled view-specific representations for images such that the inter-view overlap is minimized. We show how this allows us to compute the intent of a collection as a distribution over views. We show how effective retrieval can be performed by prioritizing candidate expansion images that match the intent of a query collection. Finally, we present a new querying mechanism for image search enabled by composing multiple collections and perform retrieval under this setting using the techniques presented in this paper.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Credible, Optimal Auctions via Public Broadcast
Authors:
Tarun Chitra,
Matheus V. X. Ferreira,
Kshitij Kulkarni
Abstract:
We study auction design in a setting where agents can communicate over a censorship-resistant broadcast channel like the ones we can implement over a public blockchain. We seek to design credible, strategyproof auctions in a model that differs from the traditional mechanism design framework because communication is not centralized via the auctioneer. We prove this allows us to design a larger clas…
▽ More
We study auction design in a setting where agents can communicate over a censorship-resistant broadcast channel like the ones we can implement over a public blockchain. We seek to design credible, strategyproof auctions in a model that differs from the traditional mechanism design framework because communication is not centralized via the auctioneer. We prove this allows us to design a larger class of credible auctions where the auctioneer has no incentive to be strategic. Intuitively, a decentralized communication model weakens the auctioneer's adversarial capabilities because they can only inject messages into the communication channel but not delete, delay, or modify the messages from legitimate buyers. Our main result is a separation in the following sense: we give the first instance of an auction that is credible only if communication is decentralized. Moreover, we construct the first two-round auction that is credible, strategyproof, and optimal when bidder valuations are $α$-strongly regular, for $α> 0$. Our result relies on mild assumptions -- namely, the existence of a broadcast channel and cryptographic commitments.
△ Less
Submitted 3 September, 2024; v1 submitted 29 January, 2023;
originally announced January 2023.
-
The prevalence and influence of circumstellar material around hydrogen-rich supernova progenitors
Authors:
Rachel J. Bruch,
Avishay Gal-Yam,
Ofer Yaron,
Ping Chen,
Nora L. Strotjohann,
Ido Irani,
Erez Zimmerman,
Steve Schulze,
Yi Yang,
Young-Lo Kim,
Mattia Bulla,
Jesper Sollerman,
Mickael Rigault,
Eran Ofek,
Maayane Soumagnac,
Frank J. Masci,
Christoffer Fremling,
Daniel Perley,
Jakob Nordin,
S. Bradley Cenko,
Anna Y. Q. Ho,
S. Adams,
Igor Adreoni,
Eric C. Bellm,
Nadia Blagorodnova
, et al. (22 additional authors not shown)
Abstract:
Narrow transient emission lines (flash-ionization features) in early supernova (SN) spectra trace the presence of circumstellar material (CSM) around the massive progenitor stars of core-collapse SNe. The lines disappear within days after the SN explosion, suggesting that this material is spatially confined, and originates from enhanced mass loss shortly (months to a few years) prior to explosion.…
▽ More
Narrow transient emission lines (flash-ionization features) in early supernova (SN) spectra trace the presence of circumstellar material (CSM) around the massive progenitor stars of core-collapse SNe. The lines disappear within days after the SN explosion, suggesting that this material is spatially confined, and originates from enhanced mass loss shortly (months to a few years) prior to explosion. We performed a systematic survey of H-rich (Type II) SNe discovered within less than two days from explosion during the first phase of the Zwicky Transient Facility (ZTF) survey (2018-2020), finding thirty events for which a first spectrum was obtained within $< 2$ days from explosion. The measured fraction of events showing flash ionisation features ($>36\%$ at $95\%$ confidence level) confirms that elevated mass loss in massive stars prior to SN explosion is common. We find that SNe II showing flash ionisation features are not significantly brighter, nor bluer, nor more slowly rising than those without. This implies that CSM interaction does not contribute significantly to their early continuum emission, and that the CSM is likely optically thin. We measured the persistence duration of flash ionisation emission and find that most SNe show flash features for $\approx 5 $ days. Rarer events, with persistence timescales $>10$ days, are brighter and rise longer, suggesting these may be intermediate between regular SNe II and strongly-interacting SNe IIn.
△ Less
Submitted 13 December, 2022; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test
Authors:
Chih-Yuan Chiu,
Kshitij Kulkarni,
Shankar Sastry
Abstract:
Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probabilit…
▽ More
Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.
△ Less
Submitted 17 July, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
EraseNet: A Recurrent Residual Network for Supervised Document Cleaning
Authors:
Yashowardhan Shinde,
Kishore Kulkarni,
Sachin Kuberkar
Abstract:
Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This pape…
▽ More
Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.
△ Less
Submitted 4 July, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Towards a Theory of Maximal Extractable Value I: Constant Function Market Makers
Authors:
Kshitij Kulkarni,
Theo Diamandis,
Tarun Chitra
Abstract:
Maximal Extractable Value (MEV) refers to excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users' transactions to maximize fees or from inserting new transactions that front-run users' transactions. One of the most common types of MEV involves a `sandwich attack' against a user trading on a constant function marke…
▽ More
Maximal Extractable Value (MEV) refers to excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users' transactions to maximize fees or from inserting new transactions that front-run users' transactions. One of the most common types of MEV involves a `sandwich attack' against a user trading on a constant function market maker (CFMM), which is a popular class of automated market maker. We analyze game theoretic properties of MEV in CFMMs that we call \textit{routing} and \textit{reordering} MEV. In the case of routing, we present examples where the existence of MEV both degrades and, counterintuitively, \emph{improves} the quality of routing. We construct an analogue of the price of anarchy for this setting and demonstrate that if the impact of a sandwich attack is localized in a suitable sense, then the price of anarchy is constant. In the case of reordering, we show conditions when the maximum price impact caused by the reordering of sandwich attacks in a sequence of trades, relative to the average price, impact is $O(\log n)$ in the number of user trades. Combined, our results suggest methods that both MEV searchers and CFMM designers can utilize for estimating costs and profits of MEV.
△ Less
Submitted 30 April, 2023; v1 submitted 24 July, 2022;
originally announced July 2022.
-
GEMS: Scene Expansion using Generative Models of Graphs
Authors:
Rishi Agarwal,
Tirupati Saketh Chandra,
Vaidehi Patil,
Aniruddha Mahapatra,
Kuldeep Kulkarni,
Vishwa Vinay
Abstract:
Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by a…
▽ More
Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene graph expansion as a sequential prediction task involving multiple steps of first predicting a new node and then predicting the set of relationships between the newly predicted node and previous nodes in the graph. We propose a sequencing strategy for observed graphs that retains the clustering patterns amongst nodes. In addition, we leverage external knowledge to train our graph generation model, enabling greater generalization of node predictions. Due to the inefficiency of existing maximum mean discrepancy (MMD) based metrics for graph generation problems in evaluating predicted relationships between nodes (objects), we design novel metrics that comprehensively evaluate different aspects of predicted relations. We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs using the standard MMD-based metrics and our proposed metrics. We observe that the graphs generated by our method, GEMS, better represent the real distribution of the scene graphs than the baseline methods like GraphRNN.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Inducing Social Optimality in Games via Adaptive Incentive Design
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Manxi Wu,
Shankar Sastry
Abstract:
How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social pl…
▽ More
How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social planner updates the incentive mechanism at a slower timescale. In particular, the update of the incentive mechanism is based on each player's externality, which is evaluated as the difference between the player's marginal cost and the society's marginal cost in each time step. We show that any fixed point of our learning dynamics corresponds to the optimal incentive mechanism such that the corresponding Nash equilibrium also achieves social optimality. We also provide sufficient conditions for the learning dynamics to converge to a fixed point so that the adaptive incentive mechanism eventually induces a socially optimal outcome. Finally, we demonstrate that the sufficient conditions for convergence are satisfied in a variety of games, including (i) atomic networked quadratic aggregative games, (ii) atomic Cournot competition, and (iii) non-atomic network routing games.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Controllable Animation of Fluid Elements in Still Images
Authors:
Aniruddha Mahapatra,
Kuldeep Kulkarni
Abstract:
We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. Specifically, we focus on the animation of fluid elements like water, smoke, fire, which have the properties of repeating textures and continuous fluid motion. Taking inspiration from prior works, we represent the motion of such fluid elements in the image in the form of a constan…
▽ More
We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. Specifically, we focus on the animation of fluid elements like water, smoke, fire, which have the properties of repeating textures and continuous fluid motion. Taking inspiration from prior works, we represent the motion of such fluid elements in the image in the form of a constant 2D optical flow map. To this end, we allow the user to provide any number of arrow directions and their associated speeds along with a mask of the regions the user wants to animate. The user-provided input arrow directions, their corresponding speed values, and the mask are then converted into a dense flow map representing a constant optical flow map (FD). We observe that FD, obtained using simple exponential operations can closely approximate the plausible motion of elements in the image. We further refine computed dense optical flow map FD using a generative-adversarial network (GAN) to obtain a more realistic flow map. We devise a novel UNet based architecture to autoregressively generate future frames using the refined optical flow map by forward-warping the input image features at different resolutions. We conduct extensive experiments on a publicly available dataset and show that our method is superior to the baselines in terms of qualitative and quantitative metrics. In addition, we show the qualitative animations of the objects in directions that did not exist in the training set and provide a way to synthesize videos that otherwise would not exist in the real world.
△ Less
Submitted 25 September, 2023; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Hilbert's Irreducibility Theorem and Ideal Class Groups of Quadratic Fields
Authors:
Kaivalya Kulkarni,
Aaron Levin
Abstract:
We prove a version of Hilbert's Irreducibility Theorem in the quadratic case, giving a quantitative improvement to a result of Bilu-Gillibert in this restricted setting. As an application, we give improvements to several quantitative results counting quadratic fields with certain types of ideal class groups. The proof of the main theorem is based on a result of Stewart and Top on values of binary…
▽ More
We prove a version of Hilbert's Irreducibility Theorem in the quadratic case, giving a quantitative improvement to a result of Bilu-Gillibert in this restricted setting. As an application, we give improvements to several quantitative results counting quadratic fields with certain types of ideal class groups. The proof of the main theorem is based on a result of Stewart and Top on values of binary forms modulo squares.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Dynamic Tolling for Inducing Socially Optimal Traffic Loads
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Manxi Wu,
Shankar Sastry
Abstract:
How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of…
▽ More
How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of loads and tolls in our dynamics have three key features: (i) The total demand of incoming and outgoing travelers is stochastically realized; (ii) Travelers are myopic and selfish in that they choose routes according to a perturbed best response given the current latency and tolls on parallel links; (iii) The update of tolls is at a slower timescale as compared to the the update of loads. We show that the loads and the tolls eventually concentrate in a neighborhood of the fixed point, which corresponds to the socially optimal load and toll price. Moreover, the fixed point load is also a stochastic user equilibrium with respect to the toll price. Our results are useful for traffic authorities to efficiently manage traffic loads in response to the arrival and departure of travelers.
△ Less
Submitted 17 October, 2021;
originally announced October 2021.
-
SemIE: Semantically-aware Image Extrapolation
Authors:
Bholeshwar Khurana,
Soumya Ranjan Dash,
Abhishek Bhatia,
Aniruddha Mahapatra,
Hrituraj Singh,
Kuldeep Kulkarni
Abstract:
We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the ex…
▽ More
We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the extended region based on the context. To this end, for a given image, we first obtain an object segmentation map using a state-of-the-art semantic segmentation method. The, thus, obtained segmentation map is fed into a network to compute the extrapolated semantic segmentation and the corresponding panoptic segmentation maps. The input image and the obtained segmentation maps are further utilized to generate the final extrapolated image. We conduct experiments on Cityscapes and ADE20K-bedroom datasets and show that our method outperforms all baselines in terms of FID, and similarity in object co-occurrence statistics.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation
Authors:
Kaustubh Milind Kulkarni,
Sucheth Shenoy
Abstract:
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation perfor…
▽ More
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes with a validation accuracy of 99.37%. Moreover, the neural network generalizes well over the data of a player excluded from the training and validation dataset, classifying the fresh strokes with an overall best accuracy of 98.72%. Various model architectures using machine learning and deep learning based approaches have been trained for stroke recognition and their performances have been compared and benchmarked. Inferences such as performance monitoring and stroke comparison of the players using the model have been discussed. Therefore, we are contributing to the development of a computer vision based sports analytics system for the sport of table tennis that focuses on the previously unexploited aspect of the sport i.e., a player's strokes, which is extremely insightful for performance improvement.
△ Less
Submitted 31 May, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.
-
Social Choice with Changing Preferences: Representation Theorems and Long-Run Policies
Authors:
Kshitij Kulkarni,
Sven Neth
Abstract:
We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide…
▽ More
We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide an axiomatic characterization of MDP reward functions that agree with the Utilitarianism social welfare functionals of social choice theory. We also provide discussion of cases when the implementation of social choice-theoretic axioms may fail to lead to long-run optimal outcomes.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Diverse R-PPG: Camera-Based Heart Rate Estimation for Diverse Subject Skin-Tones and Scenes
Authors:
Pradyumna Chari,
Krish Kabra,
Doruk Karinca,
Soumyarup Lahiri,
Diplav Srivastava,
Kimaya Kulkarni,
Tianyuan Chen,
Maxime Cannesson,
Laleh Jalilian,
Achuta Kadambi
Abstract:
Heart rate (HR) is an essential clinical measure for the assessment of cardiorespiratory instability. Since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease, there is a pressing need to deploy contactless HR sensing solutions for high-quality telemedicine evaluations. Existing computer vision methods that estimate HR from facial videos exhibit biased…
▽ More
Heart rate (HR) is an essential clinical measure for the assessment of cardiorespiratory instability. Since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease, there is a pressing need to deploy contactless HR sensing solutions for high-quality telemedicine evaluations. Existing computer vision methods that estimate HR from facial videos exhibit biased performance against dark skin tones. We present a novel physics-driven algorithm that boosts performance on darker skin tones in our reported data. We assess the performance of our method through the creation of the first telemedicine-focused remote vital signs dataset, the VITAL dataset. 432 videos (~864 minutes) of 54 subjects with diverse skin tones are recorded under realistic scene conditions with corresponding vital sign data. Our method reduces errors due to lighting changes, shadows, and specular highlights and imparts unbiased performance gains across skin tones, setting the stage for making medically inclusive non-contact HR sensing technologies a viable reality for patients of all skin tones.
△ Less
Submitted 9 December, 2020; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Hyper-local sustainable assortment planning
Authors:
Nupur Aggarwal,
Abhishek Bansal,
Kushagra Manglik,
Kedar Kulkarni,
Vikas Raykar
Abstract:
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimi…
▽ More
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimization approach, that yields a Pareto-front of optimal assortments for merchandisers to choose from. Using the proposed approach on a few product categories of a leading fashion retailer shows that choosing assortments with lower environmental impact with a minimal impact on revenue is possible.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships
Authors:
Kuldeep Kulkarni,
Tejas Gokhale,
Rajhans Singh,
Pavan Turaga,
Aswin Sankaranarayanan
Abstract:
Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially unde…
▽ More
Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially under uncertainty due to weather, occlusion, or noise. On the other hand, algorithms that can synthesize images from sparse labelmaps or sketches are highly desirable as tools that can guide content creators and artists to quickly generate scenes by simply specifying locations of a few objects. In this paper, we address the problem of complex scene completion from sparse labelmaps. Under this setting, very few details about the scene (30\% of object instances) are available as input for image synthesis. We propose a two-stage deep network based method, called `Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap. The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image. The proposed method is evaluated on the Cityscapes dataset and it outperforms two baselines methods on performance metrics like Fréchet Inception Distance (FID), semantic segmentation accuracy, and similarity in object co-occurrences. We also show qualitative results on a subset of ADE20K dataset that contains bedroom images.
△ Less
Submitted 20 May, 2021; v1 submitted 18 April, 2020;
originally announced April 2020.