-
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
Authors:
Bingchen Liu,
Ehsan Akhgari,
Alexander Visheratin,
Aleks Kamko,
Linmiao Xu,
Shivam Shrirao,
Chase Lambert,
Joao Souza,
Suhail Doshi,
Daiqing Li
Abstract:
We introduce Playground v3 (PGv3), our latest text-to-image model that achieves state-of-the-art (SoTA) performance across multiple testing benchmarks, excels in graphic design abilities and introduces new capabilities. Unlike traditional text-to-image generative models that rely on pre-trained language models like T5 or CLIP text encoders, our approach fully integrates Large Language Models (LLMs…
▽ More
We introduce Playground v3 (PGv3), our latest text-to-image model that achieves state-of-the-art (SoTA) performance across multiple testing benchmarks, excels in graphic design abilities and introduces new capabilities. Unlike traditional text-to-image generative models that rely on pre-trained language models like T5 or CLIP text encoders, our approach fully integrates Large Language Models (LLMs) with a novel structure that leverages text conditions exclusively from a decoder-only LLM. Additionally, to enhance image captioning quality-we developed an in-house captioner, capable of generating captions with varying levels of detail, enriching the diversity of text structures. We also introduce a new benchmark CapsBench to evaluate detailed image captioning performance. Experimental results demonstrate that PGv3 excels in text prompt adherence, complex reasoning, and accurate text rendering. User preference studies indicate the super-human graphic design ability of our model for common design applications, such as stickers, posters, and logo designs. Furthermore, PGv3 introduces new capabilities, including precise RGB color control and robust multilingual understanding.
△ Less
Submitted 21 October, 2024; v1 submitted 16 September, 2024;
originally announced September 2024.
-
A Bird-Eye view on DNA Storage Simulators
Authors:
Sanket Doshi,
Mihir Gohel,
Manish K. Gupta
Abstract:
In the current world due to the huge demand for storage, DNA-based storage solution sounds quite promising because of their longevity, low power consumption, and high capacity. However in real life storing data in the form of DNA is quite expensive, and challenging. Therefore researchers and developers develop such kind of software that helps simulate real-life DNA storage without worrying about t…
▽ More
In the current world due to the huge demand for storage, DNA-based storage solution sounds quite promising because of their longevity, low power consumption, and high capacity. However in real life storing data in the form of DNA is quite expensive, and challenging. Therefore researchers and developers develop such kind of software that helps simulate real-life DNA storage without worrying about the cost. This paper aims to review some of the software that performs DNA storage simulations in different domains. The paper also explains the core concepts such as synthesis, sequencing, clustering, reconstruction, GC window, K-mer window, etc and some overview on existing algorithms. Further, we present 3 different softwares on the basis of domain, implementation techniques, and customer/commercial usability.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
Authors:
Daiqing Li,
Aleks Kamko,
Ehsan Akhgari,
Ali Sabet,
Linmiao Xu,
Suhail Doshi
Abstract:
In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-image generative models. We focus on three critical aspects for model improvement: enhancing color and contrast, improving generation across multiple aspect ratios, and improving human-centric fine details. First, we delve into the significance of the noise schedule in training a diffusion model, demo…
▽ More
In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-image generative models. We focus on three critical aspects for model improvement: enhancing color and contrast, improving generation across multiple aspect ratios, and improving human-centric fine details. First, we delve into the significance of the noise schedule in training a diffusion model, demonstrating its profound impact on realism and visual fidelity. Second, we address the challenge of accommodating various aspect ratios in image generation, emphasizing the importance of preparing a balanced bucketed dataset. Lastly, we investigate the crucial role of aligning model outputs with human preferences, ensuring that generated images resonate with human perceptual expectations. Through extensive analysis and experiments, Playground v2.5 demonstrates state-of-the-art performance in terms of aesthetic quality under various conditions and aspect ratios, outperforming both widely-used open-source models like SDXL and Playground v2, and closed-source commercial systems such as DALLE 3 and Midjourney v5.2. Our model is open-source, and we hope the development of Playground v2.5 provides valuable guidelines for researchers aiming to elevate the aesthetic quality of diffusion-based image generation models.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Immersive Virtual Reality and Robotics for Upper Extremity Rehabilitation
Authors:
Vuthea Chheang,
Rakshith Lokesh,
Amit Chaudhari,
Qile Wang,
Lauren Baron,
Behdokht Kiafar,
Sagar Doshi,
Erik Thostenson,
Joshua Cashaback,
Roghayeh Leila Barmaki
Abstract:
Stroke patients often experience upper limb impairments that restrict their mobility and daily activities. Physical therapy (PT) is the most effective method to improve impairments, but low patient adherence and participation in PT exercises pose significant challenges. To overcome these barriers, a combination of virtual reality (VR) and robotics in PT is promising. However, few systems effective…
▽ More
Stroke patients often experience upper limb impairments that restrict their mobility and daily activities. Physical therapy (PT) is the most effective method to improve impairments, but low patient adherence and participation in PT exercises pose significant challenges. To overcome these barriers, a combination of virtual reality (VR) and robotics in PT is promising. However, few systems effectively integrate VR with robotics, especially for upper limb rehabilitation. This work introduces a new virtual rehabilitation solution that combines VR with robotics and a wearable sensor to analyze elbow joint movements. The framework also enhances the capabilities of a traditional robotic device (KinArm) used for motor dysfunction assessment and rehabilitation. A pilot user study (n = 16) was conducted to evaluate the effectiveness and usability of the proposed VR framework. We used a two-way repeated measures experimental design where participants performed two tasks (Circle and Diamond) with two conditions (VR and VR KinArm). We observed no significant differences in the main effect of conditions for task completion time. However, there were significant differences in both the normalized number of mistakes and recorded elbow joint angles (captured as resistance change values from the wearable sleeve sensor) between the Circle and Diamond tasks. Additionally, we report the system usability, task load, and presence in the proposed VR framework. This system demonstrates the potential advantages of an immersive, multi-sensory approach and provides future avenues for research in developing more cost-effective, tailored, and personalized upper limb solutions for home therapy applications.
△ Less
Submitted 29 June, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Learning to Precode for Integrated Sensing and Communications Systems
Authors:
R. S. Prasobh Sankar,
Sidharth S. Nair,
Siddhant Doshi,
Sundeep Prabhakar Chepuri
Abstract:
In this paper, we present an unsupervised learning neural model to design transmit precoders for integrated sensing and communication (ISAC) systems to maximize the worst-case target illumination power while ensuring a minimum signal-to-interference-plus-noise ratio (SINR) for all the users. The problem of learning transmit precoders from uplink pilots and echoes can be viewed as a parameterized f…
▽ More
In this paper, we present an unsupervised learning neural model to design transmit precoders for integrated sensing and communication (ISAC) systems to maximize the worst-case target illumination power while ensuring a minimum signal-to-interference-plus-noise ratio (SINR) for all the users. The problem of learning transmit precoders from uplink pilots and echoes can be viewed as a parameterized function estimation problem and we propose to learn this function using a neural network model. To learn the neural network parameters, we develop a novel loss function based on the first-order optimality conditions to incorporate the SINR and power constraints. Through numerical simulations, we demonstrate that the proposed method outperforms traditional optimization-based methods in presence of channel estimation errors while incurring lesser computational complexity and generalizing well across different channel conditions that were not shown during training.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Authors:
Yizhong Wang,
Swaroop Mishra,
Pegah Alipoormolabashi,
Yeganeh Kordi,
Amirreza Mirzaei,
Anjana Arunkumar,
Arjun Ashok,
Arut Selvan Dhanasekaran,
Atharva Naik,
David Stap,
Eshaan Pathak,
Giannis Karamanolakis,
Haizhi Gary Lai,
Ishan Purohit,
Ishani Mondal,
Jacob Anderson,
Kirby Kuznia,
Krima Doshi,
Maitreya Patel,
Kuntal Kumar Pal,
Mehrad Moradshahi,
Mihir Parmar,
Mirali Purohit,
Neeraj Varshney,
Phani Rohitha Kaza
, et al. (15 additional authors not shown)
Abstract:
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting,…
▽ More
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.
△ Less
Submitted 24 October, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Graph Neural Networks with Parallel Neighborhood Aggregations for Graph Classification
Authors:
Siddhant Doshi,
Sundeep Prabhakar Chepuri
Abstract:
We focus on graph classification using a graph neural network (GNN) model that precomputes the node features using a bank of neighborhood aggregation graph operators arranged in parallel. These GNN models have a natural advantage of reduced training and inference time due to the precomputations but are also fundamentally different from popular GNN variants that update node features through a seque…
▽ More
We focus on graph classification using a graph neural network (GNN) model that precomputes the node features using a bank of neighborhood aggregation graph operators arranged in parallel. These GNN models have a natural advantage of reduced training and inference time due to the precomputations but are also fundamentally different from popular GNN variants that update node features through a sequential neighborhood aggregation procedure during training. We provide theoretical conditions under which a generic GNN model with parallel neighborhood aggregations (PA-GNNs, in short) are provably as powerful as the well-known Weisfeiler-Lehman (WL) graph isomorphism test in discriminating non-isomorphic graphs. Although PA-GNN models do not have an apparent relationship with the WL test, we show that the graph embeddings obtained from these two methods are injectively related. We then propose a specialized PA-GNN model, called SPIN, which obeys the developed conditions. We demonstrate via numerical experiments that the developed model achieves state-of-the-art performance on many diverse real-world datasets while maintaining the discriminative power of the WL test and the computational advantage of preprocessing graphs before the training process.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Dr-COVID: Graph Neural Networks for SARS-CoV-2 Drug Repurposing
Authors:
Siddhant Doshi,
Sundeep Prabhakar Chepuri
Abstract:
The 2019 novel coronavirus (SARS-CoV-2) pandemic has resulted in more than a million deaths, high morbidities, and economic distress worldwide. There is an urgent need to identify medications that would treat and prevent novel diseases like the 2019 coronavirus disease (COVID-19). Drug repurposing is a promising strategy to discover new medical indications of the existing approved drugs due to sev…
▽ More
The 2019 novel coronavirus (SARS-CoV-2) pandemic has resulted in more than a million deaths, high morbidities, and economic distress worldwide. There is an urgent need to identify medications that would treat and prevent novel diseases like the 2019 coronavirus disease (COVID-19). Drug repurposing is a promising strategy to discover new medical indications of the existing approved drugs due to several advantages in terms of the costs, safety factors, and quick results compared to new drug design and discovery. In this work, we explore computational data-driven methods for drug repurposing and propose a dedicated graph neural network (GNN) based drug repurposing model, called Dr-COVID. Although we analyze the predicted drugs in detail for COVID-19, the model is generic and can be used for any novel diseases. We construct a four-layered heterogeneous graph to model the complex interactions between drugs, diseases, genes, and anatomies. We pose drug repurposing as a link prediction problem. Specifically, we design an encoder based on the scalable inceptive graph neural network (SIGN) to generate embeddings for all the nodes in the four-layered graph and propose a quadratic norm scorer as a decoder to predict treatment for a disease. We provide a detailed analysis of the 150 potential drugs (such as Dexamethasone, Ivermectin) predicted by Dr-COVID for COVID-19 from different pharmacological classes (e.g., corticosteroids, antivirals, antiparasitic). Out of these 150 drugs, 46 drugs are currently in clinical trials. Dr-COVID is evaluated in terms of its prediction performance and its ability to rank the known treatment drugs for diseases as high as possible. For a majority of the diseases, Dr-COVID ranks the actual treatment drug in the top 15.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Human-centered manipulation and navigation with Robot DE NIRO
Authors:
Fabian Falck,
Sagar Doshi,
Nico Smuts,
John Lingi,
Kim Rants,
Petar Kormushev
Abstract:
Social assistance robots in health and elderly care have the potential to support and ease human lives. Given the macrosocial trends of aging and long-lived populations, robotics-based care research mainly focused on helping the elderly live independently. In this paper, we introduce Robot DE NIRO, a research platform that aims to support the supporter (the caregiver) and also offers direct human-…
▽ More
Social assistance robots in health and elderly care have the potential to support and ease human lives. Given the macrosocial trends of aging and long-lived populations, robotics-based care research mainly focused on helping the elderly live independently. In this paper, we introduce Robot DE NIRO, a research platform that aims to support the supporter (the caregiver) and also offers direct human-robot interaction for the care recipient. Augmented by several sensors, DE NIRO is capable of complex manipulation tasks. It reliably interacts with humans and can autonomously and swiftly navigate through dynamically changing environments. We describe preliminary experiments in a demonstrative scenario and discuss DE NIRO's design and capabilities. We put particular emphases on safe, human-centered interaction procedures implemented in both hardware and software, including collision avoidance in manipulation and navigation as well as an intuitive perception stack through speech and face recognition.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.