Search | arXiv e-print repository

The Design of an LLM-powered Unstructured Analytics System

Authors: Eric Anderson, Jonathan Fritz, Austin Lee, Bohou Li, Mark Lindblad, Henry Lindeman, Alex Meyer, Parth Parmar, Tanvi Ranade, Mehul A. Shah, Benjamin Sowell, Dan Tecuci, Vinayak Thapliyal, Matt Welsh

Abstract: LLMs demonstrate an uncanny ability to process unstructured data, and as such, have the potential to go beyond search and run complex, semantic analyses at scale. We describe the design of an unstructured analytics system, Aryn, and the tenets and use cases that motivate its design. With Aryn, users specify queries in natural language and the system automatically determines a semantic plan and exe… ▽ More LLMs demonstrate an uncanny ability to process unstructured data, and as such, have the potential to go beyond search and run complex, semantic analyses at scale. We describe the design of an unstructured analytics system, Aryn, and the tenets and use cases that motivate its design. With Aryn, users specify queries in natural language and the system automatically determines a semantic plan and executes it to compute an answer from a large collection of unstructured documents. At the core of Aryn is Sycamore, a declarative document processing engine, that provides a reliable distributed abstraction called DocSets. Sycamore allows users to analyze, enrich, and transform complex documents at scale. Aryn includes Luna, a query planner that translates natural language queries to Sycamore scripts, and DocParse, which takes raw PDFs and document images, and converts them to DocSets for downstream processing. We show how these pieces come together to achieve better accuracy than RAG on analytics queries over real world reports from the National Transportation Safety Board (NTSB). Also, given current limitations of LLMs, we argue that an analytics system must provide explainability to be practical, and show how Aryn's user interface does this to help build trust. △ Less

Submitted 28 December, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

Comments: Included in the proceedings of The Conference on Innovative Data Systems Research (CIDR) 2025

arXiv:2103.15246 [pdf, ps, other]

On Arroyo-Figueroa's Proof that $\mathrm{P} \neq \mathrm{NP}$

Authors: Mandar Juvekar, David E. Narváez, Melissa Welsh

Abstract: We critique Javier Arroyo-Figueroa's paper titled ``The existence of the Tau one-way functions class as a proof that $\mathrm{P} \neq \mathrm{NP}$,'' which claims to prove $\mathrm{P} \neq \mathrm{NP}$ by showing the existence of a class of one-way functions. We summarize our best interpretation of Arroyo-Figueroa's argument, and show why it fails to prove the existence of one-way functions. Hence… ▽ More We critique Javier Arroyo-Figueroa's paper titled ``The existence of the Tau one-way functions class as a proof that $\mathrm{P} \neq \mathrm{NP}$,'' which claims to prove $\mathrm{P} \neq \mathrm{NP}$ by showing the existence of a class of one-way functions. We summarize our best interpretation of Arroyo-Figueroa's argument, and show why it fails to prove the existence of one-way functions. Hence, we show that Arroyo-Figueroa fails to prove $\mathrm{P} \neq \mathrm{NP}$. △ Less

Submitted 28 March, 2021; originally announced March 2021.

Comments: 5 pages

arXiv:2102.08054 [pdf, other]

Could you become more credible by being White? Assessing Impact of Race on Credibility with Deepfakes

Authors: Kurtis Haut, Caleb Wohn, Victor Antony, Aidan Goldfarb, Melissa Welsh, Dillanie Sumanthiran, Ji-ze Jang, Md. Rafayet Ali, Ehsan Hoque

Abstract: Computer mediated conversations (e.g., videoconferencing) is now the new mainstream media. How would credibility be impacted if one could change their race on the fly in these environments? We propose an approach using Deepfakes and a supporting GAN architecture to isolate visual features and alter racial perception. We then crowd-sourced over 800 survey responses to measure how credibility was in… ▽ More Computer mediated conversations (e.g., videoconferencing) is now the new mainstream media. How would credibility be impacted if one could change their race on the fly in these environments? We propose an approach using Deepfakes and a supporting GAN architecture to isolate visual features and alter racial perception. We then crowd-sourced over 800 survey responses to measure how credibility was influenced by changing the perceived race. We evaluate the effect of showing a still image of a Black person versus a still image of a White person using the same audio clip for each survey. We also test the effect of showing either an original video or an altered video where the appearance of the person in the original video is modified to appear more White. We measure credibility as the percent of participant responses who believed the speaker was telling the truth. We found that changing the race of a person in a static image has negligible impact on credibility. However, the same manipulation of race on a video increases credibility significantly (61\% to 73\% with p $<$ 0.05). Furthermore, a VADER sentiment analysis over the free response survey questions reveals that more positive sentiment is used to justify the credibility of a White individual in a video. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: 10 pages, 5 figures

Showing 1–3 of 3 results for author: Welsh, M