-
APE: Active Learning-based Tooling for Finding Informative Few-shot Examples for LLM-based Entity Matching
Authors:
Kun Qian,
Yisi Sang,
Farima Fatahi Bayat,
Anton Belyi,
Xianqi Chu,
Yash Govind,
Samira Khorshidi,
Rahul Khot,
Katherine Luna,
Azadeh Nikfarjam,
Xiaoguang Qi,
Fei Wu,
Xianhan Zhang,
Yunyao Li
Abstract:
Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstratio…
▽ More
Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstrations for LLMs is labor-intensive, frequently entailing sifting through an extensive search space. In this demonstration, we showcase a human-in-the-loop tool called APE (Active Prompt Engineering) designed for refining prompts through active learning. Drawing inspiration from active learning, APE iteratively selects the most ambiguous examples for human feedback, which will be transformed into few-shot examples within the prompt. The demo recording can be found with the submission or be viewed at https://youtu.be/OwQ6MQx53-Y.
△ Less
Submitted 29 July, 2024;
originally announced August 2024.
-
Open Domain Knowledge Extraction for Knowledge Graphs
Authors:
Kun Qian,
Anton Belyi,
Fei Wu,
Samira Khorshidi,
Azadeh Nikfarjam,
Rahul Khot,
Yisi Sang,
Katherine Luna,
Xianqi Chu,
Eric Choi,
Yash Govind,
Chloe Seivwright,
Yiwen Sun,
Ahmed Fakhry,
Theo Rekatsinas,
Ihab Ilyas,
Xiaoguang Qi,
Yunyao Li
Abstract:
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope…
▽ More
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.
△ Less
Submitted 30 October, 2023;
originally announced December 2023.
-
Toward a System Building Agenda for Data Integration
Authors:
AnHai Doan,
Adel Ardalan,
Jeffrey R. Ballard,
Sanjib Das,
Yash Govind,
Pradap Konda,
Han Li,
Erik Paulson,
Paul Suganthan G. C.,
Haojun Zhang
Abstract:
In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limi…
▽ More
In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the "pain points" of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges.
△ Less
Submitted 29 September, 2017;
originally announced October 2017.