-
On the Merits of LLM-Based Corpus Enrichment
Authors:
Gal Zur,
Tommy Mordo,
Moshe Tennenholtz,
Oren Kurland
Abstract:
Generative AI (genAI) technologies -- specifically, large language models (LLMs) -- and search have evolving relations. We argue for a novel perspective: using genAI to enrich a document corpus so as to improve query-based retrieval effectiveness. The enrichment is based on modifying existing documents or generating new ones. As an empirical proof of concept, we use LLMs to generate documents rele…
▽ More
Generative AI (genAI) technologies -- specifically, large language models (LLMs) -- and search have evolving relations. We argue for a novel perspective: using genAI to enrich a document corpus so as to improve query-based retrieval effectiveness. The enrichment is based on modifying existing documents or generating new ones. As an empirical proof of concept, we use LLMs to generate documents relevant to a topic which are more retrievable than existing ones. In addition, we demonstrate the potential merits of using corpus enrichment for retrieval augmented generation (RAG) and answer attribution in question answering.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
CSP: A Simulator For Multi-Agent Ranking Competitions
Authors:
Tommy Mordo,
Tomer Kordonsky,
Haya Nachimovsky,
Moshe Tennenholtz,
Oren Kurland
Abstract:
In ranking competitions, document authors compete for the highest rankings by modifying their content in response to past rankings. Previous studies focused on human participants, primarily students, in controlled settings. The rise of generative AI, particularly Large Language Models (LLMs), introduces a new paradigm: using LLMs as document authors. This approach addresses scalability constraints…
▽ More
In ranking competitions, document authors compete for the highest rankings by modifying their content in response to past rankings. Previous studies focused on human participants, primarily students, in controlled settings. The rise of generative AI, particularly Large Language Models (LLMs), introduces a new paradigm: using LLMs as document authors. This approach addresses scalability constraints in human-based competitions and reflects the growing role of LLM-generated content on the web-a prime example of ranking competition. We introduce a highly configurable ranking competition simulator that leverages LLMs as document authors. It includes analytical tools to examine the resulting datasets. We demonstrate its capabilities by generating multiple datasets and conducting an extensive analysis. Our code and datasets are publicly available for research.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
White Hat Search Engine Optimization using Large Language Models
Authors:
Niv Bardas,
Tommy Mordo,
Oren Kurland,
Moshe Tennenholtz,
Gal Zur
Abstract:
We present novel white-hat search engine optimization techniques based on genAI and demonstrate their empirical merits.
We present novel white-hat search engine optimization techniques based on genAI and demonstrate their empirical merits.
△ Less
Submitted 23 February, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Search results diversification in competitive search
Authors:
Tommy Mordo,
Itamar Reinman,
Moshe Tennenholtz,
Oren Kurland
Abstract:
In Web retrieval, there are many cases of competition between authors of Web documents: their incentive is to have their documents highly ranked for queries of interest. As such, the Web is a prominent example of a competitive search setting. Past work on competitive search focused on ranking functions based solely on relevance estimation. We study ranking functions that integrate a results-divers…
▽ More
In Web retrieval, there are many cases of competition between authors of Web documents: their incentive is to have their documents highly ranked for queries of interest. As such, the Web is a prominent example of a competitive search setting. Past work on competitive search focused on ranking functions based solely on relevance estimation. We study ranking functions that integrate a results-diversification aspect. We show that the competitive search setting with diversity-based ranking has an equilibrium. Furthermore, we theoretically and empirically show that the phenomenon of authors mimicking content in documents highly ranked in the past, which was demonstrated in previous work, is mitigated when search results diversification is applied.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
Sponsored Question Answering
Authors:
Tommy Mordo,
Moshe Tennenholtz,
Oren Kurland
Abstract:
The potential move from search to question answering (QA) ignited the question of how should the move from sponsored search to sponsored QA look like. We present the first formal analysis of a sponsored QA platform. The platform fuses an organic answer to a question with an ad to produce a so called {\em sponsored answer}. Advertisers then bid on their sponsored answers. Inspired by Generalized Se…
▽ More
The potential move from search to question answering (QA) ignited the question of how should the move from sponsored search to sponsored QA look like. We present the first formal analysis of a sponsored QA platform. The platform fuses an organic answer to a question with an ad to produce a so called {\em sponsored answer}. Advertisers then bid on their sponsored answers. Inspired by Generalized Second Price Auctions (GSPs), the QA platform selects the winning advertiser, sets the payment she pays, and shows the user the sponsored answer. We prove an array of results. For example, advertisers are incentivized to be truthful in their bids; i.e., set them to their true value of the sponsored answer. The resultant setting is stable with properties of VCG auctions.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.