-
DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers
Authors:
Navve Wasserman,
Oliver Heinimann,
Yuval Golbari,
Tal Zimbalist,
Eli Schwartz,
Michal Irani
Abstract:
Rerankers play a critical role in multimodal Retrieval-Augmented Generation (RAG) by refining ranking of an initial set of retrieved documents. Rerankers are typically trained using hard negative mining, whose goal is to select pages for each query which rank high, but are actually irrelevant. However, this selection process is typically passive and restricted to what the retriever can find in the…
▽ More
Rerankers play a critical role in multimodal Retrieval-Augmented Generation (RAG) by refining ranking of an initial set of retrieved documents. Rerankers are typically trained using hard negative mining, whose goal is to select pages for each query which rank high, but are actually irrelevant. However, this selection process is typically passive and restricted to what the retriever can find in the available corpus, leading to several inherent limitations. These include: limited diversity, negative examples which are often not hard enough, low controllability, and frequent false negatives which harm training. Our paper proposes an alternative approach: Single-Page Hard Negative Query Generation, which goes the other way around. Instead of retrieving negative pages per query, we generate hard negative queries per page. Using an automated LLM-VLM pipeline, and given a page and its positive query, we create hard negatives by rephrasing the query to be as similar as possible in form and context, yet not answerable from the page. This paradigm enables fine-grained control over the generated queries, resulting in diverse, hard, and targeted negatives. It also supports efficient false negative verification. Our experiments show that rerankers trained with data generated using our approach outperform existing models and significantly improve retrieval performance.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
KernelFusion: Assumption-Free Blind Super-Resolution via Patch Diffusion
Authors:
Oliver Heinimann,
Assaf Shocher,
Tal Zimbalist,
Michal Irani
Abstract:
Traditional super-resolution (SR) methods assume an ``ideal'' downscaling SR-kernel (e.g., bicubic downscaling) between the high-resolution (HR) image and the low-resolution (LR) image. Such methods fail once the LR images are generated differently. Current blind-SR methods aim to remove this assumption, but are still fundamentally restricted to rather simplistic downscaling SR-kernels (e.g., anis…
▽ More
Traditional super-resolution (SR) methods assume an ``ideal'' downscaling SR-kernel (e.g., bicubic downscaling) between the high-resolution (HR) image and the low-resolution (LR) image. Such methods fail once the LR images are generated differently. Current blind-SR methods aim to remove this assumption, but are still fundamentally restricted to rather simplistic downscaling SR-kernels (e.g., anisotropic Gaussian kernels), and fail on more complex (out of distribution) downscaling degradations. However, using the correct SR-kernel is often more important than using a sophisticated SR algorithm. In ``KernelFusion'' we introduce a zero-shot diffusion-based method that makes no assumptions about the kernel. Our method recovers the unique image-specific SR-kernel directly from the LR input image, while simultaneously recovering its corresponding HR image. KernelFusion exploits the principle that the correct SR-kernel is the one that maximizes patch similarity across different scales of the LR image. We first train an image-specific patch-based diffusion model on the single LR input image, capturing its unique internal patch statistics. We then reconstruct a larger HR image with the same learned patch distribution, while simultaneously recovering the correct downscaling SR-kernel that maintains this cross-scale relation between the HR and LR images. Empirical results show that KernelFusion vastly outperforms all SR baselines on complex downscaling degradations, where existing SotA Blind-SR methods fail miserably. By breaking free from predefined kernel assumptions, KernelFusion pushes Blind-SR into a new assumption-free paradigm, handling downscaling kernels previously thought impossible.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Detecting Bone Lesions in X-Ray Under Diverse Acquisition Conditions
Authors:
Tal Zimbalist,
Ronnie Rosen,
Keren Peri-Hanania,
Yaron Caspi,
Bar Rinott,
Carmel Zeltser-Dekel,
Eyal Bercovich,
Yonina C. Eldar,
Shai Bagon
Abstract:
The diagnosis of primary bone tumors is challenging, as the initial complaints are often non-specific. Early detection of bone cancer is crucial for a favorable prognosis. Incidentally, lesions may be found on radiographs obtained for other reasons. However, these early indications are often missed. In this work, we propose an automatic algorithm to detect bone lesions in conventional radiographs…
▽ More
The diagnosis of primary bone tumors is challenging, as the initial complaints are often non-specific. Early detection of bone cancer is crucial for a favorable prognosis. Incidentally, lesions may be found on radiographs obtained for other reasons. However, these early indications are often missed. In this work, we propose an automatic algorithm to detect bone lesions in conventional radiographs to facilitate early diagnosis. Detecting lesions in such radiographs is challenging: first, the prevalence of bone cancer is very low; any method must show high precision to avoid a prohibitive number of false alarms. Second, radiographs taken in health maintenance organizations (HMOs) or emergency departments (EDs) suffer from inherent diversity due to different X-ray machines, technicians and imaging protocols. This diversity poses a major challenge to any automatic analysis method. We propose to train an off-the-shelf object detection algorithm to detect lesions in radiographs. The novelty of our approach stems from a dedicated preprocessing stage that directly addresses the diversity of the data. The preprocessing consists of self-supervised region-of-interest detection using vision transformer (ViT), and a foreground-based histogram equalization for contrast enhancement to relevant regions only. We evaluate our method via a retrospective study that analyzes bone tumors on radiographs acquired from January 2003 to December 2018 under diverse acquisition protocols. Our method obtains 82.43% sensitivity at 1.5% false-positive rate and surpasses existing preprocessing methods. For lesion detection, our method achieves 82.5% accuracy and an IoU of 0.69. The proposed preprocessing method enables to effectively cope with the inherent diversity of radiographs acquired in HMOs and EDs.
△ Less
Submitted 21 March, 2024; v1 submitted 15 December, 2022;
originally announced December 2022.