Target Prompting for Information Extraction with Vision Language Model
Authors:
Dipankar Medhi
Abstract:
The recent trend in the Large Vision and Language model has brought a new change in how information extraction systems are built. VLMs have set a new benchmark with their State-of-the-art techniques in understanding documents and building question-answering systems across various industries. They are significantly better at generating text from document images and providing accurate answers to que…
▽ More
The recent trend in the Large Vision and Language model has brought a new change in how information extraction systems are built. VLMs have set a new benchmark with their State-of-the-art techniques in understanding documents and building question-answering systems across various industries. They are significantly better at generating text from document images and providing accurate answers to questions. However, there are still some challenges in effectively utilizing these models to build a precise conversational system. General prompting techniques used with large language models are often not suitable for these specially designed vision language models. The output generated by such generic input prompts is ordinary and may contain information gaps when compared with the actual content of the document. To obtain more accurate and specific answers, a well-targeted prompt is required by the vision language model, along with the document image. In this paper, a technique is discussed called Target prompting, which focuses on explicitly targeting parts of document images and generating related answers from those specific regions only. The paper also covers the evaluation of response for each prompting technique using different user queries and input prompts.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
Energy-Aware Aggregation of Dynamic Temporal Workload in Data Centers
Authors:
Haiyang Qian,
Fu Li,
Ravishankar Ravindran,
Deep Medhi
Abstract:
Data center providers seek to minimize their total cost of ownership (TCO), while power consumption has become a social concern. We present formulations to minimize server energy consumption and server cost under three different data center server setups (homogeneous, heterogeneous, and hybrid hetero-homogeneous clusters) with dynamic temporal workload. Our studies show that the homogeneous model…
▽ More
Data center providers seek to minimize their total cost of ownership (TCO), while power consumption has become a social concern. We present formulations to minimize server energy consumption and server cost under three different data center server setups (homogeneous, heterogeneous, and hybrid hetero-homogeneous clusters) with dynamic temporal workload. Our studies show that the homogeneous model significantly differs from the heterogeneous model in computational time (by an order of magnitude). To be able to compute optimal configurations in near real-time for large scale data centers, we propose two modes, aggregation by maximum and aggregation by mean. In addition, we propose two aggregation methods, static (periodic) aggregation and dynamic (aperiodic) aggregation. We found that in the aggregation by maximum mode, the dynamic aggregation resulted in cost savings of up to approximately 18% over the static aggregation. In the aggregation by mean mode, the dynamic aggregation by mean could save up to approximately 50% workload rearrangement compared to the static aggregation by mean mode. Overall, our methodology helps to understand the trade-off in energy-aware aggregation.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.