-
PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference
Authors:
Weisheng Jin,
Maojia Song,
Tej Deep Pala,
Yew Ken Chia,
Amir Zadeh,
Chuan Li,
Soujanya Poria
Abstract:
As large language models (LLMs) tackle increasingly complex tasks and longer documents, their computational and memory costs during inference become a major bottleneck. To address this, we propose PromptDistill, a novel, training-free method that improves inference efficiency while preserving generation quality. PromptDistill identifies and retains the most informative tokens by leveraging attenti…
▽ More
As large language models (LLMs) tackle increasingly complex tasks and longer documents, their computational and memory costs during inference become a major bottleneck. To address this, we propose PromptDistill, a novel, training-free method that improves inference efficiency while preserving generation quality. PromptDistill identifies and retains the most informative tokens by leveraging attention interactions in early layers, preserving their hidden states while reducing the computational burden in later layers. This allows the model to focus on essential contextual information without fully processing all tokens. Unlike previous methods such as H2O and SnapKV, which perform compression only after processing the entire input, or GemFilter, which selects a fixed portion of the initial prompt without considering contextual dependencies, PromptDistill dynamically allocates computational resources to the most relevant tokens while maintaining a global awareness of the input. Experiments using our method and baseline approaches with base models such as LLaMA 3.1 8B Instruct, Phi 3.5 Mini Instruct, and Qwen2 7B Instruct on benchmarks including LongBench, InfBench, and Needle in a Haystack demonstrate that PromptDistill significantly improves efficiency while having minimal impact on output quality compared to the original models. With a single-stage selection strategy, PromptDistill effectively balances performance and efficiency, outperforming prior methods like GemFilter, H2O, and SnapKV due to its superior ability to retain essential information. Specifically, compared to GemFilter, PromptDistill achieves an overall $1\%$ to $5\%$ performance improvement while also offering better time efficiency. Additionally, we explore multi-stage selection, which further improves efficiency while maintaining strong generation performance.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Authors:
Vernon Y. H. Toh,
Yew Ken Chia,
Deepanway Ghosal,
Soujanya Poria
Abstract:
The releases of OpenAI's o-[n] series, such as o1, o3, and o4-mini, mark a significant paradigm shift in Large Language Models towards advanced reasoning capabilities. Notably, models like o3 have demonstrated strong performance on benchmarks like the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI). However, this benchmark is limited to symbolic patterns, whereas hum…
▽ More
The releases of OpenAI's o-[n] series, such as o1, o3, and o4-mini, mark a significant paradigm shift in Large Language Models towards advanced reasoning capabilities. Notably, models like o3 have demonstrated strong performance on benchmarks like the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI). However, this benchmark is limited to symbolic patterns, whereas humans often perceive and reason about multimodal scenarios involving both vision and language data. Thus, there is an urgent need to investigate advanced reasoning capabilities in multimodal tasks. To this end, we track the evolution of the GPT-[n] and o-[n] series models (including o1, o3, and o4-mini) on challenging multimodal puzzles from PuzzleVQA and AlgoPuzzleVQA, which demand fine-grained visual perception. Our results reveal that o-[n] series, particularly later iterations like o3 and o4-mini, significantly outperform the GPT-[n] series and show strong scalability in multimodal reasoning. Nonetheless, despite these substantial advancements and the superior capabilities demonstrated by the o-[n] series, our findings highlight that even these leading models face persistent challenges. Difficulties are particularly evident in tasks requiring precise visual perception, robust compositional reasoning across multiple visual attributes, and solving complex algorithmic or highly combinatorial puzzles, indicating critical areas for future AGI development. We plan to continuously track new models in the series and update our results in this paper accordingly. All resources used in this evaluation are openly available at https://github.com/declare-lab/LLM-PuzzleTest.
△ Less
Submitted 21 May, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
Authors:
Yew Ken Chia,
Liying Cheng,
Hou Pong Chan,
Chaoqun Liu,
Maojia Song,
Sharifah Mahani Aljunied,
Soujanya Poria,
Lidong Bing
Abstract:
The ability to understand and answer questions over documents can be useful in many business and practical applications. However, documents often contain lengthy and diverse multimodal contents such as texts, figures, and tables, which are very time-consuming for humans to read thoroughly. Hence, there is an urgent need to develop effective and automated methods to aid humans in this task. In this…
▽ More
The ability to understand and answer questions over documents can be useful in many business and practical applications. However, documents often contain lengthy and diverse multimodal contents such as texts, figures, and tables, which are very time-consuming for humans to read thoroughly. Hence, there is an urgent need to develop effective and automated methods to aid humans in this task. In this work, we introduce M-LongDoc, a benchmark of 851 samples, and an automated framework to evaluate the performance of large multimodal models. We further propose a retrieval-aware tuning approach for efficient and effective multimodal document reading. Compared to existing works, our benchmark consists of more recent and lengthy documents with hundreds of pages, while also requiring open-ended solutions and not just extractive answers. To our knowledge, our training framework is the first to directly address the retrieval setting for multimodal long documents. To enable tuning open-source models, we construct a training corpus in a fully automatic manner for the question-answering task over such documents. Experiments show that our tuning approach achieves a relative improvement of 4.6% for the correctness of model responses, compared to the baseline open-source models. Our data, code, and models are available at https://multimodal-documents.github.io.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths
Authors:
Yew Ken Chia,
Guizhen Chen,
Weiwen Xu,
Luu Anh Tuan,
Soujanya Poria,
Lidong Bing
Abstract:
Advanced models such as OpenAI o1 exhibit impressive problem-solving capabilities through step-by-step reasoning. However, they may still falter on more complex problems, making errors that disrupt their reasoning paths. We attribute this to the expansive solution space, where each step has the risk of diverging into mistakes. To enhance language model reasoning, we introduce a specialized trainin…
▽ More
Advanced models such as OpenAI o1 exhibit impressive problem-solving capabilities through step-by-step reasoning. However, they may still falter on more complex problems, making errors that disrupt their reasoning paths. We attribute this to the expansive solution space, where each step has the risk of diverging into mistakes. To enhance language model reasoning, we introduce a specialized training framework called Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. Reasoning Paths Optimization does not rely on large-scale human-annotated rationales or outputs from closed-source models, making it scalable and data-efficient. We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions. The experiments demonstrate that our framework significantly enhances the reasoning performance of large language models, with up to 3.1% and 4.3% improvement on GSM8K and MMLU (STEM) respectively. Our data and code can be found at https://reasoning-paths.github.io.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models
Authors:
Yew Ken Chia,
Qi Sun,
Lidong Bing,
Soujanya Poria
Abstract:
Large multimodal models have demonstrated impressive problem-solving abilities in vision and language tasks, and have the potential to encode extensive world knowledge. However, it remains an open challenge for these models to perceive, reason, plan, and act in realistic environments. In this work, we introduce Can-Do, a benchmark dataset designed to evaluate embodied planning abilities through mo…
▽ More
Large multimodal models have demonstrated impressive problem-solving abilities in vision and language tasks, and have the potential to encode extensive world knowledge. However, it remains an open challenge for these models to perceive, reason, plan, and act in realistic environments. In this work, we introduce Can-Do, a benchmark dataset designed to evaluate embodied planning abilities through more diverse and complex scenarios than previous datasets. Our dataset includes 400 multimodal samples, each consisting of natural language user instructions, visual images depicting the environment, state changes, and corresponding action plans. The data encompasses diverse aspects of commonsense knowledge, physical understanding, and safety awareness. Our fine-grained analysis reveals that state-of-the-art models, including GPT-4V, face bottlenecks in visual perception, comprehension, and reasoning abilities. To address these challenges, we propose NeuroGround, a neurosymbolic framework that first grounds the plan generation in the perceived environment states and then leverages symbolic planning engines to augment the model-generated plans. Experimental results demonstrate the effectiveness of our framework compared to strong baselines. Our code and dataset are available at https://embodied-planning.github.io.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
Authors:
Wenxuan Zhang,
Hou Pong Chan,
Yiran Zhao,
Mahani Aljunied,
Jianyu Wang,
Chaoqun Liu,
Yue Deng,
Zhiqiang Hu,
Weiwen Xu,
Yew Ken Chia,
Xin Li,
Lidong Bing
Abstract:
Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it…
▽ More
Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by its rich linguistic diversity, has lacked adequate language technology support. SeaLLMs 3 aims to bridge this gap by covering a comprehensive range of languages spoken in this region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese. Leveraging efficient language enhancement techniques and a specially constructed instruction tuning dataset, SeaLLMs 3 significantly reduces training costs while maintaining high performance and versatility. Our model excels in tasks such as world knowledge, mathematical reasoning, translation, and instruction following, achieving state-of-the-art performance among similarly sized models. Additionally, we prioritized safety and reliability by addressing both general and culture-specific considerations and incorporated mechanisms to reduce hallucinations. This work underscores the importance of inclusive AI, showing that advanced LLM capabilities can benefit underserved linguistic and cultural communities.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
GCS*: Forward Heuristic Search on Implicit Graphs of Convex Sets
Authors:
Shao Yuan Chew Chia,
Rebecca H. Jiang,
Bernhard Paus Graesdal,
Leslie Pack Kaelbling,
Russ Tedrake
Abstract:
We consider large-scale, implicit-search-based solutions to Shortest Path Problems on Graphs of Convex Sets (GCS). We propose GCS*, a forward heuristic search algorithm that generalizes A* search to the GCS setting, where a continuous-valued decision is made at each graph vertex, and constraints across graph edges couple these decisions, influencing costs and feasibility. Such mixed discrete-conti…
▽ More
We consider large-scale, implicit-search-based solutions to Shortest Path Problems on Graphs of Convex Sets (GCS). We propose GCS*, a forward heuristic search algorithm that generalizes A* search to the GCS setting, where a continuous-valued decision is made at each graph vertex, and constraints across graph edges couple these decisions, influencing costs and feasibility. Such mixed discrete-continuous planning is needed in many domains, including motion planning around obstacles and planning through contact. This setting provides a unique challenge for best-first search algorithms: the cost and feasibility of a path depend on continuous-valued points chosen along the entire path. We show that by pruning paths that are cost-dominated over their entire terminal vertex, GCS* can search efficiently while still guaranteeing cost-optimality and completeness. To find satisficing solutions quickly, we also present a complete but suboptimal variation, pruning instead reachability-dominated paths. We implement these checks using polyhedral-containment or sampling-based methods. The former implementation is complete and cost-optimal, while the latter is probabilistically complete and asymptotically cost-optimal and performs effectively even with minimal samples in practice. We demonstrate GCS* on planar pushing tasks where the combinatorial explosion of contact modes renders prior methods intractable and show it performs favorably compared to the state-of-the-art. Project website: https://shaoyuan.cc/research/gcs-star/
△ Less
Submitted 9 December, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions
Authors:
Ruochen Zhao,
Wenxuan Zhang,
Yew Ken Chia,
Weiwen Xu,
Deli Zhao,
Lidong Bing
Abstract:
As LLMs continuously evolve, there is an urgent need for a reliable evaluation method that delivers trustworthy results promptly. Currently, static benchmarks suffer from inflexibility and unreliability, leading users to prefer human voting platforms like Chatbot Arena. However, human evaluations require significant manual effort. To address this, we propose the Auto-Arena, an innovative framework…
▽ More
As LLMs continuously evolve, there is an urgent need for a reliable evaluation method that delivers trustworthy results promptly. Currently, static benchmarks suffer from inflexibility and unreliability, leading users to prefer human voting platforms like Chatbot Arena. However, human evaluations require significant manual effort. To address this, we propose the Auto-Arena, an innovative framework that automates the entire evaluation process using LLM-powered agents. Firstly, an LLM examiner generates questions. Then, two LLM candidates engage in a multi-round peer battle based on individual questions, aiming at revealing their true performance differences. Finally, a committee of LLM judges collaboratively discusses and decides the winner, reducing bias and enhancing fairness. During the peer battles, we observe intriguing scenarios where the LLM candidates display competitive behaviors and even learn from the opponents. In our extensive experiments involving 15 recent LLMs, Auto-Arena shows a 92.14% correlation with human preferences, surpassing all previous expert-annotated benchmarks without any manual efforts. As a result, Auto-Arena offers a promising alternative to current human evaluation platforms for evaluating LLMs automatically.
△ Less
Submitted 6 October, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Authors:
Yew Ken Chia,
Vernon Toh Yan Han,
Deepanway Ghosal,
Lidong Bing,
Soujanya Poria
Abstract:
Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of 2000 puzzle instances based on abstract…
▽ More
Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of 2000 puzzle instances based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. Through our experiments on state-of-the-art large multimodal models, we find that they are not able to generalize well to simple abstract patterns. Notably, GPT-4V achieves a score of 46.4% on single-concept puzzles, which shows that state-of-the-art models struggle on our dataset. To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning. Our systematic analysis finds that the main bottlenecks of GPT-4V are weaker visual perception and inductive reasoning abilities. Through this work, we hope to shed light on the limitations of large multimodal models and how they can better emulate human cognitive processes in the future. Our data and code are available at https://puzzlevqa.github.io
△ Less
Submitted 17 August, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Towards Tight Convex Relaxations for Contact-Rich Manipulation
Authors:
Bernhard Paus Graesdal,
Shao Yuan Chew Chia,
Tobia Marcucci,
Savva Morozov,
Alexandre Amice,
Pablo A. Parrilo,
Russ Tedrake
Abstract:
We present a novel method for global motion planning of robotic systems that interact with the environment through contacts. Our method directly handles the hybrid nature of such tasks using tools from convex optimization. We formulate the motion-planning problem as a shortest-path problem in a graph of convex sets, where a path in the graph corresponds to a contact sequence and a convex set model…
▽ More
We present a novel method for global motion planning of robotic systems that interact with the environment through contacts. Our method directly handles the hybrid nature of such tasks using tools from convex optimization. We formulate the motion-planning problem as a shortest-path problem in a graph of convex sets, where a path in the graph corresponds to a contact sequence and a convex set models the quasi-static dynamics within a fixed contact mode. For each contact mode, we use semidefinite programming to relax the nonconvex dynamics that results from the simultaneous optimization of the object's pose, contact locations, and contact forces. The result is a tight convex relaxation of the overall planning problem, that can be efficiently solved and quickly rounded to find a feasible contact-rich trajectory. As an initial application for evaluating our method, we apply it on the task of planar pushing. Exhaustive experiments show that our convex-optimization method generates plans that are consistently within a small percentage of the global optimum, without relying on an initial guess, and that our method succeeds in finding trajectories where a state-of-the-art baseline for contact-rich planning usually fails. We demonstrate the quality of these plans on a real robotic system.
△ Less
Submitted 5 July, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
SeaLLMs -- Large Language Models for Southeast Asia
Authors:
Xuan-Phi Nguyen,
Wenxuan Zhang,
Xin Li,
Mahani Aljunied,
Zhiqiang Hu,
Chenhui Shen,
Yew Ken Chia,
Xingxuan Li,
Jianyu Wang,
Qingyu Tan,
Liying Cheng,
Guanzheng Chen,
Yue Deng,
Sen Yang,
Chaoqun Liu,
Hang Zhang,
Lidong Bing
Abstract:
Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are buil…
▽ More
Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are built upon the Llama-2 model and further advanced through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning to better capture the intricacies of regional languages. This allows them to respect and reflect local cultural norms, customs, stylistic preferences, and legal considerations. Our comprehensive evaluation demonstrates that SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform ChatGPT-3.5 in non-Latin languages, such as Thai, Khmer, Lao, and Burmese, by large margins while remaining lightweight and cost-effective to operate.
△ Less
Submitted 1 July, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Contrastive Chain-of-Thought Prompting
Authors:
Yew Ken Chia,
Guizhen Chen,
Luu Anh Tuan,
Soujanya Poria,
Lidong Bing
Abstract:
Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mista…
▽ More
Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mistakes to avoid, which potentially leads to more errors. Hence, inspired by how humans can learn from both positive and negative examples, we propose contrastive chain of thought to enhance language model reasoning. Compared to the conventional chain of thought, our approach provides both valid and invalid reasoning demonstrations, to guide the model to reason step-by-step while reducing reasoning mistakes. To improve generalization, we introduce an automatic method to construct contrastive demonstrations. Our experiments on reasoning benchmarks demonstrate that contrastive chain of thought can serve as a general enhancement of chain-of-thought prompting.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Authors:
Deepanway Ghosal,
Yew Ken Chia,
Navonil Majumder,
Soujanya Poria
Abstract:
Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skill…
▽ More
Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skills. This performance discrepancy can be attributed to three key factors: (1) Pre-training data, (2) Backbone architecture, and (3) Instruction dataset. In this technical report, our main focus is on investigating the impact of the third factor by leveraging VICUNA, a large language model based on LLAMA, which has undergone fine-tuning on ChatGPT conversations. To achieve this objective, we fine-tuned VICUNA using a customized instruction dataset collection called FLANMINI. This collection includes a subset of the large-scale instruction dataset known as FLAN, as well as various code-related datasets and conversational datasets derived from ChatGPT/GPT-4. This dataset comprises a large number of tasks that demand problem-solving skills. Our experimental findings strongly indicate that the enhanced problem-solving abilities of our model, FLACUNA, are obtained through fine-tuning VICUNA on the FLAN dataset, leading to significant improvements across numerous benchmark datasets in INSTRUCTEVAL. FLACUNA is publicly available at https://huggingface.co/declare-lab/flacuna-13b-v1.0.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Authors:
Wenxuan Zhang,
Sharifah Mahani Aljunied,
Chang Gao,
Yew Ken Chia,
Lidong Bing
Abstract:
Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchm…
▽ More
Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. M3Exam exhibits three unique characteristics: (1) multilingualism, encompassing questions from multiple countries that require strong multilingual proficiency and cultural knowledge; (2) multimodality, accounting for the multimodal nature of many exam questions to test the model's multimodal understanding capability; and (3) multilevel structure, featuring exams from three critical educational periods to comprehensively assess a model's proficiency at different levels. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23\% of the questions require processing images for successful solving. We assess the performance of top-performing LLMs on M3Exam and find that current models, including GPT-4, still struggle with multilingual text, particularly in low-resource and non-Latin script languages. Multimodal LLMs also perform poorly with complex multimodal questions. We believe that M3Exam can be a valuable resource for comprehensively evaluating LLMs by examining their multilingual and multimodal abilities and tracking their development. Data and evaluation code is available at \url{https://github.com/DAMO-NLP-SG/M3Exam}.
△ Less
Submitted 9 November, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Authors:
Yew Ken Chia,
Pengfei Hong,
Lidong Bing,
Soujanya Poria
Abstract:
Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding r…
▽ More
Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding regarding their full potential, primarily due to the black-box nature of many models and the absence of holistic evaluation studies. To address these challenges, we present INSTRUCTEVAL, a more comprehensive evaluation suite designed specifically for instruction-tuned large language models. Unlike previous works, our evaluation involves a rigorous assessment of models based on problem-solving, writing ability, and alignment to human values. We take a holistic approach to analyze various factors affecting model performance, including the pretraining foundation, instruction-tuning data, and training methods. Our findings reveal that the quality of instruction data is the most crucial factor in scaling model performance. While open-source models demonstrate impressive writing abilities, there is substantial room for improvement in problem-solving and alignment. We are encouraged by the rapid development of models by the open-source community, but we also highlight the need for rigorous evaluation to support claims made about these models. Through INSTRUCTEVAL, we aim to foster a deeper understanding of instruction-tuned models and advancements in their capabilities. INSTRUCTEVAL is publicly available at https://github.com/declare-lab/instruct-eval.
△ Less
Submitted 15 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction
Authors:
Yew Ken Chia,
Hui Chen,
Wei Han,
Guizhen Chen,
Sharifah Mahani Aljunied,
Soujanya Poria,
Lidong Bing
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is a challenging task in sentiment analysis, aiming to provide fine-grained insights into human sentiments. However, existing benchmarks are limited to two domains and do not evaluate model performance on unseen domains, raising concerns about the generalization of proposed methods. Furthermore, it remains unclear if large language models (LLMs) can effec…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is a challenging task in sentiment analysis, aiming to provide fine-grained insights into human sentiments. However, existing benchmarks are limited to two domains and do not evaluate model performance on unseen domains, raising concerns about the generalization of proposed methods. Furthermore, it remains unclear if large language models (LLMs) can effectively handle complex sentiment tasks like ASTE. In this work, we address the issue of generalization in ASTE from both a benchmarking and modeling perspective. We introduce a domain-expanded benchmark by annotating samples from diverse domains, enabling evaluation of models in both in-domain and out-of-domain settings. Additionally, we propose CASE, a simple and effective decoding strategy that enhances trustworthiness and performance of LLMs in ASTE. Through comprehensive experiments involving multiple tasks, settings, and models, we demonstrate that CASE can serve as a general decoding strategy for complex sentiment tasks. By expanding the scope of evaluation and providing a more reliable decoding strategy, we aim to inspire the research community to reevaluate the generalizability of benchmarks and models for ASTE. Our code, data, and models are available at https://github.com/DAMO-NLP-SG/domain-expanded-aste.
△ Less
Submitted 30 October, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources
Authors:
Xingxuan Li,
Ruochen Zhao,
Yew Ken Chia,
Bosheng Ding,
Shafiq Joty,
Soujanya Poria,
Lidong Bing
Abstract:
We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-inten…
▽ More
We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
△ Less
Submitted 21 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines
Authors:
Ruochen Zhao,
Xingxuan Li,
Yew Ken Chia,
Bosheng Ding,
Lidong Bing
Abstract:
Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we sh…
▽ More
Although large conversational AI models such as OpenAI's ChatGPT have demonstrated great potential, we question whether such models can guarantee factual accuracy. Recently, technology companies such as Microsoft and Google have announced new services which aim to combine search engines with conversational AI. However, we have found numerous mistakes in the public demonstrations that suggest we should not easily trust the factual claims of the AI models. Rather than criticizing specific models or companies, we hope to call on researchers and developers to improve AI models' transparency and factual correctness.
△ Less
Submitted 2 March, 2023;
originally announced April 2023.
-
Is GPT-3 a Good Data Annotator?
Authors:
Bosheng Ding,
Chengwei Qin,
Linlin Liu,
Yew Ken Chia,
Shafiq Joty,
Boyang Li,
Lidong Bing
Abstract:
Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefor…
▽ More
Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale language model developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.
△ Less
Submitted 14 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach
Authors:
Yew Ken Chia,
Lidong Bing,
Sharifah Mahani Aljunied,
Luo Si,
Soujanya Poria
Abstract:
Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard Universi…
▽ More
Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard University) can be factually enriched by including the qualifier (End Time, 1967). Hence, we propose the task of hyper-relational extraction to extract more specific and complete facts from text. To support the task, we construct HyperRED, a large-scale and general-purpose dataset. Existing models cannot perform hyper-relational extraction as it requires a model to consider the interaction between three entities. Hence, we propose CubeRE, a cube-filling model inspired by table-filling approaches and explicitly considers the interaction between relation triplets and qualifiers. To improve model scalability and reduce negative class imbalance, we further propose a cube-pruning method. Our experiments show that CubeRE outperforms strong baselines and reveal possible directions for future research. Our code and data are available at github.com/declare-lab/HyperRED.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction
Authors:
Yew Ken Chia,
Lidong Bing,
Soujanya Poria,
Luo Si
Abstract:
Despite the importance of relation extraction in building and representing knowledge, less research is focused on generalizing to unseen relations types. We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE) to encourage further research in low-resource relation extraction methods. Given an input sentence, each extracted triplet consists of the head entity, relation labe…
▽ More
Despite the importance of relation extraction in building and representing knowledge, less research is focused on generalizing to unseen relations types. We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE) to encourage further research in low-resource relation extraction methods. Given an input sentence, each extracted triplet consists of the head entity, relation label, and tail entity where the relation label is not seen at the training stage. To solve ZeroRTE, we propose to synthesize relation examples by prompting language models to generate structured texts. Concretely, we unify language model prompts and structured text approaches to design a structured prompt template for generating synthetic relation samples when conditioning on relation label prompts (RelationPrompt). To overcome the limitation for extracting multiple relation triplets in a sentence, we design a novel Triplet Search Decoding method. Experiments on FewRel and Wiki-ZSL datasets show the efficacy of RelationPrompt for the ZeroRTE task and zero-shot relation classification. Our code and data are available at github.com/declare-lab/RelationPrompt.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction
Authors:
Lu Xu,
Yew Ken Chia,
Lidong Bing
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which con…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which contain multiple words. Our proposed span-level approach explicitly considers the interaction between the whole spans of targets and opinions when predicting their sentiment relation. Thus, it can make predictions with the semantics of whole spans, ensuring better sentiment consistency. To ease the high computational cost caused by span enumeration, we propose a dual-channel span pruning strategy by incorporating supervision from the Aspect Term Extraction (ATE) and Opinion Term Extraction (OTE) tasks. This strategy not only improves computational efficiency but also distinguishes the opinion and target spans more properly. Our framework simultaneously achieves strong performance for the ASTE as well as ATE and OTE tasks. In particular, our analysis shows that our span-level approach achieves more significant improvements over the baselines on triplets with multi-word targets or opinions.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Red Dragon AI at TextGraphs 2020 Shared Task: LIT : LSTM-Interleaved Transformer for Multi-Hop Explanation Ranking
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
Explainable question answering for science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. To counter the limitations of methods that view each query-document pair in isolation, we propose the LSTM-Interleaved Transformer which incorporates cross-document interactions for improved multi-hop ranking. The LIT architecture can leverage prior ranki…
▽ More
Explainable question answering for science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. To counter the limitations of methods that view each query-document pair in isolation, we propose the LSTM-Interleaved Transformer which incorporates cross-document interactions for improved multi-hop ranking. The LIT architecture can leverage prior ranking positions in the re-ranking setting. Our model is competitive on the current leaderboard for the TextGraphs 2020 shared task, achieving a test-set MAP of 0.5607, and would have gained third place had we submitted before the competition deadline. Our code implementation is made available at https://github.com/mdda/worldtree_corpus/tree/textgraphs_2020
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Red Dragon AI at TextGraphs 2019 Shared Task: Language Model Assisted Explanation Generation
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
The TextGraphs-13 Shared Task on Explanation Regeneration asked participants to develop methods to reconstruct gold explanations for elementary science questions. Red Dragon AI's entries used the language of the questions and explanation text directly, rather than a constructing a separate graph-like representation. Our leaderboard submission placed us 3rd in the competition, but we present here t…
▽ More
The TextGraphs-13 Shared Task on Explanation Regeneration asked participants to develop methods to reconstruct gold explanations for elementary science questions. Red Dragon AI's entries used the language of the questions and explanation text directly, rather than a constructing a separate graph-like representation. Our leaderboard submission placed us 3rd in the competition, but we present here three methods of increasing sophistication, each of which scored successively higher on the test set after the competition close.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Scene Graph Parsing by Attention Graph
Authors:
Martin Andrews,
Yew Ken Chia,
Sam Witteveen
Abstract:
Scene graph representations, which form a graph of visual object nodes together with their attributes and relations, have proved useful across a variety of vision and language applications. Recent work in the area has used Natural Language Processing dependency tree methods to automatically build scene graphs.
In this work, we present an 'Attention Graph' mechanism that can be trained end-to-end…
▽ More
Scene graph representations, which form a graph of visual object nodes together with their attributes and relations, have proved useful across a variety of vision and language applications. Recent work in the area has used Natural Language Processing dependency tree methods to automatically build scene graphs.
In this work, we present an 'Attention Graph' mechanism that can be trained end-to-end, and produces a scene graph structure that can be lifted directly from the top layer of a standard Transformer model.
The scene graphs generated by our model achieve an F-score similarity of 52.21% to ground-truth graphs on the evaluation set using the SPICE metric, surpassing the best previous approaches by 2.5%.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Transformer to CNN: Label-scarce distillation for efficient text classification
Authors:
Yew Ken Chia,
Sam Witteveen,
Martin Andrews
Abstract:
Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop…
▽ More
Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop paper outlines how our proposed convolutional student architecture, having been trained by a distillation process from a large-scale model, can achieve 300x inference speedup and 39x reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
A learning-based approach for automatic image and video colorization
Authors:
Raj Kumar Gupta,
Alex Yong-Sang Chia,
Deepu Rajan,
Huang Zhiyong
Abstract:
In this paper, we present a color transfer algorithm to colorize a broad range of gray images without any user intervention. The algorithm uses a machine learning-based approach to automatically colorize grayscale images. The algorithm uses the superpixel representation of the reference color images to learn the relationship between different image features and their corresponding color values. We…
▽ More
In this paper, we present a color transfer algorithm to colorize a broad range of gray images without any user intervention. The algorithm uses a machine learning-based approach to automatically colorize grayscale images. The algorithm uses the superpixel representation of the reference color images to learn the relationship between different image features and their corresponding color values. We use this learned information to predict the color value of each grayscale image superpixel. As compared to processing individual image pixels, our use of superpixels helps us to achieve a much higher degree of spatial consistency as well as speeds up the colorization process. The predicted color values of the gray-scale image superpixels are used to provide a 'micro-scribble' at the centroid of the superpixels. These color scribbles are refined by using a voting based approach. To generate the final colorization result, we use an optimization-based approach to smoothly spread the color scribble across all pixels within a superpixel. Experimental results on a broad range of images and the comparison with existing state-of-the-art colorization methods demonstrate the greater effectiveness of the proposed algorithm.
△ Less
Submitted 15 April, 2017;
originally announced April 2017.
-
Unsupervised Cross-Media Hashing with Structure Preservation
Authors:
Xiangyu Wang,
Alex Yong-Sang Chia
Abstract:
Recent years have seen the exponential growth of heterogeneous multimedia data. The need for effective and accurate data retrieval from heterogeneous data sources has attracted much research interest in cross-media retrieval. Here, given a query of any media type, cross-media retrieval seeks to find relevant results of different media types from heterogeneous data sources. To facilitate large-scal…
▽ More
Recent years have seen the exponential growth of heterogeneous multimedia data. The need for effective and accurate data retrieval from heterogeneous data sources has attracted much research interest in cross-media retrieval. Here, given a query of any media type, cross-media retrieval seeks to find relevant results of different media types from heterogeneous data sources. To facilitate large-scale cross-media retrieval, we propose a novel unsupervised cross-media hashing method. Our method incorporates local affinity and distance repulsion constraints into a matrix factorization framework. Correspondingly, the proposed method learns hash functions that generates unified hash codes from different media types, while ensuring intrinsic geometric structure of the data distribution is preserved. These hash codes empower the similarity between data of different media types to be evaluated directly. Experimental results on two large-scale multimedia datasets demonstrate the effectiveness of the proposed method, where we outperform the state-of-the-art methods.
△ Less
Submitted 18 March, 2016;
originally announced March 2016.
-
Mobile Data Offloading through A Third-Party WiFi Access Point: An Operator's Perspective
Authors:
Xin Kang,
Yeow-Khiang Chia,
Sumei Sun,
Hon Fah Chong
Abstract:
WiFi offloading is regarded as one of the most promising techniques to deal with the explosive data increase in cellular networks due to its high data transmission rate and low requirement on devices. In this paper, we investigate the mobile data offloading problem through a third-party WiFi access point (AP) for a cellular mobile system. From the cellular operator's perspective, by assuming a usa…
▽ More
WiFi offloading is regarded as one of the most promising techniques to deal with the explosive data increase in cellular networks due to its high data transmission rate and low requirement on devices. In this paper, we investigate the mobile data offloading problem through a third-party WiFi access point (AP) for a cellular mobile system. From the cellular operator's perspective, by assuming a usage-based charging model, we formulate the problem as a utility maximization problem. In particular, we consider three scenarios: (i) successive interference cancellation (SIC) available at both the base station (BS) and the AP; (ii) SIC available at neither the BS nor the AP; (iii) SIC available at only the BS. For (i), we show that the utility maximization problem can be solved by considering its relaxation problem, and we prove that our proposed data offloading scheme is near-optimal when the number of users is large. For (ii), we prove that with high probability the optimal solution is One-One-Association, i.e., one user connects to the BS and one user connects to the AP. For (iii), we show that with high probability there is at most one user connecting to the AP, and all the other users connect to the BS. By comparing these three scenarios, we prove that SIC decoders help the cellular operator maximize its utility. To relieve the computational burden of the BS, we propose a threshold-based distributed data offloading scheme. We show that the proposed distributed scheme performs well if the threshold is properly chosen.
△ Less
Submitted 22 August, 2014;
originally announced August 2014.
-
Cost minimization for fading channels with energy harvesting and conventional energy
Authors:
Xin Kang,
Yeow-Khiang Chia,
Chin Keong Ho,
Sumei Sun
Abstract:
In this paper, we investigate resource allocation strategies for a point-to-point wireless communications system with hybrid energy sources consisting of an energy harvester and a conventional energy source. In particular, as an incentive to promote the use of renewable energy, we assume that the renewable energy has a lower cost than the conventional energy. Then, by assuming that the non-causal…
▽ More
In this paper, we investigate resource allocation strategies for a point-to-point wireless communications system with hybrid energy sources consisting of an energy harvester and a conventional energy source. In particular, as an incentive to promote the use of renewable energy, we assume that the renewable energy has a lower cost than the conventional energy. Then, by assuming that the non-causal information of the energy arrivals and the channel power gains are available, we minimize the total energy cost of such a system over $N$ fading slots under a proposed outage constraint together with the energy harvesting constraints. The outage constraint requires a minimum fixed number of slots to be reliably decoded, and thus leads to a mixed-integer programming formulation for the optimization problem. This constraint is useful, for example, if an outer code is used to recover all the data bits. Optimal linear time algorithms are obtained for two extreme cases, i.e., the number of outage slot is $1$ or $N-1$. For the general case, a lower bound based on the linear programming relaxation, and two suboptimal algorithms are proposed. It is shown that the proposed suboptimal algorithms exhibit only a small gap from the lower bound. We then extend the proposed algorithms to the multi-cycle scenario in which the outage constraint is imposed for each cycle separately. Finally, we investigate the resource allocation strategies when only causal information on the energy arrivals and only channel statistics is available. It is shown that the greedy energy allocation is optimal for this scenario.
△ Less
Submitted 4 April, 2014;
originally announced April 2014.
-
Ergodic Sum-Rate Maximization for Fading Cognitive Multiple Access Channels without Successive Interference Cancellation
Authors:
Xin Kang,
Hon Fah Chong,
Yeow-Khiang Chia,
Sumei Sun
Abstract:
In this paper, the ergodic sum-rate of a fading cognitive multiple access channel (C-MAC) is studied, where a secondary network (SN) with multiple secondary users (SUs) transmitting to a secondary base station (SBS) shares the spectrum band with a primary user (PU). An interference power constraint (IPC) is imposed on the SN to protect the PU. Under such a constraint and the individual transmit po…
▽ More
In this paper, the ergodic sum-rate of a fading cognitive multiple access channel (C-MAC) is studied, where a secondary network (SN) with multiple secondary users (SUs) transmitting to a secondary base station (SBS) shares the spectrum band with a primary user (PU). An interference power constraint (IPC) is imposed on the SN to protect the PU. Under such a constraint and the individual transmit power constraint (TPC) imposed on each SU, we investigate the power allocation strategies to maximize the ergodic sum-rate of a fading C-MAC without successive interference cancellation (SIC). In particular, this paper considers two types of constraints: (1) average TPC and average IPC, (2) peak TPC and peak IPC. For the first case, it is proved that the optimal power allocation is dynamic time-division multiple-access (D-TDMA), which is exactly the same as the optimal power allocation to maximize the ergodic sum-rate of the fading C-MAC with SIC under the same constraints. For the second case, it is proved that the optimal solution must be at the extreme points of the feasible region. It is shown that D-TDMA is optimal with high probability when the number of SUs is large. Besides, we show that, when the SUs can be sorted in a certain order, an algorithm with linear complexity can be used to find the optimal power allocation.
△ Less
Submitted 3 March, 2014;
originally announced March 2014.
-
Energy-Efficient, Large-scale Distributed-Antenna System (L-DAS) for Multiple Users
Authors:
Jingon Joung,
Yeow Khiang Chia,
Sumei Sun
Abstract:
Large-scale distributed-antenna system (L-DAS) with very large number of distributed antennas, possibly up to a few hundred antennas, is considered. A few major issues of the L-DAS, such as high latency, energy consumption, computational complexity, and large feedback (signaling) overhead, are identified. The potential capability of the L-DAS is illuminated in terms of an energy efficiency (EE) th…
▽ More
Large-scale distributed-antenna system (L-DAS) with very large number of distributed antennas, possibly up to a few hundred antennas, is considered. A few major issues of the L-DAS, such as high latency, energy consumption, computational complexity, and large feedback (signaling) overhead, are identified. The potential capability of the L-DAS is illuminated in terms of an energy efficiency (EE) throughout the paper. We firstly and generally model the power consumption of an L-DAS, and formulate an EE maximization problem. To tackle two crucial issues, namely the huge computational complexity and large amount of feedback (signaling) information, we propose a channel-gain-based antenna selection (AS) method and an interference-based user clustering (UC) method. The original problem is then split into multiple subproblems by a cluster, and each cluster's precoding and power control are managed in parallel for high EE. Simulation results reveal that i) using all antennas for zero-forcing multiuser multiple-input multiple-output (MU-MIMO) is energy inefficient if there is nonnegligible overhead power consumption on MU-MIMO processing, and ii) increasing the number of antennas does not necessarily result in a high EE. Furthermore, the results validate and underpin the EE merit of the proposed L-DAS complied with the AS, UC, precoding, and power control by comparing with non-clustering L-DAS and colocated antenna systems.
△ Less
Submitted 20 January, 2014; v1 submitted 6 December, 2013;
originally announced December 2013.
-
A Note on Broadcast Channels with Stale State Information at the Transmitter
Authors:
Hyeji Kim,
Yeow-Khiang Chia,
Abbas El Gamal
Abstract:
This paper shows that the Maddah-Ali--Tse (MAT) scheme which establishes the symmetric capacity of two example broadcast channels with strictly causal state information at the transmitter is a simple special case of the Shayevitz--Wigger scheme for the broadcast channel with generalized feedback, which involves block Markov coding, compression, superposition coding, Marton coding, and coded time s…
▽ More
This paper shows that the Maddah-Ali--Tse (MAT) scheme which establishes the symmetric capacity of two example broadcast channels with strictly causal state information at the transmitter is a simple special case of the Shayevitz--Wigger scheme for the broadcast channel with generalized feedback, which involves block Markov coding, compression, superposition coding, Marton coding, and coded time sharing. Focusing on the class of symmetric broadcast channels with state, we derive an expression for the maximum achievable symmetric rate using the Shayevitz--Wigger scheme. We show that the MAT results can be recovered by evaluating this expression for the special case in which superposition coding and Marton coding are not used. We then introduce a new broadcast channel example that shares many features of the MAT examples. We show that another special case of our maximum symmetric rate expression in which superposition coding is also used attains a higher symmetric rate than the MAT scheme. The symmetric capacity of this example is not known, however.
△ Less
Submitted 16 June, 2014; v1 submitted 28 September, 2013;
originally announced September 2013.
-
Secure Source Coding with a Public Helper
Authors:
Kittipong Kittichokechai,
Yeow-Khiang Chia,
Tobias J. Oechtering,
Mikael Skoglund,
Tsachy Weissman
Abstract:
We consider secure multi-terminal source coding problems in the presence of a public helper. Two main scenarios are studied: 1) source coding with a helper where the coded side information from the helper is eavesdropped by an external eavesdropper; 2) triangular source coding with a helper where the helper is considered as a public terminal. We are interested in how the helper can support the sou…
▽ More
We consider secure multi-terminal source coding problems in the presence of a public helper. Two main scenarios are studied: 1) source coding with a helper where the coded side information from the helper is eavesdropped by an external eavesdropper; 2) triangular source coding with a helper where the helper is considered as a public terminal. We are interested in how the helper can support the source transmission subject to a constraint on the amount of information leaked due to its public nature. We characterize the tradeoff between transmission rate, incurred distortion, and information leakage rate at the helper/eavesdropper in the form of a rate-distortion-leakage region for various classes of problems.
△ Less
Submitted 4 July, 2013;
originally announced July 2013.
-
On Secure Source Coding with Side Information at the Encoder
Authors:
Yeow-Khiang Chia,
Kittipong Kittichokechai
Abstract:
We consider a secure source coding problem with side information (S.I.) at the decoder and the eavesdropper. The encoder has a source that it wishes to describe with limited distortion through a rate limited link to a legitimate decoder. The message sent is also observed by the eavesdropper. The encoder aims to minimize both the distortion incurred by the legitimate decoder; and the information le…
▽ More
We consider a secure source coding problem with side information (S.I.) at the decoder and the eavesdropper. The encoder has a source that it wishes to describe with limited distortion through a rate limited link to a legitimate decoder. The message sent is also observed by the eavesdropper. The encoder aims to minimize both the distortion incurred by the legitimate decoder; and the information leakage rate at the eavesdropper. When the encoder has access to the uncoded S.I. at the decoder, we characterize the rate-distortion-information leakage rate (R.D.I.) region under a Markov chain assumption and when S.I. at the encoder does not improve the rate-distortion region as compared to the case when S.I. is absent. When the decoder also has access to the eavesdroppers S.I., we characterize the R.D.I. region without the Markov Chain condition. We then consider a related setting where the encoder and decoder obtain coded S.I. through a rate limited helper, and characterize the R.D.I. region for several special cases, including special cases under logarithmic loss distortion and for special cases of the Quadratic Gaussian setting. Finally, we consider the amplification measures of list or entropy constraint at the decoder, and show that the R.D.I. regions for the settings considered in this paper under these amplification measures coincide with R.D.I. regions under per symbol logarithmic loss distortion constraint at the decoder.
△ Less
Submitted 3 July, 2013;
originally announced July 2013.
-
Energy Cooperation in Cellular Networks with Renewable Powered Base Stations
Authors:
Yeow-Khiang Chia,
Sumei Sun,
Rui Zhang
Abstract:
In this paper, we propose a model for energy cooperation between cellular base stations (BSs) with individual hybrid power supplies (including both the conventional grid and renewable energy sources), limited energy storages, and connected by resistive power lines for energy sharing. When the renewable energy profile and energy demand profile at all BSs are deterministic or known ahead of time, we…
▽ More
In this paper, we propose a model for energy cooperation between cellular base stations (BSs) with individual hybrid power supplies (including both the conventional grid and renewable energy sources), limited energy storages, and connected by resistive power lines for energy sharing. When the renewable energy profile and energy demand profile at all BSs are deterministic or known ahead of time, we show that the optimal energy cooperation policy for the BSs can be found by solving a linear program. We show the benefits of energy cooperation in this regime. When the renewable energy and demand profiles are stochastic and only causally known at the BSs, we propose an online energy cooperation algorithm and show the optimality properties of this algorithm under certain conditions. Furthermore, the energy-saving performances of the developed offline and online algorithms are compared by simulations, and the effect of the availability of energy state information (ESI) on the performance gains of the BSs' energy cooperation is investigated. Finally, we propose a hybrid algorithm that can incorporate offline information about the energy profiles, but operates in an online manner.
△ Less
Submitted 27 May, 2013; v1 submitted 21 January, 2013;
originally announced January 2013.
-
Epitome for Automatic Image Colorization
Authors:
Yingzhen Yang,
Xinqi Chu,
Tian-Tsong Ng,
Alex Yong-Sang Chia,
Shuicheng Yan,
Thomas S. Huang
Abstract:
Image colorization adds color to grayscale images. It not only increases the visual appeal of grayscale images, but also enriches the information contained in scientific images that lack color information. Most existing methods of colorization require laborious user interaction for scribbles or image segmentation. To eliminate the need for human labor, we develop an automatic image colorization me…
▽ More
Image colorization adds color to grayscale images. It not only increases the visual appeal of grayscale images, but also enriches the information contained in scientific images that lack color information. Most existing methods of colorization require laborious user interaction for scribbles or image segmentation. To eliminate the need for human labor, we develop an automatic image colorization method using epitome. Built upon a generative graphical model, epitome is a condensed image appearance and shape model which also proves to be an effective summary of color information for the colorization task. We train the epitome from the reference images and perform inference in the epitome to colorize grayscale images, rendering better colorization results than previous method in our experiments.
△ Less
Submitted 8 October, 2012;
originally announced October 2012.
-
Compression with Actions
Authors:
Lei Zhao,
Yeow-Khiang Chia,
Tsachy Weissman
Abstract:
We consider the setting where actions can be used to modify a state sequence before compression. The minimum rate needed to losslessly describe the optimal modified sequence is characterized when the state sequence is either non-causally or causally available at the action encoder. The achievability is closely related to the optimal channel coding strategy for channel with states. We also extend t…
▽ More
We consider the setting where actions can be used to modify a state sequence before compression. The minimum rate needed to losslessly describe the optimal modified sequence is characterized when the state sequence is either non-causally or causally available at the action encoder. The achievability is closely related to the optimal channel coding strategy for channel with states. We also extend the analysis to the the lossy case.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.
-
Estimation with a helper who knows the interference
Authors:
Yeow-Khiang Chia,
Rajiv Soundararajan,
Tsachy Weissman
Abstract:
We consider the problem of estimating a signal corrupted by independent interference with the assistance of a cost-constrained helper who knows the interference causally or noncausally. When the interference is known causally, we characterize the minimum distortion incurred in estimating the desired signal. In the noncausal case, we present a general achievable scheme for discrete memoryless syste…
▽ More
We consider the problem of estimating a signal corrupted by independent interference with the assistance of a cost-constrained helper who knows the interference causally or noncausally. When the interference is known causally, we characterize the minimum distortion incurred in estimating the desired signal. In the noncausal case, we present a general achievable scheme for discrete memoryless systems and novel lower bounds on the distortion for the binary and Gaussian settings. Our Gaussian setting coincides with that of assisted interference suppression introduced by Grover and Sahai. Our lower bound for this setting is based on the relation recently established by VerdĂș between divergence and minimum mean squared error. We illustrate with a few examples that this lower bound can improve on those previously developed. Our bounds also allow us to characterize the optimal distortion in several interesting regimes. Moreover, we show that causal and noncausal estimation are not equivalent for this problem. Finally, we consider the case where the desired signal is also available at the helper. We develop new lower bounds for this setting that improve on those previously developed, and characterize the optimal distortion up to a constant multiplicative factor for some regimes of interest.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.
-
Multi-Terminal Source Coding With Action Dependent Side Information
Authors:
Yeow-Khiang Chia,
Himanshu Asnani,
Tsachy Weissman
Abstract:
We consider multi-terminal source coding with a single encoder and multiple decoders where either the encoder or the decoders can take cost constrained actions which affect the quality of the side information present at the decoders. For the scenario where decoders take actions, we characterize the rate-cost trade-off region for lossless source coding, and give an achievability scheme for lossy so…
▽ More
We consider multi-terminal source coding with a single encoder and multiple decoders where either the encoder or the decoders can take cost constrained actions which affect the quality of the side information present at the decoders. For the scenario where decoders take actions, we characterize the rate-cost trade-off region for lossless source coding, and give an achievability scheme for lossy source coding for two decoders which is optimum for a variety of special cases of interest. For the case where the encoder takes actions, we characterize the rate-cost trade-off for a class of lossless source coding scenarios with multiple decoders. Finally, we also consider extensions to other multi-terminal source coding settings with actions, and characterize the rate -distortion-cost tradeoff for a case of successive refinement with actions.
△ Less
Submitted 31 October, 2011;
originally announced October 2011.
-
Cascade, Triangular and Two Way Source Coding with degraded side information at the second user
Authors:
Yeow Khiang Chia,
Haim Permuter,
Tsachy Weissman
Abstract:
We consider the Cascade and Triangular rate-distortion problems where the same side information is available at the source node and User 1, and the side information available at User 2 is a degraded version of the side information at the source node and User 1. We characterize the rate-distortion region for these problems. For the Cascade setup, we showed that, at User 1, decoding and re-binning t…
▽ More
We consider the Cascade and Triangular rate-distortion problems where the same side information is available at the source node and User 1, and the side information available at User 2 is a degraded version of the side information at the source node and User 1. We characterize the rate-distortion region for these problems. For the Cascade setup, we showed that, at User 1, decoding and re-binning the codeword sent by the source node for User 2 is optimum. We then extend our results to the Two way Cascade and Triangular setting, where the source node is interested in lossy reconstruction of the side information at User 2 via a rate limited link from User 2 to the source node. We characterize the rate distortion regions for these settings. Complete explicit characterizations for all settings are also given in the Quadratic Gaussian case. We conclude with two further extensions: A triangular source coding problem with a helper, and an extension of our Two Way Cascade setting in the Quadratic Gaussian case.
△ Less
Submitted 18 October, 2010;
originally announced October 2010.
-
An Achievability Scheme for the Compound Channel with State Noncausally Available at the Encoder
Authors:
Chandra Nair,
Abbas El Gamal,
Yeow-Khiang Chia
Abstract:
A new achievability scheme for the compound channel with discrete memoryless (DM) state noncausally available at the encoder is established. Achievability is proved using superposition coding, Marton coding, joint typicality encoding, and indirect decoding. The scheme is shown to achieve strictly higher rate than the straightforward extension of the Gelfand-Pinsker coding scheme for a single DMC w…
▽ More
A new achievability scheme for the compound channel with discrete memoryless (DM) state noncausally available at the encoder is established. Achievability is proved using superposition coding, Marton coding, joint typicality encoding, and indirect decoding. The scheme is shown to achieve strictly higher rate than the straightforward extension of the Gelfand-Pinsker coding scheme for a single DMC with DM state, and is optimal for some classes of channels.
△ Less
Submitted 22 April, 2010; v1 submitted 20 April, 2010;
originally announced April 2010.
-
Wiretap Channel with Causal State Information
Authors:
Yeow-Khiang Chia,
Abbas El Gamal
Abstract:
A lower bound on the secrecy capacity of the wiretap channel with state information available causally at both the encoder and decoder is established. The lower bound is shown to be strictly larger than that for the noncausal case by Liu and Chen. Achievability is proved using block Markov coding, Shannon strategy, and key generation from common state information. The state sequence available at t…
▽ More
A lower bound on the secrecy capacity of the wiretap channel with state information available causally at both the encoder and decoder is established. The lower bound is shown to be strictly larger than that for the noncausal case by Liu and Chen. Achievability is proved using block Markov coding, Shannon strategy, and key generation from common state information. The state sequence available at the end of each block is used to generate a key, which is used to enhance the transmission rate of the confidential message in the following block. An upper bound on the secrecy capacity when the state is available noncausally at the encoder and decoder is established and is shown to coincide with the lower bound for several classes of wiretap channels with state.
△ Less
Submitted 2 June, 2010; v1 submitted 13 January, 2010;
originally announced January 2010.
-
3-Receiver Broadcast Channels with Common and Confidential Messages
Authors:
Yeow-Khiang Chia,
Abbas El Gamal
Abstract:
This paper establishes inner bounds on the secrecy capacity regions for the general 3-receiver broadcast channel with one common and one confidential message sets. We consider two setups. The first is when the confidential message is to be sent to two receivers and kept secret from the third receiver. Achievability is established using indirect decoding, Wyner wiretap channel coding, and the new i…
▽ More
This paper establishes inner bounds on the secrecy capacity regions for the general 3-receiver broadcast channel with one common and one confidential message sets. We consider two setups. The first is when the confidential message is to be sent to two receivers and kept secret from the third receiver. Achievability is established using indirect decoding, Wyner wiretap channel coding, and the new idea of generating secrecy from a publicly available superposition codebook. The inner bound is shown to be tight for a class of reversely degraded broadcast channels and when both legitimate receivers are less noisy than the third receiver. The second setup investigated in this paper is when the confidential message is to be sent to one receiver and kept secret from the other two receivers. Achievability in this case follows from Wyner wiretap channel coding and indirect decoding. This inner bound is also shown to be tight for several special cases.
△ Less
Submitted 18 June, 2011; v1 submitted 8 October, 2009;
originally announced October 2009.
-
Faster-than-light effects and negative group delays in optics and electronics, and their applications
Authors:
Raymond Y. Chiao,
Jandir M. Hickmann,
Daniel Solli
Abstract:
Recent manifestations of apparently faster-than-light effects confirmed our predictions that the group velocity in transparent optical media can exceed c. Special relativity is not violated by these phenomena. Moreover, in the electronic domain, the causality principle does not forbid negative group delays of analytic signals in electronic circuits, in which the peak of an output pulse leaves th…
▽ More
Recent manifestations of apparently faster-than-light effects confirmed our predictions that the group velocity in transparent optical media can exceed c. Special relativity is not violated by these phenomena. Moreover, in the electronic domain, the causality principle does not forbid negative group delays of analytic signals in electronic circuits, in which the peak of an output pulse leaves the exit port of a circuit before the peak of the input pulse enters the input port. Furthermore, pulse distortion for these superluminal analytic signals can be negligible in both the optical and electronic domains. Here we suggest an extension of these ideas to the microelectronic domain. The underlying principle is that negative feedback can be used to produce negative group delays. Such negative group delays can be used to cancel out the positive group delays due to transistor latency (e.g., the finite RC rise time of MOSFETS caused by their intrinsic gate capacitance), as well as the propagation delays due to the interconnects between transistors. Using this principle, it is possible to speed up computer systems.
△ Less
Submitted 12 March, 2001;
originally announced March 2001.