-
AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output
Authors:
Hisami Suzuki,
Satoru Katsumata,
Takashi Kodama,
Tetsuro Takahashi,
Kouta Nakayama,
Satoshi Sekine
Abstract:
In this paper we present AnswerCarefully, a dataset for promoting the safety and appropriateness of Japanese LLM outputs. The dataset consists of 1,800 pairs of questions and reference answers, where the questions require special attention in answering. It covers a wide range of risk categories established in prior English-language datasets, but the data samples are original in that they are manua…
▽ More
In this paper we present AnswerCarefully, a dataset for promoting the safety and appropriateness of Japanese LLM outputs. The dataset consists of 1,800 pairs of questions and reference answers, where the questions require special attention in answering. It covers a wide range of risk categories established in prior English-language datasets, but the data samples are original in that they are manually created to reflect the socio-cultural context of LLM usage in Japan. We show that using this dataset for instruction to fine-tune a Japanese LLM led to improved output safety without compromising the utility of general responses. We also report the results of a safety evaluation of 12 Japanese LLMs using this dataset as a benchmark. Finally, we describe the latest update on the dataset which provides English translations and annotations of the questions, aimed at facilitating the derivation of similar datasets in different languages and regions.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Optimizing Item-based Marketing Promotion Efficiency in C2C Marketplace with Dynamic Sequential Coupon Allocation Framework
Authors:
Jie Yang,
Padunna Valappil Krishnaraj Sekhar,
Sho Sekine,
Yilin Li
Abstract:
In e-commerce platforms, coupons play a crucial role in boosting transactions. In the customer-to-customer (C2C) marketplace, ensuring the satisfaction of both buyers and sellers is essential. While buyer-focused marketing strategies often receive more attention, addressing the needs of sellers is equally important. Additionally, the existing strategies tend to optimize each promotion independentl…
▽ More
In e-commerce platforms, coupons play a crucial role in boosting transactions. In the customer-to-customer (C2C) marketplace, ensuring the satisfaction of both buyers and sellers is essential. While buyer-focused marketing strategies often receive more attention, addressing the needs of sellers is equally important. Additionally, the existing strategies tend to optimize each promotion independently, resulting in a lack of continuity between promotions and unnecessary costs in the pursuit of short-term impact within each promotion period.
We introduce a Dynamic Sequential Coupon Allocation Framework (DSCAF) to optimize item coupon allocation strategies across a series of promotions. DSCAF provides sequential recommendations for coupon configurations and timing to target items. In cases where initial suggestions do not lead to sales, it dynamically adjusts the strategy and offers subsequent solutions. It integrates two predictors for estimating the sale propensity in the current and subsequent rounds of coupon allocation, and a decision-making process to determine the coupon allocation solution. It runs iteratively until the item is sold. The goal of the framework is to maximize Return on Investment (ROI) while ensuring lift Sell-through Rate (STR) remains above a specified threshold. DSCAF aims to optimize sequential coupon efficiency with a long-term perspective rather than solely focusing on the lift achieved in each individual promotion. It has been applied for item coupon allocation in Mercari.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Strategic Coupon Allocation for Increasing Providers' Sales Experiences in Two-sided Marketplaces
Authors:
Koya Ohashi,
Sho Sekine,
Deddy Jobson,
Jie Yang,
Naoki Nishimura,
Noriyoshi Sukegawa,
Yuichi Takano
Abstract:
In a two-sided marketplace, network effects are crucial for competitiveness, and platforms need to retain users through advanced customer relationship management as much as possible. Maintaining numerous providers' stable and active presence on the platform is highly important to enhance the marketplace's scale and diversity. The strongest motivation for providers to continue using the platform is…
▽ More
In a two-sided marketplace, network effects are crucial for competitiveness, and platforms need to retain users through advanced customer relationship management as much as possible. Maintaining numerous providers' stable and active presence on the platform is highly important to enhance the marketplace's scale and diversity. The strongest motivation for providers to continue using the platform is to realize actual profits through sales. Then, we propose a personalized promotion to increase the number of successful providers with sales experiences on the platform. The main contributions of our research are twofold. First, we introduce a new perspective in provider management with the distribution of successful sales experiences. Second, we propose a personalized promotion optimization method to maximize the number of providers' sales experiences. By utilizing this approach, we ensure equal opportunities for providers to experience sales without being monopolized by a few providers. Through experiments using actual data on coupon distribution, we confirm that our method enables the implementation of coupon allocation strategies that significantly increase the total number of providers having sales experiences.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Authors:
LLM-jp,
:,
Akiko Aizawa,
Eiji Aramaki,
Bowen Chen,
Fei Cheng,
Hiroyuki Deguchi,
Rintaro Enomoto,
Kazuki Fujii,
Kensuke Fukumoto,
Takuya Fukushima,
Namgi Han,
Yuto Harada,
Chikara Hashimoto,
Tatsuya Hiraoka,
Shohei Hisada,
Sosuke Hosokawa,
Lu Jie,
Keisuke Kamata,
Teruhito Kanazawa,
Hiroki Kanezashi,
Hiroshi Kataoka,
Satoru Katsumata,
Daisuke Kawahara,
Seiya Kawano
, et al. (58 additional authors not shown)
Abstract:
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its…
▽ More
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
△ Less
Submitted 30 December, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
Authors:
Ziqi Yin,
Hao Wang,
Kaito Horio,
Daisuke Kawahara,
Satoshi Sekine
Abstract:
We investigate the impact of politeness levels in prompts on the performance of large language models (LLMs). Polite language in human communications often garners more compliance and effectiveness, while rudeness can cause aversion, impacting response quality. We consider that LLMs mirror human communication traits, suggesting they align with human cultural norms. We assess the impact of politene…
▽ More
We investigate the impact of politeness levels in prompts on the performance of large language models (LLMs). Polite language in human communications often garners more compliance and effectiveness, while rudeness can cause aversion, impacting response quality. We consider that LLMs mirror human communication traits, suggesting they align with human cultural norms. We assess the impact of politeness in prompts on LLMs across English, Chinese, and Japanese tasks. We observed that impolite prompts often result in poor performance, but overly polite language does not guarantee better outcomes. The best politeness level is different according to the language. This phenomenon suggests that LLMs not only reflect human behavior but are also influenced by language, particularly in different cultural contexts. Our findings highlight the need to factor in politeness for cross-cultural natural language processing and LLM usage.
△ Less
Submitted 14 October, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia
Authors:
Kenichiro Ando,
Satoshi Sekine,
Mamoru Komachi
Abstract:
Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist.…
▽ More
Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist. Here, we propose WikiSQE, the first large-scale dataset for sentence quality estimation in Wikipedia. Each sentence is extracted from the entire revision history of English Wikipedia, and the target quality labels were carefully investigated and selected. WikiSQE has about 3.4 M sentences with 153 quality labels. In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, by performing human annotation, we found that the model we developed performed better than the crowdsourced workers. WikiSQE is expected to be a valuable resource for other tasks in NLP.
△ Less
Submitted 29 December, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute
Authors:
Tiphaine Viard,
Thomas McLachlan,
Hamidreza Ghader,
Satoshi Sekine
Abstract:
Wikipedia is a huge opportunity for machine learning, being the largest semi-structured base of knowledge available. Because of this, many works examine its contents, and focus on structuring it in order to make it usable in learning tasks, for example by classifying it into an ontology. Beyond its textual contents, Wikipedia also displays a typical graph structure, where pages are linked together…
▽ More
Wikipedia is a huge opportunity for machine learning, being the largest semi-structured base of knowledge available. Because of this, many works examine its contents, and focus on structuring it in order to make it usable in learning tasks, for example by classifying it into an ontology. Beyond its textual contents, Wikipedia also displays a typical graph structure, where pages are linked together through citations. In this paper, we address the task of integrating graph (i.e. structure) information to classify Wikipedia into a fine-grained named entity ontology (NE), the Extended Named Entity hierarchy. To address this task, we first start by assessing the relevance of the graph structure for NE classification. We then explore two directions, one related to feature vectors using graph descriptors commonly used in large-scale network analysis, and one extending flat classification to a weighted model taking into account semantic similarity. We conduct at-scale practical experiments, on a manually labeled subset of 22,000 pages extracted from the Japanese Wikipedia. Our results show that integrating graph information succeeds at reducing sparsity of the input feature space, and yields classification results that are comparable or better than previous works.
△ Less
Submitted 22 January, 2020; v1 submitted 21 January, 2020;
originally announced January 2020.
-
Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set
Authors:
Hassan S. Shavarani,
Satoshi Sekine
Abstract:
Wikipedia is a great source of general world knowledge which can guide NLP models better understand their motivation to make predictions. Structuring Wikipedia is the initial step towards this goal which can facilitate fine-grain classification of articles. In this work, we introduce the Shinra 5-Language Categorization Dataset (SHINRA-5LDS), a large multi-lingual and multi-labeled set of annotate…
▽ More
Wikipedia is a great source of general world knowledge which can guide NLP models better understand their motivation to make predictions. Structuring Wikipedia is the initial step towards this goal which can facilitate fine-grain classification of articles. In this work, we introduce the Shinra 5-Language Categorization Dataset (SHINRA-5LDS), a large multi-lingual and multi-labeled set of annotated Wikipedia articles in Japanese, English, French, German, and Farsi using Extended Named Entity (ENE) tag set. We evaluate the dataset using the best models provided for ENE label set classification and show that the currently available classification models struggle with large datasets using fine-grained tag sets.
△ Less
Submitted 5 March, 2020; v1 submitted 13 September, 2019;
originally announced September 2019.
-
Select and Attend: Towards Controllable Content Selection in Text Generation
Authors:
Xiaoyu Shen,
Jun Suzuki,
Kentaro Inui,
Hui Su,
Dietrich Klakow,
Satoshi Sekine
Abstract:
Many text generation tasks naturally contain two steps: content selection and surface realization. Current neural encoder-decoder models conflate both steps into a black-box architecture. As a result, the content to be described in the text cannot be explicitly controlled. This paper tackles this problem by decoupling content selection from the decoder. The decoupled content selection is human int…
▽ More
Many text generation tasks naturally contain two steps: content selection and surface realization. Current neural encoder-decoder models conflate both steps into a black-box architecture. As a result, the content to be described in the text cannot be explicitly controlled. This paper tackles this problem by decoupling content selection from the decoder. The decoupled content selection is human interpretable, whose value can be manually manipulated to control the content of generated text. The model can be trained end-to-end without human annotations by maximizing a lower bound of the marginal likelihood. We further propose an effective way to trade-off between performance and controllability with a single adjustable hyperparameter. In both data-to-text and headline generation tasks, our model achieves promising results, paving the way for controllable content selection in text generation.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Can neural networks understand monotonicity reasoning?
Authors:
Hitomi Yanaka,
Koji Mineshima,
Daisuke Bekki,
Kentaro Inui,
Satoshi Sekine,
Lasha Abzianidze,
Johan Bos
Abstract:
Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way…
▽ More
Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning.
△ Less
Submitted 27 June, 2019; v1 submitted 14 June, 2019;
originally announced June 2019.
-
HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning
Authors:
Hitomi Yanaka,
Koji Mineshima,
Daisuke Bekki,
Kentaro Inui,
Satoshi Sekine,
Lasha Abzianidze,
Johan Bos
Abstract:
Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, so-called monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obs…
▽ More
Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, so-called monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obstacle is the size of datasets or the model architectures themselves. To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena. We add it to training data for the state-of-the-art neural models and evaluate them on test sets for monotonicity phenomena. The results showed that our data augmentation improved the overall accuracy. We also find that the improvement is better on monotonicity inferences with lexical replacements than on downward inferences with disjunction and modification. This suggests that some types of inferences can be improved by our data augmentation while others are immune to it.
△ Less
Submitted 27 April, 2019;
originally announced April 2019.
-
Multi-Task Learning with Contextualized Word Representations for Extented Named Entity Recognition
Authors:
Thai-Hoang Pham,
Khai Mai,
Nguyen Minh Trung,
Nguyen Tuan Duc,
Danushka Bolegala,
Ryohei Sasano,
Satoshi Sekine
Abstract:
Fine-Grained Named Entity Recognition (FG-NER) is critical for many NLP applications. While classical named entity recognition (NER) has attracted a substantial amount of research, FG-NER is still an open research domain. The current state-of-the-art (SOTA) model for FG-NER relies heavily on manual efforts for building a dictionary and designing hand-crafted features. The end-to-end framework whic…
▽ More
Fine-Grained Named Entity Recognition (FG-NER) is critical for many NLP applications. While classical named entity recognition (NER) has attracted a substantial amount of research, FG-NER is still an open research domain. The current state-of-the-art (SOTA) model for FG-NER relies heavily on manual efforts for building a dictionary and designing hand-crafted features. The end-to-end framework which achieved the SOTA result for NER did not get the competitive result compared to SOTA model for FG-NER. In this paper, we investigate how effective multi-task learning approaches are in an end-to-end framework for FG-NER in different aspects. Our experiments show that using multi-task learning approaches with contextualized word representation can help an end-to-end neural network model achieve SOTA results without using any additional manual effort for creating data and designing features.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
What Makes Reading Comprehension Questions Easier?
Authors:
Saku Sugawara,
Kentaro Inui,
Satoshi Sekine,
Akiko Aizawa
Abstract:
A challenge in creating a dataset for machine reading comprehension (MRC) is to collect questions that require a sophisticated understanding of language to answer beyond using superficial cues. In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). We propose to employ simple heuristic…
▽ More
A challenge in creating a dataset for machine reading comprehension (MRC) is to collect questions that require a sophisticated understanding of language to answer beyond using superficial cues. In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). We propose to employ simple heuristics to split each dataset into easy and hard subsets and examine the performance of two baseline models for each of the subsets. We then manually annotate questions sampled from each subset with both validity and requisite reasoning skills to investigate which skills explain the difference between easy and hard questions. From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions. These results suggest that one might overestimate recent advances in MRC.
△ Less
Submitted 28 August, 2018;
originally announced August 2018.