Skip to main content

Showing 1–4 of 4 results for author: Yahata, S

.
  1. arXiv:2503.01302  [pdf, ps, other

    cs.CL

    Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension

    Authors: Sakiko Yahata, Zhen Wan, Fei Cheng, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai

    Abstract: Extracting causal relationships from a medical case report is essential for comprehending the case, particularly its diagnostic process. Since the diagnostic process is regarded as a bottom-up inference, causal relationships in cases naturally form a multi-layered tree structure. The existing tasks, such as medical relation extraction, are insufficient for capturing the causal relationships of an… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Work in progress

  2. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (58 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 30 December, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2405.13233  [pdf, other

    cs.CL

    MELD-ST: An Emotion-aware Speech Translation Dataset

    Authors: Sirou Chen, Sakiko Yahata, Shuichiro Shimizu, Zhengdong Yang, Yihang Li, Chenhui Chu, Sadao Kurohashi

    Abstract: Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline e… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 9 pages. Accepted to ACL 2024 Findings. Dataset: https://huggingface.co/datasets/ku-nlp/MELD-ST

  4. arXiv:2403.03690  [pdf

    cs.CL cs.AI

    Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

    Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

    Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}