The 17th LLM-jp Meeting

The 17th LLM-jp meeting was held on March 25th, 2025, at the National Institute of Informatics and online.

Program

  • LLM-jp Status Report (Kurohashi) Oral

<Evaluation and Turning/Principal Elucidation WG>

  • Are Checklists Really Useful for Automatic Evaluation of Generative Tasks? (Furuhashi) [PDF]
  • Introduction of Open Japanese LLM leaderboard and statistical analysis on evaluation results. (Namgi Han)[PDF]
  • Analyzing the Pretraining of Japanese Large Language Models. (Nishida) [PDF]
  • llm-jp-judge: Japanese LLM-as-a-Judge Evaluation Tool. (Kodama) [PDF]
  • Understanding the Role of Persona and Internal Mechanisms in Large Language Models. (Ozaki)[PDF]
  • How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders. (Inaba)[PDF]
  • A Massive Fine-tuned LLMs from Diverse Models, Tasks, Methods (Harada) [PDF]
  • Comparative analysis of the Geospatial Representations in Large Language Models across Models and Languages (Otake) [PDF]
  • Large-Scale Human Evaluation of LLMs for Japanese(Inoue) [PDF]
  • A Study on Fine-tuning Methods for Balancing Usefulness and Safety in Japanese Large Language Models. (Katsumata)[PDF]

<Multi-modal WG>

  • Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation. (Sugiura) [PDF]
  • lm-jp-eval-mm: An Evaluation Framework for Evaluating Japanese-centric Vision and Language Model. (Sugiura) [PDF]
  • LLM-jp-3 VILA: Construction of Japanese Multimodal Data and Powerful Japanese Multimodal Model (Sasagawa) [PDF]

<Model Building WG>

  • Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization (Nakamura) [PDF]

<Safety WG>

  • Large-Scale Human Evaluation of LLM Safety (Takahashi) [PDF]
  • AnswerCarefully: AnswerCarefully: A Dataset for Promoting Safety of Japanese LLMs (Suzuki)[PDF]
  • Developing a Dataset of Misinformation from Social Media and an Accuracy Benchmark for Large Language Models (Nakazato)[PDF]
  • Development of Prompt Attack Data Collection Application for LLMs and Analysis of Collected Data Characteristics (Hayashi)[PDF]

<Corpus Construction WG>

  • A Comprehensive Analysis of Memorization in Large Language Models (Kiyomaru) [PDF]
  • Detection of Sensitive Personal Information in the Pre-training Corpus for Large Language Models (Minamoto) [PDF]
  • Integrated Framework for LLM Domain Adaptation Based on Synthetic Data (Ogawa) [PDF]

Participants

27 on-site and about 88 online

Updated: