The 17th LLM-jp Meeting

March 26, 2025

The 17th LLM-jp meeting was held on March 25th, 2025, at the National Institute of Informatics and online.

＜Evaluation and Turning/Principal Elucidation WG＞

Are Checklists Really Useful for Automatic Evaluation of Generative Tasks? (Furuhashi) [PDF]
Introduction of Open Japanese LLM leaderboard and statistical analysis on evaluation results. (Namgi Han)[PDF]
Analyzing the Pretraining of Japanese Large Language Models. (Nishida) [PDF]
llm-jp-judge: Japanese LLM-as-a-Judge Evaluation Tool. (Kodama) [PDF]
Understanding the Role of Persona and Internal Mechanisms in Large Language Models. (Ozaki)[PDF]
How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders. (Inaba)[PDF]
A Massive Fine-tuned LLMs from Diverse Models, Tasks, Methods (Harada) [PDF]
Comparative analysis of the Geospatial Representations in Large Language Models across Models and Languages (Otake) [PDF]
Large-Scale Human Evaluation of LLMs for Japanese（Inoue） [PDF]
A Study on Fine-tuning Methods for Balancing Usefulness and Safety in Japanese Large Language Models. (Katsumata)[PDF]

＜Multi-modal WG＞

Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation. (Sugiura) [PDF]
lm-jp-eval-mm: An Evaluation Framework for Evaluating Japanese-centric Vision and Language Model. (Sugiura) [PDF]
LLM-jp-3 VILA: Construction of Japanese Multimodal Data and Powerful Japanese Multimodal Model (Sasagawa) [PDF]

＜Model Building WG＞

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization (Nakamura) [PDF]

＜Safety WG＞

Large-Scale Human Evaluation of LLM Safety (Takahashi) [PDF]
AnswerCarefully: AnswerCarefully: A Dataset for Promoting Safety of Japanese LLMs (Suzuki)[PDF]
Developing a Dataset of Misinformation from Social Media and an Accuracy Benchmark for Large Language Models (Nakazato)[PDF]
Development of Prompt Attack Data Collection Application for LLMs and Analysis of Collected Data Characteristics (Hayashi)[PDF]

＜Corpus Construction WG＞

A Comprehensive Analysis of Memorization in Large Language Models (Kiyomaru) [PDF]
Detection of Sensitive Personal Information in the Pre-training Corpus for Large Language Models (Minamoto) [PDF]
Integrated Framework for LLM Domain Adaptation Based on Synthetic Data (Ogawa) [PDF]

27 on-site and about 88 online