Overview of Japanese LLMs

[ English | Français | 日本語 ]

📖 Please visit the more readable Web version
The content of this README is available in a more readable format at llm-jp.github.io/awesome-japanese-llm. We recommend viewing the Web version to avoid table display issues and layout problems.

Parameter sizes of Japanese and non-Japanese LLMs over time

Evolution of parameter sizes for Japanese LLMs and non-Japanese LLMs. The information on the Japanese models is derived from this article, while the information on the non-Japanese models can be referred from the Models table on LifeArchitect.ai. However, due to space constraints in the figure, some models have been omitted. Additionally, estimates are included in the parameter count for non-Japanese models. Please notify us of any corrections, additions, or updates.

Figure Update Notice

The above figure is based on data up to the end of 2024 and has not been updated since 2025. This is because recent LLM development has shifted from focusing on parameter count increases to competing through improvements in training methods and datasets. Please refer to the tables below for the latest model information.

A list of publicly available LLMs trained with a focus on Japanese, along with their evaluation benchmarks, maintained by volunteers from various sources like academic papers and other public resources.

Caution

We can't guarantee the accuracy or completeness of any information here.
Some information is based on conjecture and might not reflect your specific use case.
While many models are released under permissive licenses like MIT or Apache 2.0, some are subject to more restrictive terms including non-commercial use clauses (e.g CC BY-NC-SA 4.0) or other stipulations.

Please point out any errors on the issues page. Feel free to contribute directly with a pull request.

Table of Contents

Text Generation Models

For multimodal models, see below.

Models built from scratch

General purpose

	Release Year	Architecture	Max Context Length	Training Data	Developer	License / Terms of Use
Sarashina2-8x70B	2024	MoE (8x70b (465b))	8,192	Sparse Upcycling on Sarashina2 (70B)	SB Intuitions	Sarashina Model NonCommercial License
LLM-jp-3 172B	2024	Llama (172b, 172b-instruct2, 172b-instruct3)	4,096	Pre-training: llm-jp-corpus-v3 (2.1T tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, magpie-sft-v1.0, Daring-Anteater, FLAN, ichikara-instruction-format, AutoMultiTurnByCalm3-22B, ramdom-to-fixed-multiturn-Calm3, wizardlm8x22b-logical-math-coding-sft-ja, wizardlm8x22b-logical-math-coding-sft_additional-ja, Synthetic-JP-EN-Coding-Dataset-567k DPO (instruct3 only): aya-ja-evol-inst, ac-self-inst	Research and Development Center for Large Language Models	Pre-trained model: LLM-jp-3 172B Terms of Use Post-trained model: llm-jp-3-172b-instruct3 Terms of Use
LLM-jp-3 172B beta2	2024	Llama (172b-beta2, 172b-beta2-instruct2)	4,096	Pre-training: part of llm-jp-corpus-v3 (1.4T tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, magpie-sft-v1.0, Daring-Anteater, FLAN, ichikara-instruction-format, AutoMultiTurnByCalm3-22B, ramdom-to-fixed-multiturn-Calm3, wizardlm8x22b-logical-math-coding-sft-ja, wizardlm8x22b-logical-math-coding-sft_additional-ja, Synthetic-JP-EN-Coding-Dataset-567k	Research and Development Center for Large Language Models	LLM-jp-3 172B beta2 Terms of Use
LLM-jp-3 172B beta1	2024	Llama (172b-beta1, 172b-beta1-instruct)	4,096	Pre-training: part of llm-jp-corpus-v3 (0.7T tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN	Research and Development Center for Large Language Models	LLM-jp-3 172B beta1 Terms of Use
LLM-jp-3 172B alpha	2024	Llama (172b-alpha1, 172b-alpha1-instruct, 172b-alpha2, 172b-alpha2-instruct)	4,096	Pre-training: part of llm-jp-corpus-v3 (alpha1: 0.7T tokens, alpha2: 1.4T tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN	Research and Development Center for Large Language Models	Apache 2.0
Stockmark-2-100B-Instruct-beta	2025	Llama (100B-Instruct-beta, 100B-Instruct-beta-AWQ)	4,096	Pre-training: 1.5T tokens Instruction Tuning DPO	Stockmark	MIT
Stockmark-100b	2024	Llama (100b, 100b-instruct-v0.1)	4,096	Pre-training: RedPajama, Japanese Wikipedia, Japanese mC4, Japanese CommonCrawl, Japanese Patent, Stockmark Web Corpus (910B tokens) Instruction Tuning (LoRA): ichikara-instruction	Stockmark	MIT
PLaMo-100B-Pretrained	2024	Llama^[1] (100b)	4,096	Pre-training: Japanese CommonCrawl, RefinedWeb, undisclosed (2.0T tokens)	Preferred Elements (Preferred Networks)	PLaMo Non-Commercial License
LLM-jp-3.1	2025	Llama/MoE (8x13b-instruct4, 13b-instruct4, 1.8b-instruct4)	4,096	Pre-training: llm-jp-corpus-v3 (2.5T tokens) Continual pre-training: instruction-response pairs (90B tokens) SFT + DPO	Research and Development Center for Large Language Models	Apache 2.0
LLM-jp-3 MoE	2025	MoE (8x1.8b (9.3b), 8x1.8b (9.3b)-instruct2, 8x1.8b (9.3b)-instruct3, 8x13b (73b), 8x13b (73b)-instruct2, 8x13b (73b)-instruct3)	4,096	Drop-Upcycling on LLM-jp-3 (1.8b, 13b)	Research and Development Center for Large Language Models	Apache 2.0
Sarashina2	2024	Llama (7b, 13b, 70b)	7b, 13b: 4,096 70b: 8,192	Pre-training: Japanese Common Crawl, SlimPajama, StarCoder (2.1T tokens)	SB Intuitions	MIT
Sarashina1	2024	GPT-NeoX (7b, 13b, 65b)	2,048	Pre-training: Japanese Common Crawl (1T tokens)	SB Intuitions	MIT
Tanuki-8×8B	2024	MoE (47b) (v1.0, v1.0-AWQ, v1.0-GPTQ-4bit, v1.0-GPTQ-8bit, v1.0-GGUF)	4,096	Pre-training: various Web & synthetic datasets（1.7T tokens） SFT, DPO: various synthetic datasets ^[2]	Matsuo Lab LLM Development Project	Apache 2.0
CyberAgentLM3 (CALM3)	2024	Llama (22b-chat, 22b-chat-selfimprove-experimental)	16,384	undisclosed (2.0T tokens)	CyberAgent	Apache 2.0
LLM-jp-3 13B instruct3	2025	Llama (150m, 150m-instruct2, 150m-instruct3, 440m, 440m-instruct2, 440m-instruct3, 980m, 980m-instruct2, 980m-instruct3, 1.8b-instrcut2, 1.8b-instruct3, 3.7b-instruct2, 3.7b-instruct3, 7.2b-instruct2, 7.2b-instruct3, 13b-instruct2, 13b-instruct3)	4,096	Pre-training: llm-jp-corpus-v3 (2.1T tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, magpie-sft-v1.0, Daring-Anteater, FLAN, ichikara-instruction-format, AutoMultiTurnByCalm3-22B, ramdom-to-fixed-multiturn-Calm3, wizardlm8x22b-logical-math-coding-sft-ja, Synthetic-JP-EN-Coding-Dataset-567k DPO (instruct3 only): aya-ja-evol-inst, ac-self-inst	Research and Development Center for Large Language Models	Apache 2.0
LLM-jp-3 13B	2024	Llama (1.8b, 1.8b-instruct, 3.7b, 3.7b-instruct, 7.2b, 7.2b-instruct, 13b, 13b-instruct)	4,096	Pre-training: llm-jp-corpus-v3 (2.1T tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, FLAN, ichikara-instruction-format, AutoMultiTurnByCalm3-22B, ramdom-to-fixed-multiturn-Calm3, wizardlm8x22b-logical-math-coding-sft_additional-ja, Synthetic-JP-EN-Coding-Dataset-567k	Research and Development Center for Large Language Models	Apache 2.0
llm-jp-3-3.7b-instruct-EZO	2024	Llama (3.7b-instruct-EZO-Common, 3.7b-instruct-EZO-Humanities)	4,096	additionally trained on LLM-jp-3 (3.7B)	Axcxept	Apache 2.0
LLM-jp-13B v2.0	2024	Llama (13b-v2.0, 13b-instruct-full-dolly-ichikara_004_001_single-oasst-oasst2-v2.0, 13b-instruct-full-ac_001-dolly-ichikara_004_001_single-oasst-oasst2-v2.0, 13b-instruct-full-ac_001_16x-dolly-ichikara_004_001_single-oasst-oasst2-v2.0)	4,096	Pre-training: llm-jp-corpus-v2 (260B tokens) Instruction Tuning: ichikara-instruction, AnswerCarefully Dataset, Dolly Dataset, OASST1, OASST2	LLM-jp	Apache 2.0
Fugaku-LLM	2024	GPT (13B, 13B-instruct, 13B-instruct-gguf)	2,048	Pre-training: undisclosed dataset Instruction Tuning: OASST1, Dolly Dataset, GSM8K	Titech, Tohoku Univ., Fujitsu, RIKEN, Nagoya Univ., CyberAgent, Kotoba Technologies	Fugaku-LLM Terms of Use
LLM-jp-13B v1.1	2024	GPT (13b-instruct-lora-dolly_en-dolly_ja-ichikara_003_001-oasst_en-oasst_ja-v1.1, 13b-instruct-full-dolly_en-dolly_ja-ichikara_003_001-oasst_en-oasst_ja-v1.1, 13b-dpo-lora-hh_rlhf_ja-v1.1)	2,048	Instruction Tuning (LoRA or Full-parameter FT): Dolly Dataset, OASST1, ichikara-instruction DPO (LoRA): HH RLHF	LLM-jp	Apache 2.0
LLM-jp-13B	2023	GPT (1.3b-v1.0, 13b-v1.0, 13b-instruct-full-jaster-v1.0, 13b-instruct-full-jaster-dolly-oasst-v1.0, 13b-instruct-full-dolly-oasst-v1.0, 13b-instruct-lora-jaster-v1.0, 13b-instruct-lora-jaster-dolly-oasst-v1.0, 13b-instruct-lora-dolly-oasst-v1.0)	2,048	Pre-training: llm-jp-corpus (Wikipedia, Japanese mC4, The Pile, Stack) (300B tokens) Instruction Tuning (Full-parameter FT or LoRA): jaster, Dolly Dataset, OASST1	LLM-jp	Apache 2.0
PLaMo-13B	2023	Llama^[3] (13b, 13b-instruct, 13b-instruct-nc)	base: 4,096 instruct, instruct-nc: 8,192	Pre-training: C4, Project Gutenberg, RedPajama, Japanese Wikipedia, Japanese mC4 (1.5T tokens) Instruction Tuning: Dolly, HH RLHF, OASST1, wikinews (+Alpaca in NC model)	Preferred Networks	Apache 2.0 (CC BY-NC 4.0 as for NC model)
Stockmark-13b	2023	Llama (13b, 13b-instruct)	2,048	Pre-training: Japanese Wikipedia, Japanese CC-100, Japanese mC4, Japanese CommonCrawl, Japanese Patent, Stockmark Web Corpus (220B tokens) Instruction Tuning (LoRA): ichikara-instruction	Stockmark	base: MIT instruct: CC BY-NC-SA 4.0
Weblab-10B	2023	GPT-NeoX (10b, 10b-instruction-sft)	2,048	Japanese mC4, The Pile (600B tokens) Instruction Tuning: Alpaca, FLAN	University of Tokyo Matsuo Lab	CC BY‑NC 4.0
PLaMo 2.1 8B	2025	hybrid architecture like Samba (8b-cpt)	32,768	Training details undisclosed	Preferred Networks	PLaMo community license
PLaMo 2 8B	2025	hybrid architecture like Samba (8b)		mainly Japanese and English data (6T tokens)	Preferred Networks	PLaMo community license
Tanuki-8B	2024	Tanuki (8b) (v1.0, v1.0-AWQ, v1.0-GPTQ-4bit, v1.0-GPTQ-8bit, v1.0-GGUF)	4,096	Pre-training: various Web & synthetic datasets（1.3T tokens） SFT, DPO: various synthetic datasets ^[2:1]	Matsuo Lab LLM Development Project	Apache 2.0
Japanese StableLM Alpha	2023	GPT-NeoX (base-alpha-7b, instruct-alpha-7b, instruct-alpha-7b-v2)	2,048	Wikipedia, Japanese CC‑100, Japanese mC4, Japanese OSCAR, RedPajama, private datasets^[4] (750B tokens) Instruction Tuning: Dolly, HH‑RLHF, wikinews, Alpaca (discarded in v2)	Stability AI	base: Apache 2.0 instruct (v1): Research license instruct (v2): Apache 2.0
CyberAgentLM2 (CALM2)	2023	Llama (7b, 7b-chat, 7b-chat-dpo-experimental)	base: 4,096 chat: 32,768	publicly available Japanese and English datasets (details unknown) (1.3T tokens) DPO: Chatbot Arena Conversations JA (calm2) Dataset	CyberAgent	Apache 2.0 (CC BY 4.0 as for DPO model)
OpenCALM	2023	GPT-NeoX (small, medium, large, 1b(1.4b), 3b(2.7b), 7b(6.8b))	2,048	Japanese Wikipedia, Japanese mC4, Japanese CC‑100	CyberAgent	CC BY‑SA 4.0
Stormy	2023	GPT-NeoX (7b(6.8b))	2,048	OpenCALM fine-tuned on llm-japanese-dataset v0 non-translation tasks	University of Tokyo Izumi Lab	CC BY‑SA 4.0
rinna GPT (En-Ja Bilingual)	2023	GPT-NeoX (4b(3.8b), 4b(3.8b)-8k, 4b(3.8b)-instruction-sft, 4b(3.8b)-instruction-ppo)	8k model: 8,192 others: 2,048	Wikipedia, Japanese CC‑100, Japanese C4, RedPajama, The Pile (524B tokens) Instruction Tuning: HH‑RLHF, FLAN PPO: HH‑RLHF for reinforcement learning 8k: trained with long context	rinna	MIT
japanese-large-lm	2023	GPT-NeoX (1.7b, 3.6b, 1.7b-instruction-sft, 3.6b-instruction-sft)	2,048	Japanese Wikipedia, Japanese CC‑100, Japanese C4, Japanese OSCAR and private datasets (650GB) Instruction Tuning: OASST1	LINE	Apache 2.0
rinna GPT (Japanese only)	2023	GPT / GPT-NeoX (xsmall, small, medium, 1b, neox-small, neox-3.6b, neox-3.6b-instruction-sft, neox-3.6b-instruction-sft-v2, neox-3.6b-instruction-ppo)	≤ 2,048	Japanese Wikipedia, Japanese CC‑100 (1b and up models add Japanese mC4) Instruction Tuning: HH‑RLHF, FLAN, SHP PPO: HH‑RLHF for reinforcement learning	rinna	MIT
Sarashina2.2	2025	Llama (0.5b, 0.5b-instruct-v0.1, 1b, 1b-instruct-v0.1, 3b, 3b-instruct-v0.1)	8,192		SB Intuitions	MIT
RetrievaT5	2023	T5 (small (short), small (medium), small (long), base (short), base (medium), base (long), large (short), large (medium), large (long), xl(3b))		Japanese Wikipedia, Japanese mC4	Retrieva	CC BY‑SA 4.0
Spiral-RetNet-3b-base	2024	RetNet (3b)	2,048	Wikipedia, Japanese CC-100, CulturaX	Spiral.AI	MIT
kotomamba-2.8B	2024	Mamba (2.8B-v1.0)	2,048	Japanese Wikipedia, Swallow Corpus, SlimPajama	Kotoba Technologies	Apache 2.0
ABEJA GPT	2022	GPT / GPT-NeoX (large, neox-2.7b)		Japanese Wikipedia, Japanese CC‑100, Japanese OSCAR	ABEJA	MIT
PLaMo 2.1 2B	2025	Causal decoder-only transformer (2b-cpt)	32,768	Training details undisclosed	Preferred Networks	PLaMo community license
Rakuten AI 2.0 mini	2025	Mistral (mini(1.5b), mini(1.5b)-instruct)	131,072		Rakuten	Apache 2.0
WasedaGPT	2022	GPT (small, xl(1.5b))		Japanese Wikipedia, Japanese CC‑100	Waseda Kawahara Lab	CC BY‑SA 4.0
StockmarkGPT	2023	GPT-NeoX (1.4b)		Japanese Wikipedia (0.88B tokens), Japanese CC‑100 (10.5B tokens), private data (8.6B tokens)	Stockmark	MIT
YellowbackGPT	2021	GPT-NeoX (1.3b)		Japanese Wikipedia, Japanese CC‑100, Japanese OSCAR	Yellowback	Apache 2.0
PLaMo 2 1B	2025	hybrid architecture like Samba (1b)		mainly Japanese and English data (4T tokens)	Preferred Elements (Preferred Networks)	Apache 2.0
Sarashina2.1-1B	2024	Llama (1b)	8,192	Japanese and English data on the web (10T tokens)	SB Intuitions	Sarashina Model NonCommercial License
colorfulscoop GPT	2021	GPT (small)		Japanese Wikipedia	Colorful Scoop	CC BY‑SA 3.0
TitechGPT	2023	GPT (medium, medium-reversed) ^[5]		Japanese Wikipedia, Japanese CC‑100	Titech Okazaki Lab	CC BY‑SA 4.0
KyotoUniversityGPT	2022	GPT (small, medium, large)		Japanese Wikipedia (3.2GB), Japanese CC‑100 (85GB), Japanese OSCAR (54GB)	Kyoto University Language Media Processing Lab	CC BY‑SA 4.0
JapaneseBART	2023	BART (base, large)		Japanese Wikipedia (18M sentences)	Kyoto University Language Media Processing Lab	CC BY‑SA 4.0
Megagon Labs T5	2021	T5 (base)		Japanese mC4 (782 GB), Japanese wiki40b (2 GB)	Megagon Labs (Recruit Co.,Ltd.)	Apache 2.0

Domain Specific

	Domain	Architecture	Training Data	Developer	License
SIP-med-LLM/SIP-jmed-llm-2-8x13b-OP-instruct	Medical	MoE	Pre-trained on a medical corpus (44.2B tokens) added to LLM-jp-3 MoE (8x13b), followed by Instruction Tuning	Strategic Innovation Promotion Program (SIP) Phase 3 Project "Generative AI Utilization in the Construction of Integrated Healthcare Systems" Theme 1 "Development and Social Implementation of Open Medical LLM with Safety and Reliability" Research Group	Apache 2.0
Japanese Dialog Transformer	Dialog	Transformer	Twitter Japanese reply pairs	NTT	Evaluation Licence
Japanese News BART	Business	BART (base)	Japanese business news articles (21M articles)	Stockmark	MIT
AcademicBART	Science	BART (base)	CiNii Japanese Papers	Ehime University AI Lab	Apache 2.0

Models built off non-Japanese LLMs (w/ continual pre-training on Japanese)

General purpose

	Release Year	Base Model	Training Data	Developer	License / Terms of Use
Llama 3.3 Swallow 70B (70B-v0.4, 70B-Instruct-v0.4)	2025	Llama 3.3 (70b)	Pre-training: Wikipedia, DCLM-baseline-1.0, Swallow Corpus Version 2, Cosmopedia, Laboro ParaCorpus, FineMath-4+, Swallow Code Version 0.3 Instruction Tuning: Gemma-2-LMSYS-Chat-1M-Synth, Swallow-Magpie-Ultra-v0.1, Swallow-Gemma-Magpie-v0.1, Swallow-Code-v0.3-Instruct-style	Swallow Project	Llama 3.3 Community License & Gemma Terms of Use
Llama 3.1 Swallow 70B (70B-v0.1, 70B-Instruct-v0.1, 70B-Instruct-v0.3)	2024	Llama 3.1 (70b)	Pre-training: The Stack v2, Wikipedia, DCLM-baseline-1.0, Swallow Corpus Version 2, Cosmopedia, Laboro ParaCorpus Instruction Tuning: lmsys-chat-1m-synth-ja-wo-pii-and-template-instructions, lmsys-chat-1m-synth-en-wo-pii-and-template-instructions, filtered-magpie-ultra-ja, filtered-magpie-ultra-en, gemma-magpie	Swallow Project	Llama 3.1 Community License (Gemma Terms of Use is also applied to the Instruct model)
cyberagent/Llama-3.1-70B-Japanese-Instruct-2407	2024	Llama 3.1 (70b)	undisclosed	CyberAgent	Llama 3.1 Community License
Llama 3 Swallow 70B (70B-v0.1, 70B-Instruct-v0.1)	2024	Llama 3 (70b)	Pre-training: Algebraic Stack, Wikipedia, RefinedWeb, Swallow Corpus, Cosmopedia, Laboro ParaCorpus, OpenWebMath Instruction Tuning: OASST1 ^[6]	Swallow Project	Llama 3 Community License
turing-motors/Llama-3-heron-brain-70B-v0.3	2024	Llama 3 (70b)	additionally trained on Llama 3 Swallow 70B (details undisclosed)	Turing	Llama 3 Community License
Llama 3 Youko 70B (70b, 70b-instruct, 70b-gptq, 70b-instruct-gptq)	2024	Llama 3 (70b)	Pre-training: Wikipedia, Japanese C4, Japanese CC-100, Japanese OSCAR, The Pile, undisclosed dataset (5B tokens) Instruction Tuning: undisclosed datasetト^[7]	rinna	Llama 3 Community License
Swallow 70B (70b-hf, 70b-instruct-hf, 70b-instruct-v0.1, 70b-NVE-hf, 70b-NVE-instruct-hf)	2023	Llama 2 (70b)	Pre-training: Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile Instruction Tuning: Dolly Dataset, HH RLHF, OASST1 *v0.1: OASST1, OASST2	Swallow Project	Llama 2 Community License
KARAKURI LM (70b-v0.1, 70b-chat-v0.1)	2024	Llama 2 (70b)	Pre-training: mC4, CC100, OSCAR, RedPajama, undisclosed dataset (16B tokens) SteerLM: OASST2, undisclosed dataset	KARAKURI	Llama 2 Community License^[8]
Japanese Stable LM Beta 70B (base-beta-70b, instruct-beta-70b)	2023	Llama 2 (70b)	Pre-training: Wikipedia, Japanese mC4, Japanese CC-100, Japanese OSCAR, SlimPajama(excluding Books3) (100B tokens) Instruction Tuning: Dolly Dataset, HH RLHF, OASST1	Stability AI	Llama 2 Community License
Swallow-MX 8x7B (8x7b-NVE-v0.1)	2024	Mixtral-8x7B-Instruct-v0.1 (46.7b)	Pre-training: Algebraic Stack, Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile, The Vault	Swallow Project	Apache 2.0
KARAKURI LM 8x7B Instruct v0.1 (8x7b-instruct-v0.1)	2024	Mixtral-8x7B-Instruct-v0.1 (46.7b)	trained Swallow-MX 8x7B on the following datasets: Dolly Dataset, OASST2, HelpSteer, glaive-code-assistant-v3, glaive-function-calling-v2, synthetic_text_to_sql, MetaMathQA, orca-math-word-problems-200k, rag-dataset-12000, rag-hallucination-dataset-1000, undisclosed dataset	KARAKURI	Apache 2.0 (?)^[9]
KARAKURI LM 8x7B Chat v0.1 (8x7b-chat-v0.1)	2024	Mixtral-8x7B-Instruct-v0.1 (46.7b)	trained Swallow-MX 8x7B on OASST2, HelpSteer, and undisclosed datasets using SteerLM	KARAKURI	Apache 2.0
ABEJA-Mixtral-8x7B-japanese (8x7B-v0.1-japanese, 8x7B-Instruct-v0.1-japanese, 8x7B-Instruct-v0.1-japanese-alpha, 8x7B-Instruct-v0.1-japanese-alpha-merged)	2024	Mixtral-8x7B-Instruct-v0.1 (46.7b) *The model without "Instruct" in its name is based on Mixtral-8x7B-v0.1	Pre-training: Japanese CC, Redpajama, undisclosed dataset （450B tokens）	ABEJA	Apache 2.0
ELYZA-Thinking-1.0-Qwen-32B (32B)	2025	Qwen 2.5 (32b)	Pre-training + SFT (Reasoning)	ELYZA	Apache 2.0
ELYZA-Shortcut-1.0-Qwen-32B (32B)	2025	Qwen 2.5 (32b)	Pre-training + SFT	ELYZA	Apache 2.0
ABEJA-Qwen2.5-32b-Japanese-v1.0 (v1.0)	2025	Qwen2.5-32B-Instruct (32b)	Continual pre-training + SFT + DPO: ~20,000 synthetic and human-annotated datasets (specialized for extraction and reasoning)	ABEJA	Apache 2.0
Qwen2.5 Bakeneko 32B (qwen2.5-bakeneko-32b, qwen2.5-bakeneko-32b-instruct, deepseek-r1-distill-qwen2.5-bakeneko-32b, qwq-bakeneko-32b, qwen2.5-bakeneko-32b-instruct-v2)	2025	Qwen 2.5 (32b)		rinna	Apache 2.0
ABEJA-QwQ32b-Reasoning-Japanese-v1.0 (v1.0)	2025	Qwen 2.5 (32b)	ABEJA-Qwen2.5-32b-Japanese-v0.1 + Chat Vector (from QwQ 32b) + continual pre-training	ABEJA	Apache 2.0
ABEJA-Qwen2.5-32b-Japanese-v0.1 (32b-Japanese-v0.1)	2025	Qwen 2.5 (32b)	Pre-training: Common Crawl, Cosmopedia, undisclosed dataset （100B tokens） + Chat Vector	ABEJA	Apache 2.0
Gemma-2-Llama Swallow 27B (27b-pt-v0.1, 27b-it-v0.1)	2025	Gemma 2 (27b)	Pre-training: Wikipedia, DCLM-baseline-1.0, Swallow Corpus Version 2, Cosmopedia, Laboro ParaCorpus, FineMath-4+, Swallow Code Version 0.3 Instruction Tuning: Gemma-2-LMSYS-Chat-1M-Synth, Swallow-Magpie-Ultra-v0.1, Swallow-Gemma-Magpie-v0.1	Swallow Project	Llama 3.3 Community License & Gemma Terms of Use
Nekomata 14B (14b, 14b-instruction, 14b-gguf, 14b-instruction-gguf)	2023	Qwen (14b)	Pre-training: Wikipedia, Japanese C4, Japanese CC-100, Japanese OSCAR, The Pile, undisclosed dataset (66B tokens) Instruction Tuning: Dolly Dataset, FLAN, subsets of llm-japanese-dataset	rinna	Tongyi Qianwen LICENSE
Swallow 13B (13b-hf, 13b-instruct-hf, 13b-instruct-v0.1, 13b-NVE-hf)	2023	Llama 2 (13b)	Pre-training: Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile Instruction Tuning: Dolly Dataset, HH RLHF, OASST1 *v0.1: OASST1, OASST2	Swallow Project	Llama 2 Community License
LEIA-Swallow-13B (13b)	2024	Llama 2 (13b)	additionally trained Swallow 13B using LEIA	Individual (Ikuya Yamada, Ryokan Ri)	Llama 2 Community License
ELYZA-japanese-Llama-2-13b (13b, 13b-instruct, 13b-fast, 13b-fast-instruct)	2023	Llama 2 (13b)	Pre-training: Japanese Wikipedia, Japanese OSCAR, and other crawled data (18B tokens) Instruction Tuning: undisclosed dataset	ELYZA	Llama 2 Community License
cyberagent/Mistral-Nemo-Japanese-Instruct-2408	2024	Mistral NeMo (12b)	undisclosed	CyberAgent	Apache 2.0
Gemma-2-Llama Swallow 9B (9b-pt-v0.1, 9b-it-v0.1)	2025	Gemma 2 (9b)	Pre-training: Wikipedia, DCLM-baseline-1.0, Swallow Corpus Version 2, Cosmopedia, Laboro ParaCorpus, FineMath-4+, Swallow Code Version 0.3 Instruction Tuning: Gemma-2-LMSYS-Chat-1M-Synth, Swallow-Magpie-Ultra-v0.1, Swallow-Gemma-Magpie-v0.1	Swallow Project	Llama 3.3 Community License & Gemma Terms of Use
Llama 3.1 Swallow 8B (8B-v0.1, 8B-Instruct-v0.1, 8B-v0.2, 8B-Instruct-v0.2, 8B-Instruct-v0.3, 8B-Instruct-v0.5)	2025	Llama 3.1 (8b)	Pre-training: The Stack v2, Wikipedia, DCLM-baseline-1.0, Swallow Corpus Version 2, Cosmopedia, Laboro ParaCorpus Instruction Tuning: lmsys-chat-1m-synth-ja-wo-pii-and-template-instructions, lmsys-chat-1m-synth-en-wo-pii-and-template-instructions, filtered-magpie-ultra-ja, filtered-magpie-ultra-en, gemma-magpie, Gemma-3-LMSYS-Chat-1M-Synth	Swallow Project	Llama 3.1 Community License (Gemma Terms of Use is also applied to the Instruct model)
Llama 3 Swallow 8B (8B-v0.1, 8B-Instruct-v0.1)	2023	Llama 3 (8b)	Pre-training: Algebraic Stack, Wikipedia, RefinedWeb, Swallow Corpus, Cosmopedia, Laboro ParaCorpus, OpenWebMath Instruction Tuning: OASST1 ^[6:1]	Swallow Project	Llama 3 Community License
turing-motors/Llama-3-heron-brain-8B-v0.3	2024	Llama 3 (8b)	additionally trained on Llama 3 Swallow 8B (details undisclosed)	Turing	Llama 3 Community License
Llama 3 Youko 8B (8b, 8b-instruct, 8b-gptq, 8b-instruct-gptq)	2024	Llama 3 (8b)	Pre-training: Wikipedia, Japanese C4, Japanese CC-100, Japanese OSCAR, The Pile, undisclosed dataset (22B tokens) Instruction Tuning^[7:1]: Aya Dataset (Japanese subset), FLAN, Dolly Dataset, HH RLHF, OASST1, OASST2, MetaMathQA, CodeAlpaca Dataset, undisclosed dataset DPO: HelpSteer, HelpSteer2, undisclosed dataset	rinna	Llama 3 Community License
Llama 3 ELYZA JP 8B (8B, 8B-GGUF, 8B-AWQ)	2024	Llama 3 (8b)	undisclosed	ELYZA	Llama 3 Community License
Llama 3 neoAI 8B Chat v0.1 (8B-Chat-v0.1)	2024	Llama 3 (8b)	undisclosed	neoAI	Llama 3 Community License
Llama 3 tedllm (v0)	2024	Llama 3 (8b)	Pre-training: Japanese generic corpus	Tokyo Electron Device	Llama 3 Community License
ELYZA-Shortcut-1.0-Qwen-7B (7B)	2025	Qwen 2.5 (7b)	Pre-training + SFT	ELYZA	Apache 2.0
Swallow 7B (7b-hf, 7b-instruct-hf, 7b-instruct-v0.1, 7b-NVE-hf, 7b-NVE-instruct-hf, 7b-plus-hf)	2023	Llama 2 (7b)	Pre-training: Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile Instruction Tuning: Dolly Dataset, HH RLHF, OASST1 *v0.1: OASST1, OASST2	Swallow Project	Llama 2 Community License
LEIA-Swallow-7B (7b)	2024	Llama 2 (7b)	additionally trained Swallow 7B using LEIA	Individual (Ikuya Yamada, Ryokan Ri)	Llama 2 Community License
ELYZA-japanese-Llama-2-7b (7b, 7b-instruct, 7b-fast, 7b-fast-instruct)	2023	Llama 2 (7b)	Pre-training: Japanese Wikipedia, Japanese OSCAR, and other crawled data (18B tokens) Instruction Tuning: undisclosed dataset	ELYZA	Llama 2 Community License
Youri 7B (7b, 7b-instruction, 7b-chat, 7b-gptq, 7b-instruction-gptq, 7b-chat-gptq)	2023	Llama 2 (7b)	Pre-training: Wikipedia, Japanese C4, Japanese CC-100, Japanese OSCAR, The Pile, undisclosed dataset (40B tokens) Instruction Tuning: Dolly Dataset, FLAN, subsets of llm-japanese-dataset	rinna	Llama 2 Community License
houou-7b (instruction-7b-v1, instruction-7b-v2, instruction-7b-v3)	2023	Llama 2 (7b)	Instruction-tuned Youri 7B (base) on ichikara-instruction	MoneyForward	Llama 2 Community License
Japanese Stable LM Beta 7B (base-beta-7b, base-ja_vocab-beta-7b, instruct-beta-7b, instruct-ja_vocab-beta-7b)	2023	Llama 2 (7b)	Pre-training: Wikipedia, Japanese mC4, Japanese CC-100, Japanese OSCAR, SlimPajama(excluding Books3) (100B tokens) Instruction Tuning: Dolly Dataset, HH RLHF, OASST1	Stability AI	Llama 2 Community License
SambaLingo-Japanese (Base, Chat)	2024	Llama 2 (7b)	Pre-training: CulturaX Instruction Tuning: ultrachat_200k DPO: ultrafeedback, cai-conversation-harmless	SambaNova Systems	Llama 2 Community License (?)^[9:1]
blue-lizard (blue-lizard)	2024	Llama 2 (7b)	undisclosed	Deepreneur	Llama 2 Community License
Swallow-MS 7B (7b-v0.1, 7b-instruct-v0.1)	2024	Mistral-7B-v0.1 (7b)	Pre-training: Algebraic Stack, Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile Instruction Tuning: Dolly Dataset, OASST1	Swallow Project	Apache 2.0
Rakuten AI 2.0 (8x7B, 8x7B-instruct)	2025	Mistral-7B-v0.1 (7b)		Rakuten	Apache 2.0
RakutenAI-7B (7B, 7B-instruct, 7B-chat)	2024	Mistral-7B-v0.1 (7b)	Pre-training: undisclosed Instruction Tuning: Dolly Dataset, OASST1, datasets converted from the train split of NLU datasets (like jaster), undisclosed dataset	Rakuten	Apache 2.0
Japanese Stable LM Gamma 7B (base-gamma-7b, instruct-gamma-7b)	2023	Mistral-7B-v0.1 (7b)	Pre-training: Wikipedia, Japanese mC4, Japanese CC-100, Japanese OSCAR, SlimPajama(excluding Books3) (100B tokens) Instruction Tuning: Dolly Dataset, HH RLHF, wikinews subset of llm-japanese-dataset	Stability AI	Apache 2.0
ChatNTQ JA 7B (7b-v1.0)	2024	Mistral-7B-v0.1 (7b)	Instruction-tuned Japanese Stable LM Gamma 7B (base) on their own datasets	NTQ Solution	Apache 2.0
Shisa Gamma 7B (7b-v1)	2023	Mistral-7B-v0.1 (7b)	Instruction-tuned Japanese Stable LM Gamma 7B (base) on ultra-orca-boros-en-ja	AUGMXNT	Apache 2.0 (?)^[9:2]
Shisa 7B (base-7b-v1, 7b-v1)	2023	Mistral-7B-v0.1 (7b)	Pre-training: shisa-pretrain-en-ja-v1 (8B tokens) Instruction Tuning & DPO: ultra-orca-boros-en-ja, shisa-en-ja-dpo-v1	AUGMXNT	Apache 2.0 (?)^[9:3]
Karasu (7B, 7B-chat, 7B-chat-plus, 7B-chat-plus-unleashed)	2024	Mistral-7B-v0.1 (7b)	Additionally trained Shisa 7B (base) on Aozora Bunko, Japanese Law Precedent Dataset, Japanese Wikipedia, Japanese domain webscrapes from the Japanese subset of CulturaX, UltraChat 200k (7B tokens) Instruction Tuning: ultra-orca-boros-en-ja-v1, OASST1, ShareGPT, undisclosed dataset	Lightblue	Apache 2.0 (?)^[9:4]
Nekomata 7B (7b, 7b-instruction, 7b-gguf, 7b-instruction-gguf)	2023	Qwen (7b)	Pre-training: Wikipedia, Japanese C4, Japanese CC-100, Japanese OSCAR, The Pile, undisclosed dataset (66B tokens) Instruction Tuning: Dolly Dataset, FLAN, subsets of llm-japanese-dataset	rinna	Tongyi Qianwen LICENSE
lightblue/japanese-mpt-7b	2023	MPT (7b)	Japanese mC4	Lightblue	Apache 2.0
Japanese Stable LM 3B-4E1T (3b-4e1t-base, 3b-4e1t-instruct)	2024	StableLM-3B-4E1T (3b)	Pre-training: Wikipedia, Japanese mC4, Japanese CC-100, Japanese OSCAR, SlimPajama(excluding Books3) (100B tokens) Instruction Tuning: Dolly Dataset, HH RLHF, wikinews subset of llm-japanese-dataset	Stability AI	Apache 2.0
kotomamba-2.8B-CL	2024	mamba-2.8b-slimpj (2.8b)	Japanese Wikipedia, Swallow Corpus, SlimPajama	Kotoba Technologies	Apache 2.0
Gemma-2-Llama Swallow 2B (2b-pt-v0.1, 2b-it-v0.1)	2025	Gemma 2 (2b)	Pre-training: Wikipedia, DCLM-baseline-1.0, Swallow Corpus Version 2, Cosmopedia, Laboro ParaCorpus, FineMath-4+, Swallow Code Version 0.3 Instruction Tuning: Gemma-2-LMSYS-Chat-1M-Synth, Swallow-Magpie-Ultra-v0.1, Swallow-Gemma-Magpie-v0.1	Swallow Project	Llama 3.3 Community License & Gemma Terms of Use
Gemma 2 Baku 2B (2b, 2b-it)	2024	Gemma 2 (2b)	Pre-training: Wikipedia, Japanese C4, Japanese CC-100, Japanese OSCAR, The Pile, undisclosed dataset (80B tokens) OPRO: undisclosed dataset ^[10]	rinna	Gemma Terms of Use
Japanese Stable LM 2 1.6B (base, instruct)	2024	Stable LM 2 1.6B (1.6b)	Pre-training: Wikipedia, CulturaX Instruction Tuning: jaster, ichikara-instruction, alpaca-gpt4-japanese, ultra-orca-boros-en-ja-v1	Stability AI	STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE
TinySwallow-1.5B (1.5B, 1.5B-Instruct, 1.5B-Instruct-q4f32_1-MLC, 1.5B-Insturct-GGUF)	2025	Qwen2.5 (1.5b)	Pre-training: trained using the TAID method (with Qwen2.5 (32b) as the teacher model) Instruction Tuning: Gemma-2-LMSYS-Chat-1M-Synth, swallow-magpie-ultra-v0.1, swallow-gemma-magpie-v0.1	Sakana AI, Swallow Project	Apache 2.0
karasu-1.1B	2023	TinyLlama (1.1b)	Pre-training: Japanese OSCAR, Japanese mC4 (3B tokens)	Lightblue	Apache 2.0

Domain specific

	Domain	Base Model	Developer	License
pfnet/Preferred-MedLLM-Qwen-72B	Medicine	Qwen2.5 (72b)	Preferred Networks	Qwen LICENSE
Llama3-Preferred-MedSwallow-70B (70B)	Medicine	Llama 3 (70b)	Preferred Networks	Llama 3 Community License
AIgroup-CVM-utokyohospital/MedSwallow-70b	Medicine	Llama 2 (70b)	University of Tokyo Hospital Department of Cardiovascular Medicine AI Group	CC BY-NC-SA 4.0
nekomata-14b-pfn-qfin (qfin, qfin-inst-merge)	Finance	Qwen (14b)	Preferred Networks	Tongyi Qianwen LICENSE
Watashiha-Llama-2-13B-Ogiri-sft (sft, sft-neuron)	Oogiri	Llama 2 (13b)	Watashiha	Llama 2 Community License
Llama 3.1 Future Code Ja 8B	Coding	Llama 3.1 (8b)	Future Corp.	Llama 3.1 Community License
Karamaru (Karamaru-v1)	Edo-period Japanese	Llama 3 (8b)	Sakana AI	Llama 3 Community License
JPharmatron (7B-base, 7B)	Pharmaceutical	Qwen2.5 (7b)	EQUES Inc.	CC BY-SA 4.0
ELYZA-japanese-CodeLlama-7b (7b, 7b-instruct)	Coding	Code Llama (7b)	ELYZA	Llama 2 Community License
AIBunCho/japanese-novel-gpt-j-6b	Storytelling	GPT-J (6b)	Individual (Hiroyuki Osone)	CreativeML OpenRAIL-M License
NovelAI/genji-jp	Storytelling	GPT-J (6b)	NovelAI	？

Models built off non-Japanese LLMs (w/ post-training on Japanese)

General purpose

	Base Model	Training Data	Developer	License / Terms of Use
Llama 3.1 Shisa V2 405B (405b)	Llama 3.1 (405b)	High-quality Japanese datasets with SFT/DPO	Shisa.AI	Llama 3.1 Community License
AXCXEPT/EZO-Qwen2.5-72B-Instruct AXCXEPT/EZO-AutoCoTRAG-Qwen2.5-72B-Instruct_q4	Qwen2.5 (72b)		Axcxept	Qwen License
ao-Karasu (72B)	Qwen1.5 (72b)	ultra-orca-boros-en-ja-v1, OASST1, ShareGPT, Japanese technical blogs, News stories, QA site answers, undisclosed dataset	Lightblue	Tongyi Qianwen LICENSE (?)^[9:5]
shisa-ai/shisa-v2-llama3.3-70b	Llama 3.3 (70b)		Shisa.AI	Llama 3.3 Community License
AXCXEPT/Llama-3.1-70B-EZO-1.1-it	Llama 3.1 (70b)		Axcxept	Llama 3.1 Community License
Llama 3 shisa-v1-llama3-70b (70b)	Llama 3 (70b)	ultra-orca-boros-en-ja-v1	Shisa.AI	Llama 3 Community License (?)^[9:6]
AIgroup-CVM-utokyohospital/Llama-2-70b-chat-4bit-japanese	Llama 2 (70b)		University of Tokyo Hospital Department of Cardiovascular Medicine AI Group	Llama 2 Community License
doshisha-mil/llama-2-70b-chat-4bit-japanese-v1	Llama 2 (70b)		Doshisha University Media Informatics Lab	？
cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese	DeepSeek-R1-Distill-Qwen (32b)		CyberAgent	MIT
karakuri-ai/karakuri-lm-32b-thinking-2501-exp	QwQ (32b)		KARAKURI	Apache 2.0
shisa-ai/shisa-v2-qwen2.5-32b	Qwen2.5 (32b)		Shisa.AI	Apache 2.0
AXCXEPT/EZO-Qwen2.5-32B-Instruct AXCXEPT/EZO-AutoCoTRAG-Qwen2.5-32B-Instruct	Qwen2.5 (32b)		Axcxept	Apache 2.0
cyberagent/DeepSeek-R1-Distill-Qwen-14B-Japanese	DeepSeek-R1-Distill-Qwen (14b)		CyberAgent	MIT
shisa-ai/shisa-v2-unphi4-14b	Phi-4 (14b)		Shisa.AI	MIT
EZO-Phi-4 (phi-4-open-R1-Distill-EZOv1, phi-4-deepseek-R1K-RL-EZO)	Phi-4 (14b)		Axcxept	MIT
Qarasu (14B-chat-plus-unleashed)	Qwen (14b)	ultra-orca-boros-en-ja-v1, OASST1, ShareGPT, undisclosed dataset	Lightblue	Tongyi Qianwen LICENSE (?)^[9:7]
Sparticle/llama-2-13b-chat-japanese-lora	Llama 2 (13b)		Sparticle	？
izumi-lab/llama-13b-japanese-lora-v0-1ep	Llama (13b)		University of Tokyo Izumi Lab	？
shisa-ai/shisa-v2-mistral-nemo-12b	Mistral NeMo (12b)		Shisa.AI	Apache 2.0
AXCXEPT/EZO-Common-9B-gemma-2-it	Gemma 2 (9b)		Axcxept	Gemma Terms of Use
AXCXEPT/EZO-Humanities-9B-gemma-2-it	Gemma 2 (9b)		Axcxept	Gemma Terms of Use
AXCXEPT/Qwen3-EZO-8B-beta	Qwen3 (8b)	High-performance reasoning with Deep-Think technique	Axcxept	Apache 2.0
shisa-ai/shisa-v2-llama3.1-8b	Llama 3.1 (8b)		Shisa.AI	Llama 3.1 Community License
AXCXEPT/Llama-3.1-8B-EZO-1.1-it	Llama 3.1 (8b)		Axcxept	Llama 3.1 Community License
Llama 3 Suzume 8B (8B-japanese, 8B-japanese-gguf)	Llama 3 (8b)	megagonlabs/instruction_ja, ShareGPT, undisclosed dataset	Lightblue	Llama 3 Community License (?)^[9:8]
Llama 3 shisa-v1-llama3-8b (8b)	Llama 3 (8b)	ultra-orca-boros-en-ja-v1	Shisa.AI	Llama 3 Community License (?)^[9:9]
AXCXEPT/Llama-3-EZO-8b-Common-it	Llama 3 (8b)		Axcxept	Llama 3 Community License
lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese	DeepSeek-R1-Distill-Qwen (7b)		Lightblue	Apache 2.0
ABEJA-Qwen2.5-7b-Japanese-v0.1 (v0.1)	Qwen 2.5 (7b)		ABEJA	Apache 2.0
shisa-ai/shisa-v2-qwen2.5-7b	Qwen 2.5 (7b)		Shisa.AI	Apache 2.0
Karasu DPO (7B)	Qwen 2.5 (7b)		Lightblue	Apache 2.0
ganchengguang/Yoko-7B-Japanese-v1	Llama 2 (7b)		Yokohama National University Mori Lab	？
Sparticle/llama-2-7b-chat-japanese-lora	Llama 2 (7b)		Sparticle	？
izumi-lab/llama-7b-japanese-lora-v0-5ep	Llama (7b)		University of Tokyo Izumi Lab	？
lightblue/jod	Mistral-7B-SlimOrca (7b)		Lightblue	Apache 2.0
NTQAI/chatntq-7b-jpntuned	RWKV-4 World (7b)		NTQ Solution	？
Borea (Jp, Common, Coding)	Phi-3.5 (3.8b)		Axcxept	MIT
AXCXEPT/EZO-Llama-3.2-3B-Instruct-dpoE	Llama 3.2 (3b)		Axcxept	Llama 3.2 Community License
Gemma-2-JPN (2b-jpn-it)	Gemma 2 (2b)		Google	Gemma Terms of Use
AXCXEPT/EZO-gemma-2-2b-jpn-it	Gemma 2 (2b)		Axcxept	Gemma Terms of Use
AXCXEPT/EZO-Common-T2-2B-gemma-2-it	Gemma 2 (2b)		Axcxept	Gemma Terms of Use

Domain specific

	Domain	Base Model	Developer	License
JMedLoRA (llama2-jmedlora-6.89ep)	Medicine	Llama 2 (70b)	University of Tokyo Hospital Department of Cardiovascular Medicine AI Group	CC BY-NC 4.0

Merged models

	Original Models (Japanese LLMs in bold)	Developer	License
EQUES/MedLLama3-JP-v2	Llama 3 Swallow 8B (Instruct), OpenBioLLM-8B, MMed-Llama 3 8B, Llama 3 ELYZA JP 8B	EQUES	Llama 3 Community License
EvoLLM-JP-A (v1-7B)	Shisa Gamma 7B (v1), Arithmo2 Mistral 7B, Abel 7B 002	Sakana AI	Apache 2.0
EvoLLM-JP (v1-7B, v1-10B)	Shisa Gamma 7B (v1), WizardMath-7B-V1.1, Abel 7B 002	Sakana AI	MICROSOFT RESEARCH LICENSE

API-based models

	Max Context Length	Developer	Platform
PLaMo API	32,768	Preferred Networks	self-owned
Solar mini chat ja (solar-mini-ja-250123, solar-mini-ja-240612)	32,768	Upstage	self-owned
AI Novelist	2,400 ~ 8,192	Bit192	self-owned
LHTM-OPT		alt Inc.	AWS Marketplace
tsuzumi (tsuzumi-7b)		NTT	Azure AI Foundry

Encoder models

General purpose

	Architecture	Max Input Length	Training Data	Developer	License	HuggingFace? ^[11]
ModernBERT-Ja	ModernBERT	8,192	Japanese and English corpora	SB Intuitions	MIT	◯ (30m, 70m, 130m, 310m)
llm-jp-modernbert	ModernBERT	8,192	Japanese subset of llm-jp-corpus-v4 (0.69T tokens)	Research and Development Center for Large Language Models	Apache 2.0	◯
KyotoUniBERT	BERT (base, large)	512	Japanese Wikipedia (18M articles)	Kyoto University Language Media Processing Lab	Apache 2.0	△
TohokuUniversityBERT	BERT (base, large)	512	base (v1): Japanese Wikipedia (17M articles / 2.6GB) base (v2) & large: Japanese Wikipedia 4.0GB base (v3) & large (v2): Japanese Wikipedia (4.9GB), Japanese CC‑100 (74.3GB)	Tohoku University NLP Group	base (v1, v2) & large: CC BY‑SA 3.0 base (v3) & large (v2): Apache 2.0	◯ (base (v1), base (v1, char-level), base (v2), base (v2, char-level), large, large (char-level), base (v3), base (v3, char-level), large (v2), large (v2, char-level))
TohokuNLP BERT-alpha 500M	Llama-based encoder^[12]	4,096 or 8,192	Japanese subset of llm-jp-corpus-v3	Tohoku University NLP Group	Apache 2.0	◯ (sq4096-alpha, sq8192-alpha)
NICT BERT	BERT (base)	512	Japanese Wikipedia	NICT	CC BY 4.0	△
Laboro BERT	BERT (base, large)	512	Japanese Web Corpus (News and blogs, etc) (12GB)	Laboro.AI	CC BY‑NC 4.0	✕
colorfulscoop BERT	BERT (base)	512	Japanese Wikipedia	Colorful Scoop	CC BY‑SA 3.0	◯
UniversityOfTokyoBERT	BERT (small)	512	Japanese Wikipedia (2.9GB)	University of Tokyo Izumi Lab	CC BY‑SA 4.0	◯
chiTra (Sudachi Transformers)	BERT (base)	512	NINJAL Web Japanese Corpus (148GB)	NINJAL, WAP Tokushima Laboratory of AI and NLP	Apache 2.0	△
ACCMS BERT	BERT (base)	512	Japanese Wikipedia (3.3GB)	Kyoto University ACCMS	CC BY‑SA 4.0	◯
HitachiBERT	BERT (base)	512	Japanese Wikipedia, Japanese CC‑100	Hitachi	CC BY‑NC‑SA 4.0	◯^[13]
RetrievaBERT	BERT ^[14]	2,048	Japanese CommonCrawl, RefinedWeb, Chinese Wikipedia, Korean Wikipedia, The Stack	Retrieva	Apache 2.0	◯
Bandai Namco DistilBERT	DistilBERT	512	(Distillation of TohokuUniversityBERT(base))	Bandai Namco Research	MIT	◯
Laboro DistilBERT	DistilBERT	512	(Distillation of Laboro BERT(base))	Laboro.AI	CC BY‑NC 4.0	◯
LINE DistilBERT	DistilBERT	512	(Distillation of LINE internal BERT model)	LINE	Apache 2.0	◯
rinna RoBERTa	RoBERTa (base)	512	Japanese Wikipedia, Japanese CC‑100	rinna	MIT	◯
WasedaRoBERTa	RoBERTa (base, large)	512	Japanese Wikipedia, Japanese CC‑100	Waseda Kawahara Lab	CC BY‑SA 4.0	◯ (base, large, large (seq512))^[15]
InformatixRoBERTa	RoBERTa (base)	512	Japanese Wikipedia, Web Articles (25GB)	Informatix	Apache 2.0	△
KyotoUniversityRoBERTa	RoBERTa (base, large)	512	Japanese Wikipedia, Japanese CC‑100	Kyoto University Language Media Processing Lab	CC BY‑SA 4.0	◯ (base (char-level), large (char-level))
YokohamaNationalRoBERTa	RoBERTa (base)	512	Japanese Wikipedia (3.45GB)	Yokohama National University Mori Lab	Apache 2.0	◯
Megagon Labs RoBERTa	RoBERTa (base)^[16]	1,282	Japanese mC4 (200M sentences)	Megagon Labs (Recruit Co.,Ltd.)	MIT	◯
ACCMS RoBERTa	RoBERTa (base)	512	Japanese Wikipedia (3.3GB) + Japanese CC‑100 (70GB)	Kyoto University ACCMS	CC BY‑SA 4.0	◯
CinnamonELECTRA	ELECTRA (small)	512	Japanese Wikipedia	Cinnamon	Apache 2.0	◯
Megagon Labs ELECTRA	ELECTRA (base)	512	Japanese mC4 (200M sentences)	Megagon Labs (Recruit Co.,Ltd.)	MIT	◯
UniversityOfTokyoELECTRA	ELECTRA (small, base)	512	Japanese Wikipedia (2.9GB)	University of Tokyo Izumi Lab	CC BY‑SA 4.0	◯ (small, base)
JapaneseRoFormer	RoFormer (base)	512	Japanese Wikipedia (3.45GB)	Yokohama National University Mori Lab	Apache 2.0	◯
JapaneseLUKE	LUKE (base, large)	512	Japanese Wikipedia	Studio Ousia	Apache 2.0	◯ (base, large)
KyotoUniversityDeBERTaV2	DeBERTaV2 (tiny, base, large)	512	Japanese Wikipedia, Japanese CC‑100, Japanese OSCAR (171GB)	Kyoto University Language Media Processing Lab	CC BY‑SA 4.0	◯ (tiny, tiny (char-level), base, large)
KyotoUniversityDeBERTaV3	DeBERTaV3 (base)	512	llm-jp-corpus	Kyoto University Language Media Processing Lab	Apache 2.0	◯
UniversityOfTokyoDeBERTaV2	DeBERTaV2 (small, base)	512	Japanese Wikipedia, Japanese Wikinews, Japanese CC-100, Japanese mC4, Japanese OSCAR	University of Tokyo Izumi Lab	CC BY-SA 4.0	◯ (small, base)
GLOBIS DeBERTaV3	DeBERTaV3 (xsmall, base, large)	512	Wikipedia, WikiBooks, Aozora Bunko, Japanese CC-100, Japanese mC4, Japanese OSCAR	GLOBIS	CC BY-SA 4.0	◯ (xsmall, base, large)
JapaneseBigBird	BigBird (base)	4,096	Japanese Wikipedia, Japanese CC‑100, Japanese OSCAR	Waseda Kawahara Lab	CC BY‑SA 4.0	◯
JapaneseLayoutLM	LayoutLM (base)	512	Pre-trained on Japanese Wikipedia, initialized with TohokuUniversityBERT	The Japan Research Institute, Limited	CC BY-SA 3.0	◯

Domain Specific

	Domain	Architecture	Training Data	Developer	License	HuggingFace?
JapaneseBlogELECTRA	Colloquial language	ELECTRA (small)	Japanese Blog Corpus (354M sentences)	Kitami Institute of Technology Masui-Ptaszynski Lab	CC BY‑SA 4.0	◯
JapaneseSpokenLanguageBERT	Spoken language	BERT (base)	Additional training for TohokuUniversityBERT using Corpus of Spontaneous Japanese (CSJ) (In the DAPT model, the diet record is also used)	Retrieva	Apache 2.0	◯
AcademicRoBERTa	Science	RoBERTa (base)	CiNii Japanese Papers (6.3M sentences)	Ehime University AI Lab	Apache 2.0	◯
local-politics-BERT	Politics	BERT (base)	Wikipedia, Minutes of the National Diet, Minutes of the Local Assembly	Japanese Local Assembly Minutes Corpus Project	CC BY-SA 4.0	◯ (SC-min, SC-minwiki, SC-2M-wiki, SC-2M-min, SC-2M-minwiki, FP-min, FP-minwiki) ^[17]
UBKE-LUKE	Economics	LUKE (base)	Japanese Wikipedia, Securities Reports, Economic News Articles	Uzabase	CC BY-NC	◯
JapaneseFinancialBERT	Finance	BERT (small, base)^[18]	Japanese Wikipedia, Japanese Financial Corpus (27M sentences/5.2GB)	University of Tokyo Izumi Lab	CC BY‑SA 4.0	◯ (small, base)
JapaneseFinancialELECTRA	Finance	ELECTRA (small)	Japanese Wikipedia (20M sentences/2.9GB), Japanese Financial Corpus (27M sentences/5.2GB)	University of Tokyo Izumi Lab	CC BY‑SA 4.0	◯
JapaneseNewsBERT	Business	BERT (base)	Japanese Business Articles (3M articles)	Stockmark	CC BY 4.0	△
JapaneseNewsXLNet	Business	XLNet (base)	Japanese Business Articles (3M articles)	Stockmark	？	◯ ※ Unofficial release
JapaneseNewsALBERT	Business	ALBERT (base)	Japanese Business Articles (3M articles)	Stockmark	？	△
MinpakuBERT	Cultural Heritage	BERT (base)	Additional training with National Museum of Ethnology's cultural heritage data on top of Tohoku University BERT	University of Hyogo Ohshima Lab	MIT	◯ (minpaku-v1, minpaku-v3, minpaku-v3-no-additional-token)
JPharmaBERT	Pharmacy	BERT (base, large)	Japanese Pharmaceutical Documents (2B tokens) + PubMed English Abstracts (8B tokens) + Multilingual Pharmaceutical Data (1.2B tokens)	EQUES	Unknown	◯ (base, large)
UTH-BERT	Medicine	BERT (base)	Japanese Medical Records(120M lines)	University of Tokyo Hospital Medical AI Development Course	CC BY‑NC‑SA 4.0	△
medBERTjp	Medicine	BERT (base)	Japanese Wikipedia, Japanese Medical Corpus ("今日の診療プレミアム/Today's Care Premium" Web Version)	Osaka University Hospital Medical Informatics Lab	CC BY‑NC‑SA 4.0	△
JMedRoBERTa	Medicine	RoBERTa (base)	Japanese Medical Papers (11M sentences/1.8GB)	NII Aizawa Lab	CC BY‑NC‑SA 4.0	◯ (ManbyoWordPiece, SentencePiece)^[19]

Sentence and Document Embeddings ^[20]

Bi-Encoders

Single-representation bi-encoders

	Max Context Length	Developer	License
Ruri-v3 (v3-30m, v3-70m, v3-130m, v3-310m)	8,192	Nagoya University Sasano Group	Apache 2.0
PLaMo-Embedding-1B (1b)	4,096	Preferred Networks	Apache 2.0
sbintuitions/sarashina-embedding-v1-1b	8,192	SB Intuitions	Sarashina Model NonCommercial License
AMBER (base, large)	512	Retrieva	Apache 2.0
RoSEtta (base-ja)	1,024	PKSHA Technology	Apache 2.0
GLuCoSE v2 (base-ja-v2)	512	PKSHA Technology	Apache 2.0
Ruri (small, base, large, small-v2, base-v2, large-v2)	512	Nagoya University Sasano Group	Apache 2.0
Japanese SimCSE (unsup-simcse-ja-base, unsup-simcse-ja-large, sup-simcse-ja-base, sup-simcse-ja-large)	512	Nagoya University Sasano Group	CC BY-SA 4.0
GLuCoSE (base-ja)	512	PKSHA Technology	Apache 2.0
colorfulscoop/sbert-base-ja		Colorful Scoop	CC BY‑SA 4.0
MU-Kindai/SBERT-JSNLI-base MU-Kindai/SBERT-JSNLI-large		Kindai University	？
MU-Kindai/Japanese-SimCSE-BERT-base-unsup MU-Kindai/Japanese-SimCSE-BERT-large-unsup MU-Kindai/Japanese-SimCSE-RoBERTa-base-unsup MU-Kindai/Japanese-SimCSE-BERT-base-sup MU-Kindai/Japanese-SimCSE-BERT-large-sup		Kindai University	MIT
pkshatech/simcse-ja-bert-base-clcmlp		PKSHA Technology	CC BY‑SA 4.0
MU-Kindai/Japanese-MixCSE-BERT-base MU-Kindai/Japanese-MixCSE-BERT-large		Kindai University	MIT
MU-Kindai/Japanese-DiffCSE-BERT-base		Kindai University	MIT
bclavie/fio-base-japanese-v0.1		Individual (Benjamin Clavié)
cl-nagoya/shioriha-large-pt		Nagoya University Sasano Group

Multi-representation bi-encoders

	Developer	License
JaColBERTv2.5 (JaColBERTv2.4, JaColBERTv2.5)	Answer.AI	MIT
JaColBERTv2 (JaColBERTv2)	Individual (Benjamin Clavié)	MIT
JaColBERT (JaColBERT)	Individual (Benjamin Clavié)	MIT

Cross-Encoders

	Developer	License
Ruri-v3 Reranker (310m)	Nagoya University Sasano Group	Apache 2.0
Ruri-Reranker (stage1-small, stage1-base, stage1-large, small, base, large)	Nagoya University Sasano Group	Apache 2.0
hotchpotch/japanese-reranker-cross-encoder-xsmall-v1 hotchpotch/japanese-reranker-cross-encoder-small-v1 hotchpotch/japanese-reranker-cross-encoder-base-v1 hotchpotch/japanese-reranker-cross-encoder-large-v1 hotchpotch/japanese-bge-reranker-v2-m3-v1	Individual (Yuichi Tateno)	MIT

Vision-Language Models

Text+Image to Text

Models built from scratch

General purpose

	Year	Architecture	Training Data	Developer	License / Terms of Use
Stockmark-2-VL-100B-beta (100B-beta)	2025	LLaVA-OneVision	3-stage training: alignment pre-training, caption expansion, instruction and reasoning fine-tuning Synthetic data: Generated from Qwen2.5-VL-72B	Stockmark	Qwen License
KARAKURI VL (32b-instruct-2507, 32b-thinking-2507-exp)	2025	Vision-Language (based on Qwen2.5-VL-32B)	Custom dataset specialized for Japanese computer use: Japanese computer operation records, Japanese document image QA, visual information interpretation, Japanese OCR, flowchart comprehension 3-stage training: Supervised Fine-Tuning (SFT) + model merging + reinforcement learning *thinking model shows reasoning process explicitly using Chain of Thought (CoT) approach	KARAKURI	Apache 2.0
Heron-NVILA (1B, 2B, 15B, 33B)	2025	NVILA	3-stage training: Alignment (558k Japanese image-text pairs + 595k LLaVA-Pretrain), Pre-training (MOMIJI 13M, Japanese image-text pairs 6M, Japanese interleaved data 2M, coyo-700m 6M, mmc4-core 4M, Wikipedia-ja, LLaVA-Pretrain-JA, STAIR captions), Supervised fine-tuning (LLaVA-instruct-v1.5-en, LLaVA-instruct-ja, Japanese photos conversation, JA-VG-VQA conversation, SynthDog-ja, AI2D, SynthDog-en, Sherlock)	Turing	Apache 2.0 & OpenAI Terms of Use
Sarashina2-Vision (8b, 14b)	2025	Sarashina2 + Qwen2-VL + 2-layer MLP	3-stage training: Projector warmup (LLaVA-Pretrain 78M English tokens), Vision encoder pre-training (CC3M, CC12M, llm-jp-japanese-image-text-pairs, internal OCR dataset, internal chart caption synthetic dataset 3.8B Japanese + 7.7B English tokens), Visual instruction tuning (Japanese Visual Genome VQA, OCR-VQA, TextVQA, PlotQA, CLEVR translated, DOCCI translated, internal datasets 2.5B Japanese + 1.0B English tokens)	SB Intuitions	MIT
Asagi (2B, 4B, 8B, 14B)	2025	LLaVA	Newly crawled Japanese website images, existing Japanese datasets, Japanese translations of English datasets ~20M samples (data synthesis using English VLM Phi-3.5-vision-instruct and Japanese LLM CALM3-22B-Chat)	University of Tokyo Machine Intelligence Lab.	Apache 2.0
llava-calm2-siglip (llava-calm2-siglip)	2024	LLaVA	coversational data generated from MS-COCO and VisualGenome	CyberAgent	Apache 2.0
LLM-jp-3 VILA 14B (14b)	2024	LLaVA	Japanese image text pairs, LLaVA-Pretrain, Japanese interleaved data, coyo (subset), mmc4-core (subset), llava-instruct-ja, japanese-photos-conv, ja-vg-vqa, synthdog-ja, LLaVA-1.5 instruction data (subset)	Research and Development Center for Large Language Models	Apache 2.0 & OpenAI Terms of Use
Heron (blip-ja-stablelm-base-7b-v0, blip-ja-stablelm-base-7b-v1, blip-ja-stablelm-base-7b-v1-llava-620k, git-ja-stablelm-base-7b-v0, git-ELYZA-fast-7b-v0, git-ja-stablelm-base-7b-v1)	2023	BLIP-2 / GIT	v1: LLaVA-Instruct-150K-JA or LLaVA-Instruct-620K-JA v0: LLaVA-Instruct-150K-JA, Japanese STAIR Captions, Japanese Visual Genome VQA dataset	Turing	CC BY-NC 4.0
Japanese Stable VLM (japanese-stable-vlm)	2023	LLaVA	Japanese CC12M, STAIR Captions, Japanese Visual Genome VQA dataset	Stability AI	STABILITY AI JAPANESE STABLE VLM COMMUNITY LICENSE
Japanese InstructBLIP Alpha (japanese-instructblip-alpha)	2023	InstructBLIP	Japanese CC12M, STAIR Captions, Japanese Visual Genome VQA dataset	Stability AI	JAPANESE STABLELM RESEARCH LICENSE
rinna MiniGPT-4 (bilingual-gpt-neox-4b-minigpt4)	2023	MiniGPT-4	CC12M, COCO 2014, Visual Genome, STAIR Captions, Japanese Visual Genome VQA dataset	rinna	MIT

Domain Specific

	Architecture	Domain	Developer	License
watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm	LLaVA	Oogiri	Watashiha	Llama 2 Community License

Models built off non-Japanese VLMs

	Base Model	Training Data	Developer	License
AXCXEPT/EZO-InternVL2-26B	InternVL2	-	Axcxept	MIT

Merged models

	Original Models (Japanese LLMs in bold)	Developer	License
Llama-3-EvoVLM-JP-v2 (v2)	Mantis-8B-SigLIP-Llama-3, Llama-3-ELYZA-JP-8B, Bunny-v1.1-Llama-3-8B-V	Sakana AI	Llama 3 Community License
AXCXEPT/Llama-3-EZO-VLM-1	- (trained from Llama-3-EvoVLM-JP-v2)	Axcxept	Llama 3 Community License
EvoVLM-JP (v1-7B)	Shisa Gamma 7B (v1), LLaVA-1.6-Mistral-7B	Sakana AI	Apache 2.0

Text to Image

General Purpose

	Architecture	Training Data	Developer	License
CommonArt β (commonart-beta)	PixArt-Σ	CommonCatalog-cc-by, Megalith-10M, Smithonian Open Access, ArtBench (CC-0 only)	AI Picasso	Apache 2.0
EvoSDXL-JP (v1)	Stable Diffusion	- (merged from several diffusion models, including Japanese Stable Diffusion XL)	Sakana AI	Apache 2.0^[21]
Japanese Stable Diffusion XL (japanese-stable-diffusion-xl)	Stable Diffusion	undisclosed	Stability AI	STABILITY AI JAPANESE STABLE DIFFUSION XL COMMUNITY LICENSE
TohokuUniversity Stable Diffusion (base, refiner)	Stable Diffusion	WMT2023 Shared Task English-Japanese parallel corpus, about 13 million captions from laion2B-multi	Tohoku University NLP Group	CreativeML OpenRAIL-M License
rinna Stable Diffusion (japanese-stable-diffusion)	Stable Diffusion	LAION-5B Japanese Subset (100M images)	rinna	CreativeML OpenRAIL-M License

Domain Specific

	Architecture	Domain	Developer	License
Evo-Nishikie (v1)	Stable Diffusion (ControlNet)	Ukiyo-e	Sakana AI	Apache 2.0^[21:1]
Evo-Ukiyoe (v1)	Stable Diffusion	Ukiyo-e	Sakana AI	Apache 2.0^[21:2]

Text to Video

	Architecture	Training Data	Developer	License
AIdeaLab VideoJP (AIdeaLab-VideoJP)	CogVideoX	Pixabay, FineVideo	AIdeaLab	Apache 2.0

Others

	Architecture	Training Data	Developer	License
llm-jp-clip (llm-jp-clip-vit-base-patch16, llm-jp-clip-vit-large-patch14)	CLIP	Translation of about 1.5 billion captions from the English subset of ReLAION-5B	Research and Development Center for Large Language Models	Apache 2.0
LY CLIP (clip-japanese-base)	CLIP	CommonCrawl, CC12M, YFCC100M	LY Corp.	Apache 2.0
Recruit CLIP (japanese-clip-vit-b-32-roberta-base)	CLIP	about 120 million captions from laion2B-multi	Recruit Co.,Ltd.	CC BY-4.0
Japanese Stable CLIP (japanese-stable-clip-vit-l-16)	SigLIP	CC12M translated to Japanese, STAIR Captions	Stability AI	STABILITY AI JAPANESE STABLE CLIP COMMUNITY LICENSE
rinna CLIP (japanese-clip-vit-b-16)	CLIP	CC12M translated to Japanese	rinna	Apache 2.0
rinna CLOOB (japanese-cloob-vit-b-16)	CLOOB	CC12M translated to Japanese	rinna	Apache 2.0
HAKUHODO Technologies CLIP (base, deeper, wider)	CLIP	about 120 million captions from laion2B-multi	HAKUHODO Technologies	CC BY-NC-SA 4.0

Speech-Language Models

Automatic Speech Recognition

	Architecture	Training Data	Developer	License
Kotoba-Whisper (v1.0, v1.0-ggml, v1.0-faster, v1.1, bilingual-v1.0, bilingual-v1.0-ggml, bilingual-v1.0-faster, v2.0, v2.0-ggml, v2.0-faster, v2.1, v2.2)	Distil-Whisper	ReazonSpeech	Kotoba Technologies	Apache 2.0
Nue ASR (nue-asr)	Nue ASR (HuBERT + LLM)	ReazonSpeech	rinna	Apache 2.0
ReazonSpeech (espnet-v1, espnet-next, espnet-v2, nemo-v2)	ESPnet (Conformer-Transducer) / NeMo (FastConformer-RNNT)	ReazonSpeech	Reazon Holdings	Apache 2.0

Others

	Architecture	Training Data	Developer	License
J-Moshi (j-moshi, j-moshi-ext)	Transformer-based text and speech foundation model (Moshi)	Speech dialogue corpus (J-CHAT, Japanese Callhome, CSJ, travel agency dialogue corpus, proprietary chat dialogue corpus, proprietary consultation dialogue corpus), text dialogue corpus (Japanese PersonaChat, Japanese EmpatheticDialogues, Japanese daily dialogue corpus, RealPersonaChat)	Nagoya University Higashinaka Lab	CC BY-NC 4.0
Kotoba-Speech (v0.1)	Transformer	undisclosed	Kotoba Technologies	Apache 2.0
Kushinada (base, large)	HuBERT	60k hours of audio extracted from large-scale Japanese TV broadcast audio data	Intelligent Media Processing Research Team, AIST	Apache 2.0
UniversityOfTokyoHuBERT (base-jtube)	HuBERT	JTubeSpeech	University of Tokyo Saruwatari & Takamichi Lab	MIT
rinna HuBERT (base, large)	HuBERT	ReazonSpeech	rinna	Apache 2.0
Izanami (base, large)	wav2vec 2.0	60k hours of audio extracted from large-scale Japanese TV broadcast audio data	Intelligent Media Processing Research Team, AIST	Apache 2.0
Reazon wav2vec 2.0 (base, large)	wav2vec 2.0	ReazonSpeech	Reazon Holdings	Apache 2.0
rinna wav2vec 2.0 (base)	wav2vec 2.0	ReazonSpeech	rinna	Apache 2.0

Music-Language Models

Music-Text Conversion

	Architecture	Training Data	Developer	License
Japanese MULAN (japanese-mulan-base)	MULAN (AST + GLuCoSE)	~20k internal music-text pairs	LY Corporation	Apache 2.0

Evaluation Benchmarks for Japanese LLMs

Hybrid Benchmarks

	Description	Developer
Nejumi LLM Leaderboard3	Evaluates the Japanese language capabilities of LLMs from three perspectives: language understanding ability, application ability, and alignment (including controllability and safety). For more details, see this article.	Weights & Biases
Swallow LLM Leaderboard v2	Conducts a comprehensive evaluation of various LLMs based on three types of tasks: Japanese language understanding and generation tasks, Japanese multi-turn dialogue tasks, and English language understanding and generation tasks. v2 supports reasoning-focused models by adopting zero-shot inference and chain-of-thought prompting, evaluating on more challenging benchmarks (12 total tasks: 6 Japanese, 6 English). Also publishes swallow-evaluation, an evaluation script that integrates and improves existing LLM evaluation tools, plus the newly released swallow-evaluation-instruct for reasoning-type models.	Swallow Project

Traditional Benchmarks based on Natural Language Understanding tasks

	Description	Developer
Open Japanese LLM Leaderboard	Evaluates Japanese language models in 16 different tasks using llm-jp-eval.	LLM-jp, Hugging Face
llm-jp-eval	A tool that evaluates Japanese LLMs automatically across multiple datasets. The complete list of supported datasets can be found here (which also includes tasks such as JNLI and JCommonsenseQA from JGLUE).	LLM-jp
JP Language Model Evaluation Harness	A fork by Stability AI of EleutherAI/lm-evaluation-harness. It is a tool for automatically evaluating Japanese LLMs across multiple datasets. The complete list of supported datasets can be found here (which also includes tasks such as JNLI and JCommonsenseQA from JGLUE). There is a detailed summary of the evaluation results by rinna: [rinna] Benchmark of Stability-AI/lm-evaluation-harness	Stability AI
JGLUE	Japanese version of the GLUE benchmark suite, including the MARC-ja, JCoLA, JSTS, JNLI, JSQuAD, and JCommonsenseQA tasks. JCoLA is by the University of Tokyo's Oseki Lab. See here and here (ja only) for further details about each task.	Waseda University Kawahara Lab and Yahoo
JMMLU	A benchmark constructed as a Japanese version of the MMLU Benchmark, consisting of multiple-choice questions from a wide range of academic fields including natural sciences, humanities, and social sciences. In addition to translating the original MMLU, it features newly added problems based on the unique cultural background of Japan (Japan-specific problems).	Waseda University Kawahara Lab

Benchmarks on open-ended generative tasks

	Description	Developer
Japanese MT-bench	The Japanese version of MT-bench asks about multi-turn conversational ability. It includes 80 questions, 10 each, from 8 categories: Writing, Roleplay, Reasoning, Math, Coding, Extraction, STEM, Humanities. Some questions have been modified to fit with Japanese culture during the production of the Japanese version. It also includes a script that performs a 10-level absolute evaluation by GPT-4.	Stability AI
ELYZA-tasks-100	Ranking based on model responses to 100 complex and diverse tasks, including tasks testing summarization, correction, abstraction, induction, and other skills. Uses humans to score the model responses and then ranks models based on their mean scores.	ELYZA
Preferred Generation Benchmark (pfgen-bench)	A benchmark to measure the Japanese language generation ability of LLMs based on 50 common sense questions unique to the Japanese context. It evaluates along three axes: Fluency, Truthfulness, and Helpfulness. The evaluation is conducted without using LLM-as-a-Judge by calculating n-gram or rule-based metrics.	Preferred Elements (Preferred Networks)
Rakuda Benchmark	Ranking based on model answers to 40 open-ended questions on Japanese geography, history, politics, and society. Uses GPT-4 to judge model outputs pairwise, and then ranks models by fitting a Maximum Likelihood Elo/Bradley-Terry model to GPT-4's preferences.	YuzuAI
Japanese Vicuna QA Benchmark	This is the Japanese version of vicuna-blog-eval, which is the predecessor of MT-Bench. It includes 80 questions on general knowledge, role-playing, common sense, Fermi estimation, counterfactual thinking, coding, mathematics, and writing. It also includes a script for automatic evaluation by GPT-4 (win-rate calculation). The leaderboard can be found here.	Kyoto University Language Media Processing Lab
Tengu-Bench	Includes 120 free-form questions from various categories. Categories of questions: table interpretation, logic puzzles, idea generation, function calling, long document summarization (over a thousand tokens), conversation summarization, long document closed QA (over a thousand tokens), honorifics, project creation, math, translation, extraction, ethical control, cost estimation, Japan, chit-chat, puns, formatting, construction, business, legal judgment, politics, hypothetical questions.	Lightblue
Shaberi	A framework that can collectively evaluate the Japanese MT-bench, Rakuda Benchmark, ELYZA-tasks-100, and Tengu-Bench. There is also a fork by Shisa.AI.	Lightblue

Benchmarks for measuring performance in specific domains

	Description	Developer
Japanese Language Model Financial Evaluation Harness	A benchmark for Japanese LLM in the financial sector. It includes tasks such as sentiment analysis in finance (chabsa), basic knowledge tasks in securities analysis (cma_basics), tasks related to audits in certified public accountant examinations (cpa_audit), multiple choice question tasks in financial planner exams (fp2), and mock exam tasks for securities salespeople exams (security_sales_1). For more details, please see here.	Preferred Networks
pfmt-bench-fin-ja	A benchmark for measuring the generation capabilities of Japanese LLMs in the financial domain.	Preferred Networks
Stockmark Business Questions	The collection includes 50 questions that probe knowledge on topics such as market trends, current affairs, social issues, and business trends.	Stockmark
JMED-LLM	A dataset for evaluating LLMs in the Japanese medical domain. It compiles previously developed Japanese medical language processing tasks for LLM benchmarking.	NAIST Social Computing Lab.
JMedBench	A benchmark for LLMs in the Japanese medical field. It includes 20 datasets in 5 types of tasks: multi-choice question-answering, machine translation, named entity recognition, document classification, and semantic textual similarity (some datasets are borrowed from JMMLU and JMED-LLM). A tool called med-eval is developed to facilitate evaluation on JMedBench.	NII Aizawa Lab
Japanese Medical Language Model Evaluation Harness	A benchmark for evaluating Japanese LLMs in the medical domain in both Japanese and English, executable by a single command.	Individual (Issey Sukeda)
YakugakuQA	A Japanese pharmaceutical domain evaluation dataset based on national pharmacist licensing exams. Tests factual pharmaceutical knowledge.	EQUES Inc.
NayoseQA	A Japanese pharmaceutical domain evaluation dataset for cross-lingual terminology normalization. Tests understanding of synonyms and technical terms.	EQUES Inc.
SogoCheck	A novel task designed to assess consistency reasoning between paired statements. A challenging reasoning task where even GPT-4o performs poorly.	EQUES Inc.
karakuri-bench	A dataset for measuring performance of Japanese LLMs in customer support.	KARAKURI

Benchmarks for measuring factuality and safety

	Description	Developer
JTruthfulQA	The Japanese version of the dataset for evaluating the factuality of LLMs TruthfulQA. It includes questions about superstitions and other beliefs held by some people that are not factual, as well as questions about Japan-specific knowledge, all collected from scratch.	Waseda University Kawahara Lab
JCommonsenseMorality	A dataset on Japanese commonsense morality. Sentences describing actions are labeled with binary values indicating whether they are morally wrong or acceptable.	Hokkaido University Language Media Lab
JBBQ	The Japanese version of the social bias QA dataset BBQ, developed through translation, revision, and addition of questions based on Japanese culture and customs.	University of Tokyo Yanaka Lab

Benchmarks for measuring logical reasoning capabilities

	Description	Developer
JFLD (Japanese Formal Logic Deduction)	A dataset for evaluating deductive reasoning capabilities of Japanese LLMs (the Japanese version of the FLD (Formal Logic Deduction) proposed by the same authors). It is characterized by being composed of counterfactual samples to evaluate apart from the knowledge the LLM possesses.	Hitachi
JHumanEval	A Japanese version of the HumanEval benchmark, which assesses the ability to generate Python code from English instructions. In creating the Japanese version, the text was first machine-translated and then manually corrected.	Japan Women's University Kuramitsu Lab
JMultiPL-E	A dataset for evaluating code generation capabilities across 17 programming languages (C++, C#, Go, Java, JavaScript, PHP, Ruby, Rust, Scala, Swift, TypeScript, etc.) based on OpenAI HumanEval. Measures multilingual code understanding and generation performance.	Tohoku University Natural Language Processing Group

Benchmarks on controlled text generation

	Description	Developer
LCTG Bench	A benchmark for the controllability of Japanese LLMs. It evaluates whether LLMs can adhere to constraints in four aspects: output format, character count, keywords, and forbidden words. The quality of the generated text is also evaluated.	CyberAgent

Benchmarks for embedding models

	Description	Developer
JMTEB	A benchmark developed as the Japanese version of MTEB. It consists of tasks such as document clustering, text classification, sentence similarity, sentence pair labeling prediction, and text extraction (a reranking task was recently added).	SB Intuitions
JQaRA	A dataset for evaluating Japanese document extraction and reranking accuracy. Each of the 1,667 questions is assigned 100 candidate documents, of which at least one can answer the question. The questions are taken from JAQKET, and the candidate documents are sourced from Japanese Wikipedia.	Individual (Yuichi Tateno)
JaCWIR	A dataset created for evaluating document extraction and reranking in domains other than Wikipedia. Each of the 5,000 questions is assigned one Web page that serves as the source of the question and 99 unrelated Web pages.	Individual (Yuichi Tateno)

Benchmarks for vision-language models

	Description	Developer
llm-jp-eval-mm	A tool for evaluating the performance of Japanese VLMs on multiple benchmark tasks	Research and Development Center for Large Language Models
BusinessSlideVQA	A question-answer dataset with 220 questions about complex Japanese business slide images, designed to evaluate document comprehension capabilities.	Stockmark
JMMMU	A benchmark constructed as the Japanese version of MMMU Benchmark. It consists of 720 translated MMMU problems and 600 new problems unique to Japanese culture.	University of Tokyo Aizawa Lab
JDocQA	A question-answer dataset based on Japanese documents (pamphlets, slides, reports, websites), consisting of a total of 11,600 questions. It includes various question formats, including unanswerable questions.	NAIST Watanabe Lab
Heron VLM Leaderboard powered by Nejumi/WandB	Summarizes the evaluation results of Japanese-Heron-Bench and LLaVA-Bench-In-the-Wild (Japanese).	Turing, Weights & Biases
Japanese-Heron-Bench	21 images are assigned a total of 102 questions. It is characterized by image-question pairs that require knowledge related to Japan.	Turing
JA-VLM-Bench-In-the-Wild	A dataset independently prepared by Sakana AI to evaluate EvoVLM-JP-v1-7B. It consists of 50 questions assigned to 42 images. It is characterized by images and questions that require knowledge about Japan.	Sakana AI
JA-Multi-Image-VQA	A dataset for evaluating the question-answering ability in Japanese for multiple images.	Sakana AI
LLaVA-Bench-In-the-Wild (Japanese)	This is the Japanese version of LLaVA-Bench-In-the-Wild, translated using DeepL. It consists of 60 questions assigned to 24 images.	Turing
LLaVA-Bench (COCO) Japanese	This is the Japanese version, translated by DeepL, of the LLaVA-Bench (COCO) dataset used to evaluate LLaVA. It consists of 30 images, each with 3 types of questions assigned to them.	Turing
Japanese Visual Genome VQA dataset	A question-and-answer dataset annotated based on images from the Visual Genome dataset. A subset of this dataset, JA-VG-VQA-500, consisting of 500 questions, is sometimes used as a benchmark for evaluating VLMs.	Yahoo

References for Models and Architectures


Transformer	2017.06.12	NIPS(NeurIPS) 2017	Attention Is All You Need
GPT	2018.06.11	-	Improving Language Understanding by Generative Pre-Training
BERT	2018.10.11	NAACL 2019	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT-2	2019.02.14	-	Language Models are Unsupervised Multitask Learners
XLNet	2019.06.19	NeurIPS 2019	XLNet: Generalized Autoregressive Pretraining for Language Understanding
RoBERTa	2019.07.26	-	RoBERTa: A Robustly Optimized BERT Pretraining Approach
Sentence-BERT	2019.08.27	EMNLP-IJCNLP 2019	Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
ALBERT	2019.09.26	ICLR 2020	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
DistilBERT	2019.10.02	EMC2 Workshop at NeurIPS 2019	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
T5	2019.10.23	JMLR 2020	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
BART	2019.10.29	ACL 2020	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
LayoutLM	2019.12.31	KDD 2020	LayoutLM: Pre-training of Text and Layout for Document Image Understanding
ELECTRA	2020.03.23	ICLR 2020	ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ColBERT	2020.04.27	SIGIR 2020	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Conformer	2020.05.16	INTERSPEECH 2020	Conformer: Convolution-augmented Transformer for Speech Recognition
GPT-3	2020.05.28	NeurIPS 2020	Language Models are Few-Shot Learners
DeBERTa	2020.06.05	ICLR 2021	DeBERTa: Decoding-enhanced BERT with Disentangled Attention
wav2vec 2.0	2020.06.20	NeurIPS 2020	wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
BigBird	2020.07.28	NeurIPS 2020	Big Bird: Transformers for Longer Sequences
LUKE	2020.10.02	EMNLP 2020	LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
CLIP	2021.02.26	ICML 2021	Learning Transferable Visual Models From Natural Language Supervision
SimCSE	2021.04.18	EMNLP 2021	SimCSE: Simple Contrastive Learning of Sentence Embeddings
RoFormer	2021.04.20	-	RoFormer: Enhanced Transformer with Rotary Position Embedding
HuBERT	2021.06.14	TASLP 2021	HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
CLOOB	2021.10.21	NeurIPS 2022	CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
DeBERTaV3	2021.11.18	ICLR 2023	DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
ColBERTv2	2021.12.02	NAACL 2022	ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Stable Diffusion	2021.12.20	CVPR 2022	High-Resolution Image Synthesis With Latent Diffusion Models
BLIP	2022.01.28	ICML 2022	BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
MixCSE	2022.02.22	AAAI 2022	Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives
InstructGPT	2022.03.04	NeurIPS 2022	Training language models to follow instructions with human feedback
GPT-NeoX	2022.04.14	BigScience Research Workshop at ACL 2022	GPT-NeoX-20B: An Open-Source Autoregressive Language Model
DiffCSE	2022.04.21	NAACL 2022	DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
GIT	2022.05.27	TMLR 2022	GIT: A Generative Image-to-text Transformer for Vision and Language
CogVideo	2022.05.29	ICLR 2023	CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
MuLan	2022.08.26	ISMIR 2022	MuLan: A Joint Embedding of Music Audio and Natural Language
Whisper	2022.12.06	ICML 2023	Robust Speech Recognition via Large-Scale Weak Supervision
BLIP-2	2023.01.30	ICML 2023	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ControlNet	2023.02.10	ICCV 2023	Adding Conditional Control to Text-to-Image Diffusion Models
Llama	2023.02.27	-	LLaMA: Open and Efficient Foundation Language Models
GPT-4	2023.03.15	-	GPT-4 Technical Report
SigLIP	2023.03.27	ICCV 2023	Sigmoid Loss for Language Image Pre-Training
LLaVA	2023.04.17	NeurIPS 2023	Visual Instruction Tuning
MiniGPT-4	2023.04.20	-	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Fast Conformer	2023.05.08	ASRU 2023	Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
InstructBLIP	2023.05.11	NeurIPS 2023	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
RWKV	2023.05.22	EMNLP 2023 (Findings)	RWKV: Reinventing RNNs for the Transformer Era
RetNet	2023.07.17	-	Retentive Network: A Successor to Transformer for Large Language Models
Llama 2	2023.07.18	-	Llama 2: Open Foundation and Fine-Tuned Chat Models
Code Llama	2023.08.24	-	Code Llama: Open Foundation Models for Code
Qwen	2023.09.28	-	Qwen Technical Report
PixArt-α	2023.09.30	ICLR 2024	PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
LLaVA-1.5	2023.10.05	CVPR 2024	Improved Baselines with Visual Instruction Tuning
Mistral 7B	2023.10.10	-	Mistral 7B
Distil-Whisper	2023.11.01	-	Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Mamba	2023.12.01	COLM 2024	Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Nue ASR	2023.12.06	ACL 2024 (Findings)	Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
InternVL	2023.12.21	CVPR 2024	InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
TinyLlama	2024.01.04	-	TinyLlama: An Open-Source Small Language Model
Mixtral	2024.01.08	-	Mixtral of Experts
PIXART-δ	2024.01.10	-	PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
LEIA	2024.02.18	ACL 2024 (Findings)	LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
PixArt-Σ	2024.03.07	-	PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Gemma	2024.03.13	-	Gemma: Open Models Based on Gemini Research and Technology
EvoLLM-JP, EvoVLM-JP	2024.03.19	-	Evolutionary Optimization of Model Merging Recipes
RakutenAI-7B	2024.03.21	-	RakutenAI-7B: Extending Large Language Models for Japanese
rinna GPT, rinna RoBERTa, Nekomata, Youri, etc.	2024.04.02	LREC-COLING 2024	Release of Pre-Trained Models for the Japanese Language
SambaLingo-Japanese	2024.04.08	-	SambaLingo: Teaching Large Language Models New Languages
Heron	2024.04.11	-	Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese
Stockmark-13b	2024.04.12	PACLIC 38 (2024)	Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain
Phi-3	2024.04.22	-	Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
InternVL 1.5	2024.04.25	-	How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Swallow	2024.04.27	COLM 2024	Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Samba	2024.06.11	ICLR 2025	Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
LLM-jp-13B	2024.07.04	-	LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Qwen2	2024.07.15	-	Qwen2 Technical Report
Llama 3.1	2024.07.23	-	The Llama 3 Herd of Models
Gemma 2	2024.07.31	-	Gemma 2: Improving Open Language Models at a Practical Size
CogVideoX	2024.08.12	-	CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Moshi	2024.09.17	-	Moshi: a speech-text foundation model for real-time dialogue
PLaMo-100B	2024.10.10	-	PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
Phi-4	2024.12.12	-	Phi-4 Technical Report
ModernBERT	2024.12.18	-	Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
NVILA	2024.12.05	CVPR 2025	NVILA: Efficient Frontier Visual Language Models
Qwen2.5	2024.12.19	-	Qwen2.5 Technical Report
DeepSeek-R1	2025.01.22	-	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

References for Training Methods


PPO (RLHF)	2017.07.20	-	Proximal Policy Optimization Algorithms
Instruction Tuning (Supervised Fine-tuning; SFT)	2021.09.03	ICLR 2022	Finetuned Language Models Are Zero-Shot Learners
Sparse Upcycling	2022.12.09	ICLR 2023	Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
DPO	2023.05.29	NeurIPS 2023	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
SteerLM	2023.10.09	EMNLP 2023 (Findings)	SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
ORPO	2024.03.12	EMNLP 2024	ORPO: Monolithic Preference Optimization without Reference Model
TAID	2025.01.28	ICLR 2025	TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Drop-Upcycling	2025.02.26	ICLR 2025	Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Our Contributors

We love contributors! Feel free to contribute to this project.

Citation

The summary of this repository is also published as a preprint: Exploring Open Large Language Models for the Japanese Language: A Practical Guide

When referencing this repository, please cite as follows:

@article{awesomeJapanese2024,
    title={{Exploring Open Large Language Models for the Japanese Language: A Practical Guide}},
    author={Kaito Sugimoto},
    doi={10.51094/jxiv.682},
    journal={Jxiv preprint},
    year={2024}
}

Some architectural changes have been made. For details, refer to: 1,000億パラメータ規模の独自LLM「PLaMo-100B」の事前学習 ↩︎
Refer to the following articles: 大規模言語モデルTanuki-8B, 8x8Bの位置づけや開発指針など, 大規模言語モデルを開発するにあたっての事前・事後学習の戦略メモー特に合成データについてー ↩︎ ↩︎
Some performance enhancements have been made to the original Llama model. See here for details. ↩︎
Details have not been made public but the private dataset includes data from the EleutherAI Polyglot project's Japanese team and from members of Stable Community Japan. ↩︎
This project conducted evaluation research on using right-to-left generation instead of the usual left-to-right generation, releasing both left-to-right and right-to-left models. ↩︎
Before conducting Instruction Tuning, a Chat Vector between Llama 3 Instruct and Llama 3 Base is added. ↩︎ ↩︎
After conducting Instruction Tuning, a Chat Vector between Llama 3 Instruct and Llama 3 Base is added. ↩︎ ↩︎
However, if commercial use of KARAKURI LM is desired, direct contact with the developer, KARAKURI Inc., is required. ↩︎
In Instruction Tuning, because it uses data generated by OpenAI's models, such as GPT-3.5 and GPT-4, for training, there is a possibility that it may violate OpenAI's terms. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Before conducting Instruction Tuning, a Chat Vector between Gemma 2 Instruct and Gemma 2 Base is added. ↩︎
○: The model is on the HuggingFace Model Hub and can be loaded in with the AutoModel.from_pretrained() command. △: The model is not on the Model Hub but can be loaded in manually with the HuggingFace transformers library. ✕: The model is not directly loadable with HuggingFace. ↩︎
By removing Causal Attention from Llama, it is used as an encoder-type model. ↩︎
This project conducted evaluation research on pre-tokenization morphological analysis and released their best performing model, which used Juman++ and BPE. ↩︎
However, the maximum sequence length has been extended to 2048, and various architectural changes have been made compared to the original BERT. See the HuggingFace repository README for details. ↩︎
nlp-waseda/roberta-base-japanese and nlp-waseda/roberta-large-japanese trained using a 128 token context length, but nlp-waseda/roberta-large-japanese-seq512 expanded the context length to 512. ↩︎
Extended to a 1282 context length from the usual 512. ↩︎
For details of each model, please refer to Chapter 4 of the authors' paper. Note that the SC-2M-wiki model is strictly not a domain-specific model as it is pre-trained only on Wikipedia. ↩︎
The "small" model trains on Japanese Wikipedia and the Japanese Financial Corpus simultaneously, while the "base" model takes the TohokuUniversityBERT and conducts additional training on the Japanese Financial Corpus. ↩︎
ManbyoWordPiece conducts a pre-tokenization step using MeCab (IPA+Manbyo dictionaries) and uses WordPiece for subword tokenization, while the SentencePiece model tokenizes text directly using a unigram model. ↩︎
The classification of embedding models was referenced from Dense Text Retrieval based on Pretrained Language Models: A Survey (Zhao+, 2022). The Bi-Encoder architecture inputs two separate inputs into the model and vectorizes each, using their dot product or cosine similarity as a measure of their proximity. In contrast, the Cross-Encoder architecture inputs the combined inputs into the model to directly compute their proximity internally. Although Cross-Encoders incur higher computational costs, they are often used as rerankers in information extraction due to their ability to compute input proximity more precisely. Among Bi-Encoders, there are types (e.g., ColBERT) that represent the input as multiple vectors (such as one per token) rather than a single vector, hence further classification into Single-representation bi-encoders and Multi-representation bi-encoders. ↩︎
However, it calls for consideration for use in research and education. Additionally, be aware that some of the licenses for the source models are not Apache 2.0. ↩︎ ↩︎ ↩︎

Overview of Japanese LLMs ​

Text Generation Models ​

Models built from scratch ​

General purpose ​

Domain Specific ​

Models built off non-Japanese LLMs (w/ continual pre-training on Japanese) ​

General purpose ​

Domain specific ​

Models built off non-Japanese LLMs (w/ post-training on Japanese) ​

General purpose ​

Domain specific ​

Merged models ​

API-based models ​

Encoder models ​

General purpose ​

Domain Specific ​

Sentence and Document Embeddings [20] ​

Bi-Encoders ​

Single-representation bi-encoders ​

Multi-representation bi-encoders ​

Cross-Encoders ​

Vision-Language Models ​

Text+Image to Text ​

Models built from scratch ​

Models built off non-Japanese VLMs ​

Merged models ​

Text to Image ​

General Purpose ​

Domain Specific ​

Text to Video ​

Others ​

Speech-Language Models ​

Automatic Speech Recognition ​

Others ​

Music-Language Models ​

Music-Text Conversion ​

Evaluation Benchmarks for Japanese LLMs ​

Hybrid Benchmarks ​

Traditional Benchmarks based on Natural Language Understanding tasks ​

Benchmarks on open-ended generative tasks ​

Benchmarks for measuring performance in specific domains ​

Benchmarks for measuring factuality and safety ​

Benchmarks for measuring logical reasoning capabilities ​

Benchmarks on controlled text generation ​

Benchmarks for embedding models ​

Benchmarks for vision-language models ​

References for Models and Architectures ​

References for Training Methods ​

Our Contributors ​

Citation ​

Overview of Japanese LLMs

Text Generation Models

Models built from scratch

General purpose

Domain Specific

Models built off non-Japanese LLMs (w/ continual pre-training on Japanese)

General purpose

Domain specific

Models built off non-Japanese LLMs (w/ post-training on Japanese)

General purpose

Domain specific

Merged models

API-based models

Encoder models

General purpose

Domain Specific

Sentence and Document Embeddings ^[20]

Bi-Encoders

Single-representation bi-encoders

Multi-representation bi-encoders

Cross-Encoders

Vision-Language Models

Text+Image to Text

Models built from scratch

Models built off non-Japanese VLMs

Merged models

Text to Image

General Purpose

Domain Specific

Text to Video

Others

Speech-Language Models

Automatic Speech Recognition

Others

Music-Language Models

Music-Text Conversion

Evaluation Benchmarks for Japanese LLMs

Hybrid Benchmarks

Traditional Benchmarks based on Natural Language Understanding tasks

Benchmarks on open-ended generative tasks

Benchmarks for measuring performance in specific domains

Benchmarks for measuring factuality and safety

Benchmarks for measuring logical reasoning capabilities

Benchmarks on controlled text generation

Benchmarks for embedding models

Benchmarks for vision-language models

References for Models and Architectures

References for Training Methods

Our Contributors

Citation