列名的含义
  • M: 模型参数量
  • T: 模型预训练tokens
  • zh: 模型是否很好的支持中文
  • code: 模型是否支持code
  • S: 模型的context sequence长度, 1K=1024
  • L: 模型non-embedding的层数
  • D: 模型token embedding维度
  • F: FFN中隐向量dff维度
  • H: Multi-head attention中head数
  • G: Grouped Query Attention (GQA)中key/value head数
  • V: 词表大小
  • Type: 模型类型(Dense/MoE/Mamba/...)
  • Tie: 是否tie embedding
LLM M T zh code S L D F H G V Tie Type 时间
GPT-1 117M 800M(w) False False 512 12 768 3072 12 12 40478 True Dense 2018-06-11
GPT-2 124M 10B False False 1K 12 768 3072 12 12 50257 True Dense 2019-02-14
355M 10B False False 1K 24 1024 4096 16 16 50257 True Dense 2019-02-14
774M 10B False False 1K 36 1280 5120 20 20 50257 True Dense 2019-02-14
1.5B 10B False False 1K 48 1600 6400 25 25 50257 True Dense 2019-02-14
GPT-3 125M 300B False False 2K 12 768 3072 12 12 50257 True Dense 2020-05-28
350M 300B False False 2K 24 1024 4096 16 16 50257 ?True Dense 2020-05-28
760M 300B False False 2K 24 1536 6144 16 16 50257 ?True Dense 2020-05-28
1.3B 300B False False 2K 24 2048 8192 24 24 50257 ?True Dense 2020-05-28
2.7B 300B False False 2K 32 2560 10240 32 32 50257 ?True Dense 2020-05-28
6.7B 300B False False 2K 32 4096 16384 32 32 50257 ?True Dense 2020-05-28
13B 300B False False 2K 40 5120 20480 40 40 50257 ?True Dense 2020-05-28
175B 300B False False 2K 96 12288 49152 96 96 50257 ?True Dense 2020-05-28
Llama 1 7B 1T False False 2K 32 4096 11008 32 32 32000 False Dense 2023-02-24
13B 1T False True 2K 40 5120 13824 40 40 32000 False Dense 2023-02-24
33B 1.4T False True 2K 60 6656 17920 52 52 32000 False Dense 2023-02-24
65B 1.4T False True 2K 80 8192 22016 64 64 32000 False Dense 2023-02-24
Llama 2 7B 2T False True 4K 32 4096 11008 32 32 32000 False Dense 2023-07-18
13B 2T False True 4K 40 5120 13824 40 40 32000 False Dense 2023-07-18
70B 2T False True 4K 80 8192 28672 64 8 32000 False Dense 2023-07-18
Llama 3 8B 15T False True 8K 32 4096 14336 32 8 128256 False Dense 2024-04-18
70B 15T False True 8K 80 8192 28672 64 8 128256 False Dense 2024-04-18
Llama 3.1 8B 15T False True 128K 32 4096 14336 32 8 128256 False Dense 2024-07-23
70B 15T False True 128K 80 8192 28672 64 8 128256 False Dense 2024-07-23
405B 15T False True 128K 126 16384 53248 128 8 128256 False Dense 2024-07-23
Llama 3.2 1B 15T False True 128K 16 2K 8192 32 8 128256 True Dense 2024-09-25
3B 15T False True 128K 28 3072 8192 24 8 128256 True Dense 2024-09-25
Llama 3.3 70B 15T False True 128K 80 8192 28672 64 8 128256 False Dense 2024-12-06
Qwen 1 1.8B 2.2T True True 8K 24 2048 11008 16 16 151936 False Dense 2023-11-30
7B 2.4T True True 8K 32 4096 22016 32 32 151936 False Dense 2023-11-30
14B 3T True True 2K 40 5120 27392 40 40 152064 False Dense 2023-11-30
72B 3T True True 32K 80 8192 49152 64 64 152064 False Dense 2023-11-30
Qwen1.5 0.5B 2.4T True True 32K 24 1024 2816 16 16 151936 True Dense 2024-02-04
1.8B 2.4T True True 32K 24 2048 5504 16 16 151936 False Dense 2024-02-04
4B 2.4T True True 32K 40 2560 6912 20 20 151936 False Dense 2024-02-04
7B 4T True True 32K 32 4096 11008 32 32 151936 False Dense 2024-02-04
14B 4T True True 32K 40 5120 13696 40 40 152064 False Dense 2024-02-04
32B ?T True True 32K 64 5120 27392 40 8 152064 False Dense 2024-02-04
72B 3T True True 32K 80 8192 24576 64 64 152064 False Dense 2024-02-04
110B ?T True True 32K 80 8192 49152 64 8 152064 False Dense 2024-02-04
MoE-A2.7B ?T True True 32K 24 2048 5632 16 16 151936 False MoE 2024-02-04
Qwen2 0.5B 12T True True 128K 24 896 4864 14 2 151646 True Dense 2024-06-07
1.5B 7T True True 128K 28 1536 8960 12 2 151646 True Dense 2024-06-07
7B 7T True True 128K 28 3584 18944 28 4 151646 False Dense 2024-06-07
72B 7T True True 128K 80 8192 29568 64 8 151646 False Dense 2024-06-07
57B-A14B 4.5T True True 128K 28 3584 2560 28 4 151646 False MoE 2024-06-07
Qwen2.5 0.5B ?18T True True 32K 24 896 4864 14 2 151936 True Dense 2024-09-15
1.5B ?18T True True 32K 28 1536 8960 12 2 151936 True Dense 2024-09-15
3B ?18T True True 32K 36 2048 11008 16 2 151936 True Dense 2024-09-15
7B ?18T True True 128K 28 3584 18944 28 4 151936 False Dense 2024-09-15
14B ?18T True True 128K 48 5120 13824 40 8 151936 False Dense 2024-09-15
32B ?18T True True 128K 64 5120 27648 40 8 152064 False Dense 2024-09-15
72B ?18T True True 128K 80 8192 29568 64 8 152064 False Dense 2024-09-15
DeepSeek LLM 7B 2T True True 4K 30 4096 11008 32 32 102400 False Dense 2023-11-29
67B 2T True True 4K 95 8192 22016 64 8 102400 False Dense 2023-11-29
DeepSeekMoE 16B 2T True True 4K 28 2048 10944 16 16 102400 False MoE 2024-01-08