Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS)
亲和场景为调整少量结构或参数,使得模型更加亲和昇腾,性能更优
A3 为硬件 Atlas A3 训练系列产品
MindSpeed-MM模型列表[td]
模型任务
模型
参数量
任务
集群
精度格式
NPU性能
参考性能
认证
多模态生成
HunyuanVideo
13B
预训练
1x8
BF16
0.171 (SPS)
0.181 (SPS)
【Test】
OpenSora 1.0
5.5B
预训练
1x8
BF16
3.18 (SPS)
2.04 (SPS)
【Pass】
OpenSora 1.2
5.2B
预训练
1x8
BF16
7.31 (SPS)
8.15 (SPS)
【Pass】
OpenSoraPlan 1.2
8.7B
预训练
1x8
BF16
0.42 (SPS)
0.37 (SPS)
【Pass】
OpenSoraPlan 1.3-T2V
8.6B
预训练
1x8
BF16
1.29 (SPS)
1.27 (SPS)
【Pass】
OpenSoraPlan 1.3-I2V
8.6B
预训练
1x8
BF16
1.17 (SPS)
1.15 (SPS)
【Pass】
CogVideoX-T2V
5B
预训练
1x8
BF16
0.37 (SPS)
0.46 (SPS)
【Pass】
CogVideoX-I2V
5B
预训练
1x8
BF16
0.37 (SPS)
0.46 (SPS)
【Pass】
CogVideoX 1.5-T2V
5B
预训练
1x8
BF16
1.88 (SPS)
2.09 (SPS)
【Pass】
CogVideoX 1.5-I2V
5B
预训练
1x8
BF16
1.81 (SPS)
2.01 (SPS)
【Pass】
Qihoo-T2X
1.1B
推理
1x1
BF16
/
/
【奇虎360贡献】
SDXL
3.5B
预训练
1x8
BF16
29.92 (FPS)
30.65 (FPS)
【Pass】
3.5B
预训练
1x8
FP16
28.51 (FPS)
30.23 (FPS)
【Pass】
SD3
2B
全参微调
1x8
BF16
16.09 (FPS)
16.01 (FPS)
【Pass】
SD3.5
8.1B
全参微调
1x8
BF16
26.20 (FPS)
28.33 (FPS)
【Pass】
8.1B
Lora微调
1x8
FP16
47.93 (FPS)
47.95 (FPS)
【Pass】
Flux
12B
全参微调
1x8
BF16
55.23 (FPS)
53.65 (FPS)
【Pass】
Sana
1.6B
Lora微调
1x8
BF16
28.7 (FPS)
32.8 (FPS)
【Pass】
Kolors
2.6B
推理
1x1
FP16
/
/
【Test】
多模态理解
LLaVA 1.5
7B
全参微调
1x8
BF16
48.27 (SPS)
49.94 (SPS)
【Test】
InternVL 2.0
2B
微调
1x8
BF16
33.77 (SPS)
22.46 (SPS)
【Pass】
8B
微调
1x8
BF16
12.86 (SPS)
11.00 (SPS)
【Pass】
26B
微调
1x8
BF16
3.31 (SPS)
3.26 (SPS)
【Pass】
76B
全参微调
8x16
BF16
214 (TPS)
191 (TPS)
【Test】
InternVL 2.5
78B
微调
8x8
BF16
/
/
【Test】
Qwen2-VL
2B
微调
1x8
BF16
34.15 (SPS)
34.88 (SPS)
【Pass】
7B
微调
1x8
BF16
13.28 (SPS)
11.66 (SPS)
【Pass】
72B
微调
4x8 (A3)
BF16
261.25 (TPS)
257.63 (TPS)
【Pass】
语音识别
Whisper
1.5B
预训练
1x8
BF16
93.38 (SPS)
109.23 (SPS)
【Test】
其他已适配昇腾的多模态大模型[td]
模型
参数量
任务
集群
精度格式
NPU性能
参考性能
认证
CogVLM-2
8B
微调
1x8
BF16
3.9 (s/it)
3.3 (s/it)
【Pass】
PLLaVA
7B
预训练
1x8
BF16
0.841 (s/step)
0.935 (s/step)
【Pass】
7B
预训练
1x8
FP32
0.935 (s/step)
1.08 (s/step)
【Pass】
miniCPM-V 2.5
8B
全参微调
1x8
BF16
1046 (s)/50-200steps
847 (s)/50-200steps
【Pass】
8B
Lora微调
1x8
BF16
603 (s)/50-200steps
490 (s)/50-200steps
【Pass】
HunYuanDiT
1.5B
预训练
1x8
BF16
1099.5 (ms/step)
1059.3 (ms/step)
【Pass】
InternVL 1.5
26B
微调训练
1x8
BF16
4.952 (FPS)
5.151 (FPS)
【Pass】
图生视频: OpensoraPlan 1.3 I2V
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
输入图片
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
Prompt: A rocket ascends slowly into the sky
文生视频: OpensoraPlan 1.3 T2V
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
Prompt: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee
文生图:Flux T2I
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
Prompt: A cat holding a sign that says hello world
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
Prompt: A cat holding a sign that says MindSpeed
理解模型:InternVL2 & Qwen2VL
python源码面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解 ...
Input image for both models:
Input text for both models: Please describe the image shortly
InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene
Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.