针对目前几款 SOTA LLMs 的蒸馏痕迹探讨

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 79 天前的主题，其中的信息可能已经有所发展或是发生改变。

测试工具：Ichigo Bot ，一款 LLM API 聚合 Telegram 聊天机器人（✨GitHub：https://github.com/rewired-gh/ichigo-bot）

系统提示词：略；温度：0.2 ；测试提示词：Good morning

提示词非常简单，也是最常见的提示词之一。这是为了避免变成智商测试。以下是 LLMs 的回复。

GPT-4o （基准）：Good morning! ☀️ How can I assist you today? 😊

Qwen Max：Good morning! 🌞 How can I assist you today?

DeepSeek R1：Good morning! 🌞 How can I assist you today?

以下为暂无蒸馏证据的模型。

o3-mini：Good morning! I'm Ichigo (いちご 🍓). I hope your day is off to a great start. How can I assist you today?

Gemini 2.0 Flash Thinking：Good morning to you too! ☀️ Hope you have a wonderful day ahead! 😊 Is there anything I can help you with this morning?

豆包 1.5 Pro：Good morning! It's a great start to the day. What exciting things do you have planned?

结论：对于洁癖而言，不建议使用几款蒸馏的模型。

3 条回复

musi

78 天前

蒸馏几乎都是为了用低成本获取高质量数据，简而言之就是降本。用更少的钱获得相同质量的答案为什么不建议用

H00kran

78 天前 via Android

@musi 对，理论上蒸馏模型不代表更差的表现，所以我说的是「对于洁癖而言」。在 GitHub 那个研究中，指出这几款有蒸馏痕迹的模型存在较高的 logical contradictions ，尤其当用户告诉模型与事实不符的信息时（即误导）。

cat9life

76 天前

对用户来说，谁好用谁便宜用谁，管你是蒸馏还是啥。对 ai 公司来说，这样竞争确实是不讲武德，也不利于行业的长期发展