PPO 用 Reward Model 计算大模型完整输出的奖励分数(r),过程中会跟原始模型的输出做对比(KL),避免偏离太远。大模型每步 token 输出,都会经过 Value Model 判定输出对后续奖励的期望值(v),经过 GAE 函数计算每步动作的好坏程度,更新我们在训练的大模型 Policy Model 参数。
GRPO 去掉了 Value Model,对每个输入(q)输出多个结果o1 o2 o3 …,奖励模型判断这些结果的好坏,从中选出最好的,更新模型参数。对算法和公式更详细的介绍,推荐这两个讲解:(1),(2)
2. Aha moment:训练过程中模型学会了停下来重新评估前面的思考,而且使用拟人的口气进行了反思。强化学习过程中人类没有输入任何类似的反思引导,模型能自主进行这种反思,十分有趣。我理解为,在预训练的模型里,模型已经有这样的反思意识潜力,在训练的某次输出过程中偶然出现,而出现后的结果对复杂推理有帮助,强化学习的机制会持续鼓励这样的输出。
训练的结果,在推理相关的 benchmark上,基本达到 o1 的水平:
R1-Zero 有非常有趣的发现,即不做监督学习,仅对预训练模型进行强化学习这条路是 OK 的,最终的模型有媲美o1的推理能力,但当前这种方式训练出的 R1-Zero 实际对外使用会有些问题,主要是对人不友好,比如输出的内容中英文混杂,猜测是奖励模型并没有做这种人类阅读友好的奖励判定,模型没理由能做好这块。
R1
为了做出一个推理能力强,输出对人类友好,综合能力 OK 的模型,DeepSeek 另外训练了R1模型。
You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.
<IMPORTANT>
//
* If user provides a path, you should NOT assume it's relative to the current working directory. Instead, you should explore the file system to find the file before working on it.
* When configuring git credentials, use "openhands" as the user.name and "openhands@all-hands.dev" as the user.email by default, unless explicitly instructed otherwise.
* The assistant MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.
RuntimeInfo(available_hosts={'http://localhost:54090': 54090, 'http://localhost:55602': 55602})
</IMPORTANT>
<RUNTIME_INFORMATION>
The user has access to the following hosts for accessing a web application,
each of which has a corresponding port:
* http://localhost:54090 (port 54090)
* http://localhost:55602 (port 55602)
When starting a web server, use the corresponding ports. You should also
set any options to allow iframes and CORS requests.
</RUNTIME_INFORMATION>
You have access to the following functions:
---- BEGIN FUNCTION #1: execute_bash ----
Description: Execute a bash command in the terminal.
* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.
* Interactive: If a bash command returns exit code `-1`, this means the process is not yet finished. The assistant must then send a second call to terminal with an empty `command` (which will retrieve any additional logs), or it can send additional text (set `command` to the text) to STDIN of the running process, or it can send command like `C-c` (Ctrl+C) to interrupt the process.
Parameters:
(1) command (string, required): The bash command to execute. Can be empty string to view additional logs when previous exit code is `-1`. Can be `C-c` (Ctrl+C) to interrupt the currently running process.
---- END FUNCTION #1 ----
---- BEGIN FUNCTION #2: finish ----
Description: Finish the interaction when the task is complete OR if the assistant cannot proceed further with the task.
No parameters are required for this function.
---- END FUNCTION #2 ----
---- BEGIN FUNCTION #3: web_read ----
Description: Read (convert to markdown) content from a webpage. You should prefer using the `web_read` tool over the `browser` tool, but do use the `browser` tool if you need to interact with a webpage (e.g., click a button, fill out a form, etc.).
You may use the `web_read` tool to read content from a webpage, and even search the webpage content using a Google search query (e.g., url=`https://www.google.com/search?q=YOUR_QUERY`).
Parameters:
(1) url (string, required): The URL of the webpage to read. You can also use a Google search query here (e.g., `https://www.google.com/search?q=YOUR_QUERY`).
---- END FUNCTION #3 ----
---- BEGIN FUNCTION #4: browser ----
Description: Interact with the browser using Python code. Use it ONLY when you need to interact with a webpage.
See the description of "code" parameter for more details.
Multiple actions can be provided at once, but will be executed sequentially without any feedback from the page.
More than 2-3 actions usually leads to failure or unexpected behavior. Example:
fill('a12', 'example with "quotes"')
click('a51')
click('48', button='middle', modifiers=['Shift'])
Parameters:
(1) code (string, required): The Python code that interacts with the browser.
The following 15 functions are available. Nothing else is supported.
goto(url: str)
Description: Navigate to a url.
Examples:
goto('http://www.example.com')
go_back()
Description: Navigate to the previous page in history.
Examples:
go_back()
go_forward()
Description: Navigate to the next page in history.
Examples:
go_forward()
noop(wait_ms: float = 1000)
Description: Do nothing, and optionally wait for the given time (in milliseconds).
You can use this to get the current page content and/or wait for the page to load.
Examples:
noop()
noop(500)
scroll(delta_x: float, delta_y: float)
Description: Scroll horizontally and vertically. Amounts in pixels, positive for right or down scrolling, negative for left or up scrolling. Dispatches a wheel event.
Examples:
scroll(0, 200)
scroll(-50.2, -100.5)
fill(bid: str, value: str)
Description: Fill out a form field. It focuses the element and triggers an input event with the entered text. It works for <input>, <textarea> and [contenteditable] elements.
Examples:
fill('237', 'example value')
fill('45', 'multi-line
example')
fill('a12', 'example with "quotes"')
select_option(bid: str, options: str | list[str])
Description: Select one or multiple options in a <select> element. You can specify option value or label to select. Multiple options can be selected.
Examples:
select_option('a48', 'blue')
select_option('c48', ['red', 'green', 'blue'])
click(bid: str, button: Literal['left', 'middle', 'right'] = 'left', modifiers: list[typing.Literal['Alt', 'Control', 'ControlOrMeta', 'Meta', 'Shift']] = [])
Description: Click an element.
Examples:
click('a51')
click('b22', button='right')
click('48', button='middle', modifiers=['Shift'])
dblclick(bid: str, button: Literal['left', 'middle', 'right'] = 'left', modifiers: list[typing.Literal['Alt', 'Control', 'ControlOrMeta', 'Meta', 'Shift']] = [])
Description: Double click an element.
Examples:
dblclick('12')
dblclick('ca42', button='right')
dblclick('178', button='middle', modifiers=['Shift'])
hover(bid: str)
Description: Hover over an element.
Examples:
hover('b8')
press(bid: str, key_comb: str)
Description: Focus the matching element and press a combination of keys. It accepts the logical key names that are emitted in the keyboardEvent.key property of the keyboard events: Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp, F1 - F12, Digit0 - Digit9, KeyA - KeyZ, etc. You can alternatively specify a single character you'd like to produce such as "a" or "#". Following modification shortcuts are also supported: Shift, Control, Alt, Meta, ShiftLeft, ControlOrMeta. ControlOrMeta resolves to Control on Windows and Linux and to Meta on macOS.
Examples:
press('88', 'Backspace')
press('a26', 'ControlOrMeta+a')
press('a61', 'Meta+Shift+t')
focus(bid: str)
Description: Focus the matching element.
Examples:
focus('b455')
clear(bid: str)
Description: Clear the input field.
Examples:
clear('996')
drag_and_drop(from_bid: str, to_bid: str)
Description: Perform a drag & drop. Hover the element that will be dragged. Press left mouse button. Move mouse to the element that will receive the drop. Release left mouse button.
Examples:
drag_and_drop('56', '498')
upload_file(bid: str, file: str | list[str])
Description: Click an element and wait for a "filechooser" event, then select one or multiple input files for upload. Relative file paths are resolved relative to the current working directory. An empty list clears the selected files.
Examples:
upload_file('572', '/home/user/my_receipt.pdf')
upload_file('63', ['/home/bob/Documents/image.jpg', '/home/bob/Documents/file.zip'])
---- END FUNCTION #4 ----
---- BEGIN FUNCTION #5: execute_ipython_cell ----
Description: Run a cell of Python code in an IPython environment.
* The assistant should define variables and import packages before using them.
* The variable defined in the IPython environment will not be available outside the IPython environment (e.g., in terminal).
Parameters:
(1) code (string, required): The Python code to execute. Supports magic commands like %pip.
---- END FUNCTION #5 ----
---- BEGIN FUNCTION #6: str_replace_editor ----
Description: Custom editing tool for viewing, creating and editing files
* State is persistent across command calls and discussions with the user
* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep
* The `create` command cannot be used if the specified `path` already exists as a file
* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`
* The `undo_edit` command will revert the last edit made to the file at `path`
Notes for using the `str_replace` command:
* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!
* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique
* The `new_str` parameter should contain the edited lines that should replace the `old_str`
Parameters:
(1) command (string, required): The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.
Allowed values: [`view`, `create`, `str_replace`, `insert`, `undo_edit`]
(2) path (string, required): Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`.
(3) file_text (string, optional): Required parameter of `create` command, with the content of the file to be created.
(4) old_str (string, optional): Required parameter of `str_replace` command containing the string in `path` to replace.
(5) new_str (string, optional): Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.
(6) insert_line (integer, optional): Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.
(7) view_range (array, optional): Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.
---- END FUNCTION #6 ----
If you choose to call a function ONLY reply in the following format with NO suffix:
<function=example_function_name>
<parameter=example_parameter_1>value_1</parameter>
<parameter=example_parameter_2>
This is the value for the second parameter
that can span
multiple lines
</parameter>
</function>
<IMPORTANT>
Reminder:
- Function calls MUST follow the specified format, start with <function= and end with </function>
- Required parameters MUST be specified
- Only call one function at a time
- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after.
- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls
----------
Here's a running example of how to perform a task with the provided tools.
--------------------- START OF EXAMPLE ---------------------
USER: Create a list of numbers from 1 to 10, and display them in a web page at port 5000.
ASSISTANT: Sure! Let me first check the current directory:
<function=execute_bash>
<parameter=command>
pwd && ls
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
/workspace
openhands@runtime:~/workspace$
ASSISTANT: There is no `app.py` file in the current directory. Let me create a Python file `app.py`:
<function=str_replace_editor>
<parameter=command>create</parameter>
<parameter=path>/workspace/app.py</parameter>
<parameter=file_text>
from flask import Flask
app = Flask(__name__)
@app.route('/')
def index():
numbers = list(range(1, 11))
return str(numbers)
if __name__ == '__main__':
app.run(port=5000)
</parameter>
</function>
USER: EXECUTION RESULT of [str_replace_editor]:
File created successfully at: /workspace/app.py
ASSISTANT: I have created a Python file `app.py` that will display a list of numbers from 1 to 10 when you run it. Let me run the Python file for you:
<function=execute_bash>
<parameter=command>
python3 app.py > server.log 2>&1 &
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
[1] 121
[1]+ Exit 1 python3 app.py > server.log 2>&1
ASSISTANT: Looks like the server is running with PID 121 then crashed. Let me check the server log:
<function=execute_bash>
<parameter=command>
cat server.log
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
Traceback (most recent call last):
File "/workspace/app.py", line 2, in <module>
from flask import Flask
ModuleNotFoundError: No module named 'flask'
ASSISTANT: Looks like the server crashed because the `flask` module is not installed. Let me install the `flask` module for you:
<function=execute_bash>
<parameter=command>
pip3 install flask
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
Defaulting to user installation because normal site-packages is not writeable
Collecting flask
Using cached flask-3.0.3-py3-none-any.whl (101 kB)
Collecting blinker>=1.6.2
Using cached blinker-1.7.0-py3-none-any.whl (13 kB)
Collecting Werkzeug>=3.0.0
Using cached werkzeug-3.0.2-py3-none-any.whl (226 kB)
Collecting click>=8.1.3
Using cached click-8.1.7-py3-none-any.whl (97 kB)
Collecting itsdangerous>=2.1.2
Using cached itsdangerous-2.2.0-py3-none-any.whl (16 kB)
Requirement already satisfied: Jinja2>=3.1.2 in /home/openhands/.local/lib/python3.10/site-packages (from flask) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.0 in /home/openhands/.local/lib/python3.10/site-packages (from Jinja2>=3.1.2->flask) (2.1.5)
Installing collected packages: Werkzeug, itsdangerous, click, blinker, flask
Successfully installed Werkzeug-3.0.2 blinker-1.7.0 click-8.1.7 flask-3.0.3 itsdangerous-2.2.0
ASSISTANT: The `flask` module is installed successfully. Let me run the Python file again:
<function=execute_bash>
<parameter=command>
python3 app.py > server.log 2>&1 &
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
[1] 124
ASSISTANT:
Let me check the server log again:
<function=execute_bash>
<parameter=command>
cat server.log
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
ASSISTANT:
The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
USER: Now kill the server, make it display the numbers in a table format.
ASSISTANT:
Sure! Let me stop the server first:
<function=execute_bash>
<parameter=command>
kill 124
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
[1]+ Terminated python3 app.py > server.log 2>&1
ASSISTANT:
Now let me display the numbers in a table format:
<function=str_replace_editor>
<parameter=command>str_replace</parameter>
<parameter=path>/workspace/app.py</parameter>
<parameter=old_str>return str(numbers)</parameter>
<parameter=new_str>return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'</parameter>
</function>
USER: EXECUTION RESULT of [str_replace_editor]:
The file /workspace/app.py has been edited. Here's the result of running `cat -n` on a snippet of /workspace/app.py:
3
4 @app.route('/')
5 def index():
6 numbers = list(range(1, 11))
7 return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'
8
9 if __name__ == '__main__':
10 app.run(port=5000)
Review the changes and make sure they are as expected. Edit the file again if necessary.
ASSISTANT:
Running the updated file:
<function=execute_bash>
<parameter=command>
<parameter=command>
python3 app.py > server.log 2>&1 &
</parameter>
</function>
USER: EXECUTION RESULT of [execute_bash]:
[1] 126
ASSISTANT:
The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Let me know if you have any further requests!
<function=finish>
</function>
--------------------- END OF EXAMPLE ---------------------
Do NOT assume the environment is the same as in the example above.
附:任务 Prompt示例
以下是真实执行一个任务时,循环添加输入 LLM 的 Prompt case:
--------------------- NEW TASK DESCRIPTION ---------------------
创建一个可以调用deepseek模型的程序,并运行测试确保可用
--------------------- END OF NEW TASK DESCRIPTION ---------------------
PLEASE follow the format strictly! PLEASE EMIT ONE AND ONLY ONE FUNCTION CALL PER MESSAGE.
----------
Let me start by checking the current directory and listing its contents:
<function=execute_bash>
<parameter=command>pwd && ls</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
/workspace
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
----------
I see we are in the `/workspace` directory. Let's create a Python file that can call the DeepSeek model. First, I'll create a new file called `deepseek_app.py`:
<function=str_replace_editor>
<parameter=command>create</parameter>
<parameter=path>/workspace/deepseek_app.py</parameter>
<parameter=file_text>
from deepseek import DeepSeekModel
def call_deepseek_model(input_text):
model = DeepSeekModel()
response = model.generate(input_text)
return response
if __name__ == "__main__":
test_input = "Hello, DeepSeek!"
result = call_deepseek_model(test_input)
print("Model Response:", result)
</parameter>
</function>
----------
EXECUTION RESULT of [str_replace_editor]:
File created successfully at: /workspace/deepseek_app.py
----------
Now, let's check if the `deepseek` module is installed. If not, we'll need to install it:
<function=execute_bash>
<parameter=command>pip3 show deepseek</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
WARNING: Package(s) not found: deepseek
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 1]
----------
The `deepseek` module is not installed. Let me install it using pip:
<function=execute_bash>
<parameter=command>pip3 install deepseek</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
Collecting deepseek
Downloading deepseek-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Requirement already satisfied: requests in /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages (from deepseek) (2.32.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages (from requests->deepseek) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages (from requests->deepseek) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages (from requests->deepseek) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages (from requests->deepseek) (2024.12.14)
Downloading deepseek-1.0.0-py3-none-any.whl (4.5 kB)
Installing collected packages: deepseek
Successfully installed deepseek-1.0.0
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
----------
The `deepseek` module is now installed. Let's run the `deepseek_app.py` program to test if it works:
<function=execute_bash>
<parameter=command>python3 deepseek_app.py</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
Traceback (most recent call last):
File "/workspace/deepseek_app.py", line 1, in <module>
from deepseek import DeepSeekModel
ImportError: cannot import name 'DeepSeekModel' from 'deepseek' (/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/deepseek/__init__.py)
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 1]
----------
It seems the `DeepSeekModel` class is not available in the `deepseek` module. Let me check the documentation or available classes in the `deepseek` module:
<function=execute_bash>
<parameter=command>python3 -c "import deepseek; print(dir(deepseek))"</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
['DeepSeekAPI', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'api', 'const']
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
----------
The `deepseek` module provides a `DeepSeekAPI` class instead of `DeepSeekModel`. Let me update the `deepseek_app.py` file to use the correct class:
<function=str_replace_editor>
<parameter=command>str_replace</parameter>
<parameter=path>/workspace/deepseek_app.py</parameter>
<parameter=old_str>
from deepseek import DeepSeekModel
def call_deepseek_model(input_text):
model = DeepSeekModel()
response = model.generate(input_text)
return response
</parameter>
<parameter=new_str>
from deepseek import DeepSeekAPI
def call_deepseek_model(input_text):
api = DeepSeekAPI()
response = api.generate(input_text)
return response
</parameter>
</function>
----------
EXECUTION RESULT of [str_replace_editor]:
The file /workspace/deepseek_app.py has been edited. Here's the result of running `cat -n` on a snippet of /workspace/deepseek_app.py:
1 from deepseek import DeepSeekAPI
2
3 def call_deepseek_model(input_text):
4 api = DeepSeekAPI()
5 response = api.generate(input_text)
6 return response
7
8 if __name__ == "__main__":
9 test_input = "Hello, DeepSeek!"
10 result = call_deepseek_model(test_input)
Review the changes and make sure they are as expected. Edit the file again if necessary.
----------
Now that the file is updated, let's run the program again to test if it works:
<function=execute_bash>
<parameter=command>python3 deepseek_app.py</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
Traceback (most recent call last):
File "/workspace/deepseek_app.py", line 10, in <module>
result = call_deepseek_model(test_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepseek_app.py", line 4, in call_deepseek_model
api = DeepSeekAPI()
^^^^^^^^^^^^^
File "/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/deepseek/api.py", line 9, in __init__
raise ValueError("DEEPSEEK_API_KEY is missing")
ValueError: DEEPSEEK_API_KEY is missing
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 1]
----------
The program requires an API key to use the `DeepSeekAPI`. Let me update the program to include the API key as an environment variable. First, I'll modify the `deepseek_app.py` file:
<function=str_replace_editor>
<parameter=command>str_replace</parameter>
<parameter=path>/workspace/deepseek_app.py</parameter>
<parameter=old_str>
from deepseek import DeepSeekAPI
def call_deepseek_model(input_text):
api = DeepSeekAPI()
response = api.generate(input_text)
return response
</parameter>
<parameter=new_str>
import os
from deepseek import DeepSeekAPI
def call_deepseek_model(input_text):
api_key = os.getenv("DEEPSEEK_API_KEY")
if not api_key:
raise ValueError("Please set the DEEPSEEK_API_KEY environment variable.")
api = DeepSeekAPI(api_key=api_key)
response = api.generate(input_text)
return response
</parameter>
</function>
----------
EXECUTION RESULT of [str_replace_editor]:
The file /workspace/deepseek_app.py has been edited. Here's the result of running `cat -n` on a snippet of /workspace/deepseek_app.py:
1 import os
2 from deepseek import DeepSeekAPI
3
4 def call_deepseek_model(input_text):
5 api_key = os.getenv("DEEPSEEK_API_KEY")
6 if not api_key:
7 raise ValueError("Please set the DEEPSEEK_API_KEY environment variable.")
8 api = DeepSeekAPI(api_key=api_key)
9 response = api.generate(input_text)
10 return response
11
12 if __name__ == "__main__":
13 test_input = "Hello, DeepSeek!"
14 result = call_deepseek_model(test_input)
Review the changes and make sure they are as expected. Edit the file again if necessary.
----------
Now, let's set the `DEEPSEEK_API_KEY` environment variable and run the program again. Please provide the API key for DeepSeek, and I will set it up for you. If you don't have one, you may need to sign up for an API key from DeepSeek's official website.
<function=execute_bash>
<parameter=command>export DEEPSEEK_API_KEY=your_api_key_here && python3 deepseek_app.py</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
Traceback (most recent call last):
File "/workspace/deepseek_app.py", line 14, in <module>
result = call_deepseek_model(test_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepseek_app.py", line 9, in call_deepseek_model
response = api.generate(input_text)
^^^^^^^^^^^^
AttributeError: 'DeepSeekAPI' object has no attribute 'generate'
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 1]
----------
It seems the `DeepSeekAPI` class does not have a `generate` method. Let me check the available methods in the `DeepSeekAPI` class:
<function=execute_bash>
<parameter=command>python3 -c "from deepseek import DeepSeekAPI; print(dir(DeepSeekAPI))"</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_post_request', 'chat_completion', 'completion_impl', 'fim_completion', 'get_models', 'user_balance']
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
----------
The `DeepSeekAPI` class provides methods like `chat_completion`, `fim_completion`, and `get_models`. Let's update the `deepseek_app.py` file to use the `chat_completion` method instead of `generate`:
<function=str_replace_editor>
<parameter=command>str_replace</parameter>
<parameter=path>/workspace/deepseek_app.py</parameter>
<parameter=old_str>
api = DeepSeekAPI(api_key=api_key)
response = api.generate(input_text)
return response
</parameter>
<parameter=new_str>
api = DeepSeekAPI(api_key=api_key)
response = api.chat_completion(input_text)
return response
</parameter>
</function>
----------
EXECUTION RESULT of [str_replace_editor]:
The file /workspace/deepseek_app.py has been edited. Here's the result of running `cat -n` on a snippet of /workspace/deepseek_app.py:
4 def call_deepseek_model(input_text):
5 api_key = os.getenv("DEEPSEEK_API_KEY")
6 if not api_key:
7 raise ValueError("Please set the DEEPSEEK_API_KEY environment variable.")
8 api = DeepSeekAPI(api_key=api_key)
9 response = api.chat_completion(input_text)
10 return response
11
12 if __name__ == "__main__":
13 test_input = "Hello, DeepSeek!"
14 result = call_deepseek_model(test_input)
Review the changes and make sure they are as expected. Edit the file again if necessary.
----------
Now, let's run the program again with the updated method. Please provide the `DEEPSEEK_API_KEY` environment variable:
<function=execute_bash>
<parameter=command>export DEEPSEEK_API_KEY=your_api_key_here && python3 deepseek_app.py</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
Traceback (most recent call last):
File "/workspace/deepseek_app.py", line 14, in <module>
result = call_deepseek_model(test_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepseek_app.py", line 9, in call_deepseek_model
response = api.chat_completion(input_text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/deepseek/api.py", line 62, in chat_completion
response = self._post_request(API_CHAT_COM, payload, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/deepseek/api.py", line 25, in _post_request
raise Exception(f"HTTP Error {response.status_code}: {response.text}")
Exception: HTTP Error 401: {"error":{"message":"Authentication Fails (no such user)","type":"authentication_error","param":null,"code":"invalid_request_error"}}
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 1]
----------
The error indicates that the provided API key is invalid or does not correspond to a valid user. To proceed, you will need to:
1. Obtain a valid API key from DeepSeek's official platform.
2. Ensure the API key is correctly set in the environment.
Once you have a valid API key, you can set it and run the program again. Let me know if you need further assistance!
----------
key: *** ///在这里输入了key,隐藏
----------
Thank you for providing the API key. Let's set the environment variable and run the program again:
<function=execute_bash>
<parameter=command>export DEEPSEEK_API_KEY=sk-4a727097b1ef4e61a3444ec96e1e0db0 && python3 deepseek_app.py</parameter>
</function>
----------
EXECUTION RESULT of [execute_bash]:
Model Response: Hello! How can I assist you today? 😊
[Python Interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
今年玩过最好的游戏当然是《黑神话悟空》,远超预期,独有的中国文化之美,光看画面就是很大的享受,尤其是小西天,剧情音乐也都很顶,黄风岭陕西说唱、《壁上观》都很美,能玩到这样的游戏真好。不足的是难度没有循序渐进,开头的白衣居士卡得我差点放弃,后面反而障碍不多,但偶尔被 boss 卡还是挺难受。另外玩游戏过程中几乎感受不到剧情,这么顶的剧情全靠外部解说,也不知是不是个传播策略。
LLM在学习了大量文字对应图片的构图后,可以从语义推理这些文字的构图应该是怎样的,同时 LLM 自身也能很好理解哪些词是关键字,哪些词应该在同一行。比如上图的 旷野之息邮票 a stamp of Breath of the Wild,LLM 可以学到图上的文本应该是 Breath of the Wild,而对于邮票比较好的布局是上下两行,有个关键字 Wild 突出,得出相应的布局数据。
近期苹果发布的新品,无论是 iPhone 还是 Mac,都一改之前挤牙膏的风格,在最低配机器上都加大了内存,目的很明确,就是支撑 iPhone 和 Mac 上的端 AI 大模型。过去一年,AI手机、AI电脑的概念也一度在炒,在之前写的文章也说过,在客户端上跑大模型,一定是未来趋势。那目前端上大模型情况怎样?
有些场景可以不受设备大小限制、甚至续航功率限制,可以做得比较大,车机系统是一种,这是最好最大的应用场景,端上大模型 AI 应用会最先产生在这个领域,FSD也可以认为是端 AI 的一种。
还有一些可能得 AI 教育硬件,陪伴的玩偶等,本身也足够塞个大运算量芯片和大电池。一些刚需的硬件,比如导盲眼镜,也可以是连着口袋里一个不小的计算设备,这些算是后续可能的端上大模型的应用场景。
但除了车载系统以外,其他 AI 硬件要采用这种方式,发展会比较难。技术体验是一回事,还有商业模式的问题。
这些设备是自带硬件端上跑,还是云端跑,其实就是买断制和订阅制的区别。在端上跑需要用户一次性付出较高的硬件成本,但后续没有其他额外的成本。云端跑初期用户付出的硬件成本低,甚至厂家也愿意赔钱卖机器,但后期是可以用订阅服务制长期收费。从这角度看,用户和商家基本都会选择订阅制,对双方都更友好。所以端大模型要在 AI 硬件上流行起来,还比较难,除非是有些场景对隐私和实时性要求就是很高。
手机电脑
另一种就是利用已有设备,不需要用户额外花钱买硬件,那就还是回到设备大小、续航功耗、发热、机型覆盖等限制,有些场景为了省成本可以先用起来,PC / Mac 陆续可以有一些应用场景,例如上面提到的连接 ollama 的 MemoAI,浏览器上的 AI 搜索也非常适合端上 LLM 去做,但可能这几年会一直处于小场景尝试的阶段,要到主流的程度还早得很,也可能一直不会是主流,手机更是了。
漫画/绘本故事:核心是模型角色保持的能力。儿童绘本故事门槛很低,网上也有大量应用的教程,大众对质量的要求也没那么高,这是 AI 图生成目前擅长的。漫画门槛高一些,核心是故事、分镜的质量,生图所占的比例其实不高,所以如果用 AI 大规模生产,质量堪忧,但也有一些精品,比如这个。针对漫画有一些独立的产品和模型,例如dashtoon、Comic Factory、comicsmaker、llamagen等。
AI写真:人像 P 图是刚需,AI写真算是这个刚需的分支,火过很多产品,国内的妙鸭,海外Remini,还有一大波专门做这块的垂类产品 PhotoAI、星绘等。妙鸭虽然火一波以后销声匿迹,但这个需求是长期可持续的,photoAI 是独立开发者的产品,月流水已经到17万美元。主要用于各社交软件头像、linkedin商务照等。
新的艺术形式:摄影单独是一种艺术形式,相信 AI 图片生成也会带来独有的新的艺术形式,只是目前还未成型,摄影从诞生到成为一种艺术形式,也花了60年。跟画画与摄影不同的是,AI 图片生成创作,是有双向交互的,它不是定死的画笔或相机,创作过程中,AI 创作出来的内容会牵引下一步创作动作,不是一步到位,也不是忠实呈现自己脑里所想、呈现现实世界已有的东西,AI 不仅是工具,作品是人与 AI 的共创,有可能是新的艺术形式。
但跟摄影不同的是,图片生成技术,也许无法像拍照一样普及率那么高,摄像头记录美好生活是高频刚需,但创作不是,纯 AI 创作最终还是属于少部分创作者,就像能称为摄影师的只是少部分人。AI 技术进步是赋予了不会画画但有创意的一波人更强的能力,就像抖音最终赋予的也是少部分创作者展示他们才华的能力一样。
创作无法普及到大众,但创作出来的内容是能普及的,内容消费是大众刚需,至于这波创作者能否创作出跟摄像头相媲美的另一个维度的内容,支撑起一个 AI 内容消费社区,有待探索。
SVD(Stable Video Diffusion),23年12月发布。模型结构复用 Video LDM,主要是在 U-Net 和 VAE 解码器中分别加入时序层(temporal attention layer),SVD 论文本身在讲模型怎么训练的,包括高质量视频的微调。
PixelDance(字节),23年11月发布。特点是首尾帧机制,首帧图作为强引导,与噪点图拼接一起作为输入,严格遵守首帧图,同时尾帧图作为弱引导,训练中会随机抛弃尾帧,推理降噪过程中在步数大于τ值时也会抛弃尾帧,避免完全对齐,让生成的结果有多样性。在 DiT 架构的模型出现之前,效果基本是最好的,生成的视频运动幅度大,稳定性不错。未开源。
上面大部分模型,包括可控性的扩展模型,核心底层都是基于经典的 UNet 架构,但 Sora 出来后,业界公认 DiT 架构才是未来,毕竟效果太碾压了,最近可灵 / Luma 的出现也印证了这点。架构范式转移到 DiT 后,原先在 UNet 上做的各种可控雕花,看起来基本上是没法迁移到 DiT 架构的,一切得重来。
DiT 架构开源的只见到去年11月 sora 出来之前的 Latte,研究性比较多,效果一般。其他靠谱的开源模型还没见到,毕竟 Sora 还没见影,可灵和 luma 也刚出。(DiT架构的图片生成就有一些,比如腾讯混元)
DiT 架构后,视频生成和视频编辑这些模型大概率要淘汰,而人像视频可能在较长一段时间内仍有应用空间,如果要做 AI 视频短片,人物表情动作精细控制挺重要,DiT 架构目前还没看到有能做到精细控制的技术,基于 Unet 的通用视频生成模型这么长时间也没法做好这块的可控性,可能一段时间内还得靠原有技术做这里的可控后编辑。
Transformer 作为新 AI 时代的基石,有必要深入了解下。网上对 Transformer 的教学文章/视频非常多,很多讲得很好,像 3Blue1Brown 的讲解视频,以及这篇文章。整个详细过程原理写不来,本文主要记录一下其中我觉得比较容易混淆的 Attention 模块运算过程,主要是里面的 Q K V 的概念/运算过程/作用。