Code Interpreter for llm

前言

用过openai 的Assistants ，GPTS等功能的话，应该了解openai 提供了Code Interpreter的功能。可以在python解释器环境下执行代码，对于Assistants而言，Code Interpreter 应该也属于工具的一样，当用户输入自然语言描述的时候将触发工具调用，然后调用Code Interpreter执行代码，并返回输出。

但openai的Code Interpreter 依赖于openai，无法本地部署，所以本文主要目的是创建一个简单易用的代码执行器。

Jupyter kernelgateway

Jupyter KernelGateway 是一个Web服务器，它提供了一种简单的方法来让外部服务通过网路接口与Jupyter的内核进行交互。它允许开发者和系统管理员通过REST API从远程位置执行代码，获取代码执行结果，以及管理Jupyter内核，我们这里主要的Coder server 服务主要通过Jupyter kernelgateway 来实现。


jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 \
    --KernelGatewayApp.port=8888 \
    --KernelGatewayApp.auth_token="${TOKEN}" \
    --JupyterApp.answer_yes=true \
    --JupyterWebsocketPersonality.list_kernels=true

构建Docker进行构建，这里的python 版本和


FROM python:3.10.12-slim
RUN mkdir /notebook
RUN apt-get update
RUN apt-get install -y graphviz
COPY ./docker/server.requirement.txt /app/requirements.txt
RUN pip3 install --no-cache-dir -r /app/requirements.txt
WORKDIR /notebook
CMD jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 \
    --KernelGatewayApp.port=8888 \
    --KernelGatewayApp.auth_token="${TOKEN}" \
    --JupyterApp.answer_yes=true \
    --JupyterWebsocketPersonality.list_kernels=true

EXPOSE 8888

这里安装的python版本和一些依赖版本，主要参考了下Colab


jupyter_kernel_gateway==3.0.0
ipykernel==5.5.6
matplotlib==3.7.1
matplotlib-inline==0.1.6
matplotlib-venn==0.11.10
numpy==1.25.2
pandas==1.5.3
requests==2.31.0
beautifulsoup4==4.12.3
huggingface-hub==0.20.3
mitmproxy==10.2.4
graphviz==0.20.3
networkx==3.2.1

镜像已经构建好了，也可以直接拉取


docker run --rm -e TOKEN="241b2687-e3f2-43b5-826b-cb91e8be6b08" -p 8888:8888 qingzhaoli/fcoder-server:latest

Coder Client

Jupyter kernelgateway 作为的代码执行期服务构建好后，主要通过客户端请求来执行代码


from fcoder import CoderClient
coder_server_auth_token = "241b2687-e3f2-43b5-826b-cb91e8be6b08"
client = CoderClient(
    server_host="127.0.0.1",
    server_port=8888,
    auth_token=coder_server_auth_token
)
result = client.code_interpreter("print('hello')")

执行结果如下


{
      "status": "ok",
      "output": [
          {
              "text/plain": "hello\n"
          }
      ],
      "error_trace": null,
      "message": ""
  }

也可以用于安装依赖


result = client.code_interpreter("!pip install requests")
------
{
    "status": "ok",
    "output": [
        {
            "text/plain": "Requirement already satisfied: requests in /usr/local/lib/python3.10/site-packages (2.31.0)\r\nRequirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests) (3.7)\r\nRequirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests) (2.2.1)\r\nRequirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/site-packages (from requests) (3.3.2)\r\nRequirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests) (2024.6.2)\r\n"
        },
        {
            "text/plain": "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\r\n\u001b[0m\r\n\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\r\n\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\r\n"
        }
    ],
    "error_trace": null,
    "message": ""
}

Code Interpreter

Code Interpreter 本质上还是工具调用，LLM模型需要根据描述来返回函数调用，code_interpreter 工具函数和Tool schema 如下


def code_interpreter(code: str) -> CoderExecResult:
    """
    Code Interpreter is a code executor. Note that the execution environment of the code is Jupyter and the programming language is Python. Of course, you can also use the magic method % symbol supported by Jupyter and the ! symbol to execute system commands. % represents the line magic command. , such as %run is used to run external Python programs, the ! symbol is used to execute system commands, such as !ls is used to execute system commands, or !pip install requests means calling pip to update dependencies.

    Args:
        code: Python spec code or Linux command (pip3 only), This is the code to execute, do not include redundant comments

    Returns:
        CoderExecResult: Coder exec result

    """
    result = coder_client.code_interpreter(code)
    return result

coder_tool_schema = {
        "type": "function",
        "function": {
            "description": "Code Interpreter is a code executor. Note that the execution environment of the code is Jupyter and the programming language is Python. Of course, you can also use the magic method % symbol supported by Jupyter and the ! symbol to execute system commands. % represents the line magic command. , such as %run is used to run external Python programs, the ! symbol is used to execute system commands, such as !ls is used to execute system commands, or !pip install requests means calling pip to update dependencies.",
            "name": "code_interpreter",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Python spec code or Linux command (pip3 only), This is the code to execute, do not include redundant comments"
                    }
                },
                "required": [
                    "code"
                ]
            },
            "annotation": "code_interpreter(code: str)"
        }
    }

用户输入，然后LLM 返回Tool call，调用本地coder interpreter执行，并获取返回结果，将结果返回给llm. 让其进行有效分析。


user> Please write a Hello World in Python. Concurrent execution
tool> code_interpreter({"code": "print('Hello World')"})
exec result> Hello World

user> Please analyze the execution results。
assistant> The execution result is straightforward: a Python script that outputs "Hello World" to the console. The code used is a simple print statement, and when executed, it correctly outputs the string. This confirms that the Python environment is functioning correctly and the print statement works as expected.

Here's a breakdown of the process:

1. **Input Code**: The Python code provided was:
   '''
   print('Hello World')
   '''
2. **Execution**: When executed, this code invokes the `print` function, which outputs the string passed to it—in this case, "Hello World"—to the console.

3. **Output**: The result of the execution is the string "Hello World" printed to the console.

This is a standard test to ensure that a Python environment is properly set up and can execute basic code. The successful output "Hello World" indicates that the environment is correctly interpreting and running Python code.

当使用Gpt-4o等模型的时候，也可以将结果图一并返回给多模态模型，让其进行分析。


user> Please draw a normal distribution graph
tool> code_interpreter({
  "code": "import matplotlib.pyplot as plt\nimport numpy as np\n\n# Generate data for a normal distribution\nmu, sigma = 0, 0.1  # mean and standard deviation\ns = np.random.normal(mu, sigma, 1000)\n\n# Create the histogram\ncount, bins, ignored = plt.hist(s, 30, density=True)\n\n# Plot the normal distribution curve\nplt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *\n         np.exp( - (bins - mu)**2 / (2 * sigma**2) ),\n         linewidth=2, color='r')\nplt.title('Normal Distribution')\nplt.xlabel('Value')\nplt.ylabel('Frequency')\nplt.show()"
})
exec result> <Figure size 432x288 with 1 Axes>
user> Please analyze the execution results。
assistant> The execution result is a plot showing a normal distribution graph, which includes a histogram of randomly generated data points that follow a normal distribution, overlaid with a red curve representing the theoretical normal distribution function.

### Breakdown of the Process:

1. **Data Generation**:
   - We generate 1000 data points from a normal distribution using the `numpy` function `np.random.normal`, with a mean (`mu`) of `0` and a standard deviation (`sigma`) of `0.1`.
   '''
   mu, sigma = 0, 0.1  # mean and standard deviation
   s = np.random.normal(mu, sigma, 1000)
   '''

2. **Creating the Histogram**:
   - The histogram of these data points is created using the `plt.hist` function.
   - The `density=True` parameter normalizes the histogram such that the total area under the histogram is equal to 1.
   '''
   count, bins, ignored = plt.hist(s, 30, density=True)
   '''

3. **Plotting the Normal Distribution Curve**:
   - The red curve overlay on the histogram represents the theoretical normal distribution function.
   - This is calculated using the probability density function of the normal distribution:
   '''
   plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ), linewidth=2, color='r')
   '''

4. **Adding Labels and Title**:
   - The plot is given a title and axis labels for better readability and context.
   '''
   plt.title('Normal Distribution')
   plt.xlabel('Value')
   plt.ylabel('Frequency')
   '''

### Analysis:
- **Histogram**:
  - The blue bars represent the frequency of data points within each bin, normalized such that the total area under the histogram equals 1.
  - The shape of the histogram approximates the bell shape, which is characteristic of the normal distribution.

- **Normal Distribution Curve (Red Line)**:
  - The red line represents the theoretical normal distribution for the given mean (`mu=0`) and standard deviation (`sigma=0.1`).
  - It follows the equation of the normal distribution's probability density function (PDF).
  - The curve fits well over the histogram, indicating that the generated data points indeed follow a normal distribution.

The plot effectively demonstrates both empirical data (through the histogram) and theoretical distribution (through the red curve), providing a visual confirmation of the normal distribution properties.
user>

参考链接

fcoder
liqzz • Updated Jun 9, 2024