Voicecall Initial

MetaGLM · Nov 5, 2024 · ea96e6d · ea96e6d
1 parent 7f5e743
commit ea96e6d
Show file tree

Hide file tree

Showing 3 changed files with 189 additions and 0 deletions.
diff --git a/vision/data/audio_base64_output.txt b/vision/data/audio_base64_output.txt
diff --git a/vision/data/lecture.wav b/vision/data/lecture.wav
diff --git a/vision/glm-voicecall.ipynb b/vision/glm-voicecall.ipynb
@@ -0,0 +1,188 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# GLM-4-Plus-VideoCall Audio Call\n",
+    "\n",
+    "**This tutorial is available in English and is attached below the Chinese explanation**\n",
+    "\n",
+    "该脚本使用 GLM-4-Plus-VideoCall 输入音频文件，并通过 WebSocket 连接将其发送到服务器转录音频中的人声内容。\n",
+    "\n",
+    "This script uses GLM-4-Plus-VideoCall to input an audio file and send it over a WebSocket connection to a server for a transcription of the human voice in the audio. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Import and Encode Audio File\n",
+    "\n",
+    "首先导入音频处理、编码和 WebSocket 通信所需的库。然后读取 WAV 文件并将其编码为 Base64。WAV 文件以未压缩格式存储音频数据，因此适合编码和发送。\n",
+    "\n",
+    "Start by importing the required libraries for audio handling, encoding, and WebSocket communication. Then read a WAV file and encode it to Base64. WAV files store audio data in an uncompressed format, making it suitable for encoding and sending. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Channels: 2, Sample Width: 2, Frame Rate: 44100\n",
+      "Base64 编码的音频数据已保存到文件：/Users/lezhiliu/Desktop/glm-cookbook/vision/data/audio_base64_output.txt\n"
+     ]
+    }
+   ],
+   "source": [
+    "import websockets\n",
+    "import json\n",
+    "import time\n",
+    "import base64\n",
+    "import wave\n",
+    "\n",
+    "def read_and_encode_audio(file_path):\n",
+    "    with wave.open(file_path, 'rb') as audio_file:\n",
+    "\n",
+    "            channels = audio_file.getnchannels()\n",
+    "            sample_width = audio_file.getsampwidth()\n",
+    "            frame_rate = audio_file.getframerate()\n",
+    "\n",
+    "            print(f\"Channels: {channels}, Sample Width: {sample_width}, Frame Rate: {frame_rate}\")\n",
+    "\n",
+    "            audio_data = audio_file.readframes(audio_file.getnframes())\n",
+    "\n",
+    "            base64_encoded_audio = base64.b64encode(audio_data).decode('utf-8')\n",
+    "\n",
+    "            return base64_encoded_audio\n",
+    "\n",
+    "def save_to_file(data, output_path):\n",
+    "    with open(output_path, 'w') as file:\n",
+    "        file.write(data)\n",
+    "        print(f\"Base64 编码的音频数据已保存到文件：{output_path}\")\n",
+    "\n",
+    "\n",
+    "file_path = \"/Users/lezhiliu/Desktop/glm-cookbook/vision/data/lecture.wav\"\n",
+    "output_file_path = \"/Users/lezhiliu/Desktop/glm-cookbook/vision/data/audio_base64_output.txt\"\n",
+    "\n",
+    "base64_audio = read_and_encode_audio(file_path)\n",
+    "\n",
+    "if base64_audio:\n",
+    "    save_to_file(base64_audio, output_file_path)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Save the Base64 String\n",
+    "进行编码后，将 Base64 字符串保存为外部的 txt 文件。\n",
+    "\n",
+    "After encoding, the Base64 string is saved as an external txt file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def read_base64_file(file_path):\n",
+    "    with open(file_path, 'r') as file:\n",
+    "            return file.read()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## 3. Establishing a WebSocket Connection for Audio Transcription\n",
+    "现在，通过 connect_websocket 函数，我们建立了一个与指定 URI 的 WebSocket 连接，该连接带有授权标头，并使用承载令牌进行身份验证。连接后，它首先会收到服务器的初始响应。然后，它从指定路径读取一个 Base64 编码的音频文件，并构建一个包含各种参数的信息：一个客户端时间戳、一个提示（要求服务器转录音频）和一个控制部分，其中指定了文本响应格式、基于服务器的语音活动检测（VAD）、PCM 音频编码、语音类型和速比。发送 JSON 消息后，函数将等待并打印服务器的响应，以确认传输成功。\n",
+    "\n",
+    "Now, with the connect_websocket functio, we established a WebSocket connection to a specified URI with authorization headers, using a bearer token for authentication. Upon connection, it first receives an initial response from the server. Then, it reads a Base64-encoded audio file from the specified path and constructs a message with various parameters: a client timestamp, a prompt (asking the server to transcribe the audio), and a control section specifying text response format, server-based voice activity detection (VAD), audio encoding as PCM, voice type, and speed ratio. After sending this JSON message, the function awaits and prints the server’s response to confirm successful transmission."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "服务器响应： {\"conversation_id\":\"202411051348056326c7bcf9904cf1\",\"created\":1730785685768,\"message\":{\"content\":\"inited\",\"role\":\"assistant\",\"type\":\"event\"}}\n",
+      "消息已发送\n",
+      "服务器响应： {\"conversation_id\":\"202411051348056326c7bcf9904cf1\",\"created\":1730785713543,\"message\":{\"content\":\"很抱歉，\",\"role\":\"assistant\",\"type\":\"text\"},\"message_id\":\"9157393711747323263\"}\n"
+     ]
+    }
+   ],
+   "source": [
+    "async def connect_websocket():\n",
+    "    uri = f\"wss://open.bigmodel.cn/api/paas/ws/chat\"\n",
+    "    extra_headers = {\n",
+    "    \"Authorization\": \"Bearer {0}\".format(\"2c311d005bc6ced65ea28cdddbcbd50d.x5FWOMtkY319r8ns\")\n",
+    "    }\n",
+    "    async with websockets.connect(uri, extra_headers=extra_headers, ping_interval=120, ping_timeout=200 ) as websocket:\n",
+    "        response = await websocket.recv()\n",
+    "        print(f\"服务器响应： {response}\")\n",
+    "\n",
+    "        audio_base = read_base64_file(\"/Users/lezhiliu/Desktop/glm-cookbook/vision/data/audio_base64_output.txt\")\n",
+    "\n",
+    "        message = {\n",
+    "            \"client_timestamp\": time.time(),\n",
+    "            \"system_prompt\": \"请问这位女士在讲什么？请把你听到的每句话都复述出来。\",\n",
+    "            \"chunk_type\": \"append\",\n",
+    "            \"audio_chunk\": audio_base,\n",
+    "            \"control\": {\n",
+    "                \"response_type\": \"text\",\n",
+    "                \"vad_config\": {\n",
+    "                    \"server_vad\": True,\n",
+    "                    \"finish_time\": 60\n",
+    "                },\n",
+    "                \"audio_config\": {\n",
+    "                    \"encoding\": \"pcm\",\n",
+    "                    \"voice_type\": \"NORMAL_FEMALE\",\n",
+    "                    \"speed_ratio\": 1.00\n",
+    "                }\n",
+    "            }\n",
+    "        }\n",
+    "\n",
+    "        await websocket.send(json.dumps(message))\n",
+    "        print(\"消息已发送\")\n",
+    "\n",
+    "        response = await websocket.recv()\n",
+    "        print(f\"服务器响应： {response}\")\n",
+    "\n",
+    "\n",
+    "await connect_websocket()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}