-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] GPT-4o的流式传输,webui显示速度跟不上传输速度 #2594
Comments
💻 System environmentWindows 📦 Deployment environmentDocker 🌐 BrowserChrome 🐛 Problem descriptionFor GPT-4o streaming, the display speed is not as fast as the transmission speed. 🚦 Expected resultsIncrease the default value. Of course, the best solution is real-time sampling, done on the backend. 📷 Steps to reproduceAnswer any question using the GPT-4o API. Even more obvious for long answers. 📝 Supplementary informationI'm directly connected. Without proxy |
👀 @hlc1209 Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. |
Thanks Reply. |
不知是否尝试过当 api 生成完毕后一次性输出剩余内容,并且不自动下滚,体验如何。 |
I wonder if you have tried to output the remaining content at once after the api is generated without automatically scrolling down. What is your experience? |
The case you mentioned actually illustrates my problem very well. The best solution is of course real-time streaming similar to openai app. |
Edited 这个办法是一个比较好的解决措施。 |
This is a very good solution to #2534. |
@hlc1209 Can you record a screen to see the output of openai’s gpt-4o? I was previously thinking about whether it was possible to achieve automatic dynamic rate adjustment. The previous PR #1197 actually mentioned the problem that it is best to achieve smooth for slow providers. After 1.0 is released, I want to study whether it is possible to optimize the implementation of this area. |
如果能让用户调节api生成期间的平滑输出速率,也是不错的。主要考虑窄屏设备的浏览体验。 |
It would also be nice if the user could adjust the smooth output rate during API generation. Mainly consider the browsing experience of narrow screen devices. |
@sxjeru 用户调节就算了,这个太微操了。还是考虑下怎么做自动化比较好 |
@sxjeru Forget about user adjustment, this is too micro-managed. It would be better to consider how to automate it |
我比较懒hahaha,但我可以描述一下。 综上,我认为官方app也是有显示缓存的,只是其有内部数据,在多数情况下能有一个比较精确的估计。 |
I'm lazy hahaha, but I can describe it. To sum up, I think the official app also has a display cache, but it has internal data, which can give a more accurate estimate in most cases. |
我最早用 ChatGPT App 的时候也会遇到这个,所以其实是借鉴了这个实现逻辑。他们做的体验好一些的本质原因还是你提到的他们知道每秒输出的token有多少。这个对于我们来说基本上没法设定一个固定值。(毕竟我们提供了 Proxy Url,我们也不知道三方服务商是否会给 4o 再套个壳这种…) 所以终极解法应该还是基于 SSE 间隔和吐字速度算出一个 TPS ,然后实现动态的速率调节。 |
I also encountered this when I first used ChatGPT App, so I actually borrowed this implementation logic. The essential reason why their experience is better is that they know how many tokens are output per second as you mentioned. This is basically impossible for us to set a fixed value. (After all, we provide Proxy Url, and we don’t know whether third-party service providers will add another shell to 4o...) Therefore, the ultimate solution should be to calculate a TPS based on the SSE interval and articulation speed, and then implement dynamic rate adjustment. |
不过说实话#1197 提到的问题我以前GPT-4初代(仍是目前规模最大的模型,且速度很慢)刚发布的时候经常在官方app上遇到。 综上,其实在1.0版本前,最美观优雅的修复方法就是
1.0之后,最好的方法正如你所说,动态速率调节。 |
But to be honest, I often asked about the issues mentioned in #1197 when the first generation of GPT-4 (still the largest model at present, and very slow) was first released. Found it on the official app. To sum up, in fact, before version 1.0, the most beautiful and elegant repair method was
After 1.0, the best way is as you said, dynamic rate adjustment. |
这个我感觉可以加,是目前成本比较低,体验估计也还行的方案 |
Don’t forget to increase the maximum output speed after the API transfer is completed😂 |
|
|
这个特性 #2223 有实现,是否是需要暂时关闭自动滚动,直到本次消息生成完毕。 |
This feature #2223 has been implemented. Is it necessary to temporarily turn off automatic scrolling until this message is generated? |
@sxjeru 我建议是当接口完成输出后,临时停止自动滚动,这样比较好一些 |
@sxjeru My suggestion is to temporarily stop automatic scrolling after the interface completes output. This is better. |
我记得之前用 Anthropic Claude 的时候也有这个问题 |
I remember I had this problem when I used Anthropic Claude before. |
是不是可以看本地收到的流message堆积量来动态调节,如果堆积大于阈值可以按倍率或者指数提升显示速度? |
Is it possible to dynamically adjust the accumulated flow messages received locally? If the accumulation is greater than the threshold, the display speed can be increased by multiples or exponentially? |
接口响应内容是非常快速的时候还需要等待ui慢慢的逐字输出,非常影响效率。我对比了console输出和webui的输出,从开始响应到结束响应甚至有相差20s的。 |
When the interface response content is very fast, you still need to wait for the UI to slowly output word by word, which greatly affects efficiency. I compared the console output and the webui output, and there was even a 20s difference from the beginning response to the end response. |
有咩有计划排期,目前等待太折磨了 |
Do you have any plans and schedules? It’s too painful to wait at the moment. |
@Loongphy 等知识库/文件上传发布后会开始做这些积累的待优化项 |
@Loongphy will start working on these accumulated optimization items after the knowledge base/files are uploaded and published. |
✅ @hlc1209 This issue is closed, If you have any questions, you can comment and reply. |
🎉 This issue has been resolved in version 1.15.10 🎉 The release is available on: Your semantic-release bot 📦🚀 |
而且没办法后台输出,切换到别的页面,它的页面输出就停下来了,要切换回来,看他慢慢输出,太折磨了 |
And there is no way to output in the background. If you switch to another page, its page output will stop. It is too torture to switch back and watch it output slowly. |
停止也是这样,点停止也要等他很久才停下来 |
The same goes for stopping. If you click stop, you have to wait for a long time before he stops. |
同意,相比于token消耗,我更加无法忍受“需要注意力的等待”和“寻找”两项费脑筋的工作。我更加喜欢过几秒钟后直接把整段回复丢给我,就像绝大多数聊天软件一样。 |
Agreed, compared to token consumption, I can’t stand the two mind-consuming tasks of “waiting that requires attention” and “searching”. I prefer that the entire reply is thrown directly at me after a few seconds, just like most chat software. |
💻 系统环境
Windows
📦 部署环境
Docker
🌐 浏览器
Chrome
🐛 问题描述
GPT-4o的流式传输,显示速度更不上传输速度。
GPT-4o的输出速度奇快
ui一开始的流式传输显示速度是较慢的,几秒后ui显示速度会突然变快,估计是api那里已经传输完毕了,但是显示还没显示完全。
webui显示速度的设置值是一个预设值吧?而非实时采样的值
🚦 期望结果
提高一下预设值吧。
需要提高两种预设值
1是GPT-4o的值
2是最大速度值。(这个值其实也影响了其他场景,比如我切换了tab,然后切回来,显示的回答是未更新的,但API早就传输完了,这个场景下,现在的最大值远远不够)
当然,最好的解决方案是实时采样,做在后端。
📷 复现步骤
使用GPT-4o API回答任意问题。对长回答更加明显。
📝 补充信息
我是直连的。没走代理
The text was updated successfully, but these errors were encountered: