-
Notifications
You must be signed in to change notification settings - Fork 806
Description
System Info / 系統信息
Operating System/操作系统: CentOS Linux release 8.5.2111
Docker Version/Docker版本: Docker version 26.1.3, build b72abbb
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- docker / docker
- pip install / 通过 pip install 安装
- installation from source / 从源码安装
Version info / 版本信息
Supersisor Node Version/主节点版本: v1.4.0
Worker-01 Node Version/工作节点-01版本: v1.4.0
Worker-02 Node Version/工作节点-02版本: v1.2.0-cpu
The command used to start Xinference / 用以启动 xinference 的命令
Supersisor Node Commands for Starting/主节点启动命令:
docker run -d
-v xinference_supervisor_.xinference_data:/root/.xinference
-v xinference_supervisor_.cache_data:/root/.cache
-v /root/xinference_oauth2.json:/root/xinference_oauth2.json
-e XINFERENCE_MODEL_SRC=modelscope
--network=host
--restart=always
--name xinference-supervisor
--gpus all
xprobe/xinference:latest
xinference-supervisor -H "${supervisor_host}" --log-level debug --auth-config /root/xinference_oauth2.json
Worker-01 Node Commands for Starting/工作节点-01启动命令:
docker run -d
-v xinference_worker_01_.xinference_data:/root/.xinference
-v xinference_worker_01_.cache_data:/root/.cache
-e XINFERENCE_MODEL_SRC=modelscope
--network host
--restart=on-failure
--name xinference-worker-01
--gpus all
xprobe/xinference:latest
xinference-worker -e "http://${supervisor_host}:9997" -H "${supervisor_host}" --log-level debug
Worker-02 Node Commands for Starting/工作节点-02启动命令:
docker run -d
-v xinference_worker_02_.xinference_data:/root/.xinference
-v xinference_worker_02_.cache_data:/root/.cache
-e XINFERENCE_MODEL_SRC=modelscope
--network host
--restart=on-failure
--name xinference-worker-02
xprobe/xinference:v1.2.0-cpu
xinference-worker -e "http://${supervisor_host}:9997" -H "${worker_host}" --log-level debug
Reproduction / 复现过程
Question /问题:
The Woker node that has been successfully registered into the cluster has a certain probability of indicating that it is not in the cluster when starting the model with the specified working node IP. And the publish button is clickable, but visually appears in a non clickable gray color. The following are screenshots of the problem, cluster information, and stack output of the Supervisor node. The target Work node does not have the corresponding output content.
已经成功注册进集群的Woker节点,在指定工作节点IP启动模型时有一定概率提示该节点不在集群中。而且发布按钮是可以点击的,但在视觉上呈现不可点击的灰色。以下分别是出现问题的截图、集群信息的截图和Supervisor节点的堆栈输出,目标Work节点没有相应的输出内容。
重新启动该Worker节点,在堆栈中会输出下图内容(下图左侧1个打马赛克的是Supervisor节点的宿主机地址,右边2个打马赛克的是Worker节点的宿主机地址),即提示连接Supervisor节点失败,但从xinference管理页面来看是连接成功的。随后又输出Worker节点启动成功。
Expected behavior / 期待表现
Suggestion/建议: It is suggested to adjust the display logic of relevant front-end pages.建议调整相关前端页面展示逻辑。



