Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deploy to C++ project, thank you!怎么部署到C++工程,谢谢! #220

Open
zoufangyu1987 opened this issue Jul 23, 2019 · 89 comments

Comments

@zoufangyu1987
Copy link

Now there is a pytorch model. The CenterNet code is python. My project needs c++. How to deploy it?
Note: My Python code ability is weak

现在已经有pytorch的模型,CenterNet代码都是python的,我的工程需要c++,怎么布署呢?
注:本人python代码能力弱

@kakaluote
Copy link

kakaluote commented Jul 23, 2019

一种解决方案是python做服务器,c++通过socket发送检测请求。
还是希望能c++推断

@zoufangyu1987
Copy link
Author

Because there are not only detection modules in my project, but also other modules, so I need C++, look forward to guidance!
因为我的工程里不只有检测模块,还有其它模块,所以需要C++,望指导!

@xingyizhou
Copy link
Owner

I honestly have no experience in c++ deployment ...

@zoufangyu1987
Copy link
Author

@xingyizhou
Thank you anyway.

@Markusgami
Copy link

Convert model to caffe. And run it in caffe

@zoufangyu1987
Copy link
Author

@Markusgami
This is OK, mainly forward-propagating code needs to be converted to c++, is there any available c++ code? Thank you!
这样是可以,主要是前向传播的代码需要转成c++,有可用的c++代码吗?谢谢!

@kunyao2015
Copy link

持续关注,有好的解决方案吗, 用libtorch?

@zoufangyu1987
Copy link
Author

Wait online, it's urgent!
在线等,挺急的!

@jnulzl
Copy link

jnulzl commented Jul 29, 2019

@zoufangyu1987 @kunyao2015
实现步骤:
1、把模型转成caffemodel;
2、前后处理自己c++实现;
3、Done!
亲测可以,祝你好运!

@zoufangyu1987
Copy link
Author

@jnulzl
Can you share your c++ code? Thank you!
可以分享你的c++代码吗?谢谢!

@jnulzl
Copy link

jnulzl commented Jul 29, 2019

@zoufangyu1987
不好意思,暂时不行

@zoufangyu1987
Copy link
Author

@jnulzl
I want to cry!
好想哭!

@BokyLiu
Copy link

BokyLiu commented Jul 30, 2019

用trace转模型,再用libtorch部署吧,亲测可用

@wangshankun
Copy link

@zoufangyu1987 @kunyao2015
实现步骤:
1、把模型转成caffemodel;
2、前后处理自己c++实现;
3、Done!
亲测可以,祝你好运!

转caffe时候的DCNV2怎么办?caffe又不是天然支持

@zoufangyu1987
Copy link
Author

我把没有dcn层的dlav0_34的pytorch模型已经转成caffemodel,这两天理了下centernet的demo的前处理和后处理相关python代码,还是蛮繁琐的,有搞好的朋友分享下C++代码啊,万分感谢!
I have converted the pytorch model of dlav0_34 without DCN layer into caffe model. These two days, I have handled the Python code related to the pre-processing and post-processing of the demo in the CenterNet. It is Difficult. Who can share the C++ code well? Thank you very much!

@Fighting-JJ
Copy link

用trace转模型,再用libtorch部署吧,亲测可用

trace 可以成功么?你训练的什么arch的模型?
@BokyLiu

@BokyLiu
Copy link

BokyLiu commented Aug 8, 2019

用trace转模型,再用libtorch部署吧,亲测可用

trace 可以成功么?你训练的什么arch的模型?
@BokyLiu

res18的

@Fighting-JJ
Copy link

@zoufangyu1987 @kunyao2015
实现步骤:
1、把模型转成caffemodel;
2、前后处理自己c++实现;
3、Done!
亲测可以,祝你好运!

转caffe时候的DCNV2怎么办?caffe又不是天然支持

请问,你解决了DCNv2的部署问题么?

@zoufangyu1987
Copy link
Author

@Fighting-JJ
没有,没有找到pytorch转caffemodel支持DCNV2层的代码,现在用dlav0_34,放弃dcn,caffemodel已经验证输出的output参数完全一致,不过是在python上验证的,C++还没有搞,工作量有点大,坑有点多,发现有好些朋友已经成功,但不分享源码也没办法,只能一步一步搞
No, I haven't found the code of pytorch to caffemodel to support DCNV2 layer. Now I use dlav0_34, give up dcn, caffemodel to verify that the output parameters of output are exactly the same. However, it's verified on python. C++ hasn't been done yet, the workload is a bit heavy, the pit is a bit too many. I find that some friends have succeeded, but they don't share the source code. There's no way to do it. We can only do it step by step.

@zoufangyu1987
Copy link
Author

等我成功在C++上部署,一定分享给大家源码
When I successfully deploy on C++, I will share the source code with you.

@zoufangyu1987
Copy link
Author

我已经把pytorch全部剥离,在python上依赖numpy正常跑通了,下一步转C++,发现numpy有C++版本——"numcpp",搞得身心疲惫,希望后面少点坑!
I've stripped all pytorch and relied on numpy to run normally on python. Next, I turn to C++. I find that numpy has a version of C++ - "numcpp", which makes me tired physically and mentally. I hope there are fewer pits behind it.

@Fighting-JJ
Copy link

我已经把pytorch全部剥离,在python上依赖numpy正常跑通了,下一步转C++,发现numpy有C++版本——"numcpp",搞得身心疲惫,希望后面少点坑!
I've stripped all pytorch and relied on numpy to run normally on python. Next, I turn to C++. I find that numpy has a version of C++ - "numcpp", which makes me tired physically and mentally. I hope there are fewer pits behind it.

You can use jit.trace to trace the model then deploy it with c++ by libtorch which is a c++ library.
then only the post-process is left.

@chenjx1005
Copy link

我已经把pytorch全部剥离,在python上依赖numpy正常跑通了,下一步转C++,发现numpy有C++版本——"numcpp",搞得身心疲惫,希望后面少点坑!
I've stripped all pytorch and relied on numpy to run normally on python. Next, I turn to C++. I find that numpy has a version of C++ - "numcpp", which makes me tired physically and mentally. I hope there are fewer pits behind it.

It's no need to use numcpp. You can read the image to CV::Mat by opencv in C++ version, and convert the Mat to caffe::Blob.

@zoufangyu1987
Copy link
Author

zoufangyu1987 commented Aug 14, 2019

@chenjx1005
我也认为仅仅用OpenCV是可以的,不过我已经在用NumCpp和OpenCV结合,我先试试,如果不行再去掉NumCpp
I also think it's possible to use OpenCV only.But I've combined NumCpp with OpenCV. I'll try it first, and if I can't, I'll remove NumCpp.

@hexiangquan
Copy link

https://github.com/hexiangquan/CenterNetCPP

@zoufangyu1987
Copy link
Author

@hexiangquan
感激不尽
Be deeply grateful

@zoufangyu1987
Copy link
Author

基于numcpp的C++版本加载caffemodel已经成功了,结果一致,谢谢大家!

@BokyLiu
Copy link

BokyLiu commented Aug 20, 2019

基于numcpp的C++版本加载caffemodel已经成功了,结果一致,谢谢大家!

期待您的分享

@zoufangyu1987
Copy link
Author

@BokyLiu
这几天上班事情有点多,这两天空下来我就把整个流程和相关文件整理一下分享给大家
上面@hexiangquan也已经分享预处理、forward、后处理的c++代码,感谢他!

@Fighting-JJ
Copy link

我已经把pytorch全部剥离,在python上依赖numpy正常跑通了,下一步转C++,发现numpy有C++版本——"numcpp",搞得身心疲惫,希望后面少点坑!
I've stripped all pytorch and relied on numpy to run normally on python. Next, I turn to C++. I find that numpy has a version of C++ - "numcpp", which makes me tired physically and mentally. I hope there are fewer pits behind it.
You can use jit.trace to trace the model then deploy it with c++ by libtorch which is a c++ library.
then only the post-process is left.

您trace的时候dcn_v2部分有没有trace成功? @Fighting-JJ

我没有使用DCN的,用的DLA034或者是resnet

@121649982
Copy link

@zoufangyu1987 If it is possible could you please share the C++ pre-processing and post processing code? I have managed to make the pt file for hourglass model and was able to load it in windows using libtorch. Only thing left is the pre-processing and post processing steps. I was notable to access the baidu link you shared. Is there any other way you can share the code?
how can you convert your model to pt?

@121649982
Copy link

@ALL 另一种方式也可尝试:C++直接调用python下的pytorch模型(亲测有效)
参见博客: https://blog.csdn.net/u011681952/article/details/92765549

这个需要python

@VishnuPJ
Copy link

VishnuPJ commented Dec 4, 2019

@zoufangyu1987 If it is possible could you please share the C++ pre-processing and post processing code? I have managed to make the pt file for hourglass model and was able to load it in windows using libtorch. Only thing left is the pre-processing and post processing steps. I was notable to access the baidu link you shared. Is there any other way you can share the code?
how can you convert your model to pt?

This worked for me,
#414 (comment)

@zoufangyu1987
Copy link
Author

@BokyLiu
@Fighting-JJ
为什么我的模型在libtorch比pytorch多很多,pytorch显存占750M,libtorch占1300M,为什么会这样?
Why does my model have a lot more in libtorch than in pytorch? Why does pytorch display 750m memory and libtorch 1300m memory?

@121649982
Copy link

libtorch优化不好,而且速度很慢,建议用tensorrt

@Dantju
Copy link

Dantju commented Jun 22, 2020

@zoufangyu1987 I cannot find file about Alg_VIR, so cannot build centernet c++ project,can u share this file?

@zoufangyu1987
Copy link
Author

@Dantju
This file is not needed. You can remove the header file
这个文件不需要,你可以把包涵的头文件去掉

@Dantju
Copy link

Dantju commented Jun 22, 2020

@zoufangyu1987 但是centernet.cpp中有一些这样的参数定义

@zoufangyu1987
Copy link
Author

@Dantju
你不用管它,它只是我项目工程里的一些东西,当时没有删除。

@Dantju
Copy link

Dantju commented Jun 24, 2020

@zoufangyu1987 sVIRInput 整个结构体在后面的code中用到了,那么整个头文件十是否能分享一下呢

sVIRInput virInput;
//sVIROutput virOutput;
//std::string img_name;
//#undef GPU
// Caffe::set_mode(Caffe::GPU);
// Caffe::SetDevice(0);
//#define GPU

Caffe::set_mode(Caffe::CPU);

const std::string modelPath;
init(modelPath);

#if 01
std::vectorstd::string img_name_path;
std::vectorstd::string Img_info;
cv::Mat imgin;
//一次处理一张图片
while(getline(filelist,line))
{
//获得图片路径和图片信息
virInput.vInImg.clear();
img_name_path.clear();

@Dantju
Copy link

Dantju commented Jun 24, 2020

@zoufangyu1987 这个code是在windows下编译的吗,我编译总是遇到syntax error:missing ';' before '}'

@zoufangyu1987
Copy link
Author

@Dantju
linux

@Dantju
Copy link

Dantju commented Jun 24, 2020

@zoufangyu1987 ok,那么能提供一下sVIRInput 的头文件吗

@zoufangyu1987
Copy link
Author

@Dantju
那只是我自己项目工程里定义的结构体,用来加图像的,你把它删除自己创建一个图像就好了,改一改就行,理解一下代码就行了,很容易

@Dantju
Copy link

Dantju commented Jun 24, 2020

有成功在windows下跑成功的吗,可以分享下code吗

@xiaowk5516
Copy link

感谢您分享的代码,想请问一下,您后来有对代码进行优化么?是怎么做的呢?

@zoufangyu1987
Copy link
Author

@xiaowk5516
后处理这部分我没有做优化,个人能力有限。
有个排序函数在debug上处理很慢需要50-70ms(和硬件平台有关系),但在release模式下大约10ms.
对处理速度方面做了一些优化,主要是针对图像处理部分,采用simd图像并行加速库,对resize和copy等的替换。提升速度显著。特别是在图像预处理部分。可参考:https://www.jianshu.com/p/5b272f108ed2

I didn't optimize the post-processing part, and my personal ability is limited.
A sort function is very slow on debug, which takes 50-70 MS (depending on the hardware platform), but about 10 ms in release mode
The speed of processing has been optimized, mainly for the image processing part, the SIMD image parallel acceleration library is used to replace restore and copy. The lifting speed is remarkable. Especially in the image preprocessing part. For reference: https://www.jianshu.com/p/5b272f108ed2

@xiaowk5516
Copy link

@xiaowk5516
后处理这部分我没有做优化,个人能力有限。
有个排序函数在debug上处理很慢需要50-70ms(和硬件平台有关系),但在release模式下大约10ms.
对处理速度方面做了一些优化,主要是针对图像处理部分,采用simd图像并行加速库,对resize和copy等的替换。提升速度显著。特别是在图像预处理部分。可参考:https://www.jianshu.com/p/5b272f108ed2

I didn't optimize the post-processing part, and my personal ability is limited.
A sort function is very slow on debug, which takes 50-70 MS (depending on the hardware platform), but about 10 ms in release mode
The speed of processing has been optimized, mainly for the image processing part, the SIMD image parallel acceleration library is used to replace restore and copy. The lifting speed is remarkable. Especially in the image preprocessing part. For reference: https://www.jianshu.com/p/5b272f108ed2

多谢您的回答,我会去了解一下的,再次感谢。

@leilaShen
Copy link

用trace转模型,再用libtorch部署吧,亲测可用

我用trace转成了模型 但是模型的结果在c++中调用产生的结果和python中不同 应该是forward用trace还是不准确。 但是script太难用了 转不了模型

@ShihuaiXu
Copy link

我已经共享在如下: 链接: https://pan.baidu.com/s/1m6zdWSKE8soSMwXRbU1aeg 提取码: yntr

哥,您这个代码用了C++的numpy库,想问下放到板子上能运行吗

@LaserLV52
Copy link

用trace转模型,再用libtorch部署吧,亲测可用

你好,转成libtorch的代码你还存着吗,我的网络用的是dla,不是说网络里面有dcn直接用trace转换会报错吗,请问你是怎么解决的呢?

@zoufangyu1987
Copy link
Author

@ShihuaiXu
您好!板子上我没有试过,您可以试一试,当时用NUMPY库主要是libtorch还不够成熟,我那时候也了解不多,好像那些处理函数libtorch都有相应的函数。

@zoufangyu1987
Copy link
Author

@LaserLV52
您好!我说一下我当时是怎么解决dcn的,直接不用dla_dcn的版本,直接用没有dcn层的版本dlav0好像是。别取笑我,哈哈。

@LaserLV52
Copy link

@LaserLV52 您好!我说一下我当时是怎么解决dcn的,直接不用dla_dcn的版本,直接用没有dcn层的版本dlav0好像是。别取笑我,哈哈。

都是相互学习的哈哈哈,对了,顺便问一下你,你当时换用了没有dcn层的网络之后,有没有对比过和用了dcn层的性能差距,他们的性能表现差距大吗?

@zoufangyu1987
Copy link
Author

@LaserLV52
会差一点点,但可以乎略不计,问题不大。

@LaserLV52
Copy link

@LaserLV52 会差一点点,但可以乎略不计,问题不大。

好 我去试试看

@LaserLV52
Copy link

@LaserLV52 会差一点点,但可以乎略不计,问题不大。

楼主我又来了,我现在用dlav0的网络训练了一个模型,就是没有dcn了,用trace导出torchscript是成功了,到c++里面能够正常forward,forward出来的格式也是对的,就是hm,wh,reg三个部分,但是出来的结果数据类型是torch::jit::IValue,这个类型我不知道要怎么操作,我试过他们的办法用output.toTuple()->elements()[0].toTensor()是会报错的,不知道你是怎么解决的呢?

@zoufangyu1987
Copy link
Author

zoufangyu1987 commented Feb 8, 2022

@LaserLV52

    auto img_tensor = torch::CPU(torch::kFloat32).tensorFromBlob(floatImg1.data, {1, 608, 608,3});//将cv::Mat转成tensor,大小为1,608,608,3
    img_tensor = img_tensor.permute({0, 3, 1, 2});  //调换顺序变为torch输入的格式 1,3,608,608
    auto img_var = torch::autograd::make_variable(img_tensor, false);  //不需要梯度
    inputs.clear();
    inputs.emplace_back(img_var.to(at::kCUDA));  // 把预处理后的图像放入gpu
    cudaDeviceSynchronize();

    //struct timeval t1_,t2_;
    //double timeuse_;
    //gettimeofday(&t1_,NULL);

    //torch::Tensor output = module->forward(inputs).toTuple()->elements()[0].toTensor();
    c10::intrusive_ptr<c10::ivalue::Tuple> output = module->forward(inputs).toTuple();

    //cudaDeviceSynchronize();
    //gettimeofday(&t2_,NULL);
    //timeuse_ = t2_.tv_sec - t1_.tv_sec + (t2_.tv_usec - t1_.tv_usec)/1000000.0;
    //printf("forward:%f\n",timeuse_);

    torch::Tensor output_c = output->elements()[0].toTensor();
    torch::Tensor output_w = output->elements()[1].toTensor();
    torch::Tensor output_h = output->elements()[2].toTensor();

    torch::Tensor output_maxpool = torch::max_pool2d(output_c, {3,3}, {1,1}, {1,1});
    output_c = torch::sigmoid_(output_c);
    output_maxpool = torch::sigmoid_(output_maxpool);
    torch::Tensor keep = (output_maxpool == output_c).to(torch::kFloat32);
    torch::Tensor heat = output_c * keep;

我是这样做的,你试试。

@LaserLV52
Copy link

@LaserLV52

    auto img_tensor = torch::CPU(torch::kFloat32).tensorFromBlob(floatImg1.data, {1, 608, 608,3});//将cv::Mat转成tensor,大小为1,608,608,3
    img_tensor = img_tensor.permute({0, 3, 1, 2});  //调换顺序变为torch输入的格式 1,3,608,608
    auto img_var = torch::autograd::make_variable(img_tensor, false);  //不需要梯度
    inputs.clear();
    inputs.emplace_back(img_var.to(at::kCUDA));  // 把预处理后的图像放入gpu
    cudaDeviceSynchronize();

    //struct timeval t1_,t2_;
    //double timeuse_;
    //gettimeofday(&t1_,NULL);

    //torch::Tensor output = module->forward(inputs).toTuple()->elements()[0].toTensor();
    c10::intrusive_ptr<c10::ivalue::Tuple> output = module->forward(inputs).toTuple();

    //cudaDeviceSynchronize();
    //gettimeofday(&t2_,NULL);
    //timeuse_ = t2_.tv_sec - t1_.tv_sec + (t2_.tv_usec - t1_.tv_usec)/1000000.0;
    //printf("forward:%f\n",timeuse_);

    torch::Tensor output_c = output->elements()[0].toTensor();
    torch::Tensor output_w = output->elements()[1].toTensor();
    torch::Tensor output_h = output->elements()[2].toTensor();

    torch::Tensor output_maxpool = torch::max_pool2d(output_c, {3,3}, {1,1}, {1,1});
    output_c = torch::sigmoid_(output_c);
    output_maxpool = torch::sigmoid_(output_maxpool);
    torch::Tensor keep = (output_maxpool == output_c).to(torch::kFloat32);
    torch::Tensor heat = output_c * keep;

我是这样做的,你试试。

昨天我又自己摸索了一下,现在是搞通了,就是要在python代码里面修改一下网络的输出就可以了。但是现在有个新问题,就是我现在在c++里面看网络的三层输出,hm,wh,reg,发现跟python里面的不一样,然后就发现python代码里面的预处理,是用了仿射变换然后还进行了均值和方差的偏移运算的,而我c++里面只是用了resize,所以输出结果不一样了。你应该当时也遇到了这个问题?所以你的c++预处理的仿射变换和均值方差运算是怎么做的呢?python的numpy有广播机制很好运算,c++确实不熟,拿着数据都不知道要怎么用。。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests