-
Notifications
You must be signed in to change notification settings - Fork 126
Add x86-64-v4 support #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
First need https://github.com/shinchiro/mpv-winbuild-cmake to support x86_64-v4 |
|
My mpv-winbuild-cmake branch has implemented generic multi-march variant support. If you want to add x86-64-v4 support to the original mpv-winbuild-cmake, what is required is not just a simple addition of x86-64-v4, otherwise you won't actually get any performance advantage from AVX512. |
Here it is: shinchiro/mpv-winbuild-cmake#790 |
|
@Andarwinux I'm not really familiar with your extra configs, so I just added basic x86-64-v4 code, maybe you can add more to that config repo? |
The problem with x86-64-v4 is that -march=x86-64-v4 only has the most basic avx512f and a conservative general scheduling model, which is problematic. For skylake, avx512 will inevitably lead to performance degradation (due to thermal throttling and DownFall mitigations), so you need to limit the preferred vector width to 256b and disable avx gather instructions, which also eliminates any performance benefits of avx512f. For modern cpus like icelake, tigerlake, and znver4/5, whose avx512 is more than just basic avx512f, -march=x86-64-v4 is insufficient. A reasonable x86-64-v4 should use icelake or rocketlake as the march, but use znver4/5 scheduling model. To achieve this, you need to set GCC_ARCH and M_TUNE separately, and override x86_64_LEVEL, -mno-gather may also be necessary. Another issue is the vzeroupper instruction. For x86-64-v4 builds, clang generates heavily 256b/512b vectorized code, resulting in numerous unnecessary vzeroupper instructions, which can mask the performance benefits of avx512. However, mpv's dependencies have a lot of legacy asm code, and removing vzeroupper without sse2avx processing will result in sse/avx transition penalties. Unless all these issues are addressed, x86-64-v4 is essentially a placebo, but doing so would require significant changes to mpv-winbuild-cmake. shinchiro seems averse to extensive refactoring, which is why I can no longer contribute to shinchiro/mpv-winbuild-cmake. |
Signed-off-by: Shengyu Qu <wiagn233@outlook.com>
|
By the way, you should build LLVM and toolchain first before building mpv, otherwise you need to waste time building LLVM three times for each rebuild. |
|
Whats the difference between stock, v3 and v4 anyway? |
Performance Differences. But you usually won't notice any significant difference unless more fine tuning is done. |
Oh? Thanks :) |
No description provided.