feat: set THP_DISABLE=true in shim, and restore it before starting runc#195
feat: set THP_DISABLE=true in shim, and restore it before starting runc#195fuweid merged 1 commit intocontainerd:mainfrom
Conversation
|
I think this problem is related to specific application scenarios.
|
|
Because some processes in the environment do not show the use of huge pages, but need to use huge pages to improve performance. |
30351c5 to
0e4dae5
Compare
crates/runc/src/lib.rs
Outdated
| /// and some other utilities. | ||
| #[cfg(feature = "async")] | ||
| impl Runc { | ||
| #[cfg(not(target_os = "linux"))] |
There was a problem hiding this comment.
We don't have runc on non Linux environments. But also can apply same suggestion as above to avoid func duplication.
5ac6db6 to
80ee51f
Compare
|
Thanks for your suggestions. I have changed them. |
654bf0f to
7c6d2dd
Compare
crates/runc/src/lib.rs
Outdated
| @@ -368,6 +368,22 @@ pub trait Spawner: Debug { | |||
| impl Runc { | |||
| async fn launch(&self, cmd: Command, combined_output: bool) -> Result<Response> { | |||
There was a problem hiding this comment.
async fn launch(&self, cmd: Command, combined_output: bool) -> Result<Response> {
There are some problems here, the cmd variable needs to be mutable, but if on a non-Linux environment, cmd cannot be mutable. If you don't use a separate function name, the code looks ugly because you need to create a new mut cmd and use a #[cfg] block. Is there a better way?
| let mut vars: Vec<(&str, &str)> = Vec::new(); | ||
| #[cfg(target_os = "linux")] | ||
| let mut thp_disabled = String::new(); | ||
| #[cfg(not(target_os = "linux"))] |
There was a problem hiding this comment.
let disabled = if cfg!(target_os = "linux") {
// Query whether THP is disabled.
if let Ok(x) = prctl::get_thp_disable() {
let _ = prctl::set_thp_disable(true);
true
} else {
false
}
} else {
false
};
let vars = vec![("THP_DISABLED", &disabled.to_string())]?
There was a problem hiding this comment.
since it sounds like #[cfg(target_os = "linux")] is needed, it still might be clearer to do something like:
let thp_disabled = false;
#[cfg(target_os = "linux")]
let thp_disabled = match prctl::get_thp_disable() {
Ok(x) => {
let _ = prctl::set_thp_disable(true);
true
}
Err(_) => false,
};
let vars: Vec<(&str, &str)> = vec![("THP_DISABLED", &disabled.to_string())];There was a problem hiding this comment.
Here we need the return value of the prctl::get_thp_disable function and assign this return value to the variable thp_disable. That is, the x in Ok(x) is needed, besides, x may be true or false. We should not just judge whether it is Ok() or Err(). This value means the state before setting the set_thp_disable, and will be used to set_thp_disable before starting runc later.
There was a problem hiding this comment.
Here we need the return value of the
prctl::get_thp_disablefunction and assign this return value to the variable thp_disable. That is, thexinOk(x)is needed, besides,xmay be true or false. We should not just judge whether it isOk()orErr().
get_thp_disable returns a Result<bool, i32> so we could still return x instead of converting to string here. using the bool makes it clearer in the code than x.tostring and string::new() which is semantically unclear what string:new() is in this case.
There was a problem hiding this comment.
There are actually 3 states here, true, false and error. When the get_thp_disable function returns error, it means that the thp parameter cannot be obtained, if the set_thp_disable is executed at this time, it will cause runc to be unable to recover the value of thp_disable, therefore, thp_disabled parameter needs 3 states, true, false, error, due to the variable life cycle, so string is used here to return.
crates/runc/src/lib.rs
Outdated
| async fn launch(&self, cmd: Command, combined_output: bool) -> Result<Response> { | ||
| debug!("Execute command {:?}", cmd); | ||
| #[cfg(target_os = "linux")] | ||
| let mut cmd = cmd; |
There was a problem hiding this comment.
Since runc is Linux only, we can rewrite this to something like:
async fn launch(&self, mut cmd: Command, combined_output: bool) -> Result<Response> {
debug!("Execute command {:?}", cmd);
if let Ok(thp) = std::env::var("THP_DISABLED") {
if let Ok(thp_disabled) = thp.parse::<bool>() {
unsafe {
cmd.pre_exec(move || {
#[cfg(target_os = "linux")]
if let Err(e) = prctl::set_thp_disable(thp_disabled) {
log::debug!("set_thp_disable err: {}", e);
};
Ok(())
});
}
}
}|
Thanks for your suggestions. |
3e833d4 to
e42fb05
Compare
| let _ = prctl::set_thp_disable(true); | ||
| x.to_string() |
There was a problem hiding this comment.
is it possible that you get Ok(false) and the set it to true and this will return false? The does weren't clear (https://docs.rs/prctl/latest/prctl/fn.get_thp_disable.html)
There was a problem hiding this comment.
Our goal is to set thp disabled = true on the shim side and then restore thp disabled before starting runc.
So we only need to focus on the return value of the function get_thp_disabled, which is Result<bool, i32>.
The return value of the function set_thp_disabled is Result<(), i32>, we don't care if the setting is successful, because even if the setting failed, we should not exit the shim process, therefore, there is no need to pay attention to the set_thp_disabled function's return value.
There was a problem hiding this comment.
The return value of the function
set_thp_disabledisResult<(), i32>, we don't care if the setting is successful, because even if the setting failed, we should not exit the shim process, therefore, there is no need to pay attention to theset_thp_disabledfunction's return value.
could you add a comment in the code that indicates this? I worry about someone doing maitaince long term and wondering why return value and failure case is ignored.
There was a problem hiding this comment.
Thanks for your suggestion, done.
78f505e to
50b1963
Compare
@zzyyzte thanks for your patience. I guess I might be missing something but I don't see how the current configuration configurable? it seems to always try to |
|
Yes, we need to explicitly set |
|
Oh, read @mxpv could you clarify, otherwise looks good. |
|
@zzzzzzzzzy9 could you pls rebase your PR to pick up latest CI changes? |
|
Done. @mxpv |
|
@mxpv May need to merge again? |
|
@zzzzzzzzzy9 would you please remove that merge commit by rebase? It's conflict right now. If you don't mind, I can help to handle this. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #195 +/- ##
==========================================
- Coverage 37.98% 37.89% -0.10%
==========================================
Files 55 55
Lines 5060 5083 +23
==========================================
+ Hits 1922 1926 +4
- Misses 3138 3157 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
If /sys/kernel/mm/transparent_hugepage/enabled=always, the shim process
will use huge pages, which will consume a lot of memory.
Just like this:
ps -efo pid,rss,comm | grep shim
PID RSS COMMAND
2614 7464 containerd-shim
I don't think shim needs to use huge pages, and if we turn off the huge
pages option, we can save a lot of memory resources.
After we set THP_DISABLE=true:
ps -efo pid,comm,rss
PID COMMAND RSS
1629841 containerd-shim 5648
containerd
|
|--shim1 --start
|
|--shim2 (this shim will on host)
|
|--runc create (when containerd send create request by ttrpc)
|
|--runc init (this is the pid 1 in container)
we should set thp_disabled=1 in shim1 --start, because if we set this
in shim 2, the huge page has been setted while func main() running,
we set thp_disabled cannot change the setted huge pages.
So We need to set thp_disabled=1 in shim1 so that shim2 inherits the
settings of the parent process shim1, and shim2 has closed the
hugepage when it starts.
For runc processes, we need to set thp_disabled='before' in shim2 after
fork() and before execve(). So we use cmd.pre_exec to do this.
If /sys/kernel/mm/transparent_hugepage/enabled=always, the shim process will use huge pages, which will consume a lot of memory.
Just like this:
I don't think shim needs to use huge pages, and if we turn off the huge pages option, we can save a lot of memory resources.
After we set THP_DISABLE=true:
ps -efo pid,rss,comm PID RSS COMMAND 2470 5444 containerd-shim cat /proc/2470/smaps | grep -i hugepages AnonHugePages: 0 kB ...containerd | |--shim1 --start | |--shim2 (this shim will on host) | |--runc create (when containerd send create request by ttrpc) | |--runc init (this is the pid 1 in container)