You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a cluster with 3 master nodes and several data nodes. From time to time, we experience a brief period where certain nodes will return 429s due to replica operations:
I've scaled the cluster before, but we continue to see this intermittently since it happens on random nodes. Is there a way to have the readiness probe fail on cases like this so that requests stop being sent to a node that is overloaded?
I'm not sure how exactly to find out what is causing the back log of replica ops.
The text was updated successfully, but these errors were encountered:
I don't feel like a readiness probe should be used to deal with performance issues, also having a readiness probe failing does not mean that a Pod cannot be accessed. If a cluster is struggling to handle the load, I would first try to understand why.
I'm not sure how exactly to find out what is causing the back log of replica ops.
I have a cluster with 3 master nodes and several data nodes. From time to time, we experience a brief period where certain nodes will return 429s due to replica operations:
es_rejected_execution_exception Reason: "rejected execution of primary operation [coordinating_and_primary_bytes=0, replica_bytes=2210080256, all_bytes=2210080256, primary_operation_bytes=30430, max_coordinating_and_primary_bytes=2147483648
I've scaled the cluster before, but we continue to see this intermittently since it happens on random nodes. Is there a way to have the readiness probe fail on cases like this so that requests stop being sent to a node that is overloaded?
I'm not sure how exactly to find out what is causing the back log of replica ops.
The text was updated successfully, but these errors were encountered: