-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when run test on 500 pods #408
Comments
It think the issue is in curl max argument length , I see the following in the started pod: |
Hi @volnyansky, this is certainly a new one 😅 Did the error Out of curiosity, what are you testing that you need such a large test? On solution. That command is just an iterative concatenation: I guess we could just split it into several commands when there are lots of instances. The question is what kind of values for |
@yorugac I'm running a stress test with a real browser. I need to test not only REST and websocket apis , but also webrtc. So I can't run thousands of robots in one pod. |
Updated it fails on runners too. The most strange thing , that command line is not too long : |
@volnyansky, can you please post the full log from one of those runners?
I'm a bit confused by "real browser" part: do you mean the xk6-browser? |
I'm a bit confused by "real browser" part: do you mean the xk6-browser? - yes, it is xk6. Also I figured out that i need to wait until services left after the previous test are deleted. You code collects IPS from services list which also can lead to overflow. |
@yorugac I have idea for fix - you can store IPS in env variable(s) as list separated by ; . Then you can iterate over this list in docker start command: IFS=';' read -ra ARR <<< "$IPS" for i in "${ARR[@]}"; do |
@yorugac I've found final workaround :) I'm running then test in batches and assigning his own namespace per batch. You query k8s list services in your code, so it is possibly return all services in the namespace and not the current test run |
@volnyansky, WDYM by batches? You're not running 500 instances anymore?
🤔 we'd still need to send a "start" command with something like cURL though. Could you please clarify a bit? 🙂 |
@yorugac I need to run more than 500 instances, 5000 actually. So I split one test into several and I call them batches. But If all these tests are run in one namespace I still get "argument list too long error", and If I isolate each test in its own namespace I don't get error. I agree that you still need to send curl, I just proposed a more compact way to call it , to not reach ARG_MAX limit which causes "arguments to long error". |
🤔 It's strange that namespace is a factor here... If the test is "split" then it's already producing another curl call, even if both tests are in the same namespace. IIUC, the error appears form Well, I think it's still about making batches, as described in this comment. Do you happen to have any estimate on what the value of |
@yorugac In my env ARG_MAX= 131072 bytes |
If I may kindly point to the discussion about the use of the REST API. It's not only about efficiency, but also about keeping the k6-operator closer in the loop about the state of the runners.... |
Brief summary
I'm trying to run the test on 500 pods and get the error :
exec /usr/bin/k6: argument list too long
I find a workaround by batching tests in 300 pods packages with the same test id
k6-operator version or image
0.0.14
Helm chart version (if applicable)
k6-operator-3.6.0
TestRun / PrivateLoadZone YAML
apiVersion: k6.io/v1alpha1
kind: TestRun
metadata:
name: ${USERNAME}-${SCRIPT}-${BATCH}
namespace: k6
spec:
#number of pods to run in parallel
parallelism: ${BATCH_PODS}
script:
configMap:
name: ${USERNAME}-test-script-${BATCH}
file: test.tar
arguments: -o experimental-prometheus-rw --tag testid=${TESTID}
runner:
image: 569129334545.dkr.ecr.us-east-1.amazonaws.com/k6-robot-dev:latest
env:
- name: K6_PROMETHEUS_RW_SERVER_URL
value: "http://victoria-metrics-single-server.monitoring.svc.cluster.local:8428/api/v1/write"
- name: K6_PROMETHEUS_RW_TREND_STATS
value: "count,sum,min,max,avg,med,p(90),p(95),p(99)"
- name: K6_BROWSER_ARGS
value: "window-size=1920x1080,no-sandbox,disable-setuid-sandbox,allow-file-access,use-fake-device-for-media-stream,use-fake-ui-for-media-stream,use-file-for-fake-video-capture=/usr/local/assets/video.mjpeg,use-file-for-fake-audio-capture=/usr/local/assets/audio.wav"
- name: K6_BROWSER_TIMEOUT
value: "45s"
- name: VU_ID_START
value: "${VU_ID_START}"
nodeSelector:
engageli.com/role: k6-load-test
resources:
limits:
cpu: "${CPU}"
memory: ${MEMORY}Mi
requests:
cpu: 100m
memory: ${MEMORY}Mi
Other environment details (if applicable)
No response
Steps to reproduce the problem
Run test on 500pods , number of VUs doesn't matter
Expected behaviour
Tets runs in given number of pods
Actual behaviour
Test crashes
The text was updated successfully, but these errors were encountered: