Issues within initalizer error handling if script is incorrect #435

Phil1602 · 2024-07-30T09:58:37Z

Brief summary

We realized, that our TestRuns get stuck without any information/logs printed out if the script itself is incorrect.

Cause

We already had a deeper look and realized that this is likely related to the Log message parsing of the k6 inspect executed within the initalizer here:

k6-operator/pkg/resources/jobs/initializer.go

Line 79 in f75facb

    
           "mkdir -p $(dirname %s) && k6 archive %s -O %s %s 2> /tmp/k6logs && k6 inspect --execution-requirements %s 2> /tmp/k6logs ; ! cat /tmp/k6logs | grep 'level=error'",

When we execute k6 inspect manually inside the container, we get the following error:

/ # k6 inspect --execution-requirements  /test/<REDACTED>.js
ERRO[0009] could not initialize '/test/<REDACTED>.js': could not load JS test 'file:///test/<REDACTED>.js': unknown executor type 'burst'

The log parsing mentioned above, basically does a | grep 'level=error, which does not work for the error message we are facing since the log format seems to be different.

Might be related to: grafana/k6-docs#877

k6-operator version or image

ghcr.io/grafana/k6-operator:controller-v0.0.14

Helm chart version (if applicable)

No response

TestRun / PrivateLoadZone YAML

Since we built a custom k6 image to include the script to be used as localfile, it would need some additional effort to make this available.

However, IMO this is not really related to a specific TestRun.

Other environment details (if applicable)

k6 version: k6 v0.51.0 (go1.22.4, linux/amd64)

Steps to reproduce the problem

Deploy any TestRun resource with a broken JS script

Expected behaviour

Initalizer Pods fails or at least logs the issues which are encountered during k6 inspect
TestRun resource is in Failed state

Actual behaviour

Initalizer Pod runs but does not print out anything
The k6-operator-controller-manager logs errors (see below) about unparsable JSON without any information
TestRun resource stuck in initilization phase

Logs of k6-operator

2024-07-30T09:11:00Z    ERROR   controllers.TestRun     unable to marshal: ``   {"namespace": "loadtesting", "name": "<REDACTED>", "reconcileID": "<REDACTED>", "error": "unexpected end of JSON input"}
github.com/grafana/k6-operator/controllers.inspectTestRun
        /workspace/controllers/common.go:105
github.com/grafana/k6-operator/controllers.RunValidations
        /workspace/controllers/k6_initialize.go:55
github.com/grafana/k6-operator/controllers.(*TestRunReconciler).reconcile
        /workspace/controllers/testrun_controller.go:137
github.com/grafana/k6-operator/controllers.(*TestRunReconciler).Reconcile
        /workspace/controllers/testrun_controller.go:80
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2024-07-30T09:11:00Z    ERROR   Reconciler error        {"controller": "testrun", "controllerGroup": "k6.io", "controllerKind": "TestRun", "TestRun": {"name":"<REDACTED>","namespace":"loadtesting"}, "namespace": "loadtesting", "name": "<REDACTED>", "reconcileID": "<REDACTED>", "error": "unexpected end of JSON input"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227

The text was updated successfully, but these errors were encountered:

frittentheke · 2024-08-16T08:39:01Z

This loosely relates to #401 which is about treating initializer errors as error state of the whole TestRun CR.

yorugac · 2024-08-20T12:36:22Z

Hi @Phil1602, as mentioned by @frittentheke, this indeed has been raised and fixed: could you please update k6-operator to the latest version and try again? Thanks!

Also, in general, it is recommended to debug k6 scripts locally before deploying the TestRun 🙂

Phil1602 · 2024-08-20T20:54:15Z

Hi @yorugac,

We are using k6 verification within our pipeline as a step before creating the TestRun in the meantime. Anyways, IMO it would have been still an issue, if a wrong TestRun is not reported as such.

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

…alization success 1) Introduce an EmptyDir volume for temporary data This removes the need for any mkdir and avoids writing to the container filesystem, allowing for e.g. `readOnlyRootFilesystem` security contexts ([1]). 2) Remove all output redirection and grepping for error indicating strings in favor of using the exit codes from the commands to indicate success or failure. This approach is much cleaner in regards to any changes to the output of K6 and the vast space of error there is. It also reestablishes the clear contract of the `inspect` command to judge the provided config. `k6 inspect --execution-requirements` ([2,3]) while also printing warnings or the JSON should clearly indicate via `err` if there are issues with the provided tests, which are then converted to a non-zero RC ([4]). [1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ [2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34 [3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64 [4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118 Fixes: grafana#435

frittentheke · 2024-08-22T12:30:53Z

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

@yorugac while #401 does indeed treat an error of the Initializer Pod as error of the TestRun CR (

k6-operator/controllers/common.go

Line 59 in d9490de

returnErr = errors.New("initalizer job has failed")

).

The cause of the issue @Phil1602 reported here is with the exit code (leading to the Pod actually failing) though. If you look at

k6-operator/pkg/resources/jobs/initializer.go

Line 79 in d9490de

    
           "mkdir -p $(dirname %s) && k6 archive %s -O %s %s 2> /tmp/k6logs && k6 inspect --execution-requirements %s 2> /tmp/k6logs ; ! cat /tmp/k6logs | grep 'level=error'",

you'll notice that here are multiple commands chained and piped together. While && causes the first command with non-zero exit code to fail (and that code be returned) the second part applying the grep will then actually mask the k6 inspect (the most important bit of this command) - https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/pkg/resources/jobs/initializer.go#L79C124-L79C167

I went through the initializer logic some more and just pushed PR #450.
I know this changes a little more than just fixing this issue here. But I strongly believe reducing the interface width (exit code + termination message) allows the Initalizer to really strive and be much more flexible than it is how.

I as a user can then run any image and any (list of) command and the only thing I have to ensure is that a non-zero exit code is used if there is an issue with the test.

By running two commands instead of one (the second being the cat | grep), any failures (non-zero exit code) of the first part (containing `k6 inspect`) will be lost and masked away. By chaining them all with `&&` the first non-zero RC will fail the whole command and return. Fixes: grafana#435

frittentheke · 2024-08-23T08:24:24Z

I pushed a bugfix PR in #453, just fixing the issue reported by @Phil1602

^^ @yorugac

By running two commands instead of one (the second being the cat | grep), any failures (non-zero exit code) of the first part (containing `k6 inspect`) will be lost and masked away. By chaining them all with `&&` the first non-zero RC will fail the whole command and return. Fixes: grafana#435

Phil1602 added the bug Something isn't working label Jul 30, 2024

Phil1602 changed the title ~~Issues within rrror handling if script is incorrect~~ Issues within initalizer error handling if script is incorrect Jul 30, 2024

frittentheke linked a pull request Aug 22, 2024 that will close this issue

Rework Initializer container and handling its verdict #450

Draft

frittentheke linked a pull request Aug 23, 2024 that will close this issue

Combine all initializer commands with && to catch any failing commands #453

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues within initalizer error handling if script is incorrect #435

Issues within initalizer error handling if script is incorrect #435

Phil1602 commented Jul 30, 2024 •

edited

Loading

frittentheke commented Aug 16, 2024

yorugac commented Aug 20, 2024

Phil1602 commented Aug 20, 2024

frittentheke commented Aug 22, 2024

frittentheke commented Aug 23, 2024 •

edited

Loading

Issues within initalizer error handling if script is incorrect #435

Issues within initalizer error handling if script is incorrect #435

Comments

Phil1602 commented Jul 30, 2024 • edited Loading

Brief summary

Cause

k6-operator version or image

Helm chart version (if applicable)

TestRun / PrivateLoadZone YAML

Other environment details (if applicable)

Steps to reproduce the problem

Expected behaviour

Actual behaviour

frittentheke commented Aug 16, 2024

yorugac commented Aug 20, 2024

Phil1602 commented Aug 20, 2024

frittentheke commented Aug 22, 2024

frittentheke commented Aug 23, 2024 • edited Loading

Phil1602 commented Jul 30, 2024 •

edited

Loading

frittentheke commented Aug 23, 2024 •

edited

Loading