Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trap KFP namespace errors and display cause appropriately #1469

Merged
merged 4 commits into from
Mar 26, 2021

Conversation

kiersten-stokes
Copy link
Member

Resolves #1467. Now that this has been trapped separately, the double scroll bars in the error details have disappeared and only the single scrollbar remains.

Screenshots/log output

When namespace is required, but no namespace is entered in the runtime config:
Screen Shot 2021-03-23 at 3 42 10 PM

Log Output
Traceback (most recent call last):
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/processor_kfp.py", line 181, in process
    experiment = client.create_experiment(name=experiment_name,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp/_client.py", line 345, in create_experiment
    experiment = self.get_experiment(experiment_name=name, namespace=namespace)
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp/_client.py", line 449, in get_experiment
    list_experiments_response = self.list_experiments(page_size=100, page_token=next_page_token, namespace=namespace)
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp/_client.py", line 416, in list_experiments
    response = self._experiment_api.list_experiment(
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api/experiment_service_api.py", line 581, in list_experiment
    return self.list_experiment_with_http_info(**kwargs)  # noqa: E501
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api/experiment_service_api.py", line 682, in list_experiment_with_http_info
    return self.api_client.call_api(
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 378, in call_api
    return self.__call_api(resource_path, method,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 202, in __call_api
    raise e
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 195, in __call_api
    response_data = self.request(
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 403, in request
    return self.rest_client.GET(url,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/rest.py", line 244, in GET
    return self.request("GET", url,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'x-powered-by': 'Express', 'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Tue, 23 Mar 2021 20:42:25 GMT', 'x-envoy-upstream-service-time': '9', 'server': 'istio-envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Invalid input error: Invalid resource references for experiment. Namespace is empty.","message":"Invalid input error: Invalid resource references for experiment. Namespace is empty.","code":3,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Invalid resource references for experiment. Namespace is empty.","error_details":"Invalid input error: Invalid resource references for experiment. Namespace is empty."}]}

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/tornado/web.py", line 1705, in _execute
result = await result
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/handlers.py", line 89, in post
response = await PipelineProcessorManager.instance().process(pipeline)
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/processor.py", line 78, in process
res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline)
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/processor_kfp.py", line 190, in process
raise RuntimeError(f'Could not create experiment {experiment_name}: {ae.reason} ({ae.status}): ' +
RuntimeError: Could not create experiment tutorial: Bad Request (400): Invalid input error: Invalid resource references for experiment. Namespace is empty.


When namespace entered is incorrect:
Unfortunately, in this case, the message given as response repeats itself. I'm not sure if we want to further parse that to remove it or just leave it as is.
Screen Shot 2021-03-23 at 3 43 03 PM

Log Output
Traceback (most recent call last):
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/processor_kfp.py", line 181, in process
    experiment = client.create_experiment(name=experiment_name,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp/_client.py", line 345, in create_experiment
    experiment = self.get_experiment(experiment_name=name, namespace=namespace)
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp/_client.py", line 449, in get_experiment
    list_experiments_response = self.list_experiments(page_size=100, page_token=next_page_token, namespace=namespace)
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp/_client.py", line 416, in list_experiments
    response = self._experiment_api.list_experiment(
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api/experiment_service_api.py", line 581, in list_experiment
    return self.list_experiment_with_http_info(**kwargs)  # noqa: E501
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api/experiment_service_api.py", line 682, in list_experiment_with_http_info
    return self.api_client.call_api(
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 378, in call_api
    return self.__call_api(resource_path, method,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 202, in __call_api
    raise e
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 195, in __call_api
    response_data = self.request(
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 403, in request
    return self.rest_client.GET(url,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/rest.py", line 244, in GET
    return self.request("GET", url,
  File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/kfp_server_api/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'x-powered-by': 'Express', 'content-type': 'application/json', 'trailer': 'Grpc-Trailer-Content-Type', 'date': 'Tue, 23 Mar 2021 20:43:15 GMT', 'x-envoy-upstream-service-time': '21', 'server': 'istio-envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Failed to authorize with API resource references: BadRequestError: Unauthorized access for anonymous@kubeflow.org to namespace anon: Unauthorized access for anonymous@kubeflow.org to namespace anon","message":"Failed to authorize with API resource references: BadRequestError: Unauthorized access for anonymous@kubeflow.org to namespace anon: Unauthorized access for anonymous@kubeflow.org to namespace anon","code":10,"details":[{"@type":"type.googleapis.com/api.Error","error_message":"Unauthorized access for anonymous@kubeflow.org to namespace anon","error_details":"Failed to authorize with API resource references: BadRequestError: Unauthorized access for anonymous@kubeflow.org to namespace anon: Unauthorized access for anonymous@kubeflow.org to namespace anon"}]}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/tornado/web.py", line 1705, in _execute
result = await result
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/handlers.py", line 89, in post
response = await PipelineProcessorManager.instance().process(pipeline)
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/processor.py", line 78, in process
res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline)
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/kierstenstokes/miniconda3/envs/dev1/lib/python3.9/site-packages/elyra/pipeline/processor_kfp.py", line 190, in process
raise RuntimeError(f'Could not create experiment {experiment_name}: {ae.reason} ({ae.status}): ' +
RuntimeError: Could not create experiment tutorial: Conflict (409): Failed to authorize with API resource references: BadRequestError: Unauthorized access for anonymous@kubeflow.org to namespace anon: Unauthorized access for anonymous@kubeflow.org to namespace anon

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@elyra-bot
Copy link

elyra-bot bot commented Mar 23, 2021

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

@ptitzler
Copy link
Member

Perhaps this would be a good time to revisit the proposal to run https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.client.html#kfp.Client.list_experiments right here https://github.com/elyra-ai/elyra/blob/master/elyra/pipeline/processor_kfp.py#L118 to "verify" that the user-provided namespace value is valid. The process code currently performs quite a bit of processing (compile pipeline, upload pipeline artifacts to cos, and upload pipeline file) that's unnecessary if the namespace value is invalid. With this test in place, we'd fail fast and avoid the processing that won't yield any useful results.

@kiersten-stokes
Copy link
Member Author

@ptitzler I like that idea a lot. And am I correct to assume that the namespace check would be the sole function of calling list_experiments in this case?

@ptitzler
Copy link
Member

@ptitzler I like that idea a lot. And am I correct to assume that the namespace check would be the sole function of calling list_experiments in this case?

Jap. If you tune the call to only return one row max the overhead (in the grand scheme of things) to determine whether or not the namespace value is correct should be close to zero.

self.log.error(f'Could not create experiment {experiment_name}: {ae.reason} ({ae.status})')
if ae.body:
error_msg = json.loads(ae.body)
raise RuntimeError(f'Could not create experiment {experiment_name}: {ae.reason} ({ae.status}): ' +
Copy link
Member

@lresende lresende Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the idea here that "list experiments" will fail to retrieve experiments if the namespace is invalid? In this case, we should raise an error or provide a more specific error message to the user (e.g. Invalid namespace). I think it's a little misleading to say we could not create an experiment before we actually submit it.

Also, are we using experiments in any other place? Otherwise, I believe we should use 'jobs' which is more generic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was pondering options on this yesterday. I kept this experiment wording from further down in the file (lines 193-194) where the error came directly from the create_experiment function.

I agree that it's misleading now that we've changed the trapping mechanism, but I hesitated to change it to something like "invalid namespace" because list_experiments could technically fail for other reasons since the exception is originating as a result of a GET request. I haven't encountered any other scenarios, so maybe I'm overthinking it.

Maybe something generic like "Error making request to pipeline server"? Or should I just assume namespace-only errors and go with a specific message like "Invalid namespace in runtime configuration"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither, because the messages should express what we were trying to do: "Error validating user namespace XXX ...". The fact that we are using an experiment related API call to perform the validation is not relevant to the user. We just had to pick one that utilizes the namespace information if it is provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither, because the messages should express what we were trying to do: "Error validating user namespace XXX ...". The fact that we are using an experiment related API call to perform the validation is not relevant to the user. We just had to pick one that utilizes the namespace information if it is provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is super helpful to my understanding of this, thank you! That's exactly what I was trying to say but I couldn't quite get there -- accurate error messages are an art

@kiersten-stokes
Copy link
Member Author

New error messages:

Screen Shot 2021-03-24 at 12 25 01 PM

Screen Shot 2021-03-24 at 12 25 18 PM

@ptitzler
Copy link
Member

New error messages:

  1. There's still something actionable missing. Could you add a short blurb outlining what a user is supposed to do?
  2. Slightly off-topic, but I thought we had already overridden the generic "Error making request" dialog title in favor of one that includes the name of the task that failed? I expected to see something like "Error submitting pipeline" or something like that.

@kiersten-stokes
Copy link
Member Author

  1. There's still something actionable missing. Could you add a short blurb outlining what a user is supposed to do?

Will do. Unfortunately some error bodies come with their own punctutation and others don't, so I'll just do a quick parse to check/add punctuation if needed. Then direct users to their runtime configuration settings.

  1. Slightly off-topic, but I thought we had already overridden the generic "Error making request" dialog title in favor of one that includes the name of the task that failed? I expected to see something like "Error submitting pipeline" or something like that.

Hmm. I don't remember seeing anything different in the title recently. I like the idea though.

@kiersten-stokes
Copy link
Member Author

Updated screenshots:
Screen Shot 2021-03-24 at 2 26 04 PM

Screen Shot 2021-03-24 at 2 26 43 PM

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thank you @kiersten-stokes.

@lresende lresende merged this pull request into elyra-ai:master Mar 26, 2021
lresende pushed a commit that referenced this pull request Mar 26, 2021
Trap KFP namespace errors and display cause appropriately avoiding
double scroll bars in the error dialog box.
@kiersten-stokes kiersten-stokes deleted the namespace-error-handling branch August 20, 2021 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trap and format namespace-related errors during pipeline processing in a more user-friendly way
4 participants