Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Python Script node #722

Merged
merged 23 commits into from
Sep 23, 2020

Conversation

lresende
Copy link
Member

@lresende lresende commented Jul 8, 2020

Note for testing:
Requires elyra-ai/kfp-notebook#36
And the following env variables

KFP_NOTEBOOK_BRANCH=python-script
KFP_NOTEBOOK_ORG=lresende

Todos

  • output of python execution
  • workdir of python execution
  • validate clicking ok, and call other api
  • increment pipeline version

Fixes #187

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@lresende lresende changed the title Pipeline python script [WIP] Add support for Python Script node Jul 8, 2020
@lresende lresende added the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Jul 8, 2020
@lresende lresende force-pushed the pipeline-python-script branch from f60bd79 to da63e40 Compare July 8, 2020 22:29
@lresende lresende force-pushed the pipeline-python-script branch from da63e40 to d41697c Compare September 11, 2020 18:59
@lresende lresende force-pushed the pipeline-python-script branch from b12f9c6 to 71c11f7 Compare September 14, 2020 19:06
@lresende
Copy link
Member Author

So this is now working with the latest code and both in kfp and local mode.
I modified my test pipeline to use a python script and seems to be all good, still could use some more test.

image

@lresende lresende marked this pull request as ready for review September 14, 2020 19:09
@lresende lresende changed the title [WIP] Add support for Python Script node Add support for Python Script node Sep 14, 2020
@lresende lresende removed the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Sep 14, 2020
@lresende lresende force-pushed the pipeline-python-script branch from daae64c to ebdc2d4 Compare September 14, 2020 19:28
@ptitzler ptitzler self-requested a review September 14, 2020 19:47
@ptitzler
Copy link
Member

"Open Notebook" -> "Open Python File"
image

@lresende
Copy link
Member Author

@ptitzler Updated to Open File
image

@ptitzler
Copy link
Member

ptitzler commented Sep 14, 2020

For notebooks we upload the completed notebooks to the CO Sbucket . For Python scripts we we should probably do "the same" and capture STDOUT and STDERR. The script I used writes to STDOUT but the output is not logged in the KFP log nor is a .stdout (or .stderr) file being uploaded to COS.

Relevant excerpt from execution log:

Executing Python Script : download_data.py ==> download_data.log
Processing outputs........

Not having access to STDOUT/STDERR is going to be a problem should troubleshooting or validation be performed. For example, in another run the script failed and there's no information available why:

Executing Python Script : download_data.py ==> download_data.log
Unexpected error: <class 'subprocess.CalledProcessError'>
Error details: Command '['python', 'download_data.py']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "bootstrapper.py", line 355, in <module>
    main()
  File "bootstrapper.py", line 349, in main
    file_op.execute()
  File "bootstrapper.py", line 257, in execute
    raise ex
  File "bootstrapper.py", line 244, in execute
    subprocess.check_call(['python', python_script])
  File "/usr/local/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', 'download_data.py']' returned non-zero exit status 1.

@ptitzler
Copy link
Member

ptitzler commented Sep 14, 2020

... Updated to Open File

Confirmed "Open File" fix. Just realized it now always says this, which I guess is fine.

@ptitzler
Copy link
Member

One more issue. The file browser seems to apply an ipynb filter, which now prevents selection of a Python script.

image

@ptitzler
Copy link
Member

ptitzler commented Sep 15, 2020

Possibly a related issue here:

image

image

I don't think there is a reason at all to apply a filter (if one was added intentionally) as a notebook or Python script can require any type of file.

@kevin-bates
Copy link
Member

@lresende, I don't see the increment of PIPELINE_CURRENT_VERSION in constants.ts. Is that what should be incremented?

@ptitzler
Copy link
Member

Please give it a quick try when you have a chance

Will do today (Monday 9/22)

@kevin-bates
Copy link
Member

Regarding the version increment, I think this particular increment should be conditional on whether the pipeline contains a python script node or not - as I touched on here.

I really don't think we should unconditionally trigger a migration dialog when there is literally nothing to change. By making this particular increment conditional, older elyra versions can continue working with shared pipelines until those shared pipelines contain a python node - which will trigger a "You're running an older version of Elyra, please upgrade" message. Likewise, users of the current version will continue operating just fine.

This does mean that the version check to determine if changes are warranted probably needs to be a list of versions and the check would be is this pipeline version not in the list of "acceptable" versions? If not in the list, prompt migration dialog and set pipeline version to "minimum acceptable" version. Only until a python node is added would the pipeline version then by incremented to the "current" version. Of course, when there are changes that DO warrant a migration, then the "acceptable" versions list is reset to only contain the "current" version and continues as a single entry until a "conditional" version is introduced again.

@lresende
Copy link
Member Author

@lresende, I don't see the increment of PIPELINE_CURRENT_VERSION in constants.ts. Is that what should be incremented?

Forgot to push :)

@lresende
Copy link
Member Author

For the version increment, we don't have the necessary infrastructure to enable what you are describing at the moment, but it would be something good to have for the future.

@kevin-bates
Copy link
Member

I'm not seeing the version ever move from 2 to 3. I get prompted to migrate (unconditionally), save the pipeline and zero changes are made to the file (I copied the original and there are no differences after saving after migrating). As a result, I'm prompted to migrate every time I open a pipeline.

Does saving a pipeline only persist changes if there are actual differences (which, in this case, there won't be since migration doesn't do anything. 😄 )?

@lresende
Copy link
Member Author

this last commit restricts the change of the associated node file only to the same type (e.g. if you originally created a python script node you can only update the file to point to another python script file).

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luciano - this is a nice feature.

@ptitzler
Copy link
Member

Confirmed local execution works now as expected. Opened #939, which is not specific to this PR.
Verifying KFP execution next.

@ptitzler
Copy link
Member

Confirmed that KFP execution works but noticed

  • STDERR is not captured in the node's .log file on COS
  • STDERR output is displayed in the JL log out of sequence:
 unpacking Complete.
 Executing Python Script : load_data.py ==> load_data.log
Uploading Python Script execution log back to Object Storage
Uploading file load_data.log as load_data.log to bucket pipeline-artifacts
Processing outputs........
Uploading file data/noaa-weather-data-jfk-airport/jfk_weather.csv as data/noaa-weather-data-jfk-airport/jfk_weather.csv to bucket pipeline-artifacts
Execution and Upload Complete.
Hello STDERR world

The last message was produced while a node was executed.

@ptitzler
Copy link
Member

Confirmed that an exported pipeline containing a Python node can be successfully uploaded to KFP using the KFP UI and runs there.

LGTM for the PR pending resolution of STDERR behavior.

Copy link
Member

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a code review of the front end code (no local testing or back end review) and have a handful of questions and code clean up comments, nothing blocking though if you'd rather address them in a followup PR

position += 20;
this.setState({ showValidationError: false });
} else {
// handle error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to actually handle the error here and not leave it as a comment

@lresende
Copy link
Member Author

the kfp execution of python scripts now captures both stdout and stderr into the same log file. Please update/build kfp-notebook in order to get these changes.

@ptitzler
Copy link
Member

ptitzler commented Sep 22, 2020

I believe somehow a regression was introduced because I am no longer able to drag a Python script onto the canvas.

drag_python

$ git status
On branch pipeline-python-script
Your branch is up to date with 'origin/pipeline-python-script'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   packages/pipeline-editor/src/canvas.ts

no changes added to commit (use "git add" and/or "git commit -a")

$ git pull origin
Already up to date.

@ptitzler
Copy link
Member

With the latest fix both local and kfp execution of a mixed pipeline works. It does appear as if we are not yet capturing STDOUT and STDERR output in the right sequence for KFP execution.

This code snippet in a Python node:

    sys.stderr.write('Hello STDERR 1')
    sys.stderr.flush()

    # Try to process the URL
    download_from_public_url(dataset_url)
    
    sys.stderr.write('Hello STDERR 2')
    sys.stderr.flush()

produces the following log file content:

Hello STDERR 1Hello STDERR 2Downloading data file https://dax-cdn.cdn.appdomain.cloud/dax-noaa-weather-data-jfk-airport/1.1.4/noaa-weather-data-jfk-airport.tar.gz ...
Saving downloaded file "noaa-weather-data-jfk-airport.tar.gz" as ...
Extracting downloaded file in directory "data" ...
Removing downloaded file ...

Local execution is fine:

Processing Pipeline : ww
Hello STDERR 1Downloading data file https://dax-cdn.cdn.appdomain.cloud/dax-noaa-weather-data-jfk-airport/1.1.4/noaa-weather-data-jfk-airport.tar.gz ...
Saving downloaded file "noaa-weather-data-jfk-airport.tar.gz" as ...
Extracting downloaded file in directory "data" ...
Removing downloaded file ...
Hello STDERR 2

@lresende
Copy link
Member Author

I believe this might be a limitation on how the subprocess python API redirects stderr to stdout.

Copy link
Member

@ptitzler ptitzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin-bates
Copy link
Member

I believe this might be a limitation on how the subprocess python API redirects stderr to stdout.

Actually, I think this may be because remote execution is using subprocess.check_call(), while the local execution is using subprocess.run() and there's this statement from the docs...

Code needing to capture stdout or stderr should use run() instead:
run(..., check=True)

I believe I mentioned this in an earlier review. It might be worth updating kfp_notebook's python processing to use run() - primarily so the two are also the same, but we may find the two also produce the same behavior.

@lresende lresende merged commit 746b347 into elyra-ai:master Sep 23, 2020
@lresende lresende deleted the pipeline-python-script branch September 23, 2020 01:02
@ptitzler
Copy link
Member

Actually, ...

@kevin-bates @lresende do we need a separate issue to follow up on this?

@kevin-bates
Copy link
Member

@kevin-bates @lresende do we need a separate issue to follow up on this?

Thanks Patrick. I just created a kfp-notebook issue for this (linked above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for python script as pipeline node type
4 participants