Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dependency management and archive generation #702

Merged
merged 9 commits into from
Jul 7, 2020

Conversation

kevin-bates
Copy link
Member

@kevin-bates kevin-bates commented Jun 30, 2020

These changes improve how dependencies are handled, generally eliminating the need to include subdirectories when adding dependencies from subdirectories.

  • Allow for '*', '?', and '[a-z]' / '[!a-z]' (sequence) wildcards.
  • Allow for embedded wildcards, including directory names (e.g., inputs/ACS* - to include all directories beginning with ACS in subdirectory inputs)
  • Enforce that all dependencies must be matched (i.e., present in resulting archive)
  • Updated tests to include additional wildcard testing
  • Added timing metrics around pipeline compilation, archive creation, and upload - logged as DEBUG statements
  • Minor refactoring to help with readability
  • Once nodes are updated to not include subdirectories but only include what they need, we see ~70% improvement in the submission time of the COVID pipeline - 80 secs to 25 secs. With no changes to the pipeline, the submission goes from 80 secs to 50 secs.

Behavioral Changes

  1. With these changes, it is no longer the case that selecting 'include subdirectories' with NO files listed as dependencies results in all files being included! Instead, no files will be included in that case. One must add a dependency of '*' to have all files included. If include-subdirectories is also selected, then all files in the folder hierarchy will be added. This treats empty dependency lists the same (and expected) way.
  2. As noted above, unmatched dependencies will now fail a pipeline's submission for run and export.

Fixes #204
Fixes #572

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
if not archive:
raise RuntimeError('Internal error creating archive: {}'.format(archive_name))
if require_complete and not include_all:
# convert filenames and matched_filenames to sets and ensure they're the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not creating sets for both from the start instead of initiating them as arrays?

Copy link
Member Author

@kevin-bates kevin-bates Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question @karlaspuldaro.
Sets are immutable, so because matched_filenames gets built up during the processing, it can't be a set until we're done. filenames, on the other hand, comes into the method as a list, but I don't see it getting altered. If python sets behave like lists in terms of iterables, then, yeah, filenames should be converted to a set first thing.
The other thing converting a list to a set does is remove duplicates - which I can't think of any reason why we wouldn't also want that. Good idea. I'll look at converting filenames to a set when the method is invoked. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I've pushed the update. It wound up simplifying indentation as well. Thank you for the suggestion!
I'm going to mark this conversation as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out I was mistaken - Python sets are mutable - only frozenset is immutable. Thank you to @karlaspuldaro for digging deeper into things! I've just pushed a change that eliminates the need for conversion - thanks again Karla!

@lresende lresende merged commit 69ccd62 into elyra-ai:master Jul 7, 2020
Copy link
Member

@akchinSTC akchinSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevin-bates kevin-bates deleted the improve-archive branch July 7, 2020 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants