Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added untar progress similar to existing unzip #17519

Merged
merged 5 commits into from
Jan 17, 2025

Conversation

perseoGI
Copy link
Contributor

Changelog: Feature: Add report progress while unpacking tarball files
Docs: omit

Closes #14589

  • Refer to the issue that supports this Pull Request.
  • If the issue has missing info, explain the purpose/use case/pain/need that covers this Pull Request.
  • I've read the Contributing guide.
  • I've followed the PEP8 style guides for Python code.
  • I've opened another PR in the Conan docs repo to the develop branch, documenting this one.

Copy link
Member

@memsharded memsharded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different pieces in this PR.
For the untargz it seems the same approach as used in the FileUploader makes sense, but reusing same code, prepared for later refactor.

For the zip uncompress, if it can use the same class FileProgress, then great, if not, maybe leave that untouched.

conan/tools/files/files.py Outdated Show resolved Hide resolved
conan/tools/files/files.py Outdated Show resolved Hide resolved
Copy link
Member

@memsharded memsharded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileProgress wrapping the full file object, including the wrapping with open/close context is a bit risky, as the file descriptor will not be properly closed. This is one of the reasons why the previous wrapper was not inheriting from io.FileIO and wrapping the minimum necessary for the progress.

conan/tools/files/files.py Outdated Show resolved Hide resolved
conan/tools/files/files.py Outdated Show resolved Hide resolved
conan/tools/files/files.py Outdated Show resolved Hide resolved
@memsharded memsharded self-assigned this Dec 27, 2024
@perseoGI
Copy link
Contributor Author

FileProgress wrapping the full file object, including the wrapping with open/close context is a bit risky, as the file descriptor will not be properly closed. This is one of the reasons why the previous wrapper was not inheriting from io.FileIO and wrapping the minimum necessary for the progress.

I understand your concern. I tried the original FileProgress approach but wanted to do a more "general" FileProgress wrapper that could work in different scenarios, such as this one:
https://github.com/perseoGI/conan/blob/8fe92be6d52f5fa357fed25d83ddced6b28b8f15/conan/tools/files/files.py?plain=1#L356
The only way I found to get a tarfile progress without any impact in the decompression is to create a wrapper of the fileobj class.
Please look at the new changes which now use with statement simplifying the consumer side.

filesize = os.path.getsize(abs_path)
with open(abs_path, mode='rb') as file_handler:
big_file = filesize > 100000000 # 100 MB
file_handler = FileProgress(file_handler, filesize) if big_file else file_handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep this pattern: avoid the overhead of the progress for small files, do it only for big files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when applied to the unzip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using a TimedOutput every 10", see https://github.com/conan-io/conan/pull/17519/files#diff-332104132c8c5a1bf5b07a3835b599373594e2d5a32dfba98955b7c2377ba721R104.
It may not be needed to control whether the file is big.

You could be trying to download a 1MB file and if your bandwidth is poor, it could take a while.
Progress bars, or in this case, "progress prints" IMHO are useful for users as they can have feedback about something being done. So the 10-second interval sounds better to me than the file size!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If downloading a 1Mb file takes a lot, you are not going to be able to use Conan to install dependencies, as it is very frequent that packages have hundreds of MBs.

The issue is not about the output, even if for small files that don't take 10 seconds to download and there is no output, there are other different risks:

  • If there is some bug or issue, the chances that it affects are much higher. A ton of files, maybe 95% of files that conan will download are <100Mbs. It is better to reduce the code to the minimum. The only code that is guaranteed to not have bugs is the one that is not written (or executed for the case)
  • There will always be some performance penalty. Maybe it is small or negligible in this case, but we are already starting to have performance as an important factor, as Conan use cases keep scaling. Using the progress for all files, irrespective if they are small and they will never print anything, might still have an impact on file read performance.

I know I am probably being excessively concerned about this, but this is mostly because of the previous scars I have from progress bars and other "beautiful UX nice to have" things that end up bringing very hard to debug issues. So for me, in this area, the less the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand your concerns, I've slightly changed the code only to report progress (and only execute that part of the code) if the file size is greater than 100MB (as it was originally) being done.

def read(self, size: int = -1) -> bytes:
current_percentage = int(self.tell() * 100.0 / self._total_size) if self._total_size != 0 else 0
self._t.info(f"{self.msg} {self._filename}: {current_percentage}%")
return super().read(size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to accumulate the read bytes and avoid the extra self.tell() call

Copy link
Contributor Author

@perseoGI perseoGI Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is a good one!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the last commit

@memsharded memsharded merged commit 4bb0bcc into conan-io:develop2 Jan 17, 2025
33 checks passed
@memsharded memsharded added this to the 2.12.0 milestone Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Report progress of unpacking source files
2 participants