Skip to content

Cannot stream-write a zipfile #631

@Clockwork-Muse

Description

@Clockwork-Muse

I'm attempting to stream-write an uncompressed zip file (200MB+ -> 1GB+ eventually, mostly ~3MB images) to avoid writing to disk. Unfortunately when the zip file is closed it attempts to at least partially flush the stream, and the storage client seems to assume that a flush will only occur to close the streamed file, which isn't the case here (and seems wrong in the general sense, since buffers are often flushed when they reach saturation).

Environment details

  • OS type and version: Ubuntu 20.04 (custom devcontainer docker image)
  • Python version: 3.8.10
  • pip version: 21.3
  • google-cloud-storage version: 1.42.3

Code example

import zipfile

from google.cloud.storage import Blob, Client

client = Client("some-account")
blob = Blob.from_string("gs://some-bucket/some-folder/something.zip", client)

with blob.open(mode="wb") as blob_file, \
    zipfile.ZipFile(blob_file, mode="w") as zip_file:
    # Empty/not empty doesn't matter, the same error is generated
    pass

Stack trace

Traceback (most recent call last):
  File "s.py", line 14, in <module>
    pass
  File "/usr/lib/python3.8/zipfile.py", line 1312, in __exit__
    self.close()
  File "/usr/lib/python3.8/zipfile.py", line 1839, in close
    self._write_end_record()
  File "/usr/lib/python3.8/zipfile.py", line 1947, in _write_end_record
    self.fp.flush()
  File "/workspaces/someproject/.pyenv/lib/python3.8/site-packages/google/cloud/storage/fileio.py", line 401, in flush
    raise io.UnsupportedOperation(
io.UnsupportedOperation: Cannot flush without finalizing upload. Use close() instead.

</ br>

Workaround

Insert an io.BufferedWriter:

import io
import zipfile

from google.cloud.storage import Blob, Client

client = Client("some-account")
blob = Blob.from_string("gs://some-bucket/some-folder/something.zip", client)

with blob.open(mode="wb") as blob_file, \
    io.BufferedWriter(blob_file) as binary_file, \
    zipfile.ZipFile(binary_file, mode="w") as zip_file:
    # Empty/not empty doesn't matter, the same error is generated
    pass

This results in the file being written to cloud storage (and at least for a simple case, including correctly written contents), but it prints a new error:

Traceback (most recent call last):
  File "s.py", line 24, in <module>
    pass
  File "/workspaces/someproject/.pyenv/lib/python3.8/site-packages/google/cloud/storage/fileio.py", line 406, in close
    self._checkClosed()  # Raises ValueError if closed.
  File "/workspaces/someproject/.pyenv/lib/python3.8/site-packages/google/cloud/storage/fileio.py", line 413, in _checkClosed
    raise ValueError("I/O operation on closed file.")
ValueError: I/O operation on closed file.

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the googleapis/python-storage API.status: investigatingThe issue is under investigation, which is determined to be non-trivial.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions