Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend JSON serialization capabilities #1612

Merged
merged 14 commits into from
Jun 4, 2024
Merged

Conversation

DarkLight1337
Copy link
Contributor

@DarkLight1337 DarkLight1337 commented May 28, 2024

#247 first introduced JSON serialization of filesystem objects. However, it fails to handle storage_args and storage_options that contain Path or other filesystems. This can occur in a few cases, such as:

  • When passing a Path instance to construct a DirFileSystem (not officially documented but the Path object is automatically converted to a string in the initializer, so it is functional)
  • When passing a filesystem to wrapper filesystems such as CachingFileSystem, DirFileSystem and ReferenceFileSystem.

This PR makes it possible to serialize such filesystems.

@martindurant
Copy link
Member

At a first quick glance, this PR looks great. Can I ask what kind of workflow you have in mind to make use of this feature?

@DarkLight1337
Copy link
Contributor Author

DarkLight1337 commented May 28, 2024

At a first quick glance, this PR looks great. Can I ask what kind of workflow you have in mind to make use of this feature?

I'm trying to set up document nodes that reference files in llama-index, using fsspec to read the files for each node on demand instead of copying them into storage. However, the nodes need to be JSON-serializable, so I wrote some code to temporarily patch in this feature. Hopefully, this PR would incorporate such functionality into the library directly.

@martindurant
Copy link
Member

Sounds like something Intake could do for you, but I might be slightly biased :)

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole encoder/decoder classes are quite long now, there's no reason they need to be in the main spec module - can they be pulled into a different file?

fsspec/spec.py Outdated
if (obj_cls := self.try_resolve_fs_cls(dct)) is not None:
return AbstractFileSystem.from_dict(dct)
if (obj_cls := self.try_resolve_path_cls(dct)) is not None:
return obj_cls(dct["str"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these both fail, you end up with just a dictionary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the default behaviour when decoding a dictionary from JSON.

fsspec/spec.py Outdated
try:
path_cls = _import_class(fqp)
except Exception:
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why catch Exception only to raise it again, all within the outer suppress block?

Copy link
Contributor Author

@DarkLight1337 DarkLight1337 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just to have the same control flow as the code for decoding the filesystem class. If you think it's too verbose then I can simplify it.

@DarkLight1337
Copy link
Contributor Author

DarkLight1337 commented May 29, 2024

The whole encoder/decoder classes are quite long now, there's no reason they need to be in the main spec module - can they be pulled into a different file?

Sure - I'll move them to fsspec.json.

@DarkLight1337
Copy link
Contributor Author

I have finished addressing your comments.

@martindurant martindurant merged commit 447c27d into fsspec:master Jun 4, 2024
11 checks passed
@DarkLight1337 DarkLight1337 deleted the json branch June 4, 2024 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants