Skip to content

[FEATURE REQUEST] MMC4 Dataset Preprocess #218

@ElegantLin

Description

@ElegantLin

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

  1. I think The output shard will contain all the examples, which is not reasonable.
  2. Some of the images do not exist at this time. The pre-process script should filter these images.

Describe the workflow you want to enable.
A clear and concise description of what you want to happen.

  1. Split the shards making each shard contain 10k examples.
  2. Check the existence of the images before writing them into JSON file.

Describe your proposed solution.
How do you propose to address this? A high-level description is fine, but detailed suggestions are also welcome.

I can push PR to fix it.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Are you willing to help implement this feature?
If so, please comment on how long you expect it to take or what kind of support you would require from the OpenFlamingo team!

Yes. I think I can finish it in one week.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions