Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support video playback from Zimit archives #292

Closed
Jaifroid opened this issue Sep 10, 2022 · 5 comments · Fixed by #293
Closed

Support video playback from Zimit archives #292

Jaifroid opened this issue Sep 10, 2022 · 5 comments · Fixed by #293
Assignees
Labels
enhancement experimental Experimental features
Milestone

Comments

@Jaifroid
Copy link
Member

Jaifroid commented Sep 10, 2022

There are significant challenges for video playback support in the absence of the ability to load the Replay system from the ZIM. For description of some of these, particularly in relation to YouTube video content, see openzim/zimit#122 (comment) and the linked issue openzim/warc2zim#80 and the linked PR openzim/warc2zim#83. The main difficulty is catching POST requests in the Service Worker (we currently filter out anything that is not a GET) and then following the same fuzzy matching rules. Sample rules from the PR:

    {
        "match": re.compile(
            r"//(?:www\.)?youtube(?:-nocookie)?\.com\/(youtubei\/[^?]+).*(videoId[^&]+).*"
        ),
        "replace": r"//youtube.fuzzy.replayweb.page/\1?\2",
    },
    {
        "match": re.compile(r"//(?:www\.)?youtube(?:-nocookie)?\.com/embed/([^?]+).*"),
        "replace": r"//youtube.fuzzy.replayweb.page/embed/\1",
    },
    {
        "match": re.compile(
            r".*(?:gcs-vimeo|vod|vod-progressive)\.akamaized\.net.*?/([\d/]+.mp4)$"
        ),
        "replace": r"vimeo-cdn.fuzzy.replayweb.page/\1",
    },
    {
        "match": re.compile(r".*player.vimeo.com/(video/[\d]+)\?.*"),
        "replace": r"vimeo.fuzzy.replayweb.page/\1",
    },
@Jaifroid Jaifroid added enhancement experimental Experimental features labels Sep 10, 2022
@Jaifroid Jaifroid self-assigned this Sep 10, 2022
@Jaifroid Jaifroid added this to the Release 2.2.0 milestone Sep 10, 2022
@mossroy
Copy link

mossroy commented Sep 10, 2022

Has some kind of "contract" been agreed with people working on Zimit, on how videos are implemented in such ZIM files?
If not, I think you'll exhaust yourself running after all the possible contents, and all the future changes

@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 10, 2022

Thank you for raising this @mossroy. I think it would be useful to have a wider conversation about such understandings or guarantees. Having said that, on the Kiwix side @rgaudin has been very generous in responding to any issues I've raised and to any requests for information that he might have. But the Zimit case is a bit specific: the challenges surrounding developing an alternative reader for warc2zim (in the absence of the ability to run the Replay system) relate to the fact that the Web Archive format is developed independently of Kiwix, and specific work on warc2zim was contracted in, AIUI.

Until very recently, YouTube video support was broken in Zimit archives due to a change in the way the embedded JS player provided by Google contacts the YouTube servers (it changed to using POST requests, which greatly complicated getting the video BLOBS). The effort to fix that necessarily entailed a change in format (dubbed "POST request canonicalization"...).

My immediate reason for wanting to fix this specific issue (which I think is less complicated than it sounds) at my end is that it's the last major piece of the Zimit puzzle. Nearly all Zimit ZIMs Kiwix publishes can now be read pretty well by KJSWL with the exception of embedded video content from YouTube (mostly) that will shortly be available (in a working format) in such ZIMs.

If we decide to backport Zimit support upstream to Kiwix JS in a less experimental way (ideally by finding ways to use the wabac.js API, currently undocumented), it woud certainly be important to get some internal guarantee or "contract".

@mossroy
Copy link

mossroy commented Sep 10, 2022

OK I understand

@Jaifroid
Copy link
Member Author

Better list of fuzzy substitution rules and code for the fuzzy matching algorithm:

https://github.com/webrecorder/wabac.js/blob/main/src/fuzzymatcher.js

@Jaifroid
Copy link
Member Author

See openzim/zimit#122 (comment) for documentation of my investigation of the matching process that is required in order to identify the correct video BLOB from the ZIM, given a specific videoId.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement experimental Experimental features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants