Skip to content

Conversation

@kosarko
Copy link
Member

@kosarko kosarko commented May 20, 2025

This contains the changes from lindat-2025.05.15113644568

Run Python import weekly
ZCU-PUB/Fixed encoding of the filename from the URL (dataquest-dev#838) (dataquest-dev#851)
UFAL/Do not mount the Solr configs; copy them each time instead. (dataquest-dev#850)
UFAL/Fix the bulk access (dataquest-dev#852)
UFAL/Do not add dtoken into the URL if it is null (dataquest-dev#860) (dataquest-dev#861)
UFAL/Added spacing between clarin & dspace logo (dataquest-dev#848)
Ufal/seznam license request (dataquest-dev#844)
Merge remote-tracking branch 'dataquest-dev/dtq-dev' into HEAD
Ufal dtq sync 2025 05 14 (dataquest-dev#855)

milanmajchrak and others added 10 commits April 24, 2025 08:46
* Added spacing between clarin & dspace logo

* Spacing is prettier and when clarin logo is removed, dspace logo is centered
* Fix of unwanted Seznam Dataset License request while using other licenses
…ace#851)

* Added an URL serializer to fix encoding of the special characters from the URL e.g., `[`, `(` because the filename wasn't properly parsed

* Added some unit tests for encoding the bitstream filename url
* Do not use the clarin item view box for the bulk access

* Removed unused import

* Clarify the `showClarinViewBox` is boolean

* Use the constant for the hardcoded bulk access list id
* Create Acknowledgment-ReadMe.md

Acnkowledgment of NRP project

(cherry picked from commit ad889b2)

* Video files previews

This uses the thumbnail as poster (if available) and correctly sets the
source of the video

currently only works for anonymously accessible files.

(cherry picked from commit 4832c2f)

* Handle video previews for restricted items

append a shortlived token at the right time (error, seeking, stalled)

(cherry picked from commit 2c12d7d)

* Display only ORIGINAL bitstreams

Thumbnails, when available, should be shown istead of the generic
MIME_TYPE_IMAGE.

Content of the TEXT bundle should not be shown at all this is usually
automatically extracted "text layer" of a PDF, useful for indexing, but
don't want people downloading it.

(cherry picked from commit 3862442)

* fix linter and test errors

(cherry picked from commit f852096)

* Code review follow up

the listOfFiles should really not contain files from "TEXT" or
"THUMBNAIL" bundles.

* code review

unsubscribe error$, seeking$ and $stalled

* code review - thumbnail might be undefined

* code review - consistent formatting
this contains the changes from lindat-2025.05.15113644568
@kosarko kosarko requested a review from Copilot May 20, 2025 11:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Sync various updates from lindat-2025.05 including weekly import, filename‐encoding improvements, bulk access tweaks, Solr config copy changes, and UI/logo layout adjustments.

  • Add a custom UrlSerializer and encodeRFC3986URIComponent util to properly handle special characters in bitstream URLs
  • Introduce bulk‐access list ID and clarify view logic for CLARIN vs default list display
  • Update Docker Compose to copy Solr configs on each start and schedule a weekly import

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/app/shared/object-list/object-list.component.ts Define BULK_ACCESS_LIST_ID and showClarinViewBox() for bulk‐access logic
src/app/shared/object-list/object-list.component.html Swap hardcoded context === 'search' checks for showClarinViewBox()
src/app/shared/clarin-shared-util.ts Add encodeRFC3986URIComponent to safely encode filenames
src/app/login-page/login-page.component.html Align logos, add target="_blank" and rel attrs
src/app/item-page/simple/field-components/preview-section/file-description/file-description.component.ts Implement OnDestroy to manage subscriptions and handle missing thumbnails
src/app/item-page/simple/field-components/preview-section/file-description/file-description.component.spec.ts Minor spec formatting update
src/app/item-page/clarin-files-section/clarin-files-section.component.ts Restrict metadata bitstream query to ORIGINAL only
src/app/core/url-serializer/bitstream-url-serializer.ts New BitstreamUrlSerializer to encode filenames in router URLs
src/app/core/url-serializer/bitstream-url-serializer.spec.ts Tests for filename encoding in /bitstream/ URLs
src/app/bitstream-page/legacy-bitstream-url.resolver.ts Apply encodeRFC3986URIComponent to legacy download filenames
src/app/bitstream-page/clarin-zip-download-page/clarin-zip-download-page.component.ts Conditionally append dtoken query param only if present
src/app/bitstream-page/clarin-license-agreement-page/clarin-license-agreement-page.component.ts Subscribe to item$ and load license content only when appropriate
src/app/app.module.ts Provide custom BitstreamUrlSerializer for the router
docker/docker-compose-rest.yml Remove Solr volume mount; copy configsets at startup
.github/workflows/import-weekly.yml Schedule weekly import job every Sunday
.github/disabled-workflows/deploy.yml Remove legacy import schedule from disabled workflow
Comments suppressed due to low confidence (4)

src/app/item-page/simple/field-components/preview-section/file-description/file-description.component.ts:38

  • [nitpick] Property handlers_added uses snake_case; align with TypeScript style by renaming to handlersAdded.
handlers_added = false;

src/app/shared/clarin-shared-util.ts:96

  • This new utility lacks unit tests. Add tests covering spaces, brackets, parentheses, and edge cases to ensure correct encoding.
export function encodeRFC3986URIComponent(uriPart: string) {

src/app/core/url-serializer/bitstream-url-serializer.spec.ts:26

  • Consider adding a test for filenames containing multiple / segments to verify path reconstruction (and catch unintended comma insertion).
it('should encode special characters in the filename in /bitstream/ URLs', () => {

src/app/shared/clarin-shared-util.ts:101

  • The implementation only replaces parentheses. To match the docstring and cover brackets ([]) and spaces, extend the replacement or use a more comprehensive regex/pipeline.
.replace(/[()]/g, c => '%' + c.charCodeAt(0).toString(16).toUpperCase());

loadLicenseContentSeznam() {
this.htmlContentService.getHmtlContentByPathAndLocale(this.LICENSE_PATH_SEZNAM_CZ).then(content => {
this.licenseContentSeznam.next(content);
this.item$.subscribe((item) => {
Copy link

Copilot AI May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subscribing directly without cleanup can cause memory leaks. Consider using take(1) or unsubscribing in ngOnDestroy, or refactor to async pipe.

Suggested change
this.item$.subscribe((item) => {
this.item$.pipe(take(1)).subscribe((item) => {

Copilot uses AI. Check for mistakes.
@kosarko kosarko merged commit d7e8551 into clarin-v7 May 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants