feat: page level caching #3158

elevatebart · 2025-02-19T12:38:25Z

🔗 Linked issue

closes #3151

❓ Type of change

📖 Documentation (updates to the documentation or readme)
🐞 Bug fix (a non-breaking change that fixes an issue)
👌 Enhancement (improving an existing functionality like performance)
✨ New feature (a non-breaking change that adds functionality)
⚠️ Breaking change (fix or feature that would cause existing functionality to change)

📚 Description

This PR moves the caching level to each page/record instead of each collection.

This allows for faster iteration.

Build time
- Only take into account the definition of the table in the table hash
- Add a __hash__ field on each content table to store the hash of the current record
  - it should have the "unique" constraint
- Calculate at build time the hash of the values then append the hash at the end of the values before setting the query
- Store the list of hash in the database dump as the first line of the dump.
  - Store in a SQL comment to respect sql syntax
  - Store it as a JSON table to facilitate parsing and avoid bad chars
At runtime
- Start by getting the list of existing hash from the database
- From the database dump get the list of hash that needs to be set after init
- Get the list of hash that need to be removed and delete them (split into multiple queries if necessary)
- Run the needed inserts from the dump
  - Since the queries and the hash are in the same order, use the array to know what inserts to run

📝 Checklist

I have linked an issue or discussion.
I have updated the documentation accordingly.

elevatebart · 2025-02-19T12:39:04Z

This PR still needs some extra tests for updating an existing db.

pkg-pr-new · 2025-02-19T13:49:04Z

npm i https://pkg.pr.new/@nuxt/content@3158

commit: cf4ffc8

elevatebart · 2025-02-20T09:12:45Z

Live testing has been made on this PR.

I would love some help for E2E testing of updates.
I can't find a way to setup a project, build it is serve it... specially since I am trying to build/serve multiple times with file changes.

If you have the time to show me the ropes, I am all ears.
If not, I'd rather have you write the E2E tests here.

Test scenarii:

initialization works on an empty database
when collection structure change (add a new field), table is dropped
when adding md doc, the new doc is added in the DB
when removing a doc, removed in the DB
when updating a doc, doc ...
All 3 together (add some, remove some, update some)

To be complete, docs should be in multiple directories so re-order has an effect.

farnabaz

Generally, I like the idea.
I have concerns about the solution. Removing drop queries is a breaking change for the full dump and breaks NuxtHub deployment. (NuxtHub uses this full dump to refill the database)

What if we keep the drop query and prefix it with structure checksum /* checksum */ DROP ..., then we ignore these lines if the checksum is not changed.

We can add similar prefixes for insert and update queries. This will simply hash detection here

if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) {

The dump can be like

/* structure:{HASH} */ DROP TABLE ......
/* structure:{HASH} */ CREATE TABLE ...
/* hash:{ROW_HASH} */ INSERT ...
/* hash:{ROW_HASH} */ UPDATE ...

With these comments, the dump is executable outside of our logic and we have optimize our import

WDYT?

farnabaz · 2025-02-24T16:12:33Z

src/runtime/internal/database.server.ts

+    // If the structure has not changed,
+    // skip any insert/update line whose hash is already in the database.
+    // If not, since we dropped the table, no record is skipped, insert them all again.
+    if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) {


Values of indexesToInsert does not match the index of queries in dump therefore indexesToInsert.includes(index) is not a valid condition to check

The hashes indexed here come directly from the dump

So they are ordered the same way as the insert query, I make sure of it at build time.

The table of hash is padded at the beginning with the exact number of lines in the init. If there are 3 lines of insert, we insert 3 empty strings at the beginning.

That is why I would have loved a few e2e tests.

I like your idea of having a hash at the begining of the line so the code is simpler to understand

elevatebart · 2025-02-25T10:02:09Z

I have tweaked the system a little:

restored the drop table in the dump
added the hash in comment in front of updates and inserts
ignored all structure queries if the structure has not changed
made sure that info version inserts and updates are always committed

Example of resulting dump

-- ["VwBBpsSOHZ","EO7IcoosJP"]
CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE);
/* starting_init */ INSERT INTO _content_info VALUES ('checksum_content', false, 'fr2JfCq2gl', 'v3.2.3--dE9pgeuPBt', '3hvYqGvzZ8');
DROP TABLE IF EXISTS _content_content;
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "body" TEXT, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE);
/* VwBBpsSOHZ */ INSERT INTO _content_content VALUES ('content/about.md', 'About', '{"type":"minimal","value":[["h1",{"id":"about"},"About"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":false,"numberField":123,"arrayField":["item3","item4"]}', 'true', '/about', '{"title":"About","description":""}', 'about', 'VwBBpsSOHZ');
/* EO7IcoosJP */ INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":true,"numberField":1,"arrayField":["item1","item2"]}', 'true', '/', '{"title":"Home page","description":""}', 'index', 'EO7IcoosJP');
/* successful_init */ UPDATE _content_info SET ready = true WHERE id = 'checksum_content';

NOTA: I kept the first line of hashes as an index that I do not need to rebuild as I run through the dump.
The deletes have to be done first to avoid primary key conflicts.
To do delete the records that need cleaning, I need the list of hash.
To get the list of hash I need to roll through the dump which could be costly.

I will optimize for readability and remove that first line.

elevatebart · 2025-02-25T10:51:40Z

In that last commit I changed the format of the dump once more:

CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_info VALUES ('checksum_content', false, 'RYgZDcy6sk', 'v3.2.3--DZOppTJAd2', 'synekhxovA');
DROP TABLE IF EXISTS _content_content; /* structure */
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "arrayField" TEXT, "body" TEXT, "booleanField" BOOLEAN, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "numberField" INT, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '["item1","item2"]', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', true, '', 'md', '{}', 'true', 1, '/', '{"title":"Home page","description":""}', 'index', 'JApZF0Dncz'); /* checksum: JApZF0Dncz */
UPDATE _content_info SET ready = true WHERE id = 'checksum_content';

This feels more readable to both human beings and sql scripts.

All comments are at the end to avoid changing the tests.
Structure related statements are flagged with a /* structure */ suffix.
Version related must never be ignored so they have no comment.

farnabaz

As comments moved to end, we can use -- comment which helps us to improve performance a bit more

farnabaz · 2025-02-26T11:33:01Z

src/module.ts

    const collectionHash = hash(collection)
    const collectionQueries = generateCollectionTableDefinition(collection, { drop: true })
-      .split('\n')
+      .split('\n').map(q => `${q} /* structure */`)


Suggested change

.split('\n').map(q => `${q} /* structure */`)

.split('\n').map(q => `${q} -- structure`)

As we moved comments to end, we can use -- comment which helps us to improve performance a bit more

farnabaz · 2025-02-26T11:33:23Z

src/module.ts

      list.sort((a, b) => String(a[0]).localeCompare(String(b[0])))

-      collectionQueries.push(...list.flatMap(([, sql]) => sql!))
+      collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} /* checksum: ${hash} */`)))


Suggested change

collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} /* checksum: ${hash} */`)))

collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} -- ${hash}`)))

farnabaz · 2025-02-26T11:34:18Z

src/module.ts

+      `${generateCollectionTableDefinition(infoCollection, { drop: false })} /* structure */`,
+      ...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries,


Suggested change

`${generateCollectionTableDefinition(infoCollection, { drop: false })} /* structure */`,

...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries,

`${generateCollectionTableDefinition(infoCollection, { drop: false })} -- structure`,

...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries.map(sql => `${sql} -- meta`),

farnabaz · 2025-02-26T11:35:54Z

src/runtime/internal/database.server.ts

+    const hashListFromTheDump: string[] = dump.map((sql) => {
+      return CHECKSUM_REGEXP.exec(sql)?.[1]
+    }).filter(Boolean) as string[]


Suggested change

const hashListFromTheDump: string[] = dump.map((sql) => {

return CHECKSUM_REGEXP.exec(sql)?.[1]

}).filter(Boolean) as string[]

const hashListFromTheDump: string[] = dump.map(row => row.split(' -- ').pop())

farnabaz · 2025-02-26T14:01:49Z

src/runtime/internal/database.server.ts

  await dump.reduce(async (prev: Promise<void>, sql: string) => {
    await prev
+
+    // If the structure has not changed,
+    // skip any insert/update line whose hash is already in the database.
+    // If not, since we dropped the table, no record is skipped, insert them all again.
+    if (unchangedStructure) {
+      // skip any line that is structure related,
+      // the structure is unchanged
+      if (/\/\* structure \*\/$/.test(sql)) {
+        return Promise.resolve()
+      }
+
+      // skip any record whose hash is not already in the DB
+      const hash = CHECKSUM_REGEXP.exec(sql)?.[1]
+      if (hash && !hashesInDb.includes(hash)) {
+        return Promise.resolve()
+      }
+    }


Suggested change

await dump.reduce(async (prev: Promise<void>, sql: string) => {

await prev

// If the structure has not changed,

// skip any insert/update line whose hash is already in the database.

// If not, since we dropped the table, no record is skipped, insert them all again.

if (unchangedStructure) {

// skip any line that is structure related,

// the structure is unchanged

if (/\/\* structure \*\/$/.test(sql)) {

return Promise.resolve()

}

// skip any record whose hash is not already in the DB

const hash = CHECKSUM_REGEXP.exec(sql)?.[1]

if (hash && !hashesInDb.includes(hash)) {

return Promise.resolve()

}

}

await dump.reduce(async (prev: Promise<void>, sql: string, index: number) => {

await prev

const _hash = hashListFromTheDump[index]

// If the structure has not changed,

// skip any insert/update line whose hash is already in the database.

// If not, since we dropped the table, no record is skipped, insert them all again.

if (unchangedStructure) {

// skip any line that is structure related,

// the structure is unchanged

if (hash === "structure") {

return Promise.resolve()

}

// skip any record whose hash is not already in the DB

if (!hashesInDb.includes(hash)) {

return Promise.resolve()

}

}

elevatebart · 2025-02-26T14:18:43Z

Thanks @farnabaz, All good suggestions. I took them all.

I also found out that the algorithm was backwards:
I was skipping instead of inserting the records that wer already in the DB...
I fixed it and added a comment

farnabaz · 2025-02-26T15:14:20Z

src/runtime/internal/database.server.ts

+    // in D1, there is a bug where semicolons and comments can't work together
+    // so we need to split the SQL and remove the comment
+    // @see https://github.com/cloudflare/workers-sdk/issues/3892
+    const [statement, hash] = sql.split(' -- ')


This will fail if query contains --. Imagine if the content contains sql snippet.

we can use hashListFromTheDump array to find line hash value and remove via substring

Something like

const hash = hashListFromTheDump[index] const statement = sql.substring(0, sql.length - (' -- ' + hash).length)

good catch yeh !

farnabaz

LGTM 👍
Great work @elevatebart

elevatebart added 5 commits February 19, 2025 11:01

feat: add a hash per record in pages

cf060b9

fix: add the hash in local preview database too

e7222f8

create the page level caching system

4cdbd7a

fix: version missing

d4c99b8

fix tests

8869a48

elevatebart marked this pull request as draft February 19, 2025 12:38

elevatebart added 3 commits February 19, 2025 14:31

fix lints

4405e39

Merge branch 'main' into feat/page-level-caching

177550b

set proper insert queries in dev

0dd1ea7

elevatebart added 7 commits February 19, 2025 15:05

fix: only drop the table if the structure has changed

37432d0

update database version

21560fc

fix tests and build

bd0f83f

only have the database version on the structure

c690a23

fix: order hash by the same as the insert statements

6db9e28

Merge branch 'main' into feat/page-level-caching

6e9dbce

fix database version snafu

9326dca

elevatebart changed the title ~~feat: page level caching (WIP)~~ feat: page level caching Feb 19, 2025

elevatebart marked this pull request as ready for review February 19, 2025 19:44

Merge branch 'main' into feat/page-level-caching

06d551e

farnabaz requested changes Feb 24, 2025

View reviewed changes

elevatebart added 4 commits February 25, 2025 09:55

Merge branch 'main' into feat/page-level-caching

bdb89f5

fix: restore drop database in the dump

0f5259e

add hash on query

985e3bd

fix info comments

1c38a8a

elevatebart added 2 commits February 25, 2025 11:04

simplify: remove the sql comment skip optim

77da3a8

update tests

940f03c

elevatebart added 2 commits February 25, 2025 11:49

final format for dump

74ee06b

remove final comment

53198e2

fix: update algo

b2eff41

farnabaz requested changes Feb 26, 2025

View reviewed changes

elevatebart added 3 commits February 26, 2025 15:03

use -- comments

0ecbf74

take suggestions

cc4bef4

optimize for lookup using a set

81aae7b

fix: for D1 to work, remove comment

4d70008

farnabaz reviewed Feb 26, 2025

View reviewed changes

elevatebart and others added 4 commits February 26, 2025 16:50

Merge branch 'main' into feat/page-level-caching

0989326

make sure it works when sql snippets are in the data.

8d1a897

chore: use set for diff check

a1a69f9

chore: update database version to match next release number

cf4ffc8

farnabaz approved these changes Feb 26, 2025

View reviewed changes

farnabaz merged commit f4e4f4c into nuxt:main Feb 26, 2025
5 checks passed

	.split('\n').map(q => `${q} /* structure */`)
	.split('\n').map(q => `${q} -- structure`)

	collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} /* checksum: ${hash} */`)))
	collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} -- ${hash}`)))

		`${generateCollectionTableDefinition(infoCollection, { drop: false })} /* structure */`,
		...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries,

Uh oh!

feat: page level caching #3158

feat: page level caching #3158

Uh oh!

Conversation

elevatebart commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Linked issue

❓ Type of change

📚 Description

📝 Checklist

Uh oh!

elevatebart commented Feb 19, 2025

Uh oh!

pkg-pr-new bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elevatebart commented Feb 20, 2025

Uh oh!

farnabaz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevatebart commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elevatebart commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

farnabaz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevatebart commented Feb 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

farnabaz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

elevatebart commented Feb 19, 2025 •

edited

Loading

pkg-pr-new bot commented Feb 19, 2025 •

edited

Loading

farnabaz left a comment •

edited

Loading

elevatebart commented Feb 25, 2025 •

edited

Loading

elevatebart commented Feb 25, 2025 •

edited

Loading