-
-
Notifications
You must be signed in to change notification settings - Fork 741
feat: page level caching #3158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: page level caching #3158
Conversation
|
This PR still needs some extra tests for updating an existing db. |
commit: |
|
Live testing has been made on this PR. I would love some help for E2E testing of updates. If you have the time to show me the ropes, I am all ears. Test scenarii:
To be complete, docs should be in multiple directories so re-order has an effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like the idea.
I have concerns about the solution. Removing drop queries is a breaking change for the full dump and breaks NuxtHub deployment. (NuxtHub uses this full dump to refill the database)
What if we keep the drop query and prefix it with structure checksum /* checksum */ DROP ..., then we ignore these lines if the checksum is not changed.
We can add similar prefixes for insert and update queries. This will simply hash detection here
if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) {
The dump can be like
/* structure:{HASH} */ DROP TABLE ......
/* structure:{HASH} */ CREATE TABLE ...
/* hash:{ROW_HASH} */ INSERT ...
/* hash:{ROW_HASH} */ UPDATE ...With these comments, the dump is executable outside of our logic and we have optimize our import
WDYT?
| // If the structure has not changed, | ||
| // skip any insert/update line whose hash is already in the database. | ||
| // If not, since we dropped the table, no record is skipped, insert them all again. | ||
| if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values of indexesToInsert does not match the index of queries in dump therefore indexesToInsert.includes(index) is not a valid condition to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashes indexed here come directly from the dump
- So they are ordered the same way as the insert query, I make sure of it at build time.
- The table of hash is padded at the beginning with the exact number of lines in the init. If there are 3 lines of insert, we insert 3 empty strings at the beginning.
- That is why I would have loved a few e2e tests.
- I like your idea of having a hash at the begining of the line so the code is simpler to understand
|
I have tweaked the system a little:
Example of resulting dump -- ["VwBBpsSOHZ","EO7IcoosJP"]
CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE);
/* starting_init */ INSERT INTO _content_info VALUES ('checksum_content', false, 'fr2JfCq2gl', 'v3.2.3--dE9pgeuPBt', '3hvYqGvzZ8');
DROP TABLE IF EXISTS _content_content;
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "body" TEXT, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE);
/* VwBBpsSOHZ */ INSERT INTO _content_content VALUES ('content/about.md', 'About', '{"type":"minimal","value":[["h1",{"id":"about"},"About"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":false,"numberField":123,"arrayField":["item3","item4"]}', 'true', '/about', '{"title":"About","description":""}', 'about', 'VwBBpsSOHZ');
/* EO7IcoosJP */ INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":true,"numberField":1,"arrayField":["item1","item2"]}', 'true', '/', '{"title":"Home page","description":""}', 'index', 'EO7IcoosJP');
/* successful_init */ UPDATE _content_info SET ready = true WHERE id = 'checksum_content';NOTA: I kept the first line of hashes as an index that I do not need to rebuild as I run through the dump. I will optimize for readability and remove that first line. |
|
In that last commit I changed the format of the dump once more: CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_info VALUES ('checksum_content', false, 'RYgZDcy6sk', 'v3.2.3--DZOppTJAd2', 'synekhxovA');
DROP TABLE IF EXISTS _content_content; /* structure */
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "arrayField" TEXT, "body" TEXT, "booleanField" BOOLEAN, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "numberField" INT, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '["item1","item2"]', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', true, '', 'md', '{}', 'true', 1, '/', '{"title":"Home page","description":""}', 'index', 'JApZF0Dncz'); /* checksum: JApZF0Dncz */
UPDATE _content_info SET ready = true WHERE id = 'checksum_content'; This feels more readable to both human beings and sql scripts. All comments are at the end to avoid changing the tests. |
farnabaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As comments moved to end, we can use -- comment which helps us to improve performance a bit more
src/module.ts
Outdated
| const collectionHash = hash(collection) | ||
| const collectionQueries = generateCollectionTableDefinition(collection, { drop: true }) | ||
| .split('\n') | ||
| .split('\n').map(q => `${q} /* structure */`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .split('\n').map(q => `${q} /* structure */`) | |
| .split('\n').map(q => `${q} -- structure`) |
As we moved comments to end, we can use -- comment which helps us to improve performance a bit more
src/module.ts
Outdated
| list.sort((a, b) => String(a[0]).localeCompare(String(b[0]))) | ||
|
|
||
| collectionQueries.push(...list.flatMap(([, sql]) => sql!)) | ||
| collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} /* checksum: ${hash} */`))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} /* checksum: ${hash} */`))) | |
| collectionQueries.push(...list.flatMap(([, sql, hash]) => sql.map(q => `${q} -- ${hash}`))) |
src/module.ts
Outdated
| `${generateCollectionTableDefinition(infoCollection, { drop: false })} /* structure */`, | ||
| ...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `${generateCollectionTableDefinition(infoCollection, { drop: false })} /* structure */`, | |
| ...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries, | |
| `${generateCollectionTableDefinition(infoCollection, { drop: false })} -- structure`, | |
| ...generateCollectionInsert(infoCollection, { id: `checksum_${collection.name}`, version, structureVersion, ready: false }).queries.map(sql => `${sql} -- meta`), |
| const hashListFromTheDump: string[] = dump.map((sql) => { | ||
| return CHECKSUM_REGEXP.exec(sql)?.[1] | ||
| }).filter(Boolean) as string[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const hashListFromTheDump: string[] = dump.map((sql) => { | |
| return CHECKSUM_REGEXP.exec(sql)?.[1] | |
| }).filter(Boolean) as string[] | |
| const hashListFromTheDump: string[] = dump.map(row => row.split(' -- ').pop()) |
| await dump.reduce(async (prev: Promise<void>, sql: string) => { | ||
| await prev | ||
|
|
||
| // If the structure has not changed, | ||
| // skip any insert/update line whose hash is already in the database. | ||
| // If not, since we dropped the table, no record is skipped, insert them all again. | ||
| if (unchangedStructure) { | ||
| // skip any line that is structure related, | ||
| // the structure is unchanged | ||
| if (/\/\* structure \*\/$/.test(sql)) { | ||
| return Promise.resolve() | ||
| } | ||
|
|
||
| // skip any record whose hash is not already in the DB | ||
| const hash = CHECKSUM_REGEXP.exec(sql)?.[1] | ||
| if (hash && !hashesInDb.includes(hash)) { | ||
| return Promise.resolve() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| await dump.reduce(async (prev: Promise<void>, sql: string) => { | |
| await prev | |
| // If the structure has not changed, | |
| // skip any insert/update line whose hash is already in the database. | |
| // If not, since we dropped the table, no record is skipped, insert them all again. | |
| if (unchangedStructure) { | |
| // skip any line that is structure related, | |
| // the structure is unchanged | |
| if (/\/\* structure \*\/$/.test(sql)) { | |
| return Promise.resolve() | |
| } | |
| // skip any record whose hash is not already in the DB | |
| const hash = CHECKSUM_REGEXP.exec(sql)?.[1] | |
| if (hash && !hashesInDb.includes(hash)) { | |
| return Promise.resolve() | |
| } | |
| } | |
| await dump.reduce(async (prev: Promise<void>, sql: string, index: number) => { | |
| await prev | |
| const _hash = hashListFromTheDump[index] | |
| // If the structure has not changed, | |
| // skip any insert/update line whose hash is already in the database. | |
| // If not, since we dropped the table, no record is skipped, insert them all again. | |
| if (unchangedStructure) { | |
| // skip any line that is structure related, | |
| // the structure is unchanged | |
| if (hash === "structure") { | |
| return Promise.resolve() | |
| } | |
| // skip any record whose hash is not already in the DB | |
| if (!hashesInDb.includes(hash)) { | |
| return Promise.resolve() | |
| } | |
| } |
|
Thanks @farnabaz, All good suggestions. I took them all. I also found out that the algorithm was backwards: |
| // in D1, there is a bug where semicolons and comments can't work together | ||
| // so we need to split the SQL and remove the comment | ||
| // @see https://github.com/cloudflare/workers-sdk/issues/3892 | ||
| const [statement, hash] = sql.split(' -- ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if query contains --. Imagine if the content contains sql snippet.
we can use hashListFromTheDump array to find line hash value and remove via substring
Something like
const hash = hashListFromTheDump[index]
const statement = sql.substring(0, sql.length - (' -- ' + hash).length)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch yeh !
farnabaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM π
Great work @elevatebart
π Linked issue
closes #3151
β Type of change
π Description
This PR moves the caching level to each page/record instead of each collection.
This allows for faster iteration.
__hash__field on each content table to store the hash of the current recordπ Checklist