Skip to content

feat: make storage purging default, add --resurrect#729

Merged
vladfrangu merged 4 commits intomasterfrom
cleanup-purge-behavior
Jun 12, 2025
Merged

feat: make storage purging default, add --resurrect#729
vladfrangu merged 4 commits intomasterfrom
cleanup-purge-behavior

Conversation

@vladfrangu
Copy link
Member

@vladfrangu vladfrangu commented Jan 15, 2025

Closes #590

BREAKING CHANGE:

Purging in projects using Crawlee for SDK v3+ now purge by default, so using --purge flag is no longer necessary. Use --no-purge to keep the storage folder intact.
The purge-queue, purge-dataset and purge-key-value-store flags have been removed, and the logic of all three was combined into the purge flag.

@vladfrangu vladfrangu requested a review from B4nan as a code owner January 15, 2025 19:12
@github-actions github-actions bot added this to the 106th sprint - Tooling team milestone Jan 15, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 15, 2025
Comment on lines -199 to -249
// TODO: deprecate these flags
if (this.flags.purgeQueue && !this.flags.purge) {
await purgeDefaultQueue();
info({ message: 'Default local request queue was purged.' });
}

if (this.flags.purgeDataset && !this.flags.purge) {
await purgeDefaultDataset();
info({ message: 'Default local dataset was purged.' });
}

if (this.flags.purgeKeyValueStore && !this.flags.purge) {
await purgeDefaultKeyValueStore();
info({ message: 'Default local key-value store was purged.' });
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's hope this won't be missed by anyone 🙃 worst case we can add it back

btw I hope oclif wont silently ignore undeclared flags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will throw for unknown flags... Do we want to define them as hidden flags so it doesn't throw for users?

Code - 2025-01-16 at 11 33 22

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good, if they are not doing anything we need to tell potential users

@netmilk
Copy link
Contributor

netmilk commented Jan 16, 2025

Hi guys, I've just realized that if we want the CLI to behave consistently with how the Apify platform works, we shouldn't purge entirely by default, but instead the CLI should just keep adding local runs for later introspection and comparison. I know you're making it on par with Crawlee's behavior, but I think this will introduce further divergence.

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

I am not sure about that, regular actor developer runs, runs, and runs. They would end up with a lot of junk locally, I am not sure people would like such a default.

Also, I am not sure if this is a "divergence", we are not doing that now either, the CLI never considered multiple storage folders, neither the actual consumers of the storage folder (crawlee/sdk). We could surely just rename the existing storage folder instead of wiping it, but should that really be a default?

@B4nan B4nan changed the title refactor: cleanup purging behavior for CLI fix: cleanup purging behavior for CLI Jan 16, 2025
@netmilk
Copy link
Contributor

netmilk commented Jan 16, 2025

@B4nan Understood. Divergence from the platform behavior is what I'm concerned about. You run it there and the results always persist by default under a new run. But unifying the behavior of our local tooling is definitely a way to go.

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

@vladfrangu lets describe the BCs in the PR description

@vladfrangu
Copy link
Member Author

I also found out you can group flags in the help menu. Hows this look? @netmilk @B4nan

Code - 2025-01-16 at 11 51 25

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

nit: should be Node.js, not node.js

@B4nan
Copy link
Member

B4nan commented Jan 16, 2025

also i double checked with others, python works the same as JS, auto purge for crawlee or SDK (even without crawlee) v2+

@vladfrangu vladfrangu requested a review from B4nan February 4, 2025 00:01
@vladfrangu vladfrangu changed the title fix: cleanup purging behavior for CLI feat: make storage purging default, add --resurrect Jun 12, 2025
@vladfrangu vladfrangu force-pushed the cleanup-purge-behavior branch from d8bd43b to f0b5fe8 Compare June 12, 2025 12:52
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jun 12, 2025
@vladfrangu vladfrangu merged commit 8dff93a into master Jun 12, 2025
22 checks passed
@vladfrangu vladfrangu deleted the cleanup-purge-behavior branch June 12, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clean up the purging logic for crawlee projects

3 participants