Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batches with invalid contents should be flagged and skipped during aggregation #1270

Open
awrichar opened this issue Apr 12, 2023 · 1 comment

Comments

@awrichar
Copy link
Contributor

Currently if FireFly receives a batch with any malformed contents (such as a mismatched key due to #1175), it writes the batch to the database but none of the messages or data.

return nil, false, nil // This is not retryable. skip this batch

However, the saved batch has no marking to indicate that the contents were invalid. Therefore every time the aggregator rewinds to this batch, it will stall with "message not yet available".

l.Debugf("Message '%s' in batch '%s' is not yet available", msgEntry.ID, manifest.ID)

When batch contents are skipped in this fashion, FireFly should actually mark the batch somehow, and when the aggregator attempts to process a message from that batch, it should simply mark the pin dispatched and move on. This is different from rejecting the message, because the message will never even be inserted to the database - but there's no point in blocking the pin forever, since the message will never be valid.

@awrichar awrichar changed the title Batches with invalid contents should be flagged an skipped during aggregation Batches with invalid contents should be flagged and skipped during aggregation Apr 12, 2023
@awrichar
Copy link
Contributor Author

One follow-on observation to note - we are performing certain batch validation checks only in the handler for shared storage downloads.

Ideally the exact same checks would be performed on a batch 1) before sending it, and 2) immediately after downloading it from either DX or shared storage. Anything we can do to unify those validation paths would help, because even if we fix the root issue noted above, there's a secondary problem in that the sender and the receivers are in different states (sender thinks the batch was ok).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant