FLIP for Handing Messages from Networking Layer to Engines#343
FLIP for Handing Messages from Networking Layer to Engines#343AlexHentschel merged 6 commits intomasterfrom
Conversation
|
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/onflow/flow-docs/7YV9SjxcWYRV3FRoQ9QqBqNtsTDM |
There was a problem hiding this comment.
Great writeup 👏 . Generally very much in favour of this change. A few suggestions and comments noted below.
Networking Layer Behaviour
Something you point out here is that the application layer is better positioned to manage queued messages, so it should be able to manage message queueing. I think we should take the full step here and say the engine must manage message queueing and implementations of MessageProcessor must be non-blocking.
This makes the separation of concerns more clear: the networking layer passes messages to engines as quickly as it can, does not store/queue messages, and will drop a message if the MessageProcessor doesn't accept it; the engine is responsible for all queueing, management of the message queue, and message processing. I also think this is necessary to fully solve the problem you point out:
If one of the engines (e.g.
Engine C) gets overwhelmed with large number of messages, this engine will likely delay messages for all other engines as the workers will soon all be blocked by the overwhelmed engine.
Allowing non-blocking MessageProcessors isn't sufficient to fix this -- we need to enforce that they are non-blocking. Explicit use of channels in the API would make this fairly simple to implement:
// in networking layer
channel := messageProcessor.RecvChan()
select {
case channel <- msg:
case time.After(timeout):
log.Str("msg_type", msg.Type()).Msg("dropped message")
}Interface Changes
I think we should remove the network.Engine interface and the SubmitLocal, Process functions that are copied in every engine to satisfy it. These are rarely used and only exist because network.Engine was originally created with more methods than it needed. I think we should replace the instances of in-process engines communicating using these methods with context-specific methods (ie. rather than doing engine.ProcessLocal(block), do engine.HandleBlock(block).
Minor suggestion: make MessageProcessor an interface rather than a function type to discourage the use of anonymous functions.
|
Thanks for the write up. Then following on what Jordan mentioned, the network layer will try to push the incoming message to the I can remove the |
…Process` methods (from the interface) and replace them with context-specific methods
|
Thanks @jordanschalm and @vishalchangrani for the great feedback and detailed suggestions. Updated Flipincorporated your suggestions:
Regarding the suggestion to use a channel as part of the API:
I feel that channels are fairly restrictive in such high-level APIs. Some thoughts:
|
|
@AlexHentschel - agree with what you say - we would need boilerplate code if we use channels for sure. However, I would like to suggest that the Thanks for putting this proposal together 👍 |
💯 |
|
Great write up and improvement! 👍 I like the idea to simplify the network interface. Just have some comments:
|
Some clarification: the idea is to have some bound on the invocation of But I agree with Alex's point that this is restrictive. Given that it's a private API I'm happy to not enforce the non-blocking requirement the way I suggested. We could detect misbehaving engines with metrics on the timing of |
|
Spoke with @yhassanzadeh13 and @noisygerman and we think it will be best to retain the message queue at the networking layer even after the queuing logic has been added to the individual engines because:
However, once the engines have the queue implemented, the networking queue can:
|
|
totally on-board with retaining the queue at the networking level.
Not sure what the utility of more workers will be (?). When Engines are non-blocking (and queueing messages internally), a few workers should totally suffice to deliver the messages to the engines. I am worried that an excessive amount of workers might actually obfuscate problems on the engine level, e.g. if an engine has a bug and blocks a worker we would only see this very rarely when there are 1000 workers. |
FLIP 343 to generalize the API though which the Networking Layer hands messages to Engines