Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Document for Zero-Copy via Loaned Messages #256

Merged
merged 16 commits into from
Feb 5, 2020
Merged

Conversation

mjcarroll
Copy link
Member

Design document for using loaned messages to provide support for zero-copy middleware implementations.

This is from notes that @Karsten1987 and @wjwwood composed, reformatted and expanded.

Signed-off-by: Michael Carroll <[email protected]>
Signed-off-by: Michael Carroll <[email protected]>
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added some typo fixes. I haven't reviewed the whole thing yet, particularly the proposed API.

articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
Co-Authored-By: Chris Lalancette <[email protected]>
@Karsten1987
Copy link
Contributor

fyi @michael-poehnl

articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the idea of loaned messages is the best solution for zero-copy intraprocess communication.
Currently, most of the middleware vendors we're supporting doesn't have a loaned message API.
So, it may be a good idea to also push development to add this in some of them.

Fixes #251.
That issue have some comments in line with this proposal.
Also, there is a similar previous discussion in #239.

rclcpp::Publisher::can_loan_messages()
```

### RCLCPP Subscription
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, that no modification in RCLCPP subscription would be needed.
Maybe we would need to add a new subscription signature more convenient for loaned messages, but not more than that. Probably, similar changes would be needed in rcl/rmw.

articles/zero_copy.md Show resolved Hide resolved
@Karsten1987
Copy link
Contributor

I think we talked about this, but I think we didn't come to an conclusion.

When we talk about taking sequences for messages which can be loaned, this implies that there has to be some sort of adjustment/enhancement on for a rmw_take_loaned_message_sequence function.
Right now I envision something like:

rmw_ret_t
rmw_take_loaned_message_sequence(rmw_message_sequence_t * message_sequence, size_t n);

Where n is the amount of messages being fetched and stored in the message sequence.

If I bring this all the way up to a possible implementation in the rclcpp::Executor::execute_subscription function, this would like as such:

void
Executor::execute_subscription(
  rclcpp::SubscriptionBase::SharedPtr subscription)
{
  rmw_message_info_t message_info;
  message_info.from_intra_process = false;

  // handle serialized messages
  if (subscription->is_serialized()) {
    auto serialized_msg = subscription->create_serialized_message();
    auto ret = rcl_take_serialized_message(
      subscription->get_subscription_handle().get(),
      serialized_msg.get(), &message_info, nullptr);
    if (RCL_RET_OK == ret) {
      auto void_serialized_msg = std::static_pointer_cast<void>(serialized_msg);
      subscription->handle_message(void_serialized_msg, message_info);
    } else if (RCL_RET_SUBSCRIPTION_TAKE_FAILED != ret) {
      RCUTILS_LOG_ERROR_NAMED(
        "rclcpp",
        "take_serialized failed for subscription on topic '%s': %s",
        subscription->get_topic_name(), rcl_get_error_string().str);
      rcl_reset_error();
    }
    subscription->return_serialized_message(serialized_msg);
    return;
  }

  // if a middleware can provide loaned messages, we always take them as such
  if (subscription->can_loan_messages()) {
    size_t n = 1;  // how many messages are we fetch at a time
    // Could be part of the subscription API in case of polling
    // Maybe take a parameter n for reserving space in the loaned message sequence
    auto LoanedMessageSequence lms = subscription->create_loaned_sequence(n);
    auto ret = rcl_take_loaned_message_sequence(lms.get_sequence_handle(), n);
    if (RCL_RET_OK == ret) {
      subscription->handle_message(lms.msg_at(0), lms.msg_info_at(0));
    } else if (RCL_RET_TAKE_FAILED != ret) {
      RCUTILS_LOG_ERROR_NAMED(
        "rclcpp",
        "could not take loaned message sequence on topic '%s': %s",
        subscription->get_topic_name(), rcl_get_error_string().str);
      rcl_reset_error();
    }
    subscription->return_loaned_sequence(lms);
    return;
  }

  // if the middleware can't loan message, we create our own instances and take them
  std::shared_ptr<void> message = subscription->create_message();
  auto ret = rcl_take(
    subscription->get_subscription_handle().get(),
    message.get(), &message_info, nullptr);
  if (RCL_RET_OK == ret) {
    subscription->handle_message(message, message_info);
  } else if (RCL_RET_SUBSCRIPTION_TAKE_FAILED != ret) {
    RCUTILS_LOG_ERROR_NAMED(
      "rclcpp",
      "could not deserialize serialized message on topic '%s': %s",
      subscription->get_topic_name(), rcl_get_error_string().str);
    rcl_reset_error();
  }
  subscription->return_message(message);
}

Looking at the scribble above, you can see an inconsistency between serialized/non-loaned messages and loaned messages. The first two types are always fetch one-by-one, whereas loaned messages are always fetched in a sequence - main reason for this being a strong requirements for RTI Connext Micro in order to enable zero-copy.

The question which arises now is whether any rmw_take_* should be modified in a way that it always takes n instances or if we restrict that to loaned messages exclusively.
If we did change rmw_take to always take multiple, then we have to push this consequently to rmw_serialized_messages as well as ROS messages allocated by the rclcpp::MessageMemoryStrategy and implement C-based container structures for it.

My thinking here is that in a context of an executor, we probably would get around implementing the message sequences for serialized and non-loaned messages as we handle one callback at a time.
But thinking this further and having the ability to use the API outside of the scope of an executor (by polling the waitset or having different implementations of an executor), I personally believe it would be the right thing to do and extend the sequencing also to these types. As the rmw_take_multiple functions can be implemented next to the existing rmw_take, we can still use the latter inside the executor.

Opinions?

In the second case, the memory that is used for the loaned message should be optionally provided by the middleware.
For example, the middleware may use this opportunity to use a preallocated pool of message instances, or it may return instances of messages allocated in shared memory.
However, if the middleware does not have special memory handling or preallocations, it may refuse to loan memory to the client library, and in that case, a user provided allocator should be used.
This allows the user to have control over the allocation of the message when the middleware might otherwise use the standard allocator (new/malloc).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may refuse to loan memory to the client library, and in that case, a user provided allocator should be used.

this makes a lot of sense.

This allows the user to have control over the allocation of the message when the middleware might otherwise use the standard allocator (new/malloc)

This seems to contradict that prior statement, though. Maybe I'm not reading this quite right?

I think this should read something more along the lines of "If the middleware does not have specific protocols for handling preallocations of messages, it should use the rcutils allocator."

I could be way off base in my interpretation, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it could be more clear.

"If the middleware does not have specific protocols for handling preallocations of messages, it should use the rcutils allocator."

I think this is almost right, but I don't think the user is limited to the rcutils allocator. They could provide their own.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user is limited to the rcutils allocator. They could provide their own.

Yeah, I like that better. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this need additional clarification? I believe that what is there summarizes the correct behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading what's in the document, it makes sense to me. I don't have a solid suggestion for changing it.

articles/zero_copy.md Outdated Show resolved Hide resolved

```
void *
rcl_allocate_loaned_message(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the message already been loaned? If it is planned to be loaned, maybe this should be called rcl_allocate_loaner_message?

@Karsten1987
Copy link
Contributor

connected to ros2/ros2#785

articles/zero_copy.md Outdated Show resolved Hide resolved
@dirk-thomas dirk-thomas added the more-information-needed Further information is required label Sep 30, 2019
Signed-off-by: Michael Carroll <[email protected]>
@mjcarroll mjcarroll marked this pull request as ready for review October 15, 2019 19:49
@wuffle-ros wuffle-ros bot added the in review Waiting for review (Kanban column) label Oct 15, 2019
@dirk-thomas dirk-thomas removed the more-information-needed Further information is required label Oct 15, 2019
articles/zero_copy.md Outdated Show resolved Hide resolved
Co-Authored-By: Alberto Soragna <[email protected]>
articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
In the second case, the memory that is used for the loaned message should be optionally provided by the middleware.
For example, the middleware may use this opportunity to use a preallocated pool of message instances, or it may return instances of messages allocated in shared memory.
However, if the middleware does not have special memory handling or preallocations, it may refuse to loan memory to the client library, and in that case, a user provided allocator should be used.
This allows the user to have control over the allocation of the message when the middleware might otherwise use the standard allocator (new/malloc).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it could be more clear.

"If the middleware does not have specific protocols for handling preallocations of messages, it should use the rcutils allocator."

I think this is almost right, but I don't think the user is limited to the rcutils allocator. They could provide their own.

articles/zero_copy.md Outdated Show resolved Hide resolved
mjcarroll and others added 4 commits January 30, 2020 12:16
Signed-off-by: Michael Carroll <[email protected]>
Signed-off-by: Michael Carroll <[email protected]>
Signed-off-by: Michael Carroll <[email protected]>
Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor nitpicks. Besides that there are still some unresolved comments that would be good to address before merging this.

articles/zero_copy.md Outdated Show resolved Hide resolved
articles/zero_copy.md Outdated Show resolved Hide resolved
* The user must be able to get a loaned message from the middleware when calling take.
* The user must be able to get a sequence of loaned messages from the middleware when calling take.
* The loaned message or sequence must be returned by the user.
* In general, users should return loans as soon as feasibly possible, as the underlying mechanism has finite resources to loan messages from. Holding loans too long may cause messages to dropped or publications to stall.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One sentence per line.

@jacobperron
Copy link
Member

The question which arises now is whether any rmw_take_* should be modified in a way that it always takes n instances or if we restrict that to loaned messages exclusively.

It makes sense to me to also extend the "take N" functionality to non-loaned messages, but we probably don't need to include this detail/discussion in this document since it is not exactly related to loaned messages. Probably worth discussing as part of the implementation ticket or a new issue.

@mjcarroll
Copy link
Member Author

Probably worth discussing as part of the implementation ticket or a new issue.

Agreed.

mjcarroll and others added 2 commits January 30, 2020 16:50
Signed-off-by: Michael Carroll <[email protected]>
Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

articles/zero_copy.md Show resolved Hide resolved
@mjcarroll mjcarroll merged commit 0df1296 into gh-pages Feb 5, 2020
@delete-merged-branch delete-merged-branch bot deleted the zero_copy branch February 5, 2020 16:06
@dejanpan dejanpan mentioned this pull request Feb 20, 2020
3 tasks
Karsten1987 added a commit that referenced this pull request Apr 1, 2020
We've noticed that Eclipse Iceoryx is missing. Most likely overseen while reviewing the original PR for this doc: #256 (comment)
Karsten1987 added a commit that referenced this pull request Apr 6, 2020
We've noticed that Eclipse Iceoryx is missing. Most likely overseen while reviewing the original PR for this doc: #256 (comment)
mlanting pushed a commit to mlanting/design that referenced this pull request Aug 17, 2020
* Loaned messages introduction and motivation

Signed-off-by: Michael Carroll <[email protected]>

Co-Authored-By: Jacob Perron <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]>
Co-authored-by: Alberto Soragna <[email protected]>
Co-authored-by: Jacob Perron <[email protected]>
mlanting pushed a commit to mlanting/design that referenced this pull request Aug 17, 2020
We've noticed that Eclipse Iceoryx is missing. Most likely overseen while reviewing the original PR for this doc: ros2#256 (comment)
mlanting pushed a commit to mlanting/design that referenced this pull request Aug 17, 2020
* Loaned messages introduction and motivation

Signed-off-by: Michael Carroll <[email protected]>

Co-Authored-By: Jacob Perron <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]>
Co-authored-by: Alberto Soragna <[email protected]>
Co-authored-by: Jacob Perron <[email protected]>
mlanting pushed a commit to mlanting/design that referenced this pull request Aug 17, 2020
We've noticed that Eclipse Iceoryx is missing. Most likely overseen while reviewing the original PR for this doc: ros2#256 (comment)
@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-galactic-default-middleware-announced/18064/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Waiting for review (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants