Make /tf_static use transient_local durability #160

sloretz · 2019-08-29T22:12:12Z

This makes /tf_static use transient_local durability to get behavior like ROS 1 latching.

The static broadcaster has a queue depth of 1 so only the latest message from a broadcaster is given to late joining listeners. This matches latching in ROS 1.

sloretz · 2019-08-29T22:15:55Z

CI (testing all packages above tf2_ros)

Linux
Linux-aarch64
macOS
Windows

clalancette

Generally looks OK to me. There is one nit and one documentation thing that we should do.

tf2_ros/include/tf2_ros/qos.hpp

clalancette · 2019-08-30T13:14:53Z

tf2_ros/src/static_transform_broadcaster_program.cpp

-  // broadcaster.sendTransform(msg);
-  // ROS_INFO("Spinning until killed publishing %s to %s", msg.header.frame_id.c_str(), msg.child_frame_id.c_str());
-  // rclcpp::spin(node);
+  broadcaster.sendTransform(msg);


In general, I think this is the right change to make. However, there is some possibility of downstream breakage if existing code is subscribing to the /tf_static topic without using the TransformListener class without transient_local. I'll request that you make a release note for Eloquent that states that we are changing this behavior of the static_transform_publisher program.

sloretz · 2019-09-04T01:36:18Z

Should StaticQoS be divided into different QoS for the Broadcaster and Listener?

The current state of the PR has both the Broadcaster and the Listener use a depth of 100, which I think means the Listener would receive the last 100 messages from the broadcaster. In ROS 1 only the very latest message was latched, which would be like the Broadcaster having a depth of 1 in ROS 2.

Is it more desirable for the Broadcaster to only offer the latest transform message to late joining subscribers, or for the Broadcaster to be able to send multiple transform messages?

EDIT:

And if it's desirable for the Static Broadcaster to offer multiple messages, why stop at 100? Why not use KEEP_ALL?

clalancette · 2019-09-04T21:12:42Z

Should StaticQoS be divided into different QoS for the Broadcaster and Listener?

Hm, this is a good point. Given that this is tf_static, I would think that things published here wouldn't change, and thus a depth of 1 would be appropriate here on both sides. But maybe I'm missing something that happens on tf_static.

sloretz · 2019-09-05T16:22:37Z

Given that this is tf_static, I would think that things published here wouldn't change, and thus a depth of 1 would be appropriate here on both sides.

IIUC if the listener's depth is 1 and there are 2 static transform broadcasters, the listener might miss one of the messages if they come in at the same time.

I divided the QoS classes into [Dynamic|Static]BroadcasterQoS and [Dynamic|Static]ListenerQoS. The static broadcaster's depth is 1 to be closer to the behavior in ROS 1.

clalancette · 2019-09-05T18:53:03Z

IIUC if the listener's depth is 1 and there are 2 static transform broadcasters, the listener might miss one of the messages if they come in at the same time.

To test this, I setup two transient local publishers with a single transient local subscriber. I start up the two publishers, then the subscriber to see what happens. With both Fast-RTPS and CycloneDDS, the subscriber gets both messages, even with a depth of 1. With Opensplice, I see the behavior you describe where I only see one of the two. The question is whether this is a bug in Opensplice, or whether this is something that isn't guaranteed by RTPS. I think we'll need to figure out the answer to that question before making a decision here.

I divided the QoS classes into [Dynamic|Static]BroadcasterQoS and [Dynamic|Static]ListenerQoS. The static broadcaster's depth is 1 to be closer to the behavior in ROS 1.

Yeah, I think this is a good idea regardless.

sloretz · 2019-09-05T20:29:16Z

To test this, I setup two transient local publishers with a single transient local subscriber. I start up the two publishers, then the subscriber to see what happens. With both Fast-RTPS and CycloneDDS, the subscriber gets both messages, even with a depth of 1. With Opensplice, I see the behavior you describe where I only see one of the two.

I thought a delay between the subscription being created and the subscription taking messages from the middleware would make a difference, but even with that I see the subscription get all messages with a depth of 1 using Fast-RTPS.

spub.py

import sys

import rclpy
import rclpy.node
import rclpy.qos
import std_msgs.msg

rclpy.init()

node = rclpy.node.Node('spub' + sys.argv[1])

qos = rclpy.qos.QoSProfile(
    depth=1,
    durability=rclpy.qos.DurabilityPolicy.TRANSIENT_LOCAL,
    history=rclpy.qos.HistoryPolicy.KEEP_LAST,
)

pub = node.create_publisher(std_msgs.msg.String, "/tf", qos)

pub.publish(std_msgs.msg.String(data='Hello' + sys.argv[1]))

rclpy.spin(node)
rclpy.shutdown()

ssub.py

import sys
import time

import rclpy
import rclpy.node
import rclpy.qos
import std_msgs.msg

rclpy.init()

node = rclpy.node.Node('ssub')

qos = rclpy.qos.QoSProfile(
    depth=1,
    durability=rclpy.qos.DurabilityPolicy.TRANSIENT_LOCAL,
    history=rclpy.qos.HistoryPolicy.KEEP_LAST,
)

def rx_msg(msg):
    node.get_logger().info(repr(msg))

sub = node.create_subscription(std_msgs.msg.String, "/tf", rx_msg, qos)

# Delay between subscriber matching and spinning
time.sleep(10)

rclpy.spin(node)
rclpy.shutdown()

$ python3 spub.py Alice & python3 spub.py Bob & python3 spub.py Trudy & python3 ssub.py 
[1] 29016
[2] 29017
[3] 29018
[INFO] [ssub]: std_msgs.msg.String(data='HelloTrudy')
[INFO] [ssub]: std_msgs.msg.String(data='HelloBob')
[INFO] [ssub]: std_msgs.msg.String(data='HelloAlice')

The question is whether this is a bug in Opensplice, or whether this is something that isn't guaranteed by RTPS. I think we'll need to figure out the answer to that question before making a decision here.

Regardless, a depth of 1 on the publisher seems fine. It looks like there's another DDS QoS setting for RESOURCE_LIMITS with a field max_samples. Maybe the DataReader is "taking" samples from the DataWriter, but then storing up to max_samples locally, and if so maybe rmw_opensplice is setting max_samples to depth while the other two rmw implementations are not?

@wjwwood any ideas why all messages are received with depth of 1 on the subscriber using Fast-RTPS and CycloneDDS?

wjwwood

lgtm in general.

And if it's desirable for the Static Broadcaster to offer multiple messages, why stop at 100? Why not use KEEP_ALL?

KEEP_ALL isn't very nice for real-time systems, so if we can safely get away with a known history depth as the default, I think that would be preferable.

The question is whether this is a bug in Opensplice, or whether this is something that isn't guaranteed by RTPS.

In my opinion, it's just a race condition. If you send two messages which are not keyed, then they're fighting for the same spot in a single "instance" with a history depth of 1. So unless you take the first out before the second arrives, I would expect them to replace one another. I understand that you're trying to eliminate this by sleeping in the subscription, but ultimately I don't know when latched messages are received or sent or anything.

@wjwwood any ideas why all messages are received with depth of 1 on the subscriber using Fast-RTPS and CycloneDDS?

No, but that might be a good question to ask of both the Fast-RTPS team (@richiprosima?) and perhaps @eboasson?

wjwwood · 2019-09-06T19:48:26Z

tf2_ros/include/tf2_ros/qos.hpp

+class RCLCPP_PUBLIC DynamicListenerQoS: public rclcpp::QoS
+{
+public:
+  DynamicListenerQoS() : rclcpp::QoS(100) {}


For these, I'd recommend offering a single argument for the depth and default it to 100, that way users can customize it if desired when creating the transformer. You could argue they should instead use a rclcpp::QoS directly instead, but having an intermediate class lets us customize other things in the future, e.g. adding transient local in the Static versions.

I'd also recommend having a base class and using that as the type of the argument to the static/dynamic broadcaster constructor. That way its harder for users to use the wrong kind of QoS. They can still of course change it how ever they want, but we can document which settings we feel are safe to change in that case, e.g. depth but not durability.

I added depth with default values in e32710b.

I didn't implement the base class recommendation. I'm assuming users who want to change the QoS settings beyond history depth are already aware of QoS compatibility.

Signed-off-by: Shane Loretz <[email protected]>

Static Broadcaster QoS depth is 1 Signed-off-by: Shane Loretz<[email protected]> Signed-off-by: Shane Loretz <[email protected]>

Signed-off-by: Shane Loretz <[email protected]>

eboasson · 2019-09-07T15:08:37Z

@sloretz @wjwwood this was too intriguing not to have a good look at what's going on here. A QoS setting of KEEP_LAST with depth 1 indeed means there is a single slot for each key value — and so only a single slot if there are no key fields. Later arriving[*] data is supposed to simply push out whatever happens to be present in the history.

OpenSplice and Cyclone DDS both correctly implement this. With the 10s sleep, the historical data is always pushed into the reader history before the take occurs and it only prints whichever message it received last (which can be any of the three). Without the sleep, it depends on timing — when discovery takes place, when the data is actually published, when the data makes it to subscriber process, how quickly it takes that data when it arrives ... That run you mentioned with Cyclone DDS where the subscriber showed all three was simply a matter of lucky timing. It doesn't work out like that for me in any case.

I couldn't resist digging into what is going with FastRTPS. It turns out that the first sample that receives is accepted as expected. When the second and third arrive, it finds the history depth has been reached, tries to drop an older sample from the same writer (which obviously fails, because this is from a different writer) and refuses to store the sample.

That in turn triggers a sequence of requests for retransmission of the samples that haven't made it into the history. It is very obvious in wireshark. Once the reader takes the sample that is present, the first sample to arrive after that one is accepted, &c.

That process generates a lot of traffic: a sloppily conducted experiment says for this particular test it results in 10x increase of packets over what the correct behaviour should be. Not too speak of the other effects on the system.

[*] That's with the "by reception timestamp" destination order setting; there is also "by source timestamp" setting that takes the source time stamp into account to retain the one with the latest time stamp rather, but that doesn't affect this particular experiment.

sloretz · 2019-09-10T14:45:55Z

CI (Testing all packages above tf2_ros)

Linux
Linux-aarch64
macOS
Windows Failed to delete workspace, Job failed to delete workspace build_farmer#161

dirk-thomas · 2019-09-10T16:31:08Z

@eboasson it would be great if you could report this issue in the FastRTPS repo.

clalancette · 2019-09-10T19:20:18Z

@eboasson Thanks for the detailed digging and explanation of what is going on. With that information, then, the naive thing would be to make the static listener have KEEP_ALL. As @wjwwood suggested, though, that isn't nice for real-time effects, so I think the compromise of having 100 for the static listener makes sense. Thus, I'll approve this PR.

sloretz · 2019-09-10T20:20:02Z

CI looks good, merging

eboasson · 2019-09-11T08:54:24Z

@eboasson it would be great if you could report this issue in the FastRTPS repo.

Done

dirk-thomas · 2019-09-11T18:55:29Z

@sloretz This patch might be responsible for the following nightly build failure: https://ci.ros2.org/view/nightly/job/nightly_win_deb/1372/console#console-section-16

sloretz · 2019-09-11T19:04:50Z

@sloretz This patch might be responsible for the following nightly build failure

08:08:45 transform_listener.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl tf2_ros::DynamicListenerQoS::DynamicListenerQoS(unsigned __int64)" (__imp_??0DynamicListenerQoS@tf2_ros@@QEAA@_K@Z) referenced in function "public: __cdecl tf2_ros::TransformListener::TransformListener(class tf2::BufferCore &,bool)" (??0TransformListener@tf2_ros@@QEAA@AEAVBufferCore@tf2@@_N@Z) [C:\J\workspace\nightly_win_deb\ws\build\tf2_ros\tf2_ros.vcxproj]
08:08:45 transform_listener.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl tf2_ros::StaticListenerQoS::StaticListenerQoS(unsigned __int64)" (__imp_??0StaticListenerQoS@tf2_ros@@QEAA@_K@Z) referenced in function "public: __cdecl tf2_ros::TransformListener::TransformListener(class tf2::BufferCore &,bool)" (??0TransformListener@tf2_ros@@QEAA@AEAVBufferCore@tf2@@_N@Z) [C:\J\workspace\nightly_win_deb\ws\build\tf2_ros\tf2_ros.vcxproj]
08:08:45 C:\J\workspace\nightly_win_deb\ws\build\tf2_ros\Debug\tf2_ros.dll : fatal error LNK1120: 2 unresolved externals [C:\J\workspace\nightly_win_deb\ws\build\tf2_ros\tf2_ros.vcxproj]

@dirk-thomas Looks like it. I'm surprised this didn't show up in normal windows CI. Will make a PR adding visibility macros.

Follow up from ros2/geometry2#160

* Note /tf_static uses transient_local in Eloquent Follow up from ros2/geometry2#160 * Warn about QoS incompatibility

sloretz added the enhancement New feature or request label Aug 29, 2019

sloretz self-assigned this Aug 29, 2019

sloretz added the in review Waiting for review (Kanban column) label Aug 29, 2019

sloretz mentioned this pull request Aug 31, 2019

Add Python examples for tf2_ros #161

Merged

clalancette reviewed Sep 3, 2019

View reviewed changes

wjwwood approved these changes Sep 6, 2019

View reviewed changes

sloretz added 4 commits September 6, 2019 13:58

Use transient_local for /tf_static

43e326d

Signed-off-by: Shane Loretz <[email protected]>

Remove unnecessary explicit

115458c

Signed-off-by: Shane Loretz <[email protected]>

Split listener and broadcaster QoS

89294ce

Static Broadcaster QoS depth is 1 Signed-off-by: Shane Loretz<[email protected]> Signed-off-by: Shane Loretz <[email protected]>

Add depth option to QoS

e32710b

Signed-off-by: Shane Loretz <[email protected]>

sloretz force-pushed the static_transient_local branch from 43ddb87 to e32710b Compare September 6, 2019 20:58

clalancette approved these changes Sep 10, 2019

View reviewed changes

sloretz merged commit 094d363 into ros2 Sep 10, 2019

delete-merged-branch bot deleted the static_transient_local branch September 10, 2019 20:20

eboasson mentioned this pull request Sep 11, 2019

KEEP_LAST history not respected [6329] eProsima/Fast-DDS#708

Closed

sloretz mentioned this pull request Sep 11, 2019

Add Visibility macros to QoS classes #166

Merged

clalancette mentioned this pull request Sep 16, 2019

the rampant LOOPING msg #12

Closed

wjwwood mentioned this pull request Sep 19, 2019

New intra-process communication design ros2/design#239

Merged

This was referenced Oct 21, 2019

tf2_ros::StaticTransformBroadcaster does not keep last msg for later subscription #183

Closed

Unable to import tf2_ros #188

Closed

hidmic mentioned this pull request Nov 15, 2019

Fix tf2_monitor subscriptions QoS settings. #196

Merged

sloretz added a commit to ros2/ros2_documentation that referenced this pull request Nov 15, 2019

Note /tf_static uses transient_local in Eloquent

401bd55

Follow up from ros2/geometry2#160

sloretz mentioned this pull request Nov 15, 2019

Note /tf_static uses transient_local in Eloquent ros2/ros2_documentation#423

Merged

sloretz added a commit to ros2/ros2_documentation that referenced this pull request Nov 15, 2019

Note /tf_static uses transient_local in Eloquent (#423)

6c83694

* Note /tf_static uses transient_local in Eloquent Follow up from ros2/geometry2#160 * Warn about QoS incompatibility

ferranm99 added a commit to ferranm99/ferran-ros that referenced this pull request May 20, 2022

Note /tf_static uses transient_local in Eloquent (#423)

90989a0

* Note /tf_static uses transient_local in Eloquent Follow up from ros2/geometry2#160 * Warn about QoS incompatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make /tf_static use transient_local durability #160

Make /tf_static use transient_local durability #160

sloretz commented Aug 29, 2019 •

edited

Loading

sloretz commented Aug 29, 2019

clalancette left a comment

clalancette Aug 30, 2019

sloretz commented Sep 4, 2019 •

edited

Loading

clalancette commented Sep 4, 2019

sloretz commented Sep 5, 2019

clalancette commented Sep 5, 2019

sloretz commented Sep 5, 2019 •

edited

Loading

wjwwood left a comment

wjwwood Sep 6, 2019

sloretz Sep 6, 2019

eboasson commented Sep 7, 2019

sloretz commented Sep 10, 2019 •

edited

Loading

dirk-thomas commented Sep 10, 2019

clalancette commented Sep 10, 2019

sloretz commented Sep 10, 2019

eboasson commented Sep 11, 2019

dirk-thomas commented Sep 11, 2019

sloretz commented Sep 11, 2019

Make /tf_static use transient_local durability #160

Make /tf_static use transient_local durability #160

Conversation

sloretz commented Aug 29, 2019 • edited Loading

sloretz commented Aug 29, 2019

clalancette left a comment

Choose a reason for hiding this comment

clalancette Aug 30, 2019

Choose a reason for hiding this comment

sloretz commented Sep 4, 2019 • edited Loading

clalancette commented Sep 4, 2019

sloretz commented Sep 5, 2019

clalancette commented Sep 5, 2019

sloretz commented Sep 5, 2019 • edited Loading

wjwwood left a comment

Choose a reason for hiding this comment

wjwwood Sep 6, 2019

Choose a reason for hiding this comment

sloretz Sep 6, 2019

Choose a reason for hiding this comment

eboasson commented Sep 7, 2019

sloretz commented Sep 10, 2019 • edited Loading

dirk-thomas commented Sep 10, 2019

clalancette commented Sep 10, 2019

sloretz commented Sep 10, 2019

eboasson commented Sep 11, 2019

dirk-thomas commented Sep 11, 2019

sloretz commented Sep 11, 2019

sloretz commented Aug 29, 2019 •

edited

Loading

sloretz commented Sep 4, 2019 •

edited

Loading

sloretz commented Sep 5, 2019 •

edited

Loading

sloretz commented Sep 10, 2019 •

edited

Loading