-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make /tf_static use transient_local durability #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks OK to me. There is one nit and one documentation thing that we should do.
// broadcaster.sendTransform(msg); | ||
// ROS_INFO("Spinning until killed publishing %s to %s", msg.header.frame_id.c_str(), msg.child_frame_id.c_str()); | ||
// rclcpp::spin(node); | ||
broadcaster.sendTransform(msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think this is the right change to make. However, there is some possibility of downstream breakage if existing code is subscribing to the /tf_static
topic without using the TransformListener
class without transient_local
. I'll request that you make a release note for Eloquent that states that we are changing this behavior of the static_transform_publisher
program.
Should The current state of the PR has both the Broadcaster and the Listener use a depth of 100, which I think means the Listener would receive the last 100 messages from the broadcaster. In ROS 1 only the very latest message was latched, which would be like the Broadcaster having a depth of 1 in ROS 2. Is it more desirable for the Broadcaster to only offer the latest transform message to late joining subscribers, or for the Broadcaster to be able to send multiple transform messages? EDIT: And if it's desirable for the Static Broadcaster to offer multiple messages, why stop at 100? Why not use KEEP_ALL? |
Hm, this is a good point. Given that this is tf_static, I would think that things published here wouldn't change, and thus a depth of 1 would be appropriate here on both sides. But maybe I'm missing something that happens on tf_static. |
IIUC if the listener's depth is 1 and there are 2 static transform broadcasters, the listener might miss one of the messages if they come in at the same time. I divided the QoS classes into |
To test this, I setup two transient local publishers with a single transient local subscriber. I start up the two publishers, then the subscriber to see what happens. With both Fast-RTPS and CycloneDDS, the subscriber gets both messages, even with a depth of 1. With Opensplice, I see the behavior you describe where I only see one of the two. The question is whether this is a bug in Opensplice, or whether this is something that isn't guaranteed by RTPS. I think we'll need to figure out the answer to that question before making a decision here.
Yeah, I think this is a good idea regardless. |
I thought a delay between the subscription being created and the subscription taking messages from the middleware would make a difference, but even with that I see the subscription get all messages with a depth of 1 using Fast-RTPS. spub.pyimport sys
import rclpy
import rclpy.node
import rclpy.qos
import std_msgs.msg
rclpy.init()
node = rclpy.node.Node('spub' + sys.argv[1])
qos = rclpy.qos.QoSProfile(
depth=1,
durability=rclpy.qos.DurabilityPolicy.TRANSIENT_LOCAL,
history=rclpy.qos.HistoryPolicy.KEEP_LAST,
)
pub = node.create_publisher(std_msgs.msg.String, "/tf", qos)
pub.publish(std_msgs.msg.String(data='Hello' + sys.argv[1]))
rclpy.spin(node)
rclpy.shutdown() ssub.pyimport sys
import time
import rclpy
import rclpy.node
import rclpy.qos
import std_msgs.msg
rclpy.init()
node = rclpy.node.Node('ssub')
qos = rclpy.qos.QoSProfile(
depth=1,
durability=rclpy.qos.DurabilityPolicy.TRANSIENT_LOCAL,
history=rclpy.qos.HistoryPolicy.KEEP_LAST,
)
def rx_msg(msg):
node.get_logger().info(repr(msg))
sub = node.create_subscription(std_msgs.msg.String, "/tf", rx_msg, qos)
# Delay between subscriber matching and spinning
time.sleep(10)
rclpy.spin(node)
rclpy.shutdown()
Regardless, a depth of 1 on the publisher seems fine. It looks like there's another DDS QoS setting for @wjwwood any ideas why all messages are received with depth of 1 on the subscriber using Fast-RTPS and CycloneDDS? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm in general.
And if it's desirable for the Static Broadcaster to offer multiple messages, why stop at 100? Why not use KEEP_ALL?
KEEP_ALL
isn't very nice for real-time systems, so if we can safely get away with a known history depth as the default, I think that would be preferable.
The question is whether this is a bug in Opensplice, or whether this is something that isn't guaranteed by RTPS.
In my opinion, it's just a race condition. If you send two messages which are not keyed, then they're fighting for the same spot in a single "instance" with a history depth of 1. So unless you take the first out before the second arrives, I would expect them to replace one another. I understand that you're trying to eliminate this by sleeping in the subscription, but ultimately I don't know when latched messages are received or sent or anything.
@wjwwood any ideas why all messages are received with depth of 1 on the subscriber using Fast-RTPS and CycloneDDS?
No, but that might be a good question to ask of both the Fast-RTPS team (@richiprosima?) and perhaps @eboasson?
tf2_ros/include/tf2_ros/qos.hpp
Outdated
class RCLCPP_PUBLIC DynamicListenerQoS: public rclcpp::QoS | ||
{ | ||
public: | ||
DynamicListenerQoS() : rclcpp::QoS(100) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these, I'd recommend offering a single argument for the depth and default it to 100, that way users can customize it if desired when creating the transformer. You could argue they should instead use a rclcpp::QoS
directly instead, but having an intermediate class lets us customize other things in the future, e.g. adding transient local in the Static versions.
I'd also recommend having a base class and using that as the type of the argument to the static/dynamic broadcaster constructor. That way its harder for users to use the wrong kind of QoS. They can still of course change it how ever they want, but we can document which settings we feel are safe to change in that case, e.g. depth but not durability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added depth
with default values in e32710b.
I didn't implement the base class recommendation. I'm assuming users who want to change the QoS settings beyond history depth are already aware of QoS compatibility.
Signed-off-by: Shane Loretz <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
Static Broadcaster QoS depth is 1 Signed-off-by: Shane Loretz<[email protected]> Signed-off-by: Shane Loretz <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
43ddb87
to
e32710b
Compare
@sloretz @wjwwood this was too intriguing not to have a good look at what's going on here. A QoS setting of OpenSplice and Cyclone DDS both correctly implement this. With the 10s sleep, the historical data is always pushed into the reader history before the take occurs and it only prints whichever message it received last (which can be any of the three). Without the sleep, it depends on timing — when discovery takes place, when the data is actually published, when the data makes it to subscriber process, how quickly it takes that data when it arrives ... That run you mentioned with Cyclone DDS where the subscriber showed all three was simply a matter of lucky timing. It doesn't work out like that for me in any case. I couldn't resist digging into what is going with FastRTPS. It turns out that the first sample that receives is accepted as expected. When the second and third arrive, it finds the history depth has been reached, tries to drop an older sample from the same writer (which obviously fails, because this is from a different writer) and refuses to store the sample. That in turn triggers a sequence of requests for retransmission of the samples that haven't made it into the history. It is very obvious in wireshark. Once the reader takes the sample that is present, the first sample to arrive after that one is accepted, &c. That process generates a lot of traffic: a sloppily conducted experiment says for this particular test it results in 10x increase of packets over what the correct behaviour should be. Not too speak of the other effects on the system. [*] That's with the "by reception timestamp" destination order setting; there is also "by source timestamp" setting that takes the source time stamp into account to retain the one with the latest time stamp rather, but that doesn't affect this particular experiment. |
CI (Testing all packages above tf2_ros)
|
@eboasson it would be great if you could report this issue in the FastRTPS repo. |
@eboasson Thanks for the detailed digging and explanation of what is going on. With that information, then, the naive thing would be to make the static listener have KEEP_ALL. As @wjwwood suggested, though, that isn't nice for real-time effects, so I think the compromise of having 100 for the static listener makes sense. Thus, I'll approve this PR. |
CI looks good, merging |
Done |
@sloretz This patch might be responsible for the following nightly build failure: https://ci.ros2.org/view/nightly/job/nightly_win_deb/1372/console#console-section-16 |
@dirk-thomas Looks like it. I'm surprised this didn't show up in normal windows CI. Will make a PR adding visibility macros. |
* Note /tf_static uses transient_local in Eloquent Follow up from ros2/geometry2#160 * Warn about QoS incompatibility
* Note /tf_static uses transient_local in Eloquent Follow up from ros2/geometry2#160 * Warn about QoS incompatibility
This makes
/tf_static
use transient_local durability to get behavior like ROS 1 latching.The static broadcaster has a queue depth of 1 so only the latest message from a broadcaster is given to late joining listeners. This matches latching in ROS 1.