-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing of depth image for model-based inference #44
Comments
#25 (comment) |
Hi @Trevor-wen, maybe I can help, as I also had to solve some problems when first using FoundationPose on own objects. 😅 1. Make sure that you CAD model is scaled in meters as mesh unitsUnlike other methods that use mm as the mesh unit, FoundationPose uses meters. Example if he mesh units are wrong (in mm): 44_wrong_scale.mp42. RGB and depth images must be alignedThe captured RGB and depth frames must be aligned. How this is done depends on the sensor used. The following Python script is adapted from librealsense and can be used to record aligned and unaligned frames with a RealSense (I used it for the examples) : record_realsense_foundationpose.py## License: Apache 2.0. See LICENSE file in root directory.
## Copyright(c) 2017 Intel Corporation. All Rights Reserved.
#####################################################
## Align Depth to Color ##
#####################################################
import pyrealsense2 as rs
import numpy as np
import cv2
import json
import time
import os
# Create a pipeline
pipeline = rs.pipeline()
# Create a config and configure the pipeline to stream
# different resolutions of color and depth streams
config = rs.config()
# Get device product line for setting a supporting resolution
pipeline_wrapper = rs.pipeline_wrapper(pipeline)
pipeline_profile = config.resolve(pipeline_wrapper)
device = pipeline_profile.get_device()
device_product_line = str(device.get_info(rs.camera_info.product_line))
found_rgb = False
for s in device.sensors:
if s.get_info(rs.camera_info.name) == "RGB Camera":
found_rgb = True
break
if not found_rgb:
print("The demo requires Depth camera with Color sensor")
exit(0)
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
if device_product_line == "L500":
config.enable_stream(rs.stream.color, 960, 540, rs.format.bgr8, 30)
else:
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
# Start streaming
profile = pipeline.start(config)
# Getting the depth sensor's depth scale (see rs-align example for explanation)
depth_sensor = profile.get_device().first_depth_sensor()
depth_scale = depth_sensor.get_depth_scale()
print("Depth Scale is: ", depth_scale)
# We will be removing the background of objects more than
# clipping_distance_in_meters meters away
clipping_distance_in_meters = 1 # 1 meter
clipping_distance = clipping_distance_in_meters / depth_scale
# Create an align object
# rs.align allows us to perform alignment of depth frames to others frames
# The "align_to" is the stream type to which we plan to align depth frames.
align_to = rs.stream.color
align = rs.align(align_to)
# Get the absolute path to the subfolder
script_dir = os.path.dirname(os.path.abspath(__file__))
subfolder_depth = os.path.join(script_dir, "out/depth")
subfolder_rgb = os.path.join(script_dir, "out/rgb")
subfolder_depth_unaligned = os.path.join(script_dir, "out/depth_unaligned")
subfolder_rgb_unaligned = os.path.join(script_dir, "out/rgb_unaligned")
# Check if the subfolder exists, and create it if it does not
if not os.path.exists(subfolder_depth):
os.makedirs(subfolder_depth)
if not os.path.exists(subfolder_rgb):
os.makedirs(subfolder_rgb)
if not os.path.exists(subfolder_depth_unaligned):
os.makedirs(subfolder_depth_unaligned)
if not os.path.exists(subfolder_rgb_unaligned):
os.makedirs(subfolder_rgb_unaligned)
# Create all
RecordStream = False
# Streaming loop
try:
while True:
# Get frameset of color and depth
frames = pipeline.wait_for_frames()
# frames.get_depth_frame() is a 640x360 depth image
# Align the depth frame to color frame
aligned_frames = align.process(frames)
# Get aligned frames
aligned_depth_frame = (
aligned_frames.get_depth_frame()
) # aligned_depth_frame is a 640x480 depth image
color_frame = aligned_frames.get_color_frame()
unaligned_depth_frame = frames.get_depth_frame()
unaligned_color_frame = frames.get_color_frame()
# Get instrinsics from aligned_depth_frame
intrinsics = aligned_depth_frame.profile.as_video_stream_profile().intrinsics
# Validate that both frames are valid
if not aligned_depth_frame or not color_frame:
continue
depth_image = np.asanyarray(aligned_depth_frame.get_data())
color_image = np.asanyarray(color_frame.get_data())
# Remove background - Set pixels further than clipping_distance to grey
grey_color = 153
depth_image_3d = np.dstack(
(depth_image, depth_image, depth_image)
) # depth image is 1 channel, color is 3 channels
bg_removed = np.where(
(depth_image_3d > clipping_distance) | (depth_image_3d <= 0),
grey_color,
color_image,
)
unaligned_depth_image = np.asanyarray(unaligned_depth_frame.get_data())
unaligned_rgb_image = np.asanyarray(unaligned_color_frame.get_data())
# Render images:
# depth align to color on left
# depth on right
depth_colormap = cv2.applyColorMap(
cv2.convertScaleAbs(depth_image, alpha=0.03), cv2.COLORMAP_JET
)
images = np.hstack((color_image, depth_colormap))
cv2.namedWindow("Align Example", cv2.WINDOW_NORMAL)
cv2.imshow("Align Example", images)
key = cv2.waitKey(1)
# Start saving the frames if space is pressed once until it is pressed again
if key & 0xFF == ord(" "):
if not RecordStream:
time.sleep(0.2)
RecordStream = True
with open(os.path.join(script_dir, "out/cam_K.txt"), "w") as f:
f.write(f"{intrinsics.fx} {0.0} {intrinsics.ppx}\n")
f.write(f"{0.0} {intrinsics.fy} {intrinsics.ppy}\n")
f.write(f"{0.0} {0.0} {1.0}\n")
print("Recording started")
else:
RecordStream = False
print("Recording stopped")
if RecordStream:
framename = int(round(time.time() * 1000))
# Define the path to the image file within the subfolder
image_path_depth = os.path.join(subfolder_depth, f"{framename}.png")
image_path_rgb = os.path.join(subfolder_rgb, f"{framename}.png")
image_path_depth_unaligned = os.path.join(subfolder_depth_unaligned, f"{framename}.png")
image_path_rgb_unaligned = os.path.join(subfolder_rgb_unaligned, f"{framename}.png")
cv2.imwrite(image_path_depth, depth_image)
cv2.imwrite(image_path_rgb, color_image)
cv2.imwrite(image_path_depth_unaligned, unaligned_depth_image)
cv2.imwrite(image_path_rgb_unaligned, unaligned_rgb_image)
# Press esc or 'q' to close the image window
if key & 0xFF == ord("q") or key == 27:
cv2.destroyAllWindows()
break
finally:
pipeline.stop() Example if the RBG and depth frames are not aligned properly: 44_unaligned.mp43. Wrong sensor intrinsicsMake sure that you use the correct instrinsics in the following format (RealSense with pyrealsense2, see code above): intrinsics.fx 0.0 intrinsics.ppx
0.0 intrinsics.fy intrinsics.ppy
0.0 0.0 1.0 Example of very wrong intrinsics: 44_intrinsics.mp44. Impressive pose estimation when everything is done rightExample if everything works fine: 44_correct.mp4Edit: @wenbowen123 was quicker, but maybe it still helps. 😃 |
@savidini Could you tell me how to modify the unit of YCB-Video objects CAD models? |
@Ethan-Shen-lab you can use software like MeshLab to manually scale down objects: You can also use packages like for example trimesh to do this in Python, see simple example below: import trimesh
mesh = trimesh.load('path_to_your_file.obj')
mesh.apply_scale(0.001)
mesh.export('scaled_down_file.obj') I am not sure what you want to do ( |
thank you very much! And I have another question. I used the realsense code you provided to collect image data, but after running the run_demo.py, an error like this occurred: |
@ethanshenze Example data with the Rubik's Cube used for the last video in my comment above. (Setup: RTX4090 and Docker with CUDA 12.1 as described in #27) |
I really appreciate your help! |
@savidini Thanks a lot for the detailed instruction! I will try them as soon as possible. Really appreciate for the help! |
You can use either of MeshLab or Blender to scale down by 0.001 along each axis ethanshenze |
copy that! I will try it and thanks you very much~ |
@savidini @wenbowen123 However, after I tried the steps, I am facing an issue that the bounding box is too small and not following the object (banana) Could you help with it and tell me what things could I do? Thanks in advance! More Context: The screenshot and files in the debug folder are attached below. I am using the banana CAD model from YCBV official website, a link from Bowen's previous repo https://github.com/wenbowen123/iros20-6d-pose-tracking I have checked the scene_complete.ply file by visualizer and it seems fine to me (so I assume the depth images are ok?) I have checked the model.obj file and it seems fine to me |
@aThinkingNeal can you maybe provide your intrinsics/ 44_small.mp4 |
@savidini Thanks for the advice! I have callibrated the cam_K.txt file and got the bounding box size back to normal. However, the pose estimation seems to be wandering off after the first frame. I am using an apple as the object and put the debug info in the following folder: The behavior is like the images below, the first frame is fine, then the boudning box starts to drift away, even though the object is not even moving: |
@aThinkingNeal from the images of your debug output, it looks like there are several "skips" in the images you recorded, i.e. after Below is a video showing the effect of "skipping" frames, resulting in sudden changes in the tracked object: 44_skip.mp4Apparently this cannot be handled by FoundationPose's tracking (although the correct pose will eventually be correct if enough frames are provided after a skip). This behavior is somewhat different from other methods that do not use tracking, but instead re-run the pose estimation on every frame. If my assumption is correct, but you can't avoid these skips in your input, see #37 for running the pose estimation on every frame. |
your banana model seems wrong in the scale. It's 2 meters long. |
@aThinkingNeal the file names need to be padded with 0 in front to make a fixed num of digits (see our example data) |
@wenbowen123 @savidini Thanks for the help! I think my problem is solved by:
Now I am facing another issue about how to get an accurate custom CAD model, but I will ask it in another issue Thanks again for your help! |
Hello!
Thanks in advance! |
hi,I discovered an interesting project called Foundation pose during my learning process. I saw your presentation video and was very interested in learning how you modified the code. I made some changes and encountered many problems. Can you share your correct code settings for me to refer to? Thank you! |
Hello, how did you obtain the mask image of the first frame captured by your camera |
@bingbingshu You can create the mask manually by simply opening the first frame and drawing the shape of the object and everything else should be black. It is also possible to simply add a point inside the object. Another option is to create the mask using a segmentation model such as SAM or the ISM from SAM6D: If you know the ground truth, you can also create the masks automatically using BlenderProc, but this is a bit more complicated. Regarding your previous question, please be more specific about what exactly you need, if it is still relevant. |
Thank you very much for your guidance. I created a mask in the first frame and have been able to achieve the appearance you showed me. Thank you again for your patient guidance. I will be learning how to use bundle sdf now |
Hi,
I have prepared a .obj model for a teacup and recorded a sequence of RGB and depth image sequence. All the needed files are organized like the provided demo data. However, the inference of the teacup is totally incorrect.
Maybe there is something wrong with the depth image preprocessing. I have a realsense d435i camera and have scaled the depth image like the linemod dataset (value of a pixel equals millimeter in real world).
Could you specify the preprocessing of depth image?
The text was updated successfully, but these errors were encountered: