-
Tsinghua University
- Beijing, China
- michaelyuancb.github.io
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
HoloLens 2 Data Acquisition. Access HoloLens 2 Research Mode, Front Camera, Microphone, Head, Eye, Hand, and External USB-C A/V sensor data.
A system to easily extract ground truth training data for different machine learning tasks from GTAV
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Official Repository of "UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos"
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
Labeling tool with SAM(segment anything model),supports SAM, SAM2, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具
The scanner app acquires RGB-D scans using iPhone LiDAR sensor and ARKit API, stores color, depth and IMU data on local memory and then uploads to PC for processing.
A paper list of my history reading. Robotics, Learning, Vision.
Official PyTorch implementation of the “A Unified Transformer Framework for Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection”. (TMM2023)
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compa…
A model combined 100DoH, Semantic-SAM and EgoHOS for hand-object state classification, detection, segmentation.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Ace interviews with AI practice. Our agent role-plays personalized interview tailored to your background, listening and replying like a real interviewer. Train across personas for any situation.
Code for RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [Arxiv 2024]
Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective…
Large Action Model framework to develop AI Web Agents
Building a quick conversation-based search demo with Lepton AI.
Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"