- VietNam
-
08:50
(UTC +07:00) - https://www.linkedin.com/in/hung-nguyen-4b795893/
Machine Learning
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Fast, visual and customizable LoRA sidebar packed with features for ComfyUI
Official implementation of "3D representation in 512-Byte: Variational tokenizer is the key for autoregressive 3D generation"
Perceptual video quality assessment based on multi-method fusion.
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Official Implementations for Paper - AniDoc: Animation Creation Made Easier
GaussianSpeech: Audio-Driven Gaussian Avatars
Learning records for building a large language model from scratch
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
DeepSeek Coder: Let the Code Write Itself
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
ComfyUI : 116 nodes : Display, manipulate, and edit text, images, videos, loras and more. Manage looping operations, generate randomized content, use logical conditions and work with external AI to…
Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
Mock Interview Simulator with AI-Powered Feedback
[I3D 2024] Deblur-GS: 3D Gaussian Splatting from Camera Motion Blurred Images
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…