Skip to content
View josephkirk's full-sized avatar
😀
😀

Block or report josephkirk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Machine Learning

24 repositories

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,756 186 Updated Nov 14, 2024

Fast, visual and customizable LoRA sidebar packed with features for ComfyUI

JavaScript 56 7 Updated Nov 23, 2024

Open-world 3D part segmentation of point clouds

Python 59 Updated Nov 21, 2024
76 Updated Aug 26, 2024
Python 8 1 Updated Dec 13, 2024

Official implementation of "3D representation in 512-Byte: Variational tokenizer is the key for autoregressive 3D generation"

46 Updated Dec 5, 2024

Perceptual video quality assessment based on multi-method fusion.

Python 4,752 756 Updated Jan 9, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 18,790 1,052 Updated Jan 20, 2025

Official Implementations for Paper - AniDoc: Animation Creation Made Easier

Python 454 29 Updated Dec 31, 2024

GaussianSpeech: Audio-Driven Gaussian Avatars

125 4 Updated Dec 2, 2024

Learning records for building a large language model from scratch

Jupyter Notebook 42 Updated Jan 1, 2025

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

1 Updated Dec 16, 2024

A Lightweight Recommendation System

Python 7,935 600 Updated Nov 8, 2023

DeepSeek Coder: Let the Code Write Itself

Python 10,281 686 Updated May 21, 2024

Autonomous agents for everyone

TypeScript 12,331 3,531 Updated Jan 21, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 12,918 2,636 Updated Jan 21, 2025

ComfyUI : 116 nodes : Display, manipulate, and edit text, images, videos, loras and more. Manage looping operations, generate randomized content, use logical conditions and work with external AI to…

Python 83 8 Updated Jan 11, 2025

Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"

Python 201 5 Updated Jan 3, 2025

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

499 50 Updated Dec 31, 2024

Mock Interview Simulator with AI-Powered Feedback

TypeScript 1,374 234 Updated Dec 23, 2023

[I3D 2024] Deblur-GS: 3D Gaussian Splatting from Camera Motion Blurred Images

Python 367 35 Updated May 16, 2024

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …

Python 25,015 1,342 Updated Jan 21, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,039 1,218 Updated Jan 20, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,127 431 Updated Jan 9, 2025