-
Peking University
- Beijing
- charlesCXK.github.io
Highlights
- Pro
Stars
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Janus-Series: Unified Multimodal Understanding and Generation Models
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
This repo contains the code for 1D tokenizer and generator
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Schedule-Free Optimization in PyTorch
Annotated version of the Mamba paper
VideoSys: An easy and efficient system for video generation
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
Recent LLM-based CV and related works. Welcome to comment/contribute!
Official code for the NeurIPS 2023 paper "Switching Temporary Teachers for Semi-Supervised Semantic Segmentation"
✨✨Latest Advances on Multimodal Large Language Models
The Startup CTO's Handbook, a book covering leadership, management and technical topics for leaders of software engineering teams
[ICCV-2023]-Universal Video Segmentaion For VSS, VPS and VIS
[ICCV2023] DETR Doesn’t Need Multi-Scale or Locality Design
[ICCV 2023] Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
official code for TMLR Paper: "Understanding Self-Supervised Pretraining with Part-Aware Representation Learning"
CV算法岗知识点及面试问答汇总,主要分为计算机视觉、机器学习、图像处理和 C++基础四大块,一起努力向offers发起冲击!