-
Shanghai AI Lab
- Shanghai, China
-
14:02
(UTC +08:00) - https://zhaoshitian.github.io/
Stars
A conversational Q&A agent configuration system, self-hosted deployment solutions, and a convenient all-in-one application SDK, allowing you to create intelligent Q&A bots for your GitHub repositories
Testing baseline LLMs performance across various models
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
LeanUniverse: A Library for Consistent and Scalable Lean4 Dataset Management
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
Seamlessly integrate state-of-the-art transformer models into robotics stacks
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Implementation of a JEPA Image World Model, trained on OpenAI's VPT Minecraft contractor dataset.
WorldModel is a MaskGIT model trained on 8x8x8 Minecraft voxel volumes. Beyond generating blocks from scratch, it excels in filling spaces based on neighboring blocks, ensuring seamless integration…
CGL-Dataset v2 for huggingface datasets
Repo of paper "Free Process Rewards without Process Labels"
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Recipes to train reward model for RLHF.
A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize.
Scalable RL solution for advanced reasoning of language models
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
[ECCV 2024] Official Pytorch Implementation of A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
[LMM + AIGC] What do we expect from LMMs as AIGI evaluators and how do they perform?
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
Large Concept Models: Language modeling in a sentence representation space
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Official repo and evaluation implementation of VSI-Bench
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.