Stars
This repository contains the source code and documentation for a Signature Verification System Using CNN. The system utilizes Convolutional Neural Networks (CNN) to authenticate handwritten signatu…
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
High-resolution models for human tasks.
source codes for paper: DeepHSV: User-independent Offline Signature Verification Using Two-Channel CNN
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
[NeurIPS 2022 Spotlight] The official GitHub page of "MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification"
Learned representation for Offline Handwritten Signature Verification. Models and code to extract features from signature images.
Signature verification package, for learning representations from signature data, training user-dependent classifiers.
📣 商用级开源语音自动识别程序库,开箱即用,全平台支持,中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.
检测和提取各种场景图片中的表格区域,并纠正透视和旋转问题 Detect and extract table regions from images in various scenarios, and correct perspective and rotation issues.
基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。
yolov8 车牌检测 车牌识别 中文车牌识别 检测 支持12种中文车牌 支持双层车牌
Layout detection model . Use PP-PicoDet-LCNet1.0x
lipsync-mediapipe is a project that learns facial landmark shapes from speech audio using Python 3.8.
使用opencv部署3D人脸重建3DDFA-V3,包含C++和Python两个版本的程序,只依赖opencv库就能运行
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC…
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
[CVPR 2024] High-Quality Facial Geometry and Appearance Capture at Home.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)