A curated list of awesome projects and papers for AI on Mobile/IoT/Edge devices. Everything is continuously updating. Welcome contribution!
- [ICML'22] POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging. by Patil et al. [paper]
- [NeruIPS'22] On-Device Training Under 256KB Memory. by Ji Lin, Song Han et al. [paper]
- [MobiSys'22] Melon: breaking the memory wall for resource-efficient on-device machine learning. by Qipeng Wang et al. [paper]
- [MobiSys'22] Sage: Memory-efficient DNN Training on Mobile Devices. by In Gim et al. 2022 [paper]
- [MobiCom'22] Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading. by Daliang Xu et al. [paper]
- [ICPP'22] Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training. by Shengyuan Ye et al. [paper] [code]
- [SEC'21] EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment. by Pengzhan Hao et al. [paper]
- [MobiSys'21 Workshop] Towards Ubiquitous Learning: A First Measurement of On-Device Training Performance. by Dongqi Chai, Mengwei Xu et al. [paper]
- [MobiSys'23] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors. by USTC & Microsoft. [paper]
- [MobiSys'22] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices. by Fucheng Jia et al. [paper]
- [InfoCom'22] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. by Chenghao hu et al. [paper]
- [TON'20] Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. by Liekang Zeng et al. [paper]
- [ICCD'20] A distributed in-situ CNN inference system for IoT applications. by Jiangsu Du et al. [paper]
- [TPDS'20] Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure. by Jiangsu Du et al. [paper]
- [EuroSys'19] μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. by Youngsok Kim et al. [paper]
- [TCAD'18] DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. by zhuoran Zhao et al. [paper]
- [DATE'17] Modnn: Local distributed mobile computing system for deep neural network. by Jiachen Mao et al. [paper]
- [MobiSys'21] nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices. by Li Lyna Zhang et al. [paper]
- [MobiSys'22] Band: coordinated multi-DNN inference on heterogeneous mobile processors. by Seoul National University et al. [paper]
- [MobiSys'23] ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU. by Shanghai Jiao Tong University [paper]
- [ACL'20] MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. by Zhiqing Sun et al. [paper]
- [ICML'19] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. by Mingxing Tan et al. [paper]
- [CVPR'18] Shufflenet: An extremely efficient convolutional neural network for mobile devices. by Xiangyu Zhang et al.[paper]
- [CVPR'18] MobileNetV2: Inverted Residuals and Linear Bottlenecks. by Mark Sandler et al. [paper]
- [Ubicomp'18] DeepType: On-Device Deep Learning for Input Personalization Service with Minimal Privacy Concern. by Mengwei Xu et al. [paper]
- [Arxiv 2018] Federated learning for mobile keyboard prediction. by Google [paper]
- [CVPR'23 Tutorial] Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments. by Snap Research [paper]
- Tensorflow Lite: Deploy machine learning models on mobile and edge devices. by Google. [code]
- TensorflowJS: A WebGL accelerated JavaScript library for training and deploying ML models. by Google. [code]
- MNN: A Universal and Efficient Inference Engine. by Alibaba. [code]
- TensorRT: A C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators. by Nvidia. [code]
- TVM: Open deep learning compiler stack for cpu, gpu and specialized accelerators. by Tianqi Chen et al. [code]
- MACE: a deep learning inference framework optimized for mobile heterogeneous computing platforms. by XiaoMi. [code]
- NCNN: a high-performance neural network inference framework optimized for the mobile platform. by Tencent. [code]
- Pipeline Parallelism for PyTorch by Pytorch. [code]
- A Gpipe implementation in Pytorch by Kakaobrain. [code]
All contributions to this repository are welcome. Open an issue or send a pull request.