Skip to content

reachsak/LLM-XR_Multimodal_AI_Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal AI Assistant and Extended Reality (XR) Applications

Project Description

This project explores the integration of Vision Language Model (VLM), Large Language Model (LLM), and extended reality (XR) to create a Multimodal AI assistant with voice chat, Image understanding, and smart building control in immersive environments. This project aims to create an innovative solution for remote facility management and urban infrastructure monitoring. The developed system deploys an LLM-based AI assistant and a digital building twin into an XR environment using Microsoft HoloLens 2. Users can interact with the BIM models and communicate with the Multimodal AI chatbot. Users can also interact with the AI assistant through voice commands to control building facilities. This setup enhances the ability of facility managers and occupants to interact with and control smart buildings remotely. The approach also holds the potential for scaling to multiple buildings or urban infrastructure, enabling immersive, real-time monitoring and management for smart city applications.

### AI Voice Chat and Image Understanding

Video Demo

Demo Video 1

*Click on the image to view Demo Video 1.*

Demo Video 2

*Click on the image to view Demo Video 2.*

Key Features

  • LLM-Based AI Agents: Utilizes advanced language models for intelligent interaction and control.
  • Extended Reality (XR) Integration: Implements XR technologies with Unity 3D to create immersive smart building control applications as well as BIM model manipulation.
  • AI Voice Chat: Enables natural language communication with the smart building system.
  • Image Understanding: Incorporates vision language models for understanding and interpreting visual data.

Requirements

  • Open-source Vision language model and Large Language Model (e.g., MiniCPM V, LLaMA 3)
  • Generative AI inference tool. llama.cpp
  • Unity 3D
  • Microsoft Hololen 2
  • Python 3.10
  • Open-source Text-to-Speech (TTS) model, Whisper
  • Open-source Speech-to-Text (STT) model, Piper

Detailed setup guide

Coming soon.....

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published