Adam Zhao’s Post

View profile for Adam Zhao

Lead of LLM inference optimization | SDM @ Amazon Stores Foundational AI (Rufus)

Check out this AWS blog about how Rufus build an LLM application at scale with high scalability, availability, throughput and low latency. It’s a combination of great technologies in infrastructure, software and hardware. We use AWS ECS as deployment and serving infrastructure, NVIDIA #Triton as server layer, #vLLM as inference engine, #Neuron SDK as inference backend, and #Trainium #Inferentia chips for compute developed by #Annapurnalabs. This is made possible by the Rufus team members and our great Amazon partners from AWS and #Annapurnalabs.

Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day | Amazon Web Services

Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day | Amazon Web Services

aws.amazon.com

To view or add a comment, sign in

Explore topics