Check out this AWS blog about how Rufus build an LLM application at scale with high scalability, availability, throughput and low latency. It’s a combination of great technologies in infrastructure, software and hardware. We use AWS ECS as deployment and serving infrastructure, NVIDIA #Triton as server layer, #vLLM as inference engine, #Neuron SDK as inference backend, and #Trainium #Inferentia chips for compute developed by #Annapurnalabs. This is made possible by the Rufus team members and our great Amazon partners from AWS and #Annapurnalabs.
Adam Zhao’s Post
More Relevant Posts
-
Scaling #LLM Inference on #EKS with AWS Inferentia and Trainium is now more efficient and reliable, inspired by Amazon Rufus team's blog post. Blog of Amazon Rufus: https://lnkd.in/dMJBhBgV Facing challenges like high demand for fast responses, cost-efficiency, and multi-region reliability, the solution integrates key highlights for optimal performance. Leveraging Global Accelerator and Application Load Balancer for network design, Karpenter autoscaler for resource management, and specialized hardware such as AWS Inferentia and Trainium chips, alongside NVIDIA Triton Inference Server and TensorRT-LLM, ensures top-notch inference optimization. Performance enhancements like INT8 quantization, continuous batching and streaming in vLLM backend, and using Inferentia and Trainium chips for scaling contribute to the solution's success. Check out the detailed insights here: https://lnkd.in/dtQkWKt3
Scaling LLM Inference on EKS with AWS Inferentia and Trainium
medium.com
To view or add a comment, sign in
-
Tried DeepSeek and it is impressive. I, for one, think it is good this came out. Sets a new level playing field in #AI and little hope that something as impressive might come out of a European lab (assuming the AI doesnt curb it in its development). Its comparable to the Google #MapReduce moment from early 2000s. Something fundamental for cost effective infra deployment thats *open source*. Ah, and for Amazon Web Services (AWS) a new model added to the library to be deployed via #Bedrock. https://lnkd.in/drKJJ5QT
Deploying DeepSeek-R1 Distill Llama Models on Amazon Bedrock
community.aws
To view or add a comment, sign in
-
DeepSeek AI has been taking the headlines by storm - want to leverage it on AWS ? Try it now - is easy Here is how to deploy deepseek-r1 model on Amazon Bedrock https://lnkd.in/gCfNzWDj #aws #deepseek #builders #amazonbedrock #ai #genai #data
Deploying DeepSeek-R1 Distill Llama Models on Amazon Bedrock
community.aws
To view or add a comment, sign in
-
In August I finished the book "Deep Learning for Coders" by Jeremy Howard. Near the end of the book I mainly skimmed over the sections on RNNs and focused on the areas that I already have some experience with: regressors, classifiers, tabular data, general neural networks, CNNs, data processing, and model evaluation techniques. I'll return to RNNs down the line (I've implemented some before, but I'd like to do deeper). Now in September, I started "Hands-On Machine Learning" by Aurélien Géron and am about 1/5ths of the way through and the information is sinking in (it's a great book). I'm trying to get as much of it "under my fingers" as I can. Specifically I'm aiming to be able to go through the end-to-end process of bringing an ML project to life from idea to production. Once I'm comfortable with the process I'll be diving deeper into improving models once they're live. This weekend I dockerized my ML environment with Tensorflow. I'm using docker-compose to set up a Jupyter notebook server and adjacent Python web server that I plan to serve the model from. The container makes it easy to build and deploy locally on my CPU bound system (M3 Macbook), and later down the line I'll be able to deploy the same container to environments that match my hardware needs. I'm using Dev Containers with VS Code which allows me to work directly in the container and trigger rebuilds. The file-system is mounted to my local environment allowing changes in the container to be reflected in my workspace. With this I'm able to work directly with Jupyter notebook in my Docker container through VS code (I'm pretty excited about this, since it's my preferred coding environment). I did have to go through some steps to get the build right, but since all the steps are captured in the Dockerfile I'll never have to do it again. I tried leveraging the Apple Metal stuff with Docker to squeeze out more local performance, but I hit a few too many hiccups during the setup, and I believe the CPU performance I'll get out of the M3 will be sufficient anyways. Ultimately I plan to target Nvidia GPUs as they seem to currently have superior driver support, so I'd also rather not complicate things by trying to support too many GPU types. In the future my notebook server will be deployed in clusters with access to high performance GPUs. Model builds will occur daily/weekly (depending on cost and decay rate of model performance), go through a validation process (automated review and manual human review), and finally be deployed and made available through the web server. I still have a lot to learn about making really effective ML models and ML specific infrastructure so I've been taking effort every day to learn something new by scheduling 1-2 hours in the evening to work on ML applications and infrastructure. I'll try to "learn in public" as much as I can, you can get connected with me on X:
Andrew Wilson (@abstructs) on X
x.com
To view or add a comment, sign in
-
🚀 Want to Deploy Advanced AI Without Infrastructure Headaches? Why DeepSeek-R1 on Amazon EKS Auto Mode is changing the game: 🔹 Open-Source Power – Democratize AI development with accessible, community-driven innovation. How could this transform your projects? 🔹 Smarter Reasoning – Tackle complex tasks (math, logic, coding) using Chain of Thought (CoT) for 23% higher accuracy. Ever struggled with AI model precision? 🔹 Zero-Kubernetes Stress – Auto-scaling infrastructure handled by Amazon EKS – focus on building, not managing. Developers/AI teams: Would seamless scalability accelerate your workflow? 👉 Tutorial Walkthrough: Click on the link 🔗 P.S. Tag a developer who needs this! 💬 Discussion Starter Which feature excites you most? Open-source flexibility? Enhanced reasoning? Let’s debate below! ↓ #AIInnovation #CloudComputing #TechCommunity
Hosting DeepSeek-R1 on Amazon EKS
community.aws
To view or add a comment, sign in
-
Deploying the DeepSeek AI-R1 Distill Llama models on Amazon Bedrock involves utilizing the Custom Model Import feature, which allows you to integrate your externally fine-tuned models into the Bedrock environment seamlessly. This process enables you to leverage Bedrock's serverless infrastructure and unified API for efficient model deployment.
Amazon Web Services (AWS) is offering multiple ways for our customers to leverage the DeepSeek AI R1 foundation model, which has garnered significant attention in the AI industry. Our commitment to AI accessibility is reflected in these options: 1, Amazon SageMaker AI now supports running distilled Llama and Qwen DeepSeek models 2, Amazon Bedrock's Custom Model Import feature allows utilization of distilled Llama DeepSeek models 3, DeepSeek models can be trained on Amazon SageMaker AI through Hugging Face integration At AWS, we recognize that AI requirements vary across use cases. These platforms enable our customers to efficiently evaluate and implement the most suitable AI solutions for their specific needs. As the AI landscape continues to evolve, AWS remains committed to providing flexible, state-of-the-art options. We're dedicated to ensuring our customers stay at the forefront of AI innovation, with the tools and capabilities to drive their businesses forward.
Deploying DeepSeek-R1 Distill Llama Models on Amazon Bedrock
community.aws
To view or add a comment, sign in
-
🚀 Game-changing news for #AI developers from the floor of #AWSreInvent 2024! #AWS just introduced prompt #caching and #routing will be supported on #Bedrock, cutting costs and latency without compromising accuracy. This means faster response times and significant savings for businesses using Amazon Bedrock and other AWS gen-AI services. The future of AI deployment is here, let’s go build!
AWS now allows prompt caching with 90% cost reduction
https://venturebeat.com
To view or add a comment, sign in
-
🚀 Game-changing news for #AI developers from the floor of #AWSreInvent 2024! #AWS just introduced prompt #caching and #routing will be supported on #Bedrock, cutting costs and latency without compromising accuracy. This means faster response times and significant savings for businesses using Amazon Bedrock and other AWS gen-AI services. The future of AI deployment is here, let’s go build!
AWS now allows prompt caching with 90% cost reduction
https://venturebeat.com
To view or add a comment, sign in
-
🚀 Game-changing news for #AI developers from the floor of #AWSreInvent 2024! #AWS just introduced prompt #caching and #routing will be supported on #Bedrock, cutting costs and latency without compromising accuracy. This means faster response times and significant savings for businesses using Amazon Bedrock and other AWS gen-AI services. The future of AI deployment is here, let’s go build!
AWS now allows prompt caching with 90% cost reduction
https://venturebeat.com
To view or add a comment, sign in
-
🚀 Deploying DeepSeek-R1 Distill Llama Models on Amazon Bedrock! 🔥 Bringing custom models to Amazon Bedrock just got easier! With DeepSeek-R1 Distill Llama, you can leverage Bedrock’s serverless infrastructure and unified API for seamless deployment and inference. 🛠 Key Steps: ✅ Prepare & upload model files in Hugging Face format ✅ Store them in an Amazon S3 bucket ✅ Import your model into Amazon Bedrock ✅ Invoke the model via the Bedrock API 📌 Check out the step-by-step guide: https://lnkd.in/gTE25smm 📌 Learn more about importing custom models into Bedrock: https://lnkd.in/gUQ3swQE Amazon Web Services (AWS) | AWS AI AWS Developers #GenerativeAI #AmazonBedrock #DeepSeek #MachineLearning #AI #LLM
Deploying DeepSeek-R1 Distill Llama Models on Amazon Bedrock
community.aws
To view or add a comment, sign in