E-commerce System Design Overview
E-commerce System Design Overview
Designing a large-scale ecommerce platform similar to Amazon or Flipkart, focusing on key functional and
non-functional requirements, system architecture, and component interactions. Initially, it defines critical
functionalities including product search, delivery feasibility (serviceability), cart and wishlist management, and
order history viewing. On the non-functional side, it emphasizes low latency, high availability, and
consistency, clarifying trade-offs among these goals for different services.
The architecture is explained in two main parts: the user-facing interaction (home screen, search function)
and the checkout/order flow. The system integrates various microservices, message queues (Kafka),
databases (MongoDB, MySQL, Cassandra, ElasticSearch), and caching layers (Redis) to ensure scalability
and responsiveness. Backend systems handle supplier integration, item onboarding, search indexing,
inventory management, user servicing, recommendations, and notifications.
The search functionality combines Elasticsearch for text queries with a Serviceability and Turn Around Time
(TAT) service that filters out undeliverable products early to avoid bad user experience. Events like searches,
adding items to cart/wishlist, and orders generate Kafka events feeding analytics and machine
learning-based recommendation engines.
The order flow highlights the importance of atomicity using MySQL for storing orders temporarily with
inventory blocking to prevent overselling. Redis caching helps manage orders that time out during incomplete
payments. Advanced mechanisms handle race conditions between payment success and expiry events.
Long-term order data is archived to Cassandra for efficient historical querying. Notifications and logistics are
decoupled services triggered by order state changes.
Overall, the video thoroughly covers real-world system trade-offs, explains design choices for various
databases, message workflows, consistency vs. availability challenges, and highlights operational concerns
like reconciliation, scaling, and user experience optimization.
🔍
Highlights
The ecommerce search results show only deliverable products, improving user experience by preventing
📦
non-serviceable items from appearing.
Items are onboarded through an Inbound Service and stored in MongoDB for flexibility in handling
🛒
diverse, unstructured item attributes.
Cart and Wishlist services are separate microservices with their own databases, enabling independent
🎯
scaling.
⏳
Inventory blocking and atomic transactions using MySQL prevent overselling during concurrent orders.
Redis with TTL tracks pending orders during payment, enabling automatic rollback if payment is
📊
abandoned.
Kafka streams power real-time analytics and machine learning-based personalized product
🗃️
recommendations.
Archival of completed and canceled orders to Cassandra preserves long-term order data with efficient
query support.
Key Insights
🔗 Serviceability Early in Search Flow: Integrating serviceability and delivery time estimation early in the
search pipeline prevents users from wasting time on products that cannot be delivered to their location,
reducing cart abandonment and frustration. This approach prioritizes user experience by filtering results
based on geography and product logistic constraints.
🗂️ Choosing MongoDB for Item Data: Product attributes can be highly heterogeneous (e.g., apparel size vs
TV screen) and frequently changing. Using a NoSQL database like MongoDB allows flexible schema and
efficient querying of semi-structured data. This design choice simplifies item onboarding and management
while supporting varied product lines.
🚦 Trade-offs in CAP Theorem Across Services: The video highlights how it’s impractical to enforce low
latency, high availability, and strong consistency simultaneously for every component. For example,
payments and inventory require strong consistency (CP), while search prioritizes availability (AP).
Decomposing system requirements this way leads to better overall performance and user experience.
🛠️ Atomic Inventory Blocking with Constraints: Preventing overselling in a concurrent environment is critical.
The system leverages MySQL’s ACID transactions and atomic decrement operations (count cannot go
negative) to reliably block inventory during an order before accepting payment. This ensures fairness and
consistency in competitive purchasing scenarios.
⏳ Order Timeout Handling with Redis TTL and Expiry Callbacks: Payment flows are vulnerable to user
abandonment. The system uses Redis to store order metadata with expiry keys to detect incomplete
payments. Upon TTL expiry, it triggers automatic order cancellation and inventory rollback, which helps
maintain data accuracy and frees locked resources, enhancing system robustness.
📈 Analytics and Recommendations from Event Streams: Kafka captures user behaviors such as searching,
wishlist, cart actions, and orders, feeding these into Spark Streaming and Hadoop for batch and
near-real-time analytics. Machine learning models run on this data enable personalized and category-specific
recommendations, increasing cross-selling potential and user engagement.
🏗️ Hybrid Data Storage for Orders: Active orders reside in MySQL to benefit from strong consistency and
transactional updates, while completed or canceled orders are archived in Cassandra optimized for
read-heavy workloads with wide datasets. This multi-database strategy balances ACID functionality where
needed and scalability for historical data retention and retrieval.
This detailed exploration covers the system design of an ecommerce platform, clarifying why certain
technologies are chosen, how microservices interact, and how critical real-world concerns like inventory
accuracy, delivery, payment uncertainty, and user experience are handled to build a robust and scalable
application.
Reconciliation service:
The reconciliation service in the e-commerce system design has a crucial role related to maintaining data integrity and consistency,
particularly between order records and inventory counts. In a high-throughput environment like Amazon or Flipkart, many transactions occur
simultaneously, and there can be failures or inconsistencies arising due to partial updates, system crashes, or communication delays
between services.
Use case of the reconciliation service:
When a user places an order, the system decrements the inventory stock to “block” the product during the payment process,
ensuring that no other buyer can purchase the same unit concurrently. However, there are scenarios where the entire flow
might not complete successfully—payment failure, lost response from payment gateways, or abrupt user abandonment (for
example, closing the browser window mid-payment). These cases can lead to discrepancies where inventory is reduced but
The reconciliation service periodically audits and reconciles such inconsistencies by cross-checking:
1. Order statuses vs Inventory: It verifies whether the inventory counts in the database match the actual orders
recorded. For instance, if an order was canceled or timed out but inventory decrement was not properly reversed,
2. Detecting missed or failed updates: It handles edge cases where inventory increments or decrements might have
been missed due to transient failures, ensuring that inventory counts reflect the true state of orders.
3. Ensuring eventual consistency: Since some parts of the system prioritize availability and low latency over
immediate consistency, reconciliation acts as a background corrective mechanism to restore consistent state
across services.
4. Preventing overselling and stock errors: By regularly checking and correcting inventory against orders, the service
helps prevent the system from showing wrong inventory to customers or overselling products.
In summary, the reconciliation service performs a crucial background job that audits orders and inventory periodically to
maintain system correctness and data integrity. This ensures that despite asynchronous operations, partial failures, or
network issues, the inventory reflects the real status of ordered products, reinforces trust in the system, and improves overall
Full Transcript:
Hi everyone! Welcome to CodeKarle. In this video, we'll be looking at another very common System Design Interview problem,
which is to design an ecommerce application, something very similar to Amazon or Flipkart or anything of that sort. Let's look at some
functional and non-functional requirements(NFRs) that this platform should support. So the very first thing is that people should be
able to search for whatever product they want to buy and along with searching we should also be able to tell them whether we can
deliver it to them or not. So let's just say a particular user is in a very remote location where we cannot deliver a particular
product then we should clearly call it out right on the search page. Why? - because let's say a user has seen hundred products and
then they go to checkout flow add it into cart and then if we tell them that we can't deliver then that's a very bad user experience. So
at the page of search itself we should be able to tell them that we cannot deliver or if you are delivering then by when should we be
able to deliver it to you.
The next thing is there should be a concept of a cart or a wishlist or something of that sort so that people can basically add a
particular item into a cart. The next thing is people should be able to check out which is basically making a payment and then
completing the order. will not look at the how the payment exactly works like the integrating with Payment Gateways and all of that
but we'll broadly look at how the flow overall works. The next thing is people should be able to view all of their past historical orders
as well.
The ones that have been delivered , the ones that have not been delivered, everything From the non-functional side the system
should have a very low latency. Why? - because it will be a bad user experience if it's slow. It should be highly available and it should
be highly consistent. Now high availability, high consistency and low latency all three together sounds a bit too much so let's break it
down. Not everything needs to have all these three So some of the products which are mainly dealing with the payment and
inventory counting, they need to be highly consistent at the cost of availability Certain components, like search and all, they
need to be highly available, maybe at the cost of consistent at times. And overall most of the user facing components should have a
low latency. Now let's start with the overall architecture of the whole system. We will do it in two parts First, we look at the home
screen and the search screen and then we look at the whole checkout flows. And before we get to the real thing let's look at a bit of
convention to start with. Things in green
are basically the user interfaces. It could be browsers, mobile applications, anything. These two are basically not just load
balancers but also reverse proxies and an authorization and authentication layer in between that will authenticate all the requests that
are coming from outside world into our system. All the things in blue that you see here are basically the services that we have built. It
could be Kafka consumers, could be Spark jobs could be anything. And the things in red are basically the databases or any clusters
could be Kafka Cluster/Hadoop cluster
or any public facing thing that we've used. Now let's look at the user interface what all we'll have. So we mainly have two user
interfaces. One is the home screen which would be the first screen that a user comes to it would by default have some
recommendations based on some past record of that user, or in case of new user some general recommendations. In case of search
page it would just be a text box where in user head put in a text and will give them the search results. With that let's look at how the
data flow begins. So a company like Amazon
would have various suppliers that they would have integrated with now these suppliers would be managed by various services
on the supplier front okay. I am abstracting out all of them as something called as an Inbound Service. What that does is basically it
talks to various supplier system and get all of the data Now let's say new item has come in. Or a supplier is basically onboarding a
new item. That information comes through a lot of services and through Inbound Service into a Kafka. That is basically a supplier
word coming into the whole
search and user side of things. Now there are multiple consumers on that Kafka, which will process that particular piece of
information to flow into the user world. Let's look at what that is. So the very first thing that will happen is basically something called
as an Item Service. So Item Service has a lot of people talking to it and if I start drawing all the lines, it become too cluttered so I've
not on any connection to item service but item service will be one of the important things that listen to this kafka topic and what it
does is
it basically on board a new item. Now it also is basically the source of truth for all items in the ecosystem. So it will provide
various APIs to get an item by item ID, add a new item, remove an item, update details of a particular item and it will also have an
important API to bulk GET a lot of items. So get a GET API with a lot of item IDs and it is in response it gives details about all of those
items. Now Item Service sits on top of a MongoDB. Why do we need a Mongo here? - So item information is fundamentally
very non structured. Different item types will have different attributes and all of that needs to be stored in a queriable format.
Now if we try to model that in a structured form into MySQL kind of database that will not be an optimimal choice. So that's why we've
used a Mongo. To take an example, let's say if you look if you look at attributes of some products like a shirt, it would have a size
attribute and it would probably have a color attribute right. Like a large size t-shirt of red color. If you look at something like a
television, it will have
a screen size, saying a 55 inch TV with the led display kind of a thing. There could be other things like a bread, which could
have a type and weight. Like 400 grams of bread of which is of wheat bread or a sweet bread or something of that sort. So, it's a
fairly non-structured data and that's the reason I use a Mongo database here. Now, coming to other consumers that will be sitting on
that Kafka. So on this Kafka, there is something called as a Search Consumer. What this does is basically whenever a new item
comes in, this search consumer is
responsible for making sure that item is now available for the users to query on. So what it does is all the items that are coming
in it basically reads through all of those items. It puts that in a format that the search system understands and it stores it into its
database. Now, search uses a database called Elastic search, which is again a NoSQL database, which is very efficient at doing
text-based queries. Now this whole search will happen either on the product name or product description and maybe it
will have some filters on the product attributes. Those kind of things is something that elastic search is very good at. Plus in
some cases you might also want to do a fuzzy search, which is also supported very well by Elastic Search. Now on top of this Elastic
Search, there is something called as a Search Service. This Search Service is basically an interface that talks to the front end or any
other component that wants to search anything within the ecosystem. It provides various kinds of APIs to filter products, to search by
a particular
string or anything of that sort. And the contract between Search Service and Search Consumer is fixed so both of them
understand what is the type of data that is stored within Elastic Search. That is the reason consumer is able to write it and the Search
Service is able to search for it. Now, once .. a user wants to search something, there are two main things around it. One is basically
trying to figure out the right set of items that needs to be displayed but there is also an important aspect that we should not show
items that we cannot deliver to the
customers. So for example, if a customer is staying in a very remote location and if we are not able to deliver big sized items
like refrigerator to that pin code, we should not really show that search result to the user as well because otherwise it's a bad
experience that he will see the result but you will not be able to order it, right. So search service talks to something called as a
Serviceability and TAT(Turn Around Time) Service. This basically does a lot of things first of all it tries to figure out that, where exactly
the product is. In what all
warehouses. Now given one of those Warehouse or some of those warehouses, it tries to see - do I have a way to deliver
products from this warehouse to the customer's pincode? and if I have, then what kind of products can I carry on that route. Now
certain routes can carry all kinds of products but certain routes might not be able to carry things like big products or other kind of
product, right. So all of those filtering stays within the Serviceability and TAT Service. Now it also does one more thing that it tells
you in how much time I will be able to deliver, okay. So it would be like in number of hours or days or anything of that sort it can
say that probably I will be able to deliver it in 12 hours, or 24 hours or something of that sort. Now if serviceability tells that I cannot
deliver, search will simply filter out those results and ignore that and return the rest of the remaining things. Now Search might talk to
User Service. There's something called as a User Service here. We'll get into that later but the User Service is basically a service that
is the source of true so
for the user data and it(Search) can query that(User Service) to fetch some attributes of user probably a default address or
something of that sort, which can be used basically which can be passed as an argument to Serviceability Service to check whether I
can deliver it or not, okay. Now Search Service returns the response to the user which can be rendered and the people can see
whatever you know they want to see. Now each time a Search happens, an event is basically put into Kafka. The reason behind this
is whenever somebody is searching for something they
are basically telling you an intent to buy a kind of product, right? That is a very good source for building a Recommendation.
We'll look at how Recommendation can be build later on, but this is an input into the recommendation engine and we will be using a
Kafka to pass it along. So each search query goes into Kafka, saying this user ID searched for this particular product. Now from the
Search Screen user could be able to wishlist a particular product or add it to cart and buy it, right? All of those
could be done using the Wishlist Service and Cart Service. Wishlist Service is the repository of all the wish lists in the
ecosystem and the cart services by repository of all the carts in the ecosystem. Carts are basically a shopping bag, which when
people put product into it and then checkout. Now both these services are built in exactly the same way(almost). They provide APIs
to add a product into user's cart or wishlist. Get a user's Cart or Wishlist. Or delete a particular item from that.
And they would have a very similar data model and they are both sitting on their own MySQL databases. Now from a hardware
standpoint I'd like to keep these two as separate Hardwares. Just in case, for example Wishlist size becomes too big and it needs to
scale out so we can scale this particular cluster accordingly. But otherwise functionally both of them are totally same. Now each time,
a person is putting a product into Wishlist, they are again giving you signal. Each time they are adding something into their cart
they are again giving a signal, about their preferences, things that like that they want to buy and all of that, right? All of those
could again be put into Kafka for a very similar kind of analytics. Now let's look at what those analytics would be? - So from this
Kafka there would be something like a Spark Streaming Consumer. One of the very first things that it does is - kind of come up with
some kind of reports on what products people are buying right now. Those would be things like coming up with the report saying
what was the most bought item in the last 30 minutes, or what was the most Wishlisted listed item in last 30 minutes, or in
Electronics category what which product is the most sought-after product. So all of those would be inferred by this Spark Streaming.
Other than that it also puts all the data to Hadoop saying this user has liked this product, this user has searched for this product
anything that happens, right. On top of it, we could run various ML Algorithms. I've talked about those in detail in another video(Netflix
Video), but the idea is - given a user and a
certain kinds of products that they like, we could be able to figure of two kinds of information. One is what other products this
user might like okay and the other thing is how similar is this user to other users and based on products that other users have already
purchased we would recommend a particular product to the user. So all of those is calculated by this Spark Cluster on top of which
we can run various ML jobs to come up with this data. Once we calculate those recommendations, this Spark Cluster basically talks
to
something called as a Recommendation Service. Which is basically the repository of all the recommendations, and it has
various kinds of recommendations. One is given a user ID, if you store general recommendations saying what are the most
recommended products for this user. And it will also store the same information for each category. Saying for electronics for this user
these are the recommendations. For the food kind of a thing, for this user, these are the recommendations. So when a person is
actually on home page they will first
see all in just general recommendations and if they navigate into a particular category they'll see the specific recommendations
we are specific to the category. Now we have skipped a couple of components. User Service is a very straightforward service which
basically you can look at any other of my videos that have talked about it in detail(Airbnb Video), but it's a repository of all the users. It
provides various API is to Get details of a user, Update details of user and all of that. It'll sit on top of a MySQL database and a Redis
Cache on top of it.
Now let's say Search Service wants to get details of a user the way it will work is, it will first query Redis to get the details of the
user. If the user is present, it will return from there. But if the user is not present in Redis it will then query the MySQL Database (one
of the slaves of that MYSQL cluster), get the user information, store it in Redis, and return it back. So that's how User Services will
work. Now there are some other components that are here. One is Logistic Service and one is Warehouse service.
Normally these two components come in once the order is placed. But in this scenario this Serviceability Service might query
either of these two services not a runtime, but before catching the information to fetch various attributes. So for example it might
query this Warehouse Service to get a repository of all the items that are in the warehouse or it might query Logistic Service to get
details of all the pin codes that are existing or maybe details about or the Courier Partners that work in a particular locality and all of
with
that all of the information this Serviceability Service will basically create a graph kind of a thing saying what is the shortest path
to go from point A to point B and in how much time can I get there. Now we have not covered this service in detail. I have also made
another video on implementation of Google Maps. This is very similar to that. So if you want to get into details you can look at
that(Google Maps Video) but just to just to call it out this doesn't really do any calculation at runtime. You store all the
information in a cache and whenever anybody queries, it will query the cache and return the results from the cache itself, and
no runtime calculation because those kind of calculations are fairly slow. And it will pre-calculate all possible requests that can come
to it. Basically if there are N Pincodes and M Warehouses, it will do a M x N and calculate all possible combinations of requests that
can come to the service, and store it in a cache. Now let's look at what happens when a User tries to place an order. So the user
interaction, to place an order is represented as this User Burchase Flow. This could be access to apps, mobile apps or web
browsers or anything. Now whenever User says that I am ready to place an order take me to the payment screen which is the last
piece in the app flow then basically the request goes to something called as the Order Taking Service. Think of Order Taking Service
as a part of Order Management System which takes the order. Now Order Management System sits on top of a MySQL database.
Why MYSQL? - because if you
look at order and table form for order, it will have a lot of tables, some with order information, some with customer information,
some with item information and there are a lot of updates that happen on to the Order Database, right, for each order. Now we need
to make sure that those updates are atomic and they can form a transaction so as to make sure that there are no partial updates
happening. And because MYSQL provides that out of the box we would leverage that here. So that's why MySQL. Now whenever
this gets called
the very first thing that happens is a record gets created for an order, an order ID gets generated. Now the next thing that we
do is we put an entry into this Redis saying, this order ID was created at some point in time and this record expires at some point.
Why is it being used, we'll come to. But think of it as an entry that goes in something like - Order id: "O1", gets placed and 10:00 let's
say. it expires at 10:05. Something of that sort. Now the record that goes into this MySQL Database has an initial status.
Think of it as a Status of: "CREATED". So basically the record that goes into this table is something like - order O1, was
created at 10:00 with status: "CREATED". You can have different set of names also that's fine. The third thing that happens is we
basically call Inventory Service. The idea of calling Invented Service is we want to block the inventory. So let's say if there were five
units of a television that a user wanted to purchase, we will reduce the inventory count at this point in time, and then send the user to
payment.
Why do we do that? - Let's just say there was just one TV and three users want to come in at the same time. Now we want to
make sure only one of them buys the TV and for two of them it should show that it is out of stock, right? We can easily enforce that,
through this Inventory Service, by having a constraint on to the table which has the 'count' column saying that 'count' cannot go
negative. Now each time you basically want to place that order for a TV, only one of the.. if the count is one, only one of the users will
be able to do that transaction where the
count reduces. For the other two, it'll be a Constraint Violation Exception because the count cannot be negative. So only one of
them will go through with the inventory, right! Now once the inventory is updated then the user is taken to something called as a
Payment Service. Think of it as an abstraction over the whole payment flow where in this service talks to your Payment Gateways
and takes care of all of the interaction with the payment. We will not go into details of how that works, but the interaction with the
Payment Service can lead to two-three
outcomes. One of them is that payment service could come back saying the payment was successful. Or it could say that the
payment failed due to some reason, could be a lot of reasons. Or the person could just close the browser window after going to the
payment gateway and not come back in which case payment service would not give a response. So these are there are these three
possibilities. Now let's see what can happen, okay. So one of the very first thing that can happen is let's just say at 10:01, we got a
confirmation from Payment saying payment was successful.
In that case it'll.. the order would get confirmed, right. So let's say we call it and that status is now "PLACED", or any other
name for that matter. So now once the payment has been successful, we update the status of the order over here in the database
saying that order is PLACED. Now the work is done. At that point in time an event would be send in to Kafka, staying an order has
been placed, with so-and-so details, okay! Another option - it could(Payment Service) could come back saying that.. the payment
failed. Let's say there was no money in
the account or whatever happened. Now once the payment has failed, we need to do a lot of things. First of all, we need to
cancel the order, because the payment has not happened. So let's say the other possible scenario is the order could be
CANCELLED, right? Now.. if the payment has failed we need to increment the inventory count again. so we'll basically again call
Inventory Service and saying that - you decremented the quantity for this particular order ID, now let's increment it back. So that's a
rollback transaction
kind of a thing, that is happening here, okay. So that is one more thing we'll do. And also we'll have a Reconciliation Service
that sits on top of this whole system which reconciles basically checks every once in a while that if there were ten orders overall I do
have the correct inventory count at the end, just to handle cases with due to some reason we missed update of inventory count or the
write fail. So all of those scenarios can be handled by the Reconciliation flow. Now there is one more scenario that can happen.
Payment Service could not at all
come back, right? The user close the browser window, the flow ends there, payment service doesn't respond back to us. Now
what do we do? We can't keep the inventive blocked. So let's say there was just one television available this user has gone to the
payment screen, closed the window and gone away. Now we still have that one television physically available in our warehouse but
the database says it's not available. We need to bring it back, right. So for that the Redis comes in. So at 10:05 what will happen is
Redis
this record will get expired, right. Redis will have a will have a expiry. We'll have a expiry call back on top of Redis which will
basically be invoked saying that this particular report whatever you insert inserted, it has got expired. Now do whatever you want. At
that point in time Order Taking Service will catch that event and say that now this particular record has expired, I'll basically follow the
same flow that was followed for payment cancellation basically I'll time out the payment and mark it CANCELLED, okay. So at that
point
in time again at 10:05, this order would be moved to CANCELLED State in the Database and the inventory would get updated
back again, right. So everything is good but there are a couple of scenarios here. So there could be potential risk conditions. What
happens if your payment success event and your expiry happens at the same time. What do you do then? So there are two three
scenarios in which it would happen. The first scenario is that payment success comes first and then expiry happens. So in that... that
is a scenario that will always happen.
So whenever a payment is success let's say at 10:01 payment got success anywhere at 10:05 this expiry would have
happened, right? So this is something that is bound to happen for all the the orders. So one optimization we could do here is each
time we get a payment success or payment failure event, we can delete the entry form Redis, so as to make sure it's expiry event
does not come in. The other scenario is that expiry comes first and then payment success. Now whenever we would have got.. let's
say expiry came at 10:05 and
payment success happens at 10:07. Now whenever we get the expiry event we would have canceled the order, we would have
decremented the inventory count. But now the payment has happened, that we got to know. There we could do 2-3 things. We could
either refund the money back to the customers saying that for whatever reason we are not able to process, here's your money back.
Alternatively, we could also say that now we have anyway got the money from the customer. We know what the person was able to...
was trying to order. Due to a issue one our side
possibly the order didn't go through. So we could create a new order mark this payment for that order, and then put that directly
into PLACED status. So that is another thing that we could do. Assuming for all the orders as soon as we get a payment success and
payment failure, we will delete these entries from Redis so to make sure there are no.. these conditions do not happen too frequently.
That will help us to save the memory footprint as well, because the lesser amount of data we have in Redis, the lesser RAM it would
consume, and it will be much more efficient okay.
Now one thing I want to call out about this Redis expiry is - this is not very accurate. So let's say if it was the supposed to
expire at 10:05, we might not get the call back exactly a 10:05. We might get a callback at 10:06, 10:07 or sometime after 10:05 and
that is because of the way it is expiry works. So Redis doesn't expire all the records at that point in time. It basically it checks all the
records every once in a while for expiry and whenever it finds some of the docs/ some of the keys that are expired then it expires
that. So it might happen
that it'll expire after a few minutes. Now in this scenario it doesn't really matter so I think we should be okay with this kind of an
approach but if you are trying to do something which is much more mission-critical, then possibly you might have to do a different
approach. But even you should talk about this to your interviewer, and if they say then you might also change the implementation a
bit. You might implement a queue and keep polling that queue every second or something of that sort. Okay. Now once the payment
has happened
or not happened or whatever happened, you would put all of those events into Kafka anyway. There is one more reason why
you want to put two events. So let's say there was this one item one television left okay and somebody has purchased that. Now you
want to make sure that nobody else is able to see the television because anyway the count has become.. inventory count has
become zero you cannot take an order, right. So you need to... basically make sure that it should be removed from search. Now
remember there was a Search Consumer
that we looked at in the previous section. That Search Consumer would also take care of inventory kind of things. So as soon
as an item goes out of stock it would basically remove that element from in the listing. Basically it will remove the documents
pertaining to that particular set of items. Now there is one problem in this whole thing that we have decided/built. This MySQL can
very quickly become a bottleneck. Way? - Because there are a lot of Orders. So for a company like Amazon they will probably have
like millions of Orders in a day a
couple of millions at-least. And that number could bloat up significantly over an year. And normally you would have to keep
Order Data for a couple of years for auditing purposes, right. So this Database is going massively big. we need to do something
about it. So remember the whole use case of having this MySQL is to take care of atomicity, transaction, basically the acid properties
for orders that are being updated. But for the orders that are not being updated, we don't really need any of these functionalities,
right. So what we can do is once you order
reaches a terminal state, let's just say an order is DELIVERED or CANCELLED, we can basically move it to Cassandra. So
there is something called as Archival Service, which is a cron kind of a thing, which runs every one day/12 hours at some frequency
and it pulls all the orders which have reached the terminal status from this MYSQL and puts it into Cassandra. Now how does it do
that? - so you can see these two services. There's the Order Processing Service and there's a Historical Order Service. These two
are again
component of the whole larger Order Management System which are do certain things of their own. So Order Processing
Service is the service which takes care of the whole lifecycle of the order. So once the order has been placed any changes to that
order happen through this. This will also provide APId to Get the orders if somebody wants to get Order details. Again, similarly,
Historical Order Service will provide all kinds of the APIs to fetch information about Historical Orders. Now Archival Service will
basically query Order Processing Service
to fetch all the orders who have reached terminal status, get all of those details, call Historical Order Service to insert all of
those entries into its Cassandra, and once it has got a success saying I've inserted into Cassandra, replicated in all the right places,
then it will basically call Order Processing Service to delete that. Let's say there is some error, something breaks in between, it will
retry the whole thing and because it is idempotent it doesn't really matter we can replay it as many number of times as we want.
Cool. Now coming to the...once the user has placed the order all of the things will happen, logistics and all can happen behind
the scenes. Now the user can go and see their past orders, right. That will be powered by this Orders View. So Orders View will
basically...there will be a service here which kind of is the back end of Orders View, but that'll clutter the diagram too much. Assume
there's a service here which talks to the Order View, and that talks to these two services. So basically what it will do is, it will basically
query Order Processing Service
to fetch information about all the live orders who are in transit it. It'll call Historical Order Service for all the orders that have
been completed. It will merge the data and it will return back to display on the App or website or wherever, cool. Now why did we use
a Cassandra here? - So Historical Order Service will be used in a very limited set of query patterns. So some of the queries that will
run is: 1) Get information of a particular order ID
2) Get all the orders of a particular user ID
3) possibly get all the orders of a particular seller. But there will be a small number of type of queries that will run on this
Cassandra and Cassandra very good when you have that finite number of queries(by type) but a large set of data. that's the reason
we have used Cassandra. Now coming to whenever an order is placed we want to notify the customer that your order has been
successfully placed. It will be delivered to in 3 days or something of that sort. You'll have to possibly notify a seller, you might have to
notify somebody else
or when something happens. Let's say an order is cancelled by the seller you want to notify the customers. So all of those
notifications would be taken care by this Notification Service. Think of it as an abstraction which can abstract out all various kinds of
notification like SMSs, Emails and all of that. I made another video(on Notification Service Design) which talks about in detail of how
do you implement Notification Service. So you can go have a look at that as well. While ... a user is placing all the orders, all of those
things are happening,
all the events are going into this Kafka on which we are running a Spark Streaming Consumer. It does a lot of things. One of
the... very first things that it does is it can publish a report saying in the last one hour what are our items which have been ordered the
most. Or which category has generated the most amount of revenue like electronics or food or something of that sort, right. So it can
publish out some of the reporting metrics that we want to quickly look at. Other than that, it will also put the whole of the data into
Hadoop cluster. Now we have
real order information of a user which is a very good thing for recommendations So on top of it, will have some spark jobs
which will run some very standard ALS kind of algorithms, which would be predicting that this particular user, given that he's ordered
certain things in the past, what are the next set of things that this person might order. Or because this customer looks like that
customer and that customer ordered so-and-so product, so this customer might also order so-and-so product. So there could be, a lot
of kind of models that you can run to predict what are the right set of recommendations for this user, which would be then send to
this Recommendation Service which we looked at in the last section, it will store it into its Cassandra and the next time when the user
comes in, we'll show them those kind of recommendations. It would not just have like recommendations based on your orders but
also similarity, so again taking the same example if you have ordered a white board you will definitely want to order marker. So all of
those things would also be taken care by this.