Cassandra / Kafka Support in EC2/AWS.
Kafka Training, Kafka Consulting, Kafka Tutorial
Working with Kafka Producers
Kafka Producer Working with producers in
Java
Advanced Details and advanced topics
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Objectives Create Producer
Cover advanced topics regarding Java Kafka
Consumers
Custom Serializers
Custom Partitioners
Batching
Compression
Retries and Timeouts
2
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Kafka Producer
Kafka client that publishes records to Kafka cluster
Thread safe
Producer has pool of buffer that holds to-be-sent records
background I/O threads turning records into request
bytes and transmit requests to Kafka
Close producer so producer will not leak resources
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Kafka Producer Send, Acks and Buffers
send() method is asynchronous
adds the record to output buffer and return right away
buffer used to batch records for efficiency IO and compression
acks config controls Producer record durability. all" setting
ensures full commit of record, and is most durable and least fast
setting
Producer can retry failed requests
Producer has buffers of unsent records per topic partition (sized at
[Link])
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Kafka Producer: Buffering and batching
Kafka Producer buffers are available to send immediately as fast as broker can
keep up (limited by inflight [Link])
To reduce requests count, set [Link] > 0
wait up to [Link] before sending or until batch fills up whichever comes
first
Under heavy load [Link] not met, under light producer load used to
increase broker IO throughput and increase compression
[Link] controls total memory available to producer for buffering
If records sent faster than they can be transmitted to Kafka then this buffer
gets exceeded then additional send calls block. If period blocks
([Link]) after then Producer throws a TimeoutException
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Acks
Producer Config property acks
(default all)
Write Acknowledgment received count required from
partition leader before write request deemed complete
Controls Producer sent records durability
Can be all (-1), none (0), or leader (1)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Acks 0 (NONE)
acks=0
Producer does not wait for any ack from broker at all
Records added to the socket buffer are considered sent
No guarantees of durability - maybe
Record Offset returned is set to -1 (unknown)
Record loss if leader is down
Use Case: maybe log aggregation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Acks 1 (LEADER)
acks=1
Partition leader wrote record to its local log but responds
without followers confirmed writes
If leader fails right after sending ack, record could be lost
Followers might have not replicated the record
Record loss is rare but possible
Use Case: log aggregation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Acks -1 (ALL)
acks=all or acks=-1
Leader gets write confirmation from full set of ISRs before
sending ack to producer
Guarantees record not be lost as long as one ISR remains alive
Strongest available guarantee
Even stronger with broker setting [Link] (specifies
the minimum number of ISRs that must acknowledge a write)
Most Use Cases will use this and set a [Link] > 1
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer config Acks
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Buffer Memory Size
Producer config property: [Link]
default 32MB
Total memory (bytes) producer can use to buffer records
to be sent to broker
Producer blocks up to [Link] if [Link] is
exceeded
if it is sending faster than the broker can receive,
exception is thrown
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Batching by Size
Producer config property: [Link]
Default 16K
Producer batch records
fewer requests for multiple records sent to same partition
Improved IO throughput and performance on both producer and server
If record is larger than the batch size, it will not be batched
Producer sends requests containing multiple batches
batch per partition
Small batch size reduce throughput and performance. If batch size is too big,
memory allocated for batch is wasted
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Batching by Time and Size - 1
Producer config property: [Link]
Default 0
Producer groups together any records that arrive before
they can be sent into a batch
good if records arrive faster than they can be sent out
Producer can reduce requests count even under
moderate load using [Link]
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Batching by Time and Size - 2
[Link] adds delay to wait for more records to build up so
larger batches are sent
good brokers throughput at cost of producer latency
If producer gets records who size is [Link] or more for a
brokers leader partitions, then it is sent right away
If Producers gets less than [Link] but [Link] interval has
passed, then records for that partition are sent
Increase to improve throughput of Brokers and reduce broker
load (common improvement)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Compressing Batches
Producer config property: [Link]
Default 0
Producer compresses request data
By default producer does not compress
Can be set to none, gzip, snappy, or lz4
Compression is by batch
improves with larger batch sizes
End to end compression possible if Broker config [Link] set to
producer. Compressed data from producer sent to log and consumer by broker
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Batching and Compression Example
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Custom Serializers
You dont have to use built in serializers
You can write your own
Just need to be able to convert to/fro a byte[]
Serializers work for keys and values
[Link] and [Link]
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Custom Serializers Config
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Custom Serializer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPrice
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Broker Follower Write Timeout
Producer config property: [Link]
Default 30 seconds (30,000 ms)
Maximum time broker waits for confirmation from
followers to meet Producer acknowledgment
requirements for ack=all
Measure of broker to broker latency of request
30 seconds is high, long process time is indicative of
problems
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Request Timeout
Producer config property: [Link]
Default 30 seconds (30,000 ms)
Maximum time producer waits for request to complete
to broker
Measure of producer to broker latency of request
30 seconds is very high, long request time is an indicator
that brokers cant handle load
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Retries
Producer config property: retries
Default 0
Retry count if Producer does not get ack from Broker
only if record send fail deemed a transient error (API)
as if your producer code resent record on failed attempt
timeouts are retried, [Link] (default to 100 ms)
to wait after failure before retry
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Retry, Timeout, Back-off Example
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Partitioning
Producer config property: [Link]
[Link]
Partitioner class implements Partitioner interface
Default Partitioner partitions using hash of key if record
has key
Default Partitioner partitions uses round-robin if record
has no key
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Configuring Partitioner
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPricePartitioner
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPricePartitioner partition()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPricePartitioner
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Interception
Producer config property: [Link]
empty (you can pass an comma delimited list)
interceptors implementing ProducerInterceptor
interface
intercept records producer sent to broker and after acks
you could mutate records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer - Interceptor Config
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer ProducerInterceptor
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor onSend
Output
onSend topic=stock-prices2 key=UBER value=StockPrice{dollars=737,
cents=78, name='UBER'} null
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor onAck
Output
onAck topic=stock-prices2, part=0, offset=18360
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor the rest
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer send() Method
Two forms of send with callback and with no callback both return Future
Asynchronously sends a record to a topic
Callback gets invoked when send has been acknowledged.
send is asynchronous and return right away as soon as record has added to
send buffer
Sending many records at once without blocking for response from Kafka
broker
Result of send is a RecordMetadata
record partition, record offset, record timestamp
Callbacks for records sent to same partition are executed in order
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer send() Exceptions
InterruptException - If the thread is interrupted while
blocked (API)
SerializationException - If key or value are not valid
objects given configured serializers (API)
TimeoutException - If time taken for fetching metadata or
allocating memory exceeds [Link], or getting acks
from Broker exceed [Link], etc. (API)
KafkaException - If Kafka error occurs not in public API.
(API)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Using send method
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer flush() method
flush() method sends all buffered records now (even if
[Link] > 0)
blocks until requests complete
Useful when consuming from some input system and
pushing data into Kafka
flush() ensures all previously sent messages have been
sent
you could mark progress as such at completion of flush
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer close()
close() closes producer
frees resources (threads and buffers) associated with producer
Two forms of method
both block until all previously sent requests complete or duration
passed in as args is exceeded
close with no params equivalent to close(Long.MAX_VALUE,
[Link]).
If producer is unable to complete all requests before the timeout
expires, all unsent requests fail, and this method fails
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Orderly shutdown using close
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Wait for clean close
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer partitionsFor() method
partitionsFor(topic) returns meta data for partitions
publicList<PartitionInfo>partitionsFor(Stringtopic)
Get partition metadata for give topic
Produce that do their own partitioning would use this
for custom partitioning
PartitionInfo(Stringtopic, intpartition, Nodeleader,
Node[]replicas, Node[]inSyncReplicas)
Node(intid, Stringhost, intport, optional String rack)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer metrics() method
metrics() method get map of metrics
publicMap<MetricName,? extends Metric>metrics()
Get the full set of producer metrics
MetricName(
Stringname,
Stringgroup,
String description,
Map<String,String>tags
)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Metrics [Link]()
Call [Link]()
Prints out metrics to log
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Metrics [Link]() output
Metric producer-metrics, record-queue-time-max, 508.0,
The maximum time in ms record batches spent in the record accumulator.
[Link].721 [pool-1-thread-9] INFO [Link] -
Metric producer-node-metrics, request-rate, 0.025031289111389236,
The average number of requests sent per second.
[Link].721 [pool-1-thread-9] INFO [Link] -
Metric producer-metrics, records-per-request-avg, 205.55263157894737,
The average number of records per request.
[Link].722 [pool-1-thread-9] INFO [Link] -
Metric producer-metrics, record-size-avg, 71.02631578947368,
The average record size
[Link].722 [pool-1-thread-9] INFO [Link] -
Metric producer-node-metrics, request-size-max, 56.0,
The maximum size of any request sent in the window.
[Link].723 [pool-1-thread-9] INFO [Link] -
Metric producer-metrics, request-size-max, 12058.0,
The maximum size of any request sent in the window.
[Link].723 [pool-1-thread-9] INFO [Link] -
Metric producer-metrics, compression-rate-avg, 0.41441360272859273,
The average compression rate of record batches.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Metrics via JMX
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPrice Producer Java Example
Lab StockPrice
Producer Java Example
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPrice App to demo Advanced Producer
StockPrice - holds a stock price has a name, dollar, and cents
StockPriceKafkaProducer - Configures and creates
KafkaProducer<String, StockPrice>, StockSender list, ThreadPool
(ExecutorService), starts StockSender runnable into thread pool
StockAppConstants - holds topic and broker list
StockPriceSerializer - can serialize a StockPrice into byte[]
StockSender - generates somewhat random stock prices for a
given StockPrice name, Runnable, 1 thread per StockSender
Shows using KafkaProducer from many threads
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPrice domain object
has name
dollars
cents
converts
itself to
JSON
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPriceKafkaProducer
Import classes and setup logger
Create createProducer method to create KafkaProducer instance
Create setupBootstrapAndSerializers to initialize bootstrap
servers, client id, key serializer and custom serializer
(StockPriceSerializer)
Write main() method - creates producer, create StockSender list
passing each instance a producer, creates a thread pool so every
stock sender gets it own thread, runs each stockSender in its
own thread
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPriceKafkaProducer imports, createProducer
Import classes and
setup logger
createProducer
used to create a
KafkaProducer
createProducer()
calls
setupBoostrapAnd
Serializers()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Configure Producer Bootstrap and Serializer
Create setupBootstrapAndSerializers to initialize bootstrap servers,
client id, key serializer and custom serializer (StockPriceSerializer)
StockPriceSerializer will serialize StockPrice into bytes
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
[Link]()
main method - creates producer,
create StockSender list passing each instance a producer
creates a thread pool (executorService)
every StockSender runs in its own thread
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockAppConstants
topic name for Producer example
List of bootstrap servers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
[Link]
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPriceSerializer
Converts StockPrice into byte array
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockSender
Generates random stock prices for a given StockPrice
name,
StockSender is Runnable
1 thread per StockSender
Shows using KafkaProducer from many threads
Delays random time between delayMin and delayMax,
then sends random StockPrice between stockPriceHigh
and stockPriceLow
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockSender imports, Runnable
Imports Kafka Producer, ProducerRecord, RecordMetadata, StockPrice
Implements Runnable, can be submitted to ExecutionService
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockSender constructor
takes a topic, high & low stockPrice, producer, delay min & max
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockSender run()
In loop, creates random record, send record, waits random time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockSender createRandomRecord
createRandomRecord uses randomIntBetween
creates StockPrice and then wraps StockPrice in ProducerRecord
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockSender displayRecordMetaData
Every 100 records displayRecordMetaData gets called
Prints out record info, and recordMetadata info:
key, JSON value, topic, partition, offset, time
uses Future from call to [Link]()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it
Run ZooKeeper
Run three Brokers
run [Link]
Run StockPriceKafkaProducer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run scripts
run ZooKeeper from ~/kafka-training
use bin/[Link] to create topic
use bin/[Link] to delete topic
use bin/[Link] to run
Kafka Broker 0
Config is under directory called config
use bin/[Link] to run
Kafka Broker 1 [Link] is for Kafka Broker 0
[Link] is for Kafka Broker 1
[Link] is for Kafka Broker 2
use bin/[Link] to run
Kafka Broker 2
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run All 3 Brokers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run [Link] script
Name of the topic is
stock-prices
Three partitions
Replication factor of
three
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run StockPriceKafkaProducer
Run StockPriceKafkaProducer from the IDE
You should see log messages from StockSender(s) with
StockPrice name, JSON value, partition, offset, and time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Using flush and close
Lab Adding an orderly Using flush and close
shutdown flush and close
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Shutdown Producer nicely
Handle ctrl-C shutdown from Java
Shutdown thread pool and wait
Flush producer to send any outstanding batches if using
batches ([Link]())
Close Producer ([Link]()) and wait
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Nice Shutdown [Link]()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Restart Producer then shut it down
Add shutdown hook
Start StockPriceKafkaProducer
Now stop it (CTRL-C or hit stop button in IDE)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Lab Configuring
Producer Durability
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Set default acks to all
Set defaults acks to all (this is the default)
This means that all ISRs in-sync replicas have to respond for
producer write to go through
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Note Kafka Broker Config
At least this many in-sync replicas (ISRs) have to respond for
producer to get ack
NOTE: We have three brokers in this lab, all three have to be up
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer. Kill 1 Broker
If not already, startup ZooKeeper
Startup three Kafka brokers
using scripts described earlier
From the IDE run StockPriceKafkaProducer
From the terminal kill one of the Kafka Brokers
Now look at the logs for the StockPriceKafkaProducer, you should see
Caused by:
[Link]
Messages are rejected since there are fewer in-sync replicas than required.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
What happens when we shut one down?
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Why did the send fail?
ProducerConfig.ACKS_CONFIG (acks config for
producer) was set to all
Expects leader to only give successful ack after all
followers ack the send
Broker Config [Link] set to 3
At least three in-sync replicas must respond before
send is considered successful
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer. Kill 1 Broker
If not already, startup ZooKeeper
Ensure all three Kafka Brokers are running if not running
Change StockPriceKafkaProducer acks config to 1
[Link](ProducerConfig.ACKS_CONFIG, 1"); (leader
sends ack after write to log)
From the IDE run StockPriceKafkaProducer
From the terminal kill one of the Kafka Brokers
StockPriceKafkaProducer runs normally
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
What happens when we shutdown acks 1 this time?
Nothing
happens!
It continues
to work
because only
the leader
Which type of application would you only want acks set to 1? has to ack its
write
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Why did the send not fail for acks 1?
ProducerConfig.ACKS_CONFIG (acks config for
producer) was set to 1
Expects leader to only give successful ack after it
writes to its log
Replicas still get replication but leader does not wait
for replication
Broker Config [Link] is still set to 3
This config only gets looked at if acks=all
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Try describe topics before and after
Try this last one again
Stop a server while producer is running
Run [Link] (shown above)
Rerun server you stopped
Run [Link] again
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Describe-Topics After Each Stop/Start
All 3 brokers
running
1 broker down.
Leader 1has 2
partitions
All 3 brokers
running.
Look at Leader 1
All 3 brokers
running
after a few minutes
while
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Review Question
How would you describe the above?
How many servers are likely running out of the three?
Would the producer still run with acks=all? Why or Why not?
Would the producer still run with acks=1? Why or Why not?
Would the producer still run with acks=0? Why or Why not?
Which broker is the leader of partition 1?
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Retry with acks = 0
Run the last example again (servers, and producer)
Run all three brokers then take one away
Then take another broker away
Run describe-topics
Take all of the brokers down and continue to run the producer
What do you think happens?
When you are done, change acks back to acks=all
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Using Kafka built-in Producer Metrics
Adding Producer Metrics
and Metrics
Replication Verification
Replication Verification
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Objectives of lab
Setup Kafka Producer Metrics
Use replication verification command line tool
Change [Link] for broker and observer
metrics and replication verification
Change [Link] for topic and observer
metrics and replication verification
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Create Producer Metrics Monitor
Create a class called MetricsProducerReporter that is
Runnable
Pass it a Kafka Producer
Call [Link]() every 10 seconds in a while loop
from run method, and print out the MetricName and
Metric value
Submit MetricsProducerReporter to the ExecutorService
in the main method of StockPriceKafkaProducer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Create MetricsProducerReporter (Runnable)
Implements Runnable takes a producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Create MetricsProducerReporter (Runnable)
Call [Link]() every 10 seconds in a while loop from run method, and print out
MetricName and Metric value
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Create MetricsProducerReporter (Runnable)
Increase thread pool size by 1 to fit metrics reporting
Submit instance of MetricsProducerReporter to the ExecutorService in
the main method of StockPriceKafkaProducer (and pass new
instance a producer)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer.
If not already, startup ZooKeeper
Startup three Kafka brokers
using scripts described earlier
From the IDE run StockPriceKafkaProducer (ensure
acks are set to all first)
Observe metrics which print out every ten seconds
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Look at the output
[Link].858 [pool-1-thread-1] INFO [Link] -
Metric producer-node-metrics, outgoing-byte-rate, 1.8410309773473144,
[Link].858 [pool-1-thread-1] INFO [Link] -
Metric producer-topic-metrics, record-send-rate, 975.3229844767151,
[Link].858 [pool-1-thread-1] INFO [Link] -
Metric producer-node-metrics, request-rate, 0.040021611670301965,
The average number of requests sent per second.
[Link].858 [pool-1-thread-1] INFO [Link] -
Metric producer-node-metrics, incoming-byte-rate, 7.304382629577747,
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Improve output with some Java love
Keep a set of only the metrics we want
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Improve output: Filter metrics
Use Java 8 Stream to filter and sort metrics
Get rid of metric values that are NaN, Infinite numbers and 0s
Sort map my converting it to TreeMap<String, MetricPair>
MetricPair is helper class that has a Metric and a MetricName
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Improve output: Pretty Print
Give a nice format so we can read metrics easily
Give some space and some easy indicators to find in log
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Pretty Print Metrics Output
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Validate Partition Replication
Checks lag every 5 seconds for stock-price
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer. Kill Brokers
If not already, startup ZooKeeper and three Kafka
brokers
Run StockPriceKafkaProducer (ensure acks are set to all
first)
Start and stop different Kafka Brokers while
StockPriceKafkaProducer runs, observe metrics, observe
changes, Run replication verification in one terminal and
check topics stats in another
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Observe Partitions Getting Behind
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Recover
Output
of
2nd Broker
Recovering
Replication Verification
Describe Topics
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer. Kill Brokers
Stop all Kafka Brokers (Kafka servers)
Change [Link]=3 to [Link]=2
config files for broker are in lab directory under config
Startup ZooKeeper if needed and three Kafka brokers
Run StockPriceKafkaProducer (ensure acks are set to all
first)
Start and stop different Kafka Brokers while
StockPriceKafkaProducer runs,
Observe metrics, observe changes
Run replication verification in one terminal and check
topics stats in another with [Link] in another
terminal
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Expected outcome
Producer will work even if one broker goes down.
Producer will not work if two brokers go down because
[Link]=2, two replicas have to be up besides leader
Since Producer
can run with 1
down broker,
the replication
lag can get
really far
behind.
When you
startup failed
broker, it
catches up really
fast.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Change [Link] back
Shutdown all brokers
Change back [Link]=3
Broker config for servers
Do this for all of the servers
Start ZooKeeper if needed
Start brokers back up
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Modify bin/[Link]
Modify bin/create-
[Link]
add --config
[Link]=2
Add this as param to
[Link]
Run [Link]
Run [Link]
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Recreate Topic
Run [Link]
Run [Link]
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer. Kill Brokers
Stop all Kafka Brokers (Kafka servers)
Startup ZooKeeper if needed and three Kafka brokers
Run StockPriceKafkaProducer (ensure acks are set to all first)
Start and stop different Kafka Brokers while
StockPriceKafkaProducer runs,
Observe metrics, observe changes
Run replication verification in one terminal and check topics
stats in another with [Link] in another terminal
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Expected Results
The [Link] on the Topic config overrides
the [Link] on the Broker config
In this setup, you can survive a single node failure
but not two (output below is recovery)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Lab Batching Records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Objectives
Disable batching and observer metrics
Enable batching and observe metrics
Increase batch size and linger and observe metrics
Run consumer to see batch sizes change
Enable compression, observe results
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
SimpleStockPriceConsumer
We added a SimpleStockPriceConsumer to consume
StockPrices and display batch lengths for poll()
We wont cover in detail just quickly since this is a
Producer lab not a Consumer lab. :)
Run this while you are running the
StockPriceKafkaProducer
While you are running SimpleStockPriceConsumer with
various batch and linger config observe output of Producer
metrics and StockPriceKafkaProducer output
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
SimpleStockPriceConsumer
Similar to other
Consumer
examples so far
Subscribes to
stock-prices
topic
Has custom
serializer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
[Link]
Drains topic; Creates map of current stocks; Calls displayRecordsStatsAndStocks()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
[Link](
Prints out size of each partition read and total record count
Prints out each stock at its current price
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockDeserializer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Disable batching
Start by
disabling
batching
This turns
batching off
Run this
Check
Consumer
and stats
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Metrics No Batch
Records per poll averages around 4
Batch Size is 80
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Set batching to 16K
Set the batch size to 16K
Run this
Check Consumer and stats
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Set batching to 16K Results
16K Batch Size No Batch
Consumer Records per poll averages around 7.5
Look at the
Batch Size is now 136.02
record-send-rate
59% more batching
200% faster!
Look how much the request queue time shrunk!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Set batching to 16K and linger to 10ms
Set the batch size to 16K and linger to 10ms
Run this
Check Consumer and Stats
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Results batch size 16K linger 10ms
16K No Batch
16K and 10ms linger
Consumer Records per poll averages around 17 Look at the
Batch Size is now 796 record-send-rate
585% more batching went down but higher
than start
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Set batching to 64K and linger to 1 second
Set the batch size to 64K and linger to 1 second
Run this
Check Consumer and Stats
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Results batch size 64K linger 1s
16K/10ms 16K No Batch
64K/1s
Consumer Records per poll averages around 500 Look at the
Batch Size is now 40K network-io-rate
Record Queue Time is very high :(
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Set batching to 64K and linger to 100 ms
Set the batch size to 64K and linger to 100ms second
Run this
Check Consumer and Stats
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Results batch size 64K linger 100ms
64K/100ms 64K/1s 16K/10ms 16K
64K batch size
100ms Linger
has the highest record-send-rate!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Turn on compression snappy, 50ms, 64K
Enable compression
Set the batch size to 64K and linger to 50ms second
Run this
Check Consumer and Stats
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Results batch size Snappy linger 10ms
64K/100ms Snappy 64K/50ms
Snappy 64K/50ms
has the highest record-send-rate and
1/2 the queue time!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Lab Adding Retries
and Timeouts
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Objectives
Setup timeouts
Setup retries
Setup retry back off
Setup inflight messages to 1 so retries dont store
records out of order
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer. Kill Brokers
Startup ZooKeeper if needed and three Kafka brokers
Modify StockPriceKafkaProducer to configure retry,
timeouts, in-flight message count and retry back off
Run StockPriceKafkaProducer
Start and stop any two different Kafka Brokers while
StockPriceKafkaProducer runs,
Notice retry messages in log of StockPriceKafkaProducer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Modify StockPriceKafkaProducer
to configure retry, timeouts, in-flight message count and retry back off
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Expected output after 2 broker shutdown
Run all.
Kill any two servers.
Look for retry messages.
Restart them and see that it
recovers
Also use replica verification to see
when broker catches up
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
WARN Inflight Message Count
MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION
"[Link]"
max number of unacknowledged requests client sends on a single
connection before blocking
If >1 and
failed sends, then
Risk message re-ordering on partition during retry attempt
Depends on use but for StockPrices not good, you should pick retries > 1
or inflight > 1 but not both. Avoid duplicates. :)
June 2017 release might fix this with sequence from producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Lab Write
ProducerInterceptor
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Objectives
Setup an interceptor for request sends
Create ProducerInterceptor
Implement onSend
Implement onAcknowledge
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Interception
Producer config property: [Link]
empty (you can pass an comma delimited list)
interceptors implementing ProducerInterceptor
interface
intercept records producer sent to broker and after acks
you could mutate records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer - Interceptor Config
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
KafkaProducer ProducerInterceptor
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor onSend
Output
onSend topic=stock-prices2 key=UBER value=StockPrice{dollars=737,
cents=78, name='UBER'} null
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor onAck
Output
onAck topic=stock-prices2, part=0, offset=18360
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor the rest
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Run it. Run Servers. Run Producer.
Startup ZooKeeper if needed
Start or restart Kafka brokers
Run StockPriceKafkaProducer
Look for log message from ProducerInterceptor
in output
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
ProducerInterceptor Output
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Lab Write Custom
Partitioner
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Objectives
Create StockPricePartitioner
Implements interface Partitioner
Implement partition() method
Implement configure() method with importantStocks
property
Configure new Partitioner in Producer config with property
ProducerConfig.INTERCEPTOR_CLASSES_CONFIG
Pass config property importantStocks
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Partitioning
Producer config property: [Link]
[Link]
Partitioner class implements Partitioner interface
partition() method takes topic, key, value, and cluster
returns partition number for record
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPricePartitioner
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
[Link]()
Implement configure() method
with importantStocks property
importantStocks get added to importantStocks HashSet
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPricePartitioner partition()
IMPORTANT STOCK: If stockName is in
importantStocks HashSet then put it in partitionNum =
(partitionCount -1) (last partition)
REGULAR STOCK: Otherwise if not in importantStocks
then not important use the absolute value of the hash of
the stockName modulus partitionCount -1 as the
partition to send the record
partitionNum = abs([Link]()) % (partitionCount - 1)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
StockPricePartitioner partition()
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Producer Config: Configuring Partitioner
Configure new Partitioner in Producer config with property ProducerConfig.INTERCEPTOR_CLASSES_CONFIG
Pass config property importantStocks
importantStock are the ones that go into priority queue
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial
Review of lab work
You implemented custom ProducerSerializer
You tested failover configuring broker/topic [Link], and acks
You implemented batching and compression and used metrics to see how
it was or was not working
You implemented retires and timeouts, and tested that it worked
You setup max inflight messages and retry back off
You implemented a ProducerInterceptor
You implemented a custom partitioner to implement a priority queue for
important stocks