Using Kafka with Spark Streaming
For information on how to configure Spark Streaming to receive data from Kafka, see the Spark Streaming + Kafka Integration Guide.
In CDH 5.7 and higher, the Spark connector to Kafka only works with Kafka 2.0 and higher.
Validating Kafka Integration with Spark Streaming
To validate your Kafka integration with Spark Streaming, run the KafkaWordCount example.
If you installed Spark using parcels, use the following command:
/opt/cloudera/parcels/CDH/lib/spark/bin/run-example streaming.KafkaWordCount <zkQuorum> <group> <topics> <numThreads>
If you installed Spark using packages, use the following command:
/usr/lib/spark/bin/run-example streaming.KafkaWordCount <zkQuorum> <group> <topics><numThreads>
Replace the variables as follows:
Note: If multiple applications use the same group and topic, each application receives a subset of
the data.
- <zkQuorum> - ZooKeeper quorum URI used by Kafka (for example, zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181).
- <group> - Consumer group used by the application.
- <topic> - Kafka topic containing the data for the application.
- <numThreads> - Number of consumer threads reading the data. If this is higher than the number of partitions in the Kafka topic, some threads will be idle.

Page generated March 7, 2018.
<< Using Kafka Command-line Tools | ©2016 Cloudera, Inc. All rights reserved | Using Kafka with Flume >> |
Terms and Conditions Privacy Policy |