Introduction to Kafka Stream Processing in Python

Let us first provide a quick introduction to Apache Kafka for those who are not aware of this technology.

Kafka, in a nutshell, is an open-source distributed event streaming platform by Apache. Kafka allows us to build and manage real-time data streaming pipelines.

For the sake of this article, you need to be aware of 4 main Kafka concepts.

  1. TopicAll Kafka messages pass through topics. A topic is simply a way for us to organize and group a collection of messages. We can have multiple topics, each with a unique name.
  2. Consumers: A consumer is an entity within Kafka (commonly referred to as a subscriber) that is responsible for connecting (or subscribing) to a particular topic to read its messages.
  3. Producers: A producer is an entity within Kafka (commonly referred to as a publisher) that is responsible for writing (or publishing) messages to a particular topic.
  4. Consumer Groups: In Kafka, we can have multiple topics with multiple consumers subscribed to them. We can also have multiple services (i.e. external applications) subscribed to the same topics. To prevent overlaps and data issues, every service can have its own consumer group to keep track of which messages were already processed (commonly referred to as offsets).

TL;DR

Apache Kafka is a message-passing system.

Messages are organized in topics.

Producers send data to topics.

Consumers read data from topics.

Consumer groups manage a set of consumers.

Setting up a Local Kafka Instance

wurstmeister provides a really good repository for running Kafka in a docker container.

> git clone https://github.com/wurstmeister/kafka-docker.git 
> cd kafka-docker
> docker-compose up -d
# to scale your Kafka cluster we can use:
# docker-compose scale kafka=<NUMBER_OF_KAFKA_BROKERS_YOU_WANT>

If we want our Kafka cluster to be accessible externally (i.e. from your terminal or services), we need to update the docker-compose.yml file.

vim docker-compose.yml and update it with the following:

version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper:3.4.6
    ports:
     - "2181:2181"
  kafka:
    build: .
    ports:
     - "9092:9092"
    expose:
     - "9093"
    environment:
      KAFKA_ADVERTISED_LISTENERS: INSIDE://kafka:9093,OUTSIDE://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
      KAFKA_LISTENERS: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
      KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
     - /var/run/docker.sock:/var/run/docker.sock

and then docker-compose up -d in a terminal.

The above will open the Kafka listeners on localhost:9092.

Once the docker image is built, we should see that the Kafka instance is running.

Exit mobile version