本文共 14185 字,大约阅读时间需要 47 分钟。
apache kafka
Before we dive in deep into how works and get our hands messy, here's a little backstory.
在我们深入探讨工作原理并弄乱我们的手之前,这里有一些背景故事。
Kafka is named after the acclaimed German writer, Franz Kafka and was created by LinkedIn as a result of the growing need to implement a fault tolerant, redundant way to handle their connected systems and ever growing pool of data.
Kafka以著名的德国作家Franz Kafka的名字命名,由LinkedIn创建,是由于人们越来越需要实施容错,冗余的方式来处理其连接的系统以及不断增长的数据池。
Briefly put, Apache Kafka is a distributed streaming application.
简而言之,Apache Kafka是一个分布式流应用程序。
Let's take a deeper look at what this means:
让我们更深入地了解这意味着什么:
Distribution: Generally, in software architecture, distribution is a measure of how components in a system are able to perform autonomously. Here are a few defining characteristics of distribution.
分布 :通常,在软件体系结构中, 分布是对系统组件如何能够自动执行的一种度量。 以下是一些定义的分布特征。
Several computational entities, workers, nodes or components working to achieve a singular goal.
为实现单一目标而工作的几个计算实体,工作人员,节点或组件。
Segregation of work between components in the system.
系统中组件之间的工作隔离。
Concurrency of components: Each component performs operations in its own order, independent of the order of operations of other components in the system.
组件的并发性:每个组件都以其自己的顺序执行操作,而与系统中其他组件的操作顺序无关。
Lack of a global clock: The functions of the components in the system are not synchronised.
缺少全局时钟:系统中组件的功能不同步。
Independent failure: If a component in the system fails, then its failure should not affect other components.
独立故障:如果系统中的某个组件发生故障,则其故障不应影响其他组件。
Communication between the nodes of the system is achieved through message passing.
系统节点之间的通信是通过消息传递实现的。
Streaming: This refers to the real time and continuous nature of the storage, processing and retrieval of Kafka messages.
流式传输 :这是指Kafka消息的存储,处理和检索的实时性和连续性。
In this guide, we'll see how Kafka achieves this.
在本指南中,我们将了解Kafka如何实现这一目标。
Before we discuss how Kafka works, I think a good place to start would be to explain the context in which Kafka functions.
在讨论Kafka的工作方式之前,我认为一个不错的起点是解释Kafka发挥作用的环境。
With increasing frequency, the microservices software architecture is becoming an indispensible paradigm for software engineering and development. As the scale of applications increase, that is the data they consume, process and output increases, it becomes increasingly important to find fault tolerant, scalable ways to manage both systems and the data they manage.
随着频率的增加,微服务软件体系结构已成为软件工程和开发不可或缺的范例。 随着应用程序规模的增加,即它们消耗,处理和输出的数据的增加,找到容错,可扩展的方式来管理系统及其管理的数据变得越来越重要。
Like the name suggests, a microservice is a piece of software that serves a singular purpose and works with other system components to perform a task.
顾名思义, 微服务是一种服务于单一目的并与其他系统组件一起执行任务的软件。
It is here that Kafka shines. It lends itself well to enabling communication between microservices in a system through the passing of messages between them.
卡夫卡在这里发光。 它非常适合通过微服务之间的消息传递来实现系统中微服务之间的通信。
Kafka achieves messaging through a publish and subscribe system facilitated by the following components:
Kafka通过由以下组件推动的发布和订阅系统实现消息传递:
Topics are how Kafka stores and organises messages across its system and are essentially a collection of messages. Visualise them as logs wherein Kafka stores messages. Topics can be replicated (copies are created) and partitioned (divided). The ability to replicate and partition topics is one of the factors that enable Kafka's fault tolerance and scalability.
主题是Kafka如何在其系统中存储和组织消息的方法,本质上是消息的集合。 将它们可视化为Kafka存储消息的日志。 可以复制主题(创建副本)和分区(划分)。 复制和分区主题的能力是启用Kafka的容错性和可伸缩性的因素之一。
The producer publishes messages to a Kafka topic.
生产者将消息发布到Kafka主题。
The consumer subscribes to a topic(s), reads and processes messages from topic(s).
消费者订阅一个或多个主题,读取和处理来自一个或多个主题的消息。
The broker/server(s) manage the storage of messages in topic(s). Many brokers form a Kafka cluster.
代理/服务器管理主题中消息的存储。 许多经纪人组成了Kafka集群。
Kafka uses to provide the brokers with metadata about the processes running in the system and to facilitate health checking and broker leadership election.
Kafka使用为经纪人提供有关系统中正在运行的进程的元数据,并促进健康检查和经纪人领导权选举。
You'll need Java for this, so go ahead and download the SDK from .
为此,您将需要Java,因此请继续从下载SDK。
Download the Kafka source files from and unzip them to a directory of your choice. As of the writing of this article, the current release version is _0.10.1.0
__ _
从下载Kafka源文件,并将其解压缩到您选择的目录中。 在撰写本文时,当前发行版本是_ 0.10.1.0
__ _
Doing this using the tar
utility is trivial. tar -xzf kafka_2.11-0.10.1.0.tgz
使用tar
实用程序这样做很简单。 tar -xzf kafka_2.11-0.10.1.0.tgz
You can also download Kafka using on macOS, but here, we'll use the method above in order to expose the directory structure of Kafka and give us easy access to its files.
您也可以在macOS上使用下载Kafka,但是在这里,我们将使用上述方法来公开Kafka的目录结构,并让我们轻松访问其文件。
Let's begin by starting up zookeeper by running the following command at the root of the uncompressed folder.
首先,通过在未压缩文件夹的根目录下运行以下命令来启动zookeeper。
bin/zookeeper-server-start.sh config/zookeeper.properties
The indication that zookeeper has come alive is a stream of information output to your terminal window.
动物园管理员活着的指示是向您的终端窗口输出的信息流。
In another shell, let's start a Kafka broker like this:
在另一个外壳中,让我们启动一个像这样的Kafka经纪人:
bin/kafka-server-start.sh config/server.properties
Of note is the fact that we can create multiple Kafka brokers simply by copying the server.properties
file and making a few modifications to the values in the following fields, which must be unique to each broker:
值得注意的是,我们可以简单地通过复制server.properties
文件并对以下字段中的值进行一些修改来创建多个Kafka经纪人,这些字段对于每个经纪人必须是唯一的:
broker.id
broker.id
listeners
: The first broker was started at localhost:9092
. listeners
:第一个代理从localhost:9092
启动。 log.dirs
: The physical location where each broker will store its messages. log.dirs
:每个代理将存储其消息的物理位置。 In another shell, create a Kafka topic called my-kafka-topic
like this.
在另一个外壳中,像这样创建一个名为my-kafka-topic
的Kafka主题。
bin/kafka-topics.sh --create --topic my-kafka-topic --zookeeper localhost:2181 --partitions 1 --replication-factor 1
You should receive confirmation that the topic has been created. A few points of note:
您应该收到确认已创建主题。 注意事项:
bin/kafka-topics.sh
file. 可以根据所需的分区和主题副本数量来更改分区和复制因子。 分区和副本的默认数量设置为1,但这可以在bin/kafka-topics.sh
文件中进行配置。 If you now look at the zookeeper stream we began earlier, you'll notice that the broker has registered our newly created topic.
现在,如果您看一下我们之前开始的zookeeper流,您会注意到代理已注册了我们新创建的主题。
In another shell, let's create a Kafka producer with this:
在另一个外壳中,让我们用以下代码创建一个Kafka生产者:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-kafka-topic
A few points of note:
注意事项:
localhost:9092
because it manages the storage of messages to topics. More information about this port can be found in the config/server.properties
file. 我们调用启动的代理,侦听localhost:9092
因为它管理主题消息的存储。 可以在config/server.properties
文件中找到有关此端口的更多信息。 my-kafka-topic
我们将向刚刚创建的主题my-kafka-topic
生成消息 In yet another shell, run this to start a Kafka consumer:
在另一个shell中,运行以下命令以启动Kafka使用者:
bin/kafka-consumer.sh --bootstrap-server localhost:9092 --topic my-kafka-topic --from-beginning
Now the real fun begins.
现在,真正的乐趣开始了。
In the producer stream, type some messages. Press enter after each one and watch out for what happens in the shell you started the consumer in.
在生产者流中,键入一些消息。 在每个输入框之后按Enter键,并注意启动使用者的外壳中发生了什么。
Drumroll please!
请打鼓!
The messages from the producer are appearing in the consumer thread. I must admit, the first time I saw this at work, I was quite impressed.
来自生产者的消息出现在消费者线程中。 我必须承认,当我第一次在工作中看到这一点时,我印象深刻。
After you catch your breath, let's do a little snooping to find out how Kafka achieves this.
屏住呼吸后,让我们进行一些探查,以了解Kafka如何实现这一目标。
At the root of the downloaded Kafka folder, run:
在下载的Kafka文件夹的根目录下,运行:
cd tmp/kafka-logs/my-kafka-topic-0cat 00000000000000000000.log
Here, we find our produced and consumed messages. This is why Kafka is often called a distributed commit log. It functions as an immutable record of messages.
在这里,我们找到产生和消耗的消息。 这就是为什么Kafka通常被称为分布式提交日志的原因。 它充当消息的不变记录。
Of note is the fact that you can dictate in which physical file the broker saves messages. This setting can be found in the config/server.properties
file.
值得注意的事实是,您可以指定代理将消息保存在哪个物理文件中。 可以在config/server.properties
文件中找到此设置。
In this section, we'll create an Apache Kafka producer in Python and a Kafka consumer in JavaScript.
在本节中,我们将使用Python创建一个Apache Kafka生产者,并使用JavaScript创建一个Kafka使用者。
We'll need a few things
我们需要一些东西
Confirm you've installed both correctly using
确认您已使用正确安装了两者
node --version&& virtualenv --version
You should see similar results
您应该会看到类似的结果
v6.2.215.0.3
Next, create the following folder structure.
接下来,创建以下文件夹结构。
├── scotch-kafka ├── producer ├── producer.py ├── consumer ├── consumer.js
Make sure you've started zookeeper
and a broker
as above.
确保您已经按照上述步骤启动了zookeeper
和broker
。
Make a virtual environment and while inside it, install the Python kafka module by running:
创建一个虚拟环境,并在其中运行以下命令,以安装Python kafka模块:
pipinstall kafka-python
Write the following into the producer.js
file
将以下内容写入producer.js
文件
from kafka import KafkaProducerimport json# Create an instance of the Kafka producerproducer = KafkaProducer(bootstrap_servers='localhost:9092', value_serializer=lambda v: json.dumps(v).encode('utf-8'))# Call the producer.send method with a producer-recordfor message in range(5): producer.send('kafka-python-topic', { 'values': message})
Let's break down the producer instantiation.
让我们分解生产者实例。
A few items are required for Kafka to create an instance of a producer. These are usually called producer configuration properties.
Kafka创建生产者实例需要一些项目。 这些通常称为生产者配置属性。
bootstrap_servers
要连接的代理列表:在bootstrap_servers
下给出 Of note is that, Kafka producer instances can only send Producer Record values that match the key and value serialisers types the producer is configured with.
值得注意的是,Kafka生产者实例只能发送与生产者配置了键和值序列化器类型匹配的生产者记录值。
To send the messages, referred to in Kafka terminology, as producer records, we need to call the producer.send
function and supply both a topic
and value
at minimum.
要发送以Kafka术语称为生产者记录的消息,我们需要调用producer.send
函数,并至少提供topic
和value
。
Some optional values we could provide are:
我们可以提供的一些可选值是:
log.message.timestamp.type
setting in the server.properties
file. 时间戳:生产者发布消息的时间。 可以使用server.properties
文件中的log.message.timestamp.type
设置来配置此设置。 Note that the topic we're using has the name kafka-python-topic
, so you'll have to create a topic of the same name.
请注意,我们正在使用的主题名称为kafka-python-topic
,因此您必须创建相同名称的主题。
bin/kafka-topics.sh --create --topic kafka-python-topic --zookeeper localhost:2181 --partitions 1 --replication-factor 1
In the scotch-kafka
folder, run:
在scotch-kafka
文件夹中,运行:
npm install no-kafka
Write the following into the consumer.js
file.
将以下内容写入consumer.js
文件。
const Kafka = require('no-kafka');// Create an instance of the Kafka consumerconst consumer = new Kafka.SimpleConsumervar data = function (messageSet, topic, partition) { messageSet.forEach(function (m) { console.log(topic, partition, m.offset, m.message.value.toString('utf8')); });};// Subscribe to the Kafka topicreturn consumer.init().then(function () { return consumer.subscribe('kafka-python-topic', data);});
In order to create an instance of a consumer, Kafka requires the following:
为了创建使用者的实例,Kafka需要以下条件:
no-kafka
module sets a default address of localhost:9092
要连接的经纪人列表。 在这里, no-kafka
模块将默认地址设置为localhost:9092
At the root of the project, run the following to start the consumer
在项目的根目录下,运行以下命令以启动使用者
node consumer/consumer.js
and in another shell,
在另一个外壳中
python producer/producer.py
The consumer output should look something like this:
消费者输出应如下所示:
kafka-python-topic 0 216{ "values": 0}kafka-python-topic 0 217 { "values": 1}kafka-python-topic 0 218 { "values": 2}kafka-python-topic 0 219 { "values": 3}kafka-python-topic 0 220 { "values": 4}
Congratulations, you've just built your first application using Apache Kafka. Not a mean feat at all.
恭喜,您已经使用Apache Kafka构建了第一个应用程序。 完全不是刻薄的壮举。
You'll find all the code we've used .
您会找到我们使用的所有代码。
I hope this tutorial will act as a foundation from which to learn more about Kafka, it's use cases and the many advantages it has over traditional messaging systems.
我希望本教程将成为进一步了解Kafka,使用案例以及与传统消息传递系统相比具有许多优点的基础。
Have a minute? Give me some feedback. Drop me a line in the comment box below.
有空吗 给我一些反馈。 在下面的评论框中给我一行。
翻译自:
apache kafka
转载地址:http://oguwd.baihongyu.com/