1、生产者的流程架构

生产者主体逻辑整个生产者客户端由两个线程协调运行，这两个线程分别为主线程和Sender 线程（发送线程）。

1.1 主线程：

在主线程中由KafkaProducer 创建消息，然后通过可能的拦截器、序列化器和分区器的作用之后缓存到消息累加器（ RecordAccumulator ，也称为消息收集器〉中。

1.2 Sender线程：

Sender 线程负责从RecordAccumulator 中获取消息并将其发送到Kafka

2、拦截器主线程核心功能

2.1、拦截器：

生产者拦截器可以用来在消息发送前做一些准备工作，可以修改消息发送的内容，但是拦截器的所有方法都不会对外抛出异常。

拦截器常见使用场景：

1、按照某个规则过滤不符合要求的消息

2、修改消息的内容等

3、统计类工作

查看org.apache.kafka.clients.producer.KafkaProducer 类的send() 方法，拦截器的处理是通过责任链设计模式去注入，拦截器是在消息发送第一步处理的逻辑。

自定义生产者拦截器，只需要实现org.apache.kafka.clients.producer.ProducerInterceptor接口和对应的方法

1、onSend() 方法：可以对消息进行定制化的操作，但是一般不修改ProducerRecord的topic、key和partition等信息，如果修改可能会影响到分区计算、broker端日志压缩功能

2、onAcknowledgement() 方法：

a、消息被应答之前或消息发送失败调用生产者拦截器的onAcknowledgement()方法，优先于用户设定的Callback之前执行。
b、这个方法通常在Producer的后台I/O线程中执行，所以这个方法的逻辑越简单越好，否则，会影响到消息发送的效率和速度。

3、close() 方法：关闭拦截器的时候，清理资源

public interface ProducerInterceptor<K, V> extends Configurable, AutoCloseable {// 可以对消息进行定制化的操作，但是一般不修改ProducerRecord的topic、key和partition等信息，如果修改可能会影响到分区计算、broker端日志压缩功能ProducerRecord<K, V> onSend(ProducerRecord<K, V> record);// 消息被应答之前或消息发送失败调用生产者拦截器的onAcknowledgement()方法，优先于用户设定的Callback之前执行// 这个方法通常在Producer的后台I/O线程中执行，所以这个方法的逻辑越简单越好，否则，会影响到消息发送的效率和速度// 调用方将忽略此方法引发的任何异常void onAcknowledgement(RecordMetadata metadata, Exception exception);// 关闭拦截器的时候，清理资源void close();
}// 在拦截器中所抛出的异常都会记录到日志中，不会向上传递,所以拦截器所产生的异常，不会影响到主流程。

拦截器注入位置

1、onSend() 拦截器注入在调用KafkaProducer的send()方法第一步执行

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {// intercept the record, which can be potentially modified; this method does not throw exceptionsProducerRecord<K, V> interceptedRecord = this.interceptors.onSend(record);return doSend(interceptedRecord, callback);}

2、onAcknowledgement() 拦截器注入分两种场景

a、消息发送完成 KafkaProducer 的onCompletion()方法

b、消息发送异常 this.interceptors.onSendError(record, appendCallbacks.topicPartition(), e);


public void onCompletion(RecordMetadata metadata, Exception exception) {if (metadata == null) {metadata = new RecordMetadata(topicPartition(), -1, -1, RecordBatch.NO_TIMESTAMP, -1, -1);}this.interceptors.onAcknowledgement(metadata, exception);if (this.userCallback != null)this.userCallback.onCompletion(metadata, exception);
}

消息发送异常的处理源代码： org.apache.kafka.clients.producer.internals.ProducerInterceptors 拦截器中 onSendError()中，也调用应答的处理。

onSendError的处理逻辑：

public void onSendError(ProducerRecord<K, V> record, TopicPartition interceptTopicPartition, Exception exception) {for (ProducerInterceptor<K, V> interceptor : this.interceptors) {try {if (record == null && interceptTopicPartition == null) {interceptor.onAcknowledgement(null, exception);} else {if (interceptTopicPartition == null) {interceptTopicPartition = extractTopicPartition(record);}interceptor.onAcknowledgement(new RecordMetadata(interceptTopicPartition, -1, -1,RecordBatch.NO_TIMESTAMP, -1, -1), exception);}} catch (Exception e) {// do not propagate interceptor exceptions, just loglog.warn("Error executing interceptor onAcknowledgement callback", e);}}}

2.2、序列化：

序列化的原因：Kafka服务端接收的数据格式是字节数组(byte[])，所以生产者需要用序列化器(Serializer)把对象转换成字节数组才能通过网络发送给Kafka，消费者从Kafak中获取字节数组数据，再通过反序列化器(Deserializer)成相应的对象。因此，生产者和消费者的序列化规则需要保持一致。

常见的序列化方式：org.apache.kafka.common.serialization.Serializer接口是Kafka的父接口，客户端自带的String的序列化器StringSerializer（org.apache.kafka.common.serialization.StringSerializer），以及ByteArray、ByteBuffer、Double、Integer、Long 等类型，都是实现与 Serializer 接口。

org.apache.kafka.common.serialization.Serializer接口提供的三个方法：

1、configure(Map<String, ?> configs, boolean isKey)

Serializer类的方法是配置当前类， Map<String, ?> configs 参数是key/value键值对的配置，boolean isKey 是key还是value得参数。

2、serialize()

对序列化方式的具体处理逻辑

3、close()

public interface Serializer<T> extends Closeable {default void configure(Map<String, ?> configs, boolean isKey) {// intentionally left blank}byte[] serialize(String topic, T data);default byte[] serialize(String topic, Headers headers, T data) {return serialize(topic, data);}@Overridedefault void close() {// intentionally left blank}
}

我们可以来看一下Kafka客户端StringSerializer 是如何进行字符串序列化。StringSerializer默认的编码集是UTF-8，也提供了自定义设计编码集。serialize(String topic, String data) 的实现逻辑也很简单，是通过String.getBytes()来实现字符串转byte[]。


public class StringSerializer implements Serializer<String> {private String encoding = StandardCharsets.UTF_8.name();@Overridepublic void configure(Map<String, ?> configs, boolean isKey) {String propertyName = isKey ? "key.serializer.encoding" : "value.serializer.encoding";Object encodingValue = configs.get(propertyName);if (encodingValue == null)encodingValue = configs.get("serializer.encoding");if (encodingValue instanceof String)encoding = (String) encodingValue;}@Overridepublic byte[] serialize(String topic, String data) {try {if (data == null)return null;elsereturn data.getBytes(encoding);} catch (UnsupportedEncodingException e) {throw new SerializationException("Error when serializing string to byte[] due to unsupported encoding " + encoding);}}
}

下面的代码也是Kafka基于Jackson 实现Json转byte[]的序列化器。

package org.apache.kafka.connect.json;
public class JsonSerializer implements Serializer<JsonNode> {private final ObjectMapper objectMapper = new ObjectMapper();public JsonSerializer() {this(Collections.emptySet(), JsonNodeFactory.withExactBigDecimals(true));}JsonSerializer(final Set<SerializationFeature> serializationFeatures,final JsonNodeFactory jsonNodeFactory) {serializationFeatures.forEach(objectMapper::enable);objectMapper.setNodeFactory(jsonNodeFactory);}@Overridepublic byte[] serialize(String topic, JsonNode data) {if (data == null)return null;try {return objectMapper.writeValueAsBytes(data);} catch (Exception e) {throw new SerializationException("Error serializing JSON message", e);}}
}

可以通过JSON、Protostuff、ProtoBuf、Thrift等常用的序列化工具实现，实现自定义序列化，满足业务自定义需求。

2.3、分区器：

分区器的作用是为消息分配分区，使得消息存储分布式存储。

1、如果消息ProducerRecord指定发送分区发送，则就不会使用到分区器。指定分区发送的实现方式，是在消息ProducerRecord设置partition 分区号；

2、如果不指定分区发送，则需要使用分区器经过规则计算出partition分区号，将消息发送到指定的分区。

Partitioner（org.apache.kafka.clients.producer.Partitioner）是Kafka的分区器父接口。Kafka的默认分区器是DefaultPartitioner（org.apache.kafka.clients.producer.internals.DefaultPartitioner），

默认分区器DefaultPartitioner的实现是partition（）方法中定义了主要的分区分配逻辑。如果key不为null，默认的分区器会key 进行哈希（采用MurmurHash2 算法，具备高运算性能及低碰撞率），最终根据得到的哈希值来计算得到分区号，相同key 的消息会被写入同一个分区。如果key 为null ，消息将通过轮询的方式发往主题内的各个可用分区。

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,int numPartitions) {if (keyBytes == null) {// 没有key值，则通过粘性分区分配分区return stickyPartitionCache.partition(topic, cluster);}// 对key 进行哈希，采用MurmurHash2算法，具备高运算性能及低碰撞率return BuiltInPartitioner.partitionForKey(keyBytes, numPartitions);}

但是在新版的Kafka（3.3.1）客户端不建议使用默认分区器,信息如下。

NOTE this partitioner is deprecated and shouldn't be used.  To use default partitioning logicremove partitioner.class configuration setting.  See KIP-794 for more info.