-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in TICDC v7.5.4 version related to binary fields #11771
Comments
Which protocol do you use to send to Kafka? Canal-JSON uses these two character encodings both. https://docs.pingcap.com/tidb/stable/ticdc-canal-json#binary-and-blob-types |
I am using the Canal-JSON protocol. The problem is that the same data is decoded using ISO/IEC 8859-1 in version v7.5.3, but it needs to be decoded with UTF-8 in version v7.5.4. |
Could you provide the data? |
create table tab_test3( |
I can't reproduce this. Messages encoded through Canal-JSON are the same in these two TICDC versions. {"id":0,"database":"test","table":"tab_test3","pkNames":["source_id"],"isDdl":false,"type":"INSERT","es":1732180083113,"ts":1732180103973,"sql":"","sqlType":{"source_id":12,"api_data":2004},"mysqlType":{"source_id":"varchar","api_data":"longblob"},"old":null,"data":[{"source_id":"0e9c92434c54f4a3f2bb860d37835dc4:7ce6bdfc1d5f5f490b6033af263c4a91:0","api_data":""}],"_tidb":{"commitTs":454080615707574279}} |
The data is compressed using gzip and then converted into binary to be stored in longblob. When consuming, in version v7.5.3, the data is first converted from iso8859-1 to utf-8 before decompression. However, in version v7.5.4, no such conversion to utf-8 is necessary, the data can be decompressed directly. |
What did you do?
TIDB upgraded from v7.5.3 to v7.5.4
TIDB从v7.5.3升级到v7.5.4
What did you expect to see?
The content saved in the blob field is information that has been compressed and then converted into binary. When this data is sent to Kafka, when consuming Kafka messages, it is decoded through ISO/IEC 8859-1 in version v7.5.3, but it needs to be decoded through UTF-8 in version v7.5.4. There is no explanation for this in the documentation.
blob字段保存的内容是压缩过后再转成二进制的信息,这个数据发到kafka中,消费kafka消息时,在v7.5.3版本中是通过ISO/IEC 8859-1解码,但是到了v7.5.4版本需要通过UTF-8进行解码,这个文档中没有任何说明
What did you see instead?
Do not upgrade for the time being.
暂时不进行升级
Versions of the cluster
The current version is v7.5.3 and there is a plan to upgrade it to v7.5.4.
目前的版本是v7.5.3,计划升级到v7.5.4
The text was updated successfully, but these errors were encountered: