current position:Home>Various pits encountered when upgrading HDP2.4 to HDP3.1

Various pits encountered when upgrading HDP2.4 to HDP3.1

2022-09-23 10:17:46Actually I'm real

Recently, the project will be upgraded from HDP2.4.0 to HDP3.0.1. The various pits are recorded here.

List the main technologies and versions used first

Old environment HDP-2.4.0






New Environment HDP-3.0.1

HADOOP 3.1.1

HBase 2.0.0

Storm 1.2.1

ZooKeeper 3.4.6

Kafka 1.1.1

The first type of pit: HDP-3.0.1, the dependencies of these versions in Maven are different.

For example: kafka1.1.1, the Maven dependency I use is as follows, it actually depends on zookeeper3.4.10, and HDP's own integrated 3.4.6 is 4 versions ahead.

Fortunately, this is just a small pit. It is 4 versions ahead of the small version, and it will not have any effect on my program.


The second type of pit: the dependencies between jar packages are mainly Storm-related dependency conflicts

Relevant dependencies provided by Storm used by colleagues who have used Storm.storm-hbase,storm-kafka

Due to environmental reasons, I need to use Storm1.2.1 to integrate with HBase2.0 and introduce maven dependencies as follows,

One by one: Needless to say storm-core, the scope is understood, running in the storm cluster definitely does not need storm dependencies,

storm-hbase, this is quite a pit, it depends on Hbase1.1.0, and I want to use 2.0.0, which is a big version, which is very fatal, and the maven warehouse of storm-hbaseThe latest only supports Hbase1.1.1.Hbase2.0 related jar package has not come out yet, because the API is encapsulated inside, so I can only think of 2~3 ways to solve it

1.storm-hbase, remove the related dependencies of hbase-client, and then introduce the package of Hbase-client2.0, maven will not be written here.But this method doesn't work, at least I didn't pass the test, because after converting to Hbase-client2.0, the storm-hbase function will report a function that there is no exception, NoSuchMethod, indicating that the storm-hbase function uses the old Hbasefunction, to solve this problem, I will not rewrite the storm-hbase source code, or integrate the class that has the problem for coverage.

One is that I am incapable of being able to control it, and the other is that I have no time. After all, the project time is tight, so I will give it a week. There are 6 projects in the front and back offices, all of which have to be upgraded and debugged.

2. The class library of storm-hbase is not used, and the second is to use the original hbase-clientAPI for data access in the storm class.In fact, at the very beginning, when the project started, I was going to use the Hbase native API, but the project was not done by me. The colleague who did that said: If you use storm, then using storm-hbase is definitely better than using the hbase native API yourself., I was speechless at that time, TMD, now I finally know how to fight him (note the red letter above), storm-hbase is provided by storm-hbase, then when Hbase is upgraded, storm-hbase may not correspond at allThe dependency of the new version of Hbase. At this time, if you want to upgrade, there is nothing you can do. Saying these are all complaints. In the end, I have to wipe his ass, hey~ ╮(╯▽╰)╭.I have not adopted this method, because it is necessary to rewrite the Hbase storage, which involves a knowledge point that needs to be recorded. Originally, HBase storage can obviously be encapsulated. The Hbase put function itself is in the form of key-value.The data in our Storm is a Map, as long as we design a function (table name, data), it can be easily encapsulated, but storm-hbase must write a class for a table, inheriting HbaseBolt, I asked at the timeCan that colleague integrate and use a Bolt, he said no (so if I use the second method, I need to change the Bolt of each table again, and there is no time.)

3. This method is what I currently use. Although the HBASE cluster is using 2.0.0, I still use the client of Hbase 1.1.0, hoping to be compatible. It turns out that it is compatible. Fortunately, Hbase has good compatibility..But it has to be tested to find out.


The third type of pit: the pit of the code, because the API is definitely changed after the version is upgraded, the project will not report an error in time, and there may be various jar package conflicts after the release, resulting in runtime exceptions.Here are a few points of modification that I am impressed by.

The version of kafka in HDP3.1 is 2.0.0, but my code uses storm-kafka. If I don't use this package in version 2.1, I need to rewrite a spout myself, which may cause performance problems and disconnection.Reconnection, cursor synchronization and other issues need to be considered, there is no time, but if I exclude the client that storm-kafka depends on, and then introduce the client of kafka2.0, then storm-kafka will report an error, which is originally used in KafkaConfig in the source codeThe kafka.api.OffsetRequest.DefaultClientId() function has been removed in the kafka2.0 client, so if I want to use storm-kafka, I have to lower my kafka-client version, here I useIt is version 1.1.1. In this version, this class is @deprecated and can still be used.

Another pit is that storm-kafka cannot create a directory /offsetZkRoot/offsetZkId after consumption in 2.0. Normally, this directory is used by Storm to store kafka related information. It can be created automatically when using 2.4, but cannot be automatically created after upgradingCreated, I didn't find out why, my solution is to use the zookeeper command to manually create this directory in zookeeper, and then the program will not report the error that this directory cannot be found.One thing to note here is that if this directory is just created, if you start strom first, it will prompt an error that key.serializer does not have a default value.Later, I started the producer first, let the topic have data, and then started storm without reporting this error.

copyright notice
author[Actually I'm real],Please bring the original link to reprint, thank you.

Random recommended