主页 > 其他  > 

3.hadoop3.3.6HA集群搭建

3.hadoop3.3.6HA集群搭建
1.服务器规划

软件版本参考: cloud.google /dataproc/docs/concepts/versioning/dataproc-release-2.2?hl=zh-cn

hadoop HA, 主要分为HDFS的HA和YARN的HA。

集群规划

2.预先准备 2.1.ssh免密登录 ssh-keygen -t rsa 2.2 jdk 安装

2.3 配置host sudo vim /etc/hosts

10.128.0.18 hadoop01 10.128.0.21 hadoop02 10.128.0.22 hadoop03

前面为机器对应的内网ip。

3.hadoop配置

环境变量参考 /etc/profile.d/my_env.sh

export JAVA_HOME=/opt/apps/jdk export HADOOP_HOME=/opt/apps/hadoop export ZK_HOME=/opt/apps/zookeeper export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZK_HOME/bin:$PATH

core-site.xml

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http:// .apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定 NameNode 的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://bits</value> </property> <!-- 指定 hadoop 数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/data/hadoop/data</value> </property> <!-- 配置 HDFS 网页登录使用的静态用户为 bigdatabit9 --> <property> <name>hadoop.http.staticuser.user</name> <value>bigdatabit9</value> </property> <!-- 指定 zkfc 要连接的 zkServer 地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <!-- NN 连接 JN 重试次数,默认是 10 次 --> <property> <name>ipc.client.connect.max.retries</name> <value>20</value> </property> <!-- 重试时间间隔,默认 1s --> <property> <name>ipc.client.connect.retry.interval</name> <value>5000</value> </property> </configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http:// .apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- NameNode 数据存储目录 --> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/name</value> </property> <!-- DataNode 数据存储目录 --> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/data</value> </property> <!-- JournalNode 数据存储目录 --> <property> <name>dfs.journalnode.edits.dir</name> <value>${hadoop.tmp.dir}/jn</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>bits</value> </property> <!-- 集群中 NameNode 节点都有哪些 --> <property> <name>dfs.ha.namenodes.bits</name> <value>nn1,nn2,nn3</value> </property> <!-- NameNode 的 RPC 通信地址 --> <property> <name>dfs.namenode.rpc-address.bits.nn1</name> <value>hadoop01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.bits.nn2</name> <value>hadoop02:8020</value> </property> <property> <name>dfs.namenode.rpc-address.bits.nn3</name> <value>hadoop03:8020</value> </property> <!-- NameNode 的 http 通信地址 --> <property> <name>dfs.namenode.http-address.bits.nn1</name> <value>hadoop01:9870</value> </property> <property> <name>dfs.namenode.http-address.bits.nn2</name> <value>hadoop02:9870</value> </property> <property> <name>dfs.namenode.http-address.bits.nn3</name> <value>hadoop03:9870</value> </property> <!-- 启用 nn 故障自动转移 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/bits</value> </property> <!-- 访问代理类:client 用于确定哪个 NameNode 为 Active --> <property> <name>dfs.client.failover.proxy.provider.bits</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> </configuration>

yarn-site.xml

<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http:// .apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- 指定 MR 走 shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 启用 resourcemanager ha --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 启用自动恢复 --> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-- 声明两台 resourcemanager 的地址 --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster-yarn1</value> </property> <!--指定 resourcemanager 的逻辑列表--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2,rm3</value> </property> <!-- ========== rm1 的配置 ========== --> <!-- 指定 rm1 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop01</value> </property> <!-- 指定 rm1 的 web 端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop01:8088</value> </property> <!-- 指定 rm1 的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm1</name> <value>hadoop01:8032</value> </property> <!-- 指定 AM 向 rm1 申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>hadoop01:8030</value> </property> <!-- 指定供 NM 连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>hadoop01:8031</value> </property> <!-- ========== rm2 的配置 ========== --> <!-- 指定 rm2 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop02</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop02:8088</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>hadoop02:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>hadoop02:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>hadoop02:8031</value> </property> <!-- ========== rm3 的配置 ========== --> <!-- 指定 rm1 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm3</name> <value>hadoop03</value> </property> <!-- 指定 rm1 的 web 端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm3</name> <value>hadoop03:8088</value> </property> <!-- 指定 rm1 的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm3</name> <value>hadoop03:8032</value> </property> <!-- 指定 AM 向 rm1 申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm3</name> <value>hadoop03:8030</value> </property> <!-- 指定供 NM 连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm3</name> <value>hadoop03:8031</value> </property> <!-- 指定 zookeeper 集群的地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <!-- 指定 resourcemanager 的状态信息存储在 zookeeper 集群 --> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>

mapred-site.xml

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http:// .apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定 MapReduce 程序运行在 Yarn 上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

workers

hadoop01 hadoop02 hadoop03

注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

4.启动集群

配置好环境变量 略

格式化集群 第一次需要。如果需要重新格式化,需要删除集群中的data和logs目录。

启动JournalNode 在各个 JournalNode 节点上,输入以下命令启动 journalnode 服务

hdfs --daemon start journalnode

在[nn1]上,对其进行格式化,并启动

hdfs namenode -format hdfs --daemon start namenode

在[nn2]和[nn3]上,同步 nn1 的元数据信息

hdfs namenode -bootstrapStandby

启动[nn2]和[nn3]

hdfs --daemon start namenode

启动hdfs

sbin/start-dfs.sh

启动yarn ResourceManager要在配置为ResourceManager的节点上启动。

sbin/start-yarn.sh 5.问题解决

1.namenode挂了,无法自动完成主从切换,需要重新起来才行。

<!-- 配置隔离机制 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 配置隔离机制时需要的ssh密钥登录 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/dev/.ssh/id_rsa</value> </property>

解决: 隔离机制配置为sshfence,如果一台机器挂了,新的主会执行一次杀死原主namenode的操作,如果挂了,这一步执行不通。会导致集群卡在这里,需要人工把namenode启动起来,然后集群会自动选主。这里主要是为了防止脑裂。 将上面的配置改为

<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property>

或者其他的shell命令。但是这时候,需要自己考虑脑裂的问题。

标签:

3.hadoop3.3.6HA集群搭建由讯客互联其他栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“3.hadoop3.3.6HA集群搭建