SYMPTOMS: During HDP upgrades, namenode restart would fail leading to upgrade failure. Following errors are usually seen:-
Traceback (most recent call last):File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmxreturn data_dict["beans"][0][property] IndexError: list index out of range Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 420, in <module>NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in executemethod(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restartself.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 101, in startupgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunkreturn fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 184, in namenodeif is_this_namenode_active() is False: File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapperreturn function(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 554, in is_this_namenode_active raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting...")) resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or Standby, waiting...
ROOT CAUSE: Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during namenode restart.This is because, restart command waits for namenode to come out of safemode and if the cluster size is large, namenode takes more time to leave safemode but Ambari marks this action as failure as the namenode didn't leave safemode within the configured timeout in Ambari scripts.The issue has been reported in AMBARI-18786
SOLUTION: Upgrade to Ambari 2.5
WORKAROUND: Increase the timeout for ambari as follows:-
1. Increase the timeout in
/var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
From this:
@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)
To this:
@retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)
2. Restart Ambari server
hi @Gaurav I have similar issue on amabri 2.6.1 post upgrade. My python version is still 2.6 can you help me please? 2018-05-11 17:06:21,660 - Getting jmx metrics from NN failed. URL: http://x.y.z:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx return data_dict["beans"][0][property] IndexError: list index out of range Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 361, in <module> NameNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 978, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 99, in start upgrade_suspended=params.upgrade_suspended, env=env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 203, in namenode if is_this_namenode_active() is False: File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 62, in wrapper return function(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 611, in is_this_namenode_active raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting...")) resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or Standby, waiting...
Scaling the HDFS NameNode (part 2)
Scaling the HDFS NameNode (part 4) - Avoiding Performance Pitfalls
Scaling the HDFS NameNode (part 3) - RPC scalability features
Namenode crashes with SEGFAULT when using JniBasedUnixGroupsMapping
HDFS Balancer (3): Cluster Balancing Algorithm
NodeManager Web UI connection timeouts; always 5 seconds
How to Recover Accidentally deleted file (with -skipTrash) in HDFS ?
Namenode HA : Namenode enters 'SERVICE_NOT_RESPONDING' state and causes frequent flip over
This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.
HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service
© 2011-2019 Hortonworks Inc. All Rights Reserved.
Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.