Hortonworks.com
  • Explore
    • All Tags
    • All Questions
    • All Articles
    • All Ideas
    • All Repos
    • All SKB
    • All Users
    • All Badges
    • Leaderboard
  • Create
    • Ask a question
    • Create Article
    • Post Idea
    • Add Repo
  • Tracks
    • All Tracks
    • Community Help
    • Cloud & Operations
    • CyberSecurity
    • Data Ingestion & Streaming
    • Data Processing
    • Data Science & Advanced Analytics
    • Design & Architecture
    • Governance & Lifecycle
    • Hadoop Core
    • Sandbox & Learning
    • Security
    • Solutions
  • Login
HCC Hortonworks Community Connection
  • Home /
  • Data Science & Advanced Analytics /
avatar image

Pyspark Phoenix integration failing in oozie workflow

Question by Selva prabhu Feb 08 at 02:58 PM Hbasepysparkapache-phoenixphoenix4.7phoenix-spark

I am connecting and ingesting data into phoenix table using pyspark by below code

dataframe.write.format("org.apache.phoenix.spark").mode("overwrite").option("table", "tablename").option("zkUrl", "localhost:2181").save()

When i run this in spark submit it works fine by below command,

spark-submit --master local --deploy-mode client --files /etc/hbase/conf/hbase-site.xml --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar" sparkPhoenix.py

When i run this with oozie I am getting below error,

.ConnectionClosingException: Connection to ip-172-31-44-101.us-west-2.compute.internal/172.31.44.101:16020 is closing. Call id=9, waitTime=3 row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101

Below is workflow,

<action name="pysparkAction" retry-max="1" retry-interval="1" cred="hbase">
<spark
xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>local</master>
<mode>client</mode>
<name>Spark Example</name>
<jar>sparkPhoenix.py</jar>
<spark-opts>--py-files Leia.zip --files /etc/hbase/conf/hbase-site.xml --conf spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar --conf spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.7.0.2.6.3.0-235.jar:/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-client.jar</spark-opts>
</spark>
<ok to="successEmailaction"/>
<error to="failEmailaction"/>
</action>

Using spark-submit I got the same error I corrected that by passing required jars. In oozie, Even i pass jars, it throwing error.

Comment

People who voted for this

0 Show 0
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

2 Replies

· Add your reply
  • Sort: 
  • Votes
  • Created
  • Oldest
avatar image

Answer by Josh Elser · Feb 08 at 03:25 PM

Do you have security enabled? Usually clients see this error but the server rejects the authenticated RPC.

Turn on DEBUG logging for HBase and look at the RegionServer log for the hostname that you have configured. Most of the time, this is a result of a impersonation-related configuration error. The DEBUG message in the RegionServer log will inform you what the "real" user is (who is providing kerberos credentials) and who they are trying to impersonate (who the real user "says" they are). In your case here, "oozie" would be saying that it is "you" (or however you are running this application as).

From this, you can amend your `hadoop.proxyuser...` configuration properties in core-site.xml, restart HBase, and try again.

Comment
Selva prabhu

People who voted for this

1 Show 1 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

avatar image Selva prabhu · Feb 11 at 10:11 AM 0
Share

Hi @Josh Elser Thank you so much for the answer. I checked what you said and everything fine. I am using below jdbc url as zkUrl when accessing phoenix. My cluster is kerberized cluster so I am passing all credentials properly as below

jdbc:phoenix:ip-node1,ip-node2,ip-node3:2181:/hbase-secure:hbaseuser@HCL.COM:/home/hbaseuser/hbaseuser.keytab

The problem is when i execute my pyspark with this jdbc url using spark-submit, it works fine. If i execute same code in oozie workflow its throwing below exception because of hbase connectivity issue

java.sql.SQLException: org.apache.hadoop.hbase.client.RetriesExhaustedException:Failed after attempts=36, exceptions:MonFeb1107:33:05 UTC 2019,null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68427: row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-172-31-44-101.us-west-2.compute.internal,16020,1545291237502, seqNum=0

How same code works fine in spark-submit and not in oozie workflow. I copied in all dependency jars in workflow/lib folder in hdfs. How to debug this further.

avatar image

Answer by Selva prabhu · Feb 13 at 02:07 PM

I found that "--files /etc/hbase/conf/hbase-site.xml" does not working when integrated with oozie. I pass the hbase-site.xml as below with file tag in oozie spark action. It works fine now

<file>file:///etc/hbase/conf/hbase-site.xml</file>
Comment

People who voted for this

0 Show 0 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Your answer

Hint: You can notify a user about this post by typing @username

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

54
Followers

Answers Answer & comments

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.

HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service

© 2011-2019 Hortonworks Inc. All Rights Reserved.

Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Login
  • Create
  • Ask a question
  • Create Article
  • Post Idea
  • Add Repo
  • Create SupportKB
  • Tracks
  • Community Help
  • Cloud & Operations
  • CyberSecurity
  • Data Ingestion & Streaming
  • Data Processing
  • Data Science & Advanced Analytics
  • Design & Architecture
  • Governance & Lifecycle
  • Hadoop Core
  • Sandbox & Learning
  • Security
  • Solutions
  • Explore
  • All Tags
  • All Questions
  • All Articles
  • All Ideas
  • All Repos
  • All SKB
  • All Users
  • Leaderboard
  • All Badges