Hortonworks.com
  • Explore
    • All Tags
    • All Questions
    • All Articles
    • All Ideas
    • All Repos
    • All SKB
    • All Users
    • All Badges
    • Leaderboard
  • Create
    • Ask a question
    • Create Article
    • Post Idea
    • Add Repo
  • Tracks
    • All Tracks
    • Community Help
    • Cloud & Operations
    • CyberSecurity
    • Data Ingestion & Streaming
    • Data Processing
    • Data Science & Advanced Analytics
    • Design & Architecture
    • Governance & Lifecycle
    • Hadoop Core
    • Sandbox & Learning
    • Security
    • Solutions
  • Login
HCC Hortonworks Community Connection
  • Home /
  • Community Help /
avatar image

MSCK REPAIR HIVE EXTERNAL TABLES

Question by Haifa Ben Aouicha Jun 13, 2018 at 08:07 AM Hive

Hello Community,

I have a daily ingestion of data in to HDFS .
From data into HDFS I generate Hive external tables partitioned by date .
My qestion is as follows , should I run MSCK REPAIR TABLE tablename after each data ingestion , in this case I have to run the command each day.
Or running it just one time at the table creation is enough .
Thanks a lot for your answers

Best regards

Comment

People who voted for this

0 Show 0
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

2 Replies

· Add your reply
  • Sort: 
  • Votes
  • Created
  • Oldest
avatar image

Answer by Shardul Sadavarte · Jan 25 at 10:44 PM

What is better choice and why? Repairing a existing table or recreating it ?

Comment

People who voted for this

0 Show 0 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

avatar image

Answer by Shu · Jun 13, 2018 at 11:06 AM

@Haifa Ben Aouicha

Yes, you need to run msck repair table daily once you have loaded a new partition in HDFS location.

Why we need to run msck Repair table statement everytime after each ingestion?

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions.

1.Adding each partition to the table

hive> alter table <db_name>.<table_name> add partition(`date`='<date_value>') location '<hdfs_location_of the specific partition>';

(or)

2.Run metastore check with repair table option

hive> Msck repair table <db_name>.<table_name>

which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. In other words, it will add any partitions that exist on HDFS but not in metastore to the metastore.

In addition if you are loading dynamic/static partitions to the final table from other temp table with hive statement(like insert into final table partition(..) select * from temp table), then you don't need to do any of the above methods because as you are using hive statement to load a partition then hive will update the metadata of the final table. All the above mentioned ways we have to do if you are directly adding a new directory in hdfs or other ways instead of hive.

Please refer to this link for more details regarding refresh hive metadata.

Comment

People who voted for this

0 Show 0 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Your answer

Hint: You can notify a user about this post by typing @username

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

48
Followers

Answers Answer & comments

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.

HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service

© 2011-2019 Hortonworks Inc. All Rights Reserved.

Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Login
  • Create
  • Ask a question
  • Create Article
  • Post Idea
  • Add Repo
  • Create SupportKB
  • Tracks
  • Community Help
  • Cloud & Operations
  • CyberSecurity
  • Data Ingestion & Streaming
  • Data Processing
  • Data Science & Advanced Analytics
  • Design & Architecture
  • Governance & Lifecycle
  • Hadoop Core
  • Sandbox & Learning
  • Security
  • Solutions
  • Explore
  • All Tags
  • All Questions
  • All Articles
  • All Ideas
  • All Repos
  • All SKB
  • All Users
  • Leaderboard
  • All Badges