Hortonworks.com
  • Explore
    • All Tags
    • All Questions
    • All Articles
    • All Ideas
    • All Repos
    • All SKB
    • All Users
    • All Badges
    • Leaderboard
  • Create
    • Ask a question
    • Create Article
    • Post Idea
    • Add Repo
  • Tracks
    • All Tracks
    • Community Help
    • Cloud & Operations
    • CyberSecurity
    • Data Ingestion & Streaming
    • Data Processing
    • Data Science & Advanced Analytics
    • Design & Architecture
    • Governance & Lifecycle
    • Hadoop Core
    • Sandbox & Learning
    • Security
    • Solutions
  • Login
HCC Hortonworks Community Connection
  • Home /
  • Hadoop Core /
  • Home /
  • Hadoop Core /
avatar image

What is HDFS Ozone?

  • Export to PDF
Article by Ajay · Mar 29, 2018 at 10:22 PM · edited · Apr 02, 2018 at 03:36 AM
2

Article

Ozone is an Object store for Hadoop. It is a redundant, distributed object store built by leveraging primitives present in HDFS. Below are some key features of ozone:

  1. A Hadoop compatible file system called Ozone File system that allows programs like Hive or Spark to run against Ozone without any modifications.
  2. Ozone supports RPC and REST API for accessing the store.
  3. Built to support billions of keys in distributed environment.
  4. Ozone can run concurrently with HDFS.

Like many other object stores, Ozone has a notion of volume. Only Administrators can create Volumes. Users create buckets in the volumes. To store data inside a bucket, users create keys.

An ozone file system allows other Hadoop ecosystem applications like Hive and Spark to use ozone. Once a bucket is created, it is trivial to create an ozone file system.

A 10-thousand foot view of Ozone

    1. OzoneManager (Om) acts as namespace manager. All ozone entities like volumes, buckets and keys are managed by Om. Om talks to an independent block manager (Storage Container Manager, SCM) to get blocks and passes it on to the Ozone client.
    2. SCM: Storage Container Manager is the block and cluster manager for Ozone.
    3. Block: Blocks are similar to blocks in HDFS. They are replicated blocks of data.

    These components map very closely to the existing HDFS NameNode and DataNodes. The most significant difference is the presence of a block manager, SCM.

    Using Ozone

      The easiest way to run ozone is to try it out using the docker. To build Ozone from source, please checkout the hadoop sources from github. Then checkout the ozone branch, HDFS-7240 and build it.

      git checkout HDFS-7240
      

      You can build ozone by running the following build command.

      mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Pdist -Phdsl -Dtar -DskipShade

      skipShade is just to make compilation faster and not really required.

      Running Ozone via Docker

      This assumes that you have a running docker setup on the machine. Please run following commands to see ozone in action.

      • Go to the directory where the docker compose files exist.
        cd hadoop-dist/target/compose/ozone
        • Start ozone.
        docker-compose up -d
        • Log into the datanode container
        docker exec -it ozone_datanode_1  bash
        • Run the ozone load generator
        ./bin/oz freon

        Take a look at OzoneManager UI, to see all the requests made by Freon http://localhost:9874/

        Congratulations! on your first ozone deployment. In the next part of this tutorial we will cover oz command shell and look at how to use ozone to store files.

        thub.nodes.view.add-new-comment
        HDFSfaqfaqfaqfaqhadoop-ecosystemstorage
        Add comment · Show 5 · Featured
        10 |6000 characters needed characters left characters exceeded
        ▼
        • Viewable by all users
        • Viewable by moderators
        • Viewable by moderators and the original poster
        • Advanced visibility
        Viewable by all users

        Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

        avatar image Tom McCuch · Mar 30, 2018 at 04:19 PM 0
        Share

        @Ajay - Thank you for this article! Can you please re-label this to Apache Hadoop HDFS Ozone, rather than Apache Ozone? The latter is not the proper use of Apache branding. Thanks. Tom

        avatar image David Hoyle · Aug 30, 2018 at 12:51 PM 0
        Share

        @Ajay, I checked out the HDFS-7240 branch and ran the build command (on Mac OS X). That seemed to work and downloaded a bunch of files, but then failed saying "hdsl" does not exist:

        [INFO] BUILD FAILURE
        
        [INFO] ------------------------------------------------------------------------
        
        [INFO] Total time: 12.615 s
        
        [INFO] Finished at: 2018-08-30T07:09:47-04:00
        
        [INFO] Final Memory: 127M/1258M
        
        [INFO] ------------------------------------------------------------------------
        
        [WARNING] The requested profile "hdsl" could not be activated because it does not exist.
        
        [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.2.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1]
        
        [ERROR] 
        
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.
        
        [ERROR] 
        
        [ERROR] For more information about the errors and possible solutions, please read the following articles:
        
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
        
        [ERROR] 
        
        [ERROR] After correcting the problems, you can resume the build with the command
        
        [ERROR]   mvn <goals> -rf :hadoop-common
        <br>
        avatar image Sandeep Nemuri David Hoyle · Aug 30, 2018 at 02:58 PM 0
        Share

        @David Hoyle

        The code structure has changed since this article was written.

        1) checkout trunk

        2) brew install protobuf250 (protobuf is needed to build hadoop)

        3) Build using : mvn clean package -Phdds -Pdist -Dtar -DskipShade -DskipTests -Dmaven.javadoc.skip=true

        edit: updated the proto version

        avatar image David Hoyle · Aug 31, 2018 at 01:48 PM 1
        Share

        @Sandeep Nemuri helped me work through this. Here are updated steps:

        Clone Hadoop. From the trunk branch run the following command to install protoc 2.5:

        brew install protobuf250

        Run the following command to create symlinks for protoc 2.5:

        brew link --overwrite --force protobuf250

        You can use the following command to verify that protoc 2.5 has been installed:

        protoc --version

        Use the following command to build ozone:

        mvn clean package -Phdds -Pdist -Dtar -DskipShade -DskipTests -Dmaven.javadoc.skip=true

        Go to the directory that contains the Docker compose files:

        cd <path_to_local_github>/hadoop/hadoop-dist/target/ozone-0.2.1-SNAPSHOT/compose/ozone

        Start ozone:

        docker-compose up -d

        Log in to the DataNode container:

        docker exec-it ozone_datanode_1  bash

        Run the ozone load generator:

        bin/ozone freon -validateWrites -numOfVolumes 5 -numOfBuckets 10 -numOfKeys 10

        Now you should be able to see the OzoneManager UI at http://localhost:9874/

        avatar image aengineer David Hoyle · Aug 31, 2018 at 06:44 PM 0
        Share

        Thanks for the update. Glad you were able to make it work. Thanks for the comments and sharing it with the community.

        Article

        Contributors

        avatar image

        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image avatar image avatar image avatar image
        avatar image avatar image

        Navigation

        What is HDFS Ozone?

        Related Articles

        HDFS Recovery Time from Single DataNode Failure

        My-SQL Hive User creation and grants on it

        hive Insert to Dynamic Partition query Generating too many small files

        Compression in HBase

        HBase Replication - FAQ

        HDFS Balancer: Balancing Data Between Disks on a DataNode

        HDP 2.5.6.0 is now released and includes about 60 Apache JIRA fixes

        Heterogeneous Storage in HDFS(Part-1)...

        Namenode down due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit

        HDFS checklist for identifying missing/corrupt block

        This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.

        HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service

        © 2011-2019 Hortonworks Inc. All Rights Reserved.

        Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

        • Anonymous
        • Login
        • Create
        • Ask a question
        • Create Article
        • Post Idea
        • Add Repo
        • Create SupportKB
        • Tracks
        • Community Help
        • Cloud & Operations
        • CyberSecurity
        • Data Ingestion & Streaming
        • Data Processing
        • Data Science & Advanced Analytics
        • Design & Architecture
        • Governance & Lifecycle
        • Hadoop Core
        • Sandbox & Learning
        • Security
        • Solutions
        • Explore
        • All Tags
        • All Questions
        • All Articles
        • All Ideas
        • All Repos
        • All SKB
        • All Users
        • Leaderboard
        • All Badges