LLAP is a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is very similar to Hive without LLAP, except that worker tasks run inside LLAP daemons, and not in containers.
High-level lifetime of a JDBC query:
Without LLAP | With LLAP |
Query arrives to HS2; it is parsed and compiled into “tasks” | Query arrives to HS2; it is parsed and compiled into “tasks” |
Tasks are handed over to Tez AM (query coordinator) | Tasks are handed over to Tez AM (query coordinator) |
Coordinator (AM) asks YARN for containers | Coordinator (AM) locates LLAP instances via ZK |
Coordinator (AM) pushes task attempts into containers | Coordinator (AM) pushes task attempts as fragment into LLAP |
RecordReader used to read data | LLAP IO/cache used to read data or RecordReader used to read data |
Hive operators are used to process data | Hive operators are used to process data* |
Final tasks write out results into HDFS | Final tasks write out results into HDFS |
HS2 forwards rows to JDBC | HS2 forwards rows to JDBC |
* sometimes, minor LLAP-specific optimizations are possible - e.g. sharing a hash table for map join
Theoretically, a hybrid (LLAP+containers) mode is possible, but it doesn’t have advantages in most cases, so it’s rarely used (e.g.: Ambari doesn’t expose any knobs to enable this mode).
In both cases, query uses a Tez session (YARN app with aTez AM serving as a query coordinator). In container case, AM will start more containers in the same YARN app; in LLAP case, LLAP itself runs as an external, shared YARN app, so Tez session will only have one container (the query coordinator).
LAP daemon runs work fragments using executors. Each daemon has a number of executors to run several fragments in parallel, and a local work queue. For the Hive case, fragments are similar to task attempts – mappers and reducers. Executors essentially “replace” containers – each is used by one task at a time; the sizing should be very similar for both.
Optionally, fragments may make use of LLAP cache and IO elevator (background IO threads). In HDP 2.6, it’s only supported for ORC format and isn’t supported for most ACID tables. In 3.0, support is added for text, Parquet, and ACID tables. In HDInsight, text format is also added in 2.6.
Note that queries can still run in LLAP even if they cannot use the IO layer. Each fragment would only use one IO thread at a time. Cache stores metadata (on heap in 2.6, off heap in 3.0) and encoded data (off-heap); SSD cache option is also added in 3.0 (2.6 on HDInsight).
Row vs Columnar Storage For Hive
Teragen, Terasort, and Teravalidate Performance testing on AWS
RecordReader : How Records from a file are read in hadoop
LLAP debugging overview - logs, UIs, etc
More Hadoop nodes = faster IO and processing time?
Investigating when LLAP doesn’t start
Investigating when the queries on LLAP are slow or stuck
Investigating LLAP cache hit rate
Teragen, Terasort, and Teravalidate Performance testing on Bigstep
HCC Guidelines | HCC FAQs | HCC Privacy Policy
© 2011-2017 Hortonworks Inc. All Rights Reserved.
Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
Privacy Policy |
Terms of Service
HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service
© 2011-2018 Hortonworks Inc. All Rights Reserved.
Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.