Hortonworks.com
  • Explore
    • All Tags
    • All Questions
    • All Articles
    • All Ideas
    • All Repos
    • All SKB
    • All Users
    • All Badges
    • Leaderboard
  • Create
    • Ask a question
    • Create Article
    • Post Idea
    • Add Repo
  • Tracks
    • All Tracks
    • Community Help
    • Cloud & Operations
    • CyberSecurity
    • Data Ingestion & Streaming
    • Data Processing
    • Data Science & Advanced Analytics
    • Design & Architecture
    • Governance & Lifecycle
    • Hadoop Core
    • Sandbox & Learning
    • Security
    • Solutions
  • Login
HCC Hortonworks Community Connection
  • Home /
  • Data Ingestion & Streaming /
  • Home /
  • Data Ingestion & Streaming /
  • Sample HDF/NiFi flow to Push Tweets into Solr/B... /
avatar image

Connecting Solr to Spark - Apache Zeppelin Notebook   
  • Azure Sandbox prep for Twitter/HDP/HDF demo
  • Twitter Sentiment using Spark Core NLP in Apache Zeppelin

  • Export to PDF
Article by Ian B · May 23, 2018 at 03:52 PM · edited · May 23, 2018 at 04:54 PM
4

Article

This article is designed to extend the great work by @Ali Bajwa: Sample HDF/NiFi flow to Push Tweets into Solr/Banana, HDFS/Hive

I have included the complete notebook on my Github site, which can be found here

Step 1 - Follow Ali's tutorial to establish an Apache Solr collection called "tweets"

Step 2 - Verify the version of Apache Spark being used, and visit the Solr-Spark connector site. The key is to match the version of Spark the version of the Solr-Spark connector. In the example below, the version of Spark is 2.2.0, and the connector version is 3.4.4

%spark2
sc
sc.version
 
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@617d134a
res1: String = 2.2.0.2.6.4.0-91

Step 3 - Include the Solr-Spark dependency in Zeppelin. Important note: This needs to be run before the Spark Context has been initialized.

%dep
z.load("com.lucidworks.spark:spark-solr:jar:3.4.4")
//Must be used before SparkInterpreter (%spark2) initialized
//Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter

Step 4 - Run Solr query and return results into Spark DataFrame. Note: Zookeeper host might need to use full names:

"zkhost" -> "host-1.domain.com:2181,host-2.domain.com:2181,host-3.domain.com:2181/solr",

%spark2
val options = Map(
  "collection" -> "Tweets",
  "zkhost" -> "localhost:2181/solr",
//   "query" -> "Keyword, 'More Keywords'"
)

val df = spark.read.format("solr").options(options).load
df.cache()

Step 5 - Review results of the Solr query

%spark2 
df.count()
df.printSchema()
df.take(1)
thub.nodes.view.add-new-comment
How-To/TutorialNifiSOLRSOLRSparkbananafaqfaqfaqfaqfaqfaqfaqfaqfaqfaqtwittertwitterzeppelin-notebook
Add comment
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Article

Contributors

avatar image

avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image avatar image avatar image avatar image
avatar image avatar image

Navigation

Sample HDF/NiFi flow to Push Tweets into Solr/Banana, HDFS/Hive
  • Twitter Sentiment using Spark Core NLP in Apache Zeppelin
    • HDF/HDP Twitter Sentiment Analysis End-to-End Solution
      • Convert Spark Pipeline TFIDF Model Into MLeap Bundle
    • Build and Convert a Spark NLP Pipeline into PMML in Apache Zeppelin
  • Azure Sandbox prep for Twitter/HDP/HDF demo
  • Connecting Solr to Spark - Apache Zeppelin Notebook

Related Articles

Kafka 0.9 Configuration Best Practices

Automate deployment of HDP3.1/HDF3.3 or HDF3.3 standalone using Ambari blueprints and AWS AMI

Azure Sandbox prep for Twitter/HDP/HDF demo

Sample HDF/NiFi flow to Push Tweets into Solr/Banana, HDFS/Hive

Apache Storm Topology Tuning Approach

NiFi Ranger based policy descriptions

​Apache Storm Resource Contention Resolution Strategies

How can I configure pyspark on livy to use anaconda3 python instead of the default one

Guidelines for building Streaming Applications

Running NiFi on Raspberry Pi. Best Practices.

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.

HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service

© 2011-2019 Hortonworks Inc. All Rights Reserved.

Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Login
  • Create
  • Ask a question
  • Create Article
  • Post Idea
  • Add Repo
  • Create SupportKB
  • Tracks
  • Community Help
  • Cloud & Operations
  • CyberSecurity
  • Data Ingestion & Streaming
  • Data Processing
  • Data Science & Advanced Analytics
  • Design & Architecture
  • Governance & Lifecycle
  • Hadoop Core
  • Sandbox & Learning
  • Security
  • Solutions
  • Explore
  • All Tags
  • All Questions
  • All Articles
  • All Ideas
  • All Repos
  • All SKB
  • All Users
  • Leaderboard
  • All Badges