The goal of this tutorial is to create a moving chart that shows the changes in price of a few stock symbols, similar to Google Finance or Yahoo Finance.
Download the latest (2.3 as of this writing) HDP Sandbox here. Import it into VMware or VirtualBox, start the instance, and update the DNS entry on your host machine to point to the new instance’s IP.
On Mac, edit /etc/hosts, on Windows, edit %systemroot%\system32\drivers\etc\ as administrator and add a line similar to the below:
192.168.56.102 sandbox sandbox.hortonworks.com
Follow the directions here. These were the steps that I executed for 0.4.1
cd /tmp wget http://apache.cs.utah.edu/nifi/0.4.1/nifi-0.4.1-bin.zip cd /opt/ unzip /tmp/nifi-0.4.1-bin.zip useradd nifi chown -R nifi:nifi /opt/nifi-0.4.1/ perl -pe 's/run.as=.*/run.as=nifi/' -i /opt/nifi-0.4.1/conf/bootstrap.conf perl -pe 's/nifi.web.http.port=8080/nifi.web.http.port=9090/' -i /opt/nifi-0.4.1/conf/nifi.properties /opt/nifi-0.4.1/bin/nifi.sh start
Download a new Solr dashboard, start the service, and create a new collection to store stock price changes:
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 wget https://raw.githubusercontent.com/vzlatkin/Stocks2HBaseAndSolr/master/Solr%20Dashboard.json -O /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/default.json /opt/lucidworks-hdpsearch/solr/bin/solr start -c -z localhost:2181 /opt/lucidworks-hdpsearch/solr/bin/solr create -c stocks -d data_driven_schema_configs -s 1 -rf 1
Solr is used for indexing the data, Banana UI is used for visualization, and HBase is used for future-proofing. HBase can be used to further analyze the data from Storm/Spark or to create a custom UI. The get the data into these tools, follow the steps below:
hbase shell hbase(main):001:0> create 'stocks', 'cf'
Find the template on your local machine and import it:
Drag and drop to instantiate a new template:
Double click the new process group:
You'll need to enable the HBase shared controller. To do so, click the right mouse button over the "Send to HBase" process, then click "Configure", then "Properties" and the "Go to" arrow to access the controller. Finally, click the "Enable" button.
Now start all of the processes. Hold down the Shift-key, and select all of the processes on the screen. Then click the start button:
You should see a flow that looks like the below screenshot
The reason for so many processes is that the response from Google Finance API needs to be transformed. First, we remove the comment characters '//' from the response. Second, we split the array into individual JSON objects. Third, we extract the relevant attributes. Fourth, the timestamp has the format of UTC, but it is actually in EST timezone, therefore, we fix that. Finally, we send the information to HBase, Solr, and the NiFi bulletin board for logging.
Now open the Banana UI. If you are doing this when the US stock markets are open (9:30am to 4pm Eastern Time), then you should see a dashboard similar to the below.
Full source code is available in GitHub.