Working With S3 Compatible Data Stores (and handling single source failure)
With the major outage of S3 in my region, I decided I needed to have an alternative file store. I found a great open source server called Minio that I run on a miniPC running Centos 7. We could also use this solution for connecting to other S3 compatible stores such as RiakCS and Google Cloud Storage. I like to remain cloud and location neutral.
In Apache NiFi, it's really easy. You can have two sources and two destinations, instead of just your regular AWS S3, you can have one for AWS S3 and one for another. Or you can use the second as a disaster recovery data backup. Since my Minio box is local, I can store data locally. It's pretty affordable to get a few terabytes connected to a small Linux box to hold some backups. With Apache NiFi, you have queues to buffer a potentially slower ingest/egress.
wget https://dl.minio.io/server/minio/release/linux-amd64/minio chmod 755 minio nohup ./minio server files &
Find the version that matches your hardware and OS. It will report back the endpoint (use this in the NiFi endpoint URL), access key and secret key and region. You enter this information in Apache NiFi and any S3 compatible tool like AWS CLI or S3Cmd.
S3 Tool Install
pip install awscli AWS Access Key ID [****************3P2F]: 45454545zfgfgfgfgfgzgggzggggFFF AWS Secret Access Key [****************Y3TG]: FFFDFDFDFDF7d8f7d87f8&D*F7d*&F78 Default region name [us-east-1]: Default output format [None]: aws configure set default.s3.signature_version s3v4 aws --endpoint-url http://192.168.1.155:9000 s3 ls s3://nifi 2017-03-01 16:17:19 13729 Retry_Count_Loop.xml 2017-03-01 16:19:58 19929 tspann7.jpg aws --endpoint-url http://192.168.1.155:9000 s3 ls 2017-03-01 11:19:58 nifi
These are just for testing connectivity.
# Setup endpoint host_base = 192.168.1.155:9000 host_bucket = 192.168.1.155:9000 bucket_location = us-east-1 use_https = True # Setup access keys access_key = DF&D*F&*D&F*&DF&DFDF secret_key = &d7df7f77DDFdjfiqeworsdfFDr34fd accessKey = DF&D*F&*D&F*&DF&DFDF secretKey = &d7df7f77DDFdjfiqeworsdfFDr34fd # Enable S3 v4 signature APIs signature_v2 = False
After sending Twitter JSON files to S3.