Hortonworks.com
  • Explore
    • All Tags
    • All Questions
    • All Repos
    • All SKB
    • All Articles
    • All Ideas
    • All Users
    • All Badges
    • Leaderboard
  • Create
    • Ask a question
    • Add Repo
    • Create Article
    • Post Idea
  • Tracks
    • All Tracks
    • Community Help
    • Cloud & Operations
    • CyberSecurity
    • Data Ingestion & Streaming
    • Data Processing
    • Data Science & Advanced Analytics
    • Design & Architecture
    • Governance & Lifecycle
    • Hadoop Core
    • Sandbox & Learning
    • Security
    • Solutions
  • Login
HCC Hortonworks Community Connection
  • Home /
  • Data Ingestion & Streaming /
avatar image

Parse file in Nifi ?

Question by Surendra Shringi Jan 12 at 02:31 AM nifi-templates

Hi,

I am getting error while parsing CSV formatted JSON file in NiFi. My file file column like...

Name : Surendra

Age : 24

Address : {"city":"Chennai","state":"TN","zipcode":"600345"}

Now output should be like this..

Name : Surendra

Age : 24

Address_city : Chennai

Address_state : TN

Address_zipcode : 600345

Pls can anyone help me regarding the same.

Comment
Shu

People who voted for this

1 Show 0
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

4 Replies

· Add your reply
  • Sort: 
  • Votes
  • Created
  • Oldest
avatar image
Best Answer

Answer by Shu · Jan 13 at 04:59 AM

@Surendra Shringi

We can do this parsing inside NiFi by using

Example:-

Let's consider your csv file having n number of rows in it

Surendra,24,"{"city":"Chennai","state":"TN","zipcode":"600345"}"
Surendra,25,"{"city":"Chennai","state":"TN","zipcode":"609345"}"

We need to split this file into individual flowfile having each record in one flowfile for splitting we need to use

SplitText:-

processor with below configs as

Line Split Count

1

So if our input csv having 2 lines in it then split text processor will split the input file having 2 lines into 2 flowfiles having each line in one flowfile.

Once we are having each record in one flowfile then we need to use

ExtractText:-

to extract the content of the flowfile using Extract text processor by adding new properties to the processor as below.

Address_city

"city":"(.*?)"

Address_state

"state":"(.*?)"

Address_zipcode

"zipcode":"(.*?)"

Age

,(.*?),

Name

^(.*?),

So in this processor we are going to extract contents of flowfile and keep them as flowfile attributes by adding matching regex.

To create and test regex click here.

You need to change Maximum Buffer Size value (default is 1MB) based on your flowfile size.

Replace Text Configs:-

In the previous step we have extracted all the contents of flowfile based on the properties in Replace Text processor we are going to create a new csv file with comma delimiter(you can use any delimiter you want), By changing below properties and adding replacement value property as follows.

Configs:-

Search Value

(?s)(^.*$)

Replacement Value

${Name},${Age},${Address_city},${Address_state},${Address_zipcode}

Maximum Buffer Size

1 MB

Replacement Strategy

Always Replace

Evaluation Mode

Entire text

So the output of the replace text processor would be

Surendra,24,Chennai,TN,24
Surendra,25,Chennai,TN,24

we have created a csv file without json message now but we are going to have 2 csv files(because our input data having 2 lines),if your input file having 1000 lines then we are going to end up with 1000 ourput csv files.

If you don't want to create 2 output files and want them to merge into 1 output file then you need to use

Merge Content Processor:-

With the below configs,

You need to change all the highlighted properties as per your requirements as per my configs shows Max bin age of 1 min so processor waits for 1 minute before merging all the queued flowfiles and merges them into 1 file.

Delimiter strategy to Text(default is filename) because we need to have our contents of individual flowfile needs to add as newlines in the merged file, so we need to make use of Demarcator property as Shift+Enter(this property helps to add new contents to the newline).

Output:-

1 file having both records in it

Surendra,24,Chennai,TN,600345
Surendra,25,Chennai,TN,609345

I highly sugges you to refer below links to get familiar with all properties in merge content processor

https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html

https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html

.

I'm attaching the xml to the post you can save the xml and import to nifi and make changes to that accordingly.parse-file-nifi-159780.xml

.

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.


replacetext.png (87.7 kB)
splittext.png (57.4 kB)
extracttext.png (128.3 kB)
mergecontent.png (118.4 kB)
parse-file-nifi-159780.xml (27.4 kB)
Comment
Ed Berezitsky

People who voted for this

1 Show 0 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image

Answer by Surendra Shringi · Jan 12 at 04:37 AM

Hi @Shu my input is like this, so now i want to parse these data according to above which i mentioned.

Thanks !


screenshot-from-2018-01-12-100503.png (38.4 kB)
Comment
Shu

People who voted for this

1 Show 1 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image Surendra Shringi · Jan 12 at 04:46 AM 1
Share

I want to fetch this data from Mysql so i created a table name as input in Mysql.

And my flow like this ExecuteSQL ->> SplitAvro ->> ConvertAvroToJson ->> EvaluateJsonPath ->> UpdateAttribute

avatar image

Answer by Surendra Shringi · Jan 13 at 03:42 AM

Thanks @shu for your reply ... and I am looking for the same output which you send me. I want the output in CSV file like Surendra,24,Chennai,TN,24. And output will stored in local machine only.

Comment
Shu

People who voted for this

1 Show 0 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image

Answer by Surendra Shringi · Jan 13 at 05:03 AM

Thanks for your overwhelming response, this will help me a great.

Comment
Shu

People who voted for this

1 Show 0 · Share
10 |6000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 5 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

40
Followers
follow question

Answers Answer & comments

HCC Guidelines | HCC FAQs | HCC Privacy Policy

Hortonworks - Develops, Distributes and Supports Open Enterprise Hadoop.

© 2011-2017 Hortonworks Inc. All Rights Reserved.
Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
Privacy Policy | Terms of Service

HCC Guidelines | HCC FAQs | HCC Privacy Policy | Privacy Policy | Terms of Service

© 2011-2018 Hortonworks Inc. All Rights Reserved.

Hadoop, Falcon, Atlas, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Login
  • Create
  • Ask a question
  • Add Repo
  • Create SupportKB
  • Create Article
  • Post Idea
  • Tracks
  • Community Help
  • Cloud & Operations
  • CyberSecurity
  • Data Ingestion & Streaming
  • Data Processing
  • Data Science & Advanced Analytics
  • Design & Architecture
  • Governance & Lifecycle
  • Hadoop Core
  • Sandbox & Learning
  • Security
  • Solutions
  • Explore
  • All Tags
  • All Questions
  • All Repos
  • All SKB
  • All Articles
  • All Ideas
  • All Users
  • Leaderboard
  • All Badges