Integrating Apache Nifi With External API's
Integrating Apache Nifi With External API's
API’s
Ayyakutty Ramesh — February 22, 2019 in Big data Categories
Analytics
0 comments
Artificial Intelligence
Big data
Blockchain
Cloud
Data Management
Data Science
Gaming
GDPR Compliance
Generic
IoT
Machine Learning
Contents [hide]
ML
1 What is NiFi?
Product Development
2 Connecting Nifi to external API:
3 Implementation: Testing
3.1 1)Type casting:
3.1.1 Challenges Faced:
3.2 2)Handling apostrophe: Recent Post
3.3 3)Handling a large dataset:
4 Common Machine Learning Mistakes
3.4 4)Storing records using a timestamp: And How To Fix Them!
3.5 5)Handling Null values:
What is non-functional testing
3.6 6)Pagination:
What is Machine Learning?
What is NiFi?
An Overview to DevSecOps
Apache Nifi is a data logistics platform used for automating the data flow between Driving Digital Transformation- Indium
disparate data sources and systems which makes the data ingestion fast and secure. Software
Name
00
Implementation: Email
We need to get data from an API and store the necessary columns in postgres Organization
database. We can get the data by using Invokehttp processor. The resultant data is in
json format. To split json array into individual records we use SplitJson processor. In Contact Number
00 some cases, the resultant data is in nested json format. To convert into single json
file we use JoltTransformJson. To evaluate one or more json path expression we use
Type your Request / Inquiry
EvaluateJsonPath processor,resultant of the processor are designate to the flow file
0 attribute. By using evaluate json path processor we can filter required data in json.
SHARES
SHARES
Then we use AttributeToJson processor for converting resultant attributes into json
format. Finally, we use ReplaceText processor for parsing query and ExecuteSql
processor for executing the query.
Send
1)Type casting:
In the above example, we need to store the column ‘active’ as an integer in the
Postgres database.
To achieve this, we have used Update Attribute processor which supports nifi-
expression language. We have added a property as ‘active’ and converted it to integer
by passing a property value as ${active:toNumber()}.
Challenges Faced:
Example:
While using AttributetoJson processor for writing all the
flow file attribute the resultant json values will be of
{
string data type.
“dept_name ” : “CSE”,
In the above example, we need to store the column
‘active’ as an integer in the postgres database. “active” : “1”
To achieve this, we have used Update Attribute processor which supports nifi-
expression language. We have added a property as ‘active’ and converted it to integer
by passing a property value as ${active:toNumber()}.
2)Handling apostrophe:
In our use case, we had to store the values with apostrophe in the database. But
while trying to store it using ExecuteSql process we got an error message as “Invalid
string”. In order to store the values with apostrophe, we have added a property to
replace the apostrophe with an empty character in the Update Attribute processor.
Example:
MergeContent processor can be used for executing batch queries. It reduces the
execution time taken by inserting bulk data. By default, it inserts 1000 records in a
batch. We can also change the number of records to be inserted.
To store recent records based on the updated date in the postgres database, we
have added a property in the UpdateAttribute processor.
Example:
${updated_at:toDate(“yyyy-MM-dd’T’HH:mm:ss’Z'”):format(‘yyyyMMdd’)}.
Then pass the condition by using which, the data needs to be fetched in the
RouteOnAttribute processor. Here, we have taken the records that are updated
after the given timestamp.
We can also filter the records based on the timestamp while fetching it from the API.
The above property will retrieve all the records from the API and based on the
timestamp, the recent records will be stored whereas by defining it globally we can
just retrieve the records that are updated after the given time frame.
To pass a variable as global, right click on the processor group. Under variables, add
the variable which we want to pass to all the flows.
5)Handling Null values:
To handle null records and route it into failure, we have added a property in Update
Attribute processor: This property will check whether all the columns are empty if
so, then it will not be stored and will be routed to failure.
6)Pagination:
To iterate multiple pages and retrieve records, we had to use GenerateFlowFile and
Set Initial Pagination Parameter processor. In Set Initial Pagination Parameter
processor add a property as given below.
This property will be the value to the parameter in the URL that we have given in the
InvokeHTTP processor.
https://api.example.com/v2/clients?page=${page}
Then add the name of the property where the total number of page information will
be available in the EvaluateJsonPath processor.
In our use case, the next_page attribute contains the total number of pages.
In RouteOnAttribute processor, add the below property so that it will get iterated till
the last page.
Each time the page argument in the InvokeHTTP URL will be replaced with the current
page number and this will run till the last page. Below is the whole flow.
Ayyakutty Ramesh
apache nifi
Leave a Reply
Your email address will not be published.
Comment
Name Email Website
Post Comment
© 2019 Indium Software By continuing to use this website, you agree to our cookieDigital
policy. Independent
Ok No QA Learn
Gaming
more Industries Inquire Now! Blog