Talend Interview Questions
Talend Interview Questions
Talend Vs Pentaho
Data Quality Graphical User Interface (GUI) featured GUI featured by DQ along with
(DQ) by Data Quality additional options
Ans: Talend Open Studio for Data Integration is an open-source data integration product
developed by Talend and designed to combine, convert and update data in various
locations across a business.
Ans: Java
Ans:
ETL: Extract, Transform, and Load (ETL) is a process that involves extracting data from
outside sources, transforming it to fit operational needs (sometimes using staging tables),
then loading it into the end target database or data warehouse. This approach is
reasonable as long as many different databases are involved in your data warehouse
landscape. In this scenario you have to transport data from one place to another anyway,
so it’s a legitimate way to do the transformation work in a separate specialized engine.
ELT: Extract, Load, Transform (ELT) is a process where data is extracted, then loaded
into a staging table in the database, transforming it where it sits in the database and then
loading it into the target database or data warehouse.
Ans: This component is used for correct mailing addresses associated with customer
data to ensure a single customer view and better delivery for their customer mailings.
Ans: Yes. Change the background color of the Job designer by clicking Preferences on
the Window menu, followed by Talend, Appearance, Designer, and then Colors.
Ans: No, schemas must be defined during design, not run time.
Q10) Can you define a variable that is accessible from multiple Jobs?
Ans: Yes, you can declare a static variable in a routine, and add the setter/getter methods
for this variable in the routine. The variable is then accessible from different Jobs.
Ans: This is no possible; you cannot directly edit the code generated for a Talend Job.
Q13) If you want to include your own Java code in a Job, use one of these methods?
Ans:
1. Use a tJava, tJavaRow, or tJavaFlex component.
2. Create a routine by right -clicking Routines under Code in the Repository and
then clicking Create routine
Ans: No. Secure(or SSH) File Transfer Protocol (SFTP) is not FTP. It was defined as an
extension to SSH and assumes an underlying secure channel. There is no relationship
between FTP and SFTP, so concepts such as “transfer mode’ or “current remote
directory” that exist in FTP do not exist in SFTP.
For the same reason, there is no transfer option when you select ‘SFTP Support’ on a
tFTPxxx component.
Ans: By default, the date pattern for a column of type Date in a schema is “dd-MM-yyyy”.
Q17) What is the difference between “Insert or Update” and “Update or Insert”?
1. Insert or Update: First tries to insert a record, but if a record with a matching primary key already
exists, instead of updates that record.
2. Update or Insert: First tries to update a record with a matching primary key, but if none already exists,
instead inserts the record.
From a results point of view, there are no differences between the two, nor are there
significant performance differences. In general, choose the action that matches what you
expect to be more common: Insert or Update if you think there are more inserts than
updates, Update or Insert if you think there are more updates than inserts.
Ans:
Built-in: all information is stored locally in the Job. You can enter and edit all information
manually.
Repository: all information is stored in the repository.
You can import read-only information into the Job from the repository. If you want to
modify the information, you must take one of the following actions:
1. Convert the information from Repository to Built-in and then edit the built-in information.
2. Modify the information in the Repository. Once you have made the changes, you are prompted to
update the changes into the Job.
Ans: It depends on the way you use the information is used. Use Built-In for information
that you only use once or very rarely. Use the Repository for information that you want to
use repeatedly in multiple components or Jobs, such as a database connection.
Ans: OnSubjobOK and OnComponentOK are trigger links, which can link to another
subjob.
The main difference between OnSubjobOK and OnComponentOK lies in the execution
order of the linked subjob. With OnSubjobOK, the linked subjob starts only when the
previous subjob completely finishes. With OnComponentOK, the linked subjob starts
when the previous component finishes.
Q21) How can you normalize delimited data in Talend Open Studio?
Ans: By using the tNormalize component
Ans: tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values
of the denormalized sorted row are joined with item separators. tDenormalizeSortedRow
helps synthesizing sorted input flow to save memory.
Q25) Which Talend component is used for data transform using buitl in .NET
classes?
Ans: tDotNETRow helps you facilitate data transform by utilizing custom or built-
in .NET classes.
Ans: tJoin joins two tables by doing an exact match on several columns. It compares
columns from the main flow with reference columns from the lookup flow and outputs the
main flow data and/or the rejected data.
Ans: Master data management, through which an organization builds and manages a
single, consistent, accurate view of key enterprise data, has demonstrated substantial
business value including improvements to operational efficiency, marketing effectiveness,
strategic planning, and regulatory compliance. To date, however, MDM has been the
privilege of a relatively small number of large, resource-rich organizations. Thwarted by
the prohibitive costs of proprietary MDM software and the great difficulty of building and
maintaining an in-house MDM solution, most organizations have had to forego MDM
despite its clear value.
Ans: This technical note highlights the important new features and capabilities of version
5.6 of Talend’s comprehensive suite of Platform, Enterprise, and Open Studio solutions.
Ans:
Extends its big data leadership position enabling firms to move beyond batch processing and into real-
time big data by providing technical previews for the Apache Spark, Apache Spark Streaming
and Apache Stormframeworks.
Enhances its support for the Internet of Things (IoT) by introducing support for key IoT protocols
(MQTT, AMQP) to gather and collect information from machines, sensors, or other devices.
Improves Big Data performance: MapReduce executes on average 24% faster in v5.6 than in v5.5,
and 53% faster than in v5.4.2, while Big Data profiling performance is typically 20 times faster in v5.6
compared to v5.5.
Enables faster updates to MDM data models and provides deeper control of data lineage, more
visibility, and control.
Offers further enterprise application connectivity and support by continuing to add to its extensive list
of over 800 connectors and components with enhanced support for enterprise applications such as
SAP BAPI and Tables, Oracle 12 GoldenGate CDC, Microsoft HDInsight, Marketo and Salesforce.com.