BestPractices - 2 - Naming Conventions - Data Quality
BestPractices - 2 - Naming Conventions - Data Quality
User Rating: 5 / 5
Details
Last Updated: June 28, 2015
Hits: 18277
Description
Naming standards are an important but often overlooked component when considering the success of a project. The
application and enforcement of naming standards not only establishes consistency in the repository but provides for a
developer-friendly environment. In addition, the names of artefacts can convey relevant information for tracking and
management purposes. Choosing and adhering to a common convention is the key.
It is also important to consider the different ways in which artefacts and objects are viewed. For example, the Informatica
Analyst and Developer tools provide a contextual view to the objects in the Model Repository. Other perspectives, such as
published Model Repository views or external spreadsheets for specifications can leave the user less aware of the context
and potentially confused over what artefacts they are viewing.
This document offers standardized naming conventions for various repository objects. Whatever convention is adopted, it
is important to make the selection very early in the development cycle and communicate the convention to project staff
working on the repository. The policy can be enforced by peer review and at test phases by adding processes to check
conventions both to test plans and to test execution documents.
It is important to note that the casing of the names does matter. Name values are stored in the repositories in the chosen
case, so words with different cases are not the same. However, names at the same hierarchy level cannot be
differentiated by case alone. For example, it is not possible to have the following two projects in a single model repository,
“DQ_Content” and “DQ_CONTENT”.
To ease promotion across environments, keep the connection names the same in the Domains across the migration
landscape. The naming convention should be applied across all development, test and production environments. This
allows seamless migration when migrating between environments. As an example, if the developer created a mapping
with a connection to a Data Warehouse named the same and of the same type of database but that physically differs in
each of the three environments, when the administrator imports a project folder from the development environment to the
test environment, the mappings automatically use the existing connection in the test repository. With a consistent naming
convention across environments, physical data objects can be migrated from the test to the production model repository
without manual intervention.
Note: At the beginning of a project, have the Repository Administrator or Database Administrator (DBA) set up all
connections in all environments based on the guidelines covered in this Best Practice. Then use permission options to
protect these connections so that only specified individuals or groups can modify them. As developers create artifacts for
reuse, they normally need access to database service accounts for sources and targets. It is usually a best practice to
restrict developers from creating their own connections and thus introducing possible variations on connection information
for promotable artifacts across the migration landscape. As row and column data access restrictions may apply to
Analysts in Production and across the migration landscape, there may be a need to permit those users to define their own
connections via the standard of {user}_{database name}_CONN, depending on the database type connection. The
database then controls the visibility the user has to the data. This is where the pass through credentials security option
available for some DBs may come in handy, particularly if authentication is outside the domain, such as via LDAP or SSO.
The Analyst Tool and its heaviest users require additional consideration. Some, but not all Model Repository artefacts are
visible from within the Analyst Tool. Business users and analysts normally need a more descriptive approach that leads to
less abbreviation. Many of the artefacts created by analysts, start with the default prefixes. It is typically best to adopt this
approach as a standard.
Artefacts requiring some development effort can still be helped with some standard abbreviation as some development is
impacted by the length and format of the name. Below, note that Data Object Name reflects the standard default but for
each of these objects a description can be added in the comments.
The Rule is a bit different as it is a function. If it is a data quality rule, the Description could include a qualifier as to what
type it is. Examples can include: COM=>Completeness, CNF=>Conformity, CON=>Consistency, ACC=>Accuracy,
VLD=>Validity, DUP=>Duplication, INT=>Integrity, and TIM=>Timeliness. Optionally, the rule name description potion
could also contain specific logical entity types like: customer, product, order, service, and others. Other options include
specific logical attribute types (e.g., given_name, surname, phone, and email) if applicable. It is also important to consider
identifiers linking artifacts to outside specifications for the purpose of traceability.
Rule Rule_[DESCRIPTION]
Scorecard Scorecard_[DESCRIPTION]
Projects
The objects are organized by project in the Analyst Tool. Project names should be meaningful and include the name or
identifier and whether its objects are reusable or not. To accomplish this, give projects a meaningful name based on the
business group or purpose.
Project folders are listed in alphabetical order in many of the object navigation screens. As a result, use prefixes in project
names to enforce a sort order so that is easier to see relevant groups.
Type Description
Shared projects can be read by all users. They should be used to contain common
objects for all developers or analysts such as reusable content or accelerators*.
*As a best practice, all out-of-the-box content and accelerators from Informatica
should be kept in a single project named “Informatica_DQ_Content”.
Project specific is where all the completed work needs to be kept. These projects
are driven by the business and should be created as non-shared. Only developers
assigned to the project or users should be granted the rights to access these
Project Specific projects.
i.e., b_Project
Each IDQ Developer and Analyst should be given a project to use as a sandbox.
Developer or Analyst
i.e., z_Developer_Name
This way the Shared Projects appear at the top, User Projects appear at the bottom and Business Projects appear in the
middle as shown in Fig 1.
Note: To simplify promotion from environment to environment, keep the promotable project names the same across the
migration landscape.
Folders
These names should first follow the naming constraints of the underlying database repository. Either project or usage
prefixes should be assigned. This will designate where or how they are being used. Next, they should contain descriptions
of their content.
Rule Comments
These comments are used to explain the process that the rule carries out. Informatica recommends reviewing the notes
regarding descriptions for the input and output transformation.
Developer Objects
Projects
Folders
Work Area
Individual developer folders or non-production folders should prefix with z_ so that they are grouped together and not
confused with working production folders, see Fig 2 for an illustration of this.
Address Validator AVA_{DESCRIPTION} that uses the cycle which passes the data is taking through the
Transformation transformation
AGG_{FUNCTION} that leverages the expression and/or a name that describes the
Aggregator Transformation
processing being done
AST_{FUNCTION} that leverages the expression or a name that describes the processing
Association Transformation
being done
Case Converter
CCO_{DESCRIPTION}
Transformation
Consolidation
CNS_{DESCRIPTION}
Transformation
Custom Data
CD_{DESCRIPTION}
Transformation
EXP_{FUNCTION} that uses the expression and/or a name that describes the processing
Expression Transformation
being done
Filter Transformation FIL_ or FILT_{FUNCTION} that leverages the expression or a name that describes the
processing being done
Java Transformation JV_{FUNCTION} that describes the expression or the processing carried out
RNK_{FUNCTION} that leverages the expression or a name that describes the processing
Rank Transformation
being done
Standardizer
STD_{DESCRIPTION}
Transformation
Weighted Average
WAV_{DESCRIPTION}
Transformation
Rule/Mapplet Descriptions
When a mapplet is validated as a rule, it will be available in the Analyst Tool. For this reason, rules and mapplets should
be given a meaningful description that can be easily interpreted by a Business Analyst or Business user. The descriptions
for the individual transformations are not exposed in the Analyst tool and should be written for a technical developer and
describe what each transformation is doing.
Workflow Objects
Applications
Application names can be created to logically group objects based on Effort, Subject Area, Type of object etc. While it is
possible to create applications with the same names between Projects/Folders, one might overwrite the other on the Data
Integration Service when deployed so it is recommended that application naming conflicts are avoided.
Port Names
Port names should remain the same as the source unless some other action is performed on the port. With respect to port
names, please keep in mind how autolink works. If a transformation connects to Sources, the same names are needed for
autolink to work. The same is true for Targets. The best approach is to use the port naming conventions so that work is
reduced. Unless the contents provided by a port are physically changed, it does not need to be (and should not be)
renamed. In that case, the port should be prefixed with the appropriate name.
For lookup transformations, the source/input port should be prefixed with in_. This allows users to immediately identify
input ports without having to line up the ports with the input checkbox. In other types of transformations, prefix the input
port with in_ if some process is applied in the transformation (and a corresponding output port was created as a result of
the process).
Generated output ports can also be prefixed. Along with selecting the port in developer tool and selecting the link path,
this helps to trace the port value throughout the mapping as it travels through other transformations. If the intention is to
use the autolink feature based on names, then output ports should be left as the name of the target port in the subsequent
transformation. For variables inside a transformation, the developer can use the prefix v, 'var_ or v_' plus a meaningful
name.
With some exceptions, port standards apply when creating a transformation object. The exceptions are the Read, Lookup
and Write transformation ports since these should be the same as the underlying data structures.
Prefixes on ports are preferable, as they are generally easier to identify as transformed; developers may not need to
expand the columns to see the suffix for longer port names.
Exception Tables
An exception table created by the exception transformation or simply used by that transformation should follow a specific
naming convention, such as that shown in Table 4 below.
Note: The table description along with the prefix total length must be less than 25 characters as another table is created
with the suffix “_ISSUE” and many of the database management systems do not allow more than 30 characters. Column
names are also an issue and must be less than 19 for the exception transformation as many database management
systems do not permit names longer than 30 characters and the exception transformation reserves 11 characters out of
that 30 for its own needs.
Transformation Descriptions
This section defines the standards to be used for transformation descriptions in the Developer Tool.
As part of best practice development, transformations descriptions are expected to describe the following points for each
transformation.
• The purpose of the Read Transformations and the data it is intended to select.
• Indicate if any overrides are used and if so, describe the filters or settings used. Some developers prefer items
such as the SQL statement to be included in the description as well. NOTE: Use of SQL overrides is discouraged
as it disables the licensable push down optimization feature.
Describe the lookup along the lines of the [lookup attribute] obtained from [lookup table name] to retrieve the [lookup
attribute name].
• Lookup attribute – is the name of the column being passed into the lookup and is used as the lookup criteria.
• Lookup table name – is the table on which the lookup is being performed.
• Lookup attribute name – is the name of the attribute being returned from the lookup. If appropriate, specify the
condition when the lookup is actually executed.
Expression transformation descriptions should adhere to the following format: This Expression [explanation of what the
transformation does].
• Expressions can be distinctly different depending on the situation; therefore, the explanation should be specific to
the actions being performed.
• Within each Expression, transformation ports have their own description in the format: This port [explanation of
what the port is used for].
• In mapplets as well as mappings it is a common practice to route the input or output through an expression (e.g.
exp_src_xxx, exp_tgt_yyy) before connecting those ports to other transforms as this minimizes the effort to change
sources or targets later. In some cases for development purposes, the source or target may start as a flat file and
later be changed to a table, logical data objects or web service.
Aggregator transformation descriptions should adhere to the following format: This Aggregator [explanation of what the
transformation does].
• Aggregators can be distinctly different, depending on the situation; therefore the explanation should be specific to
the actions being performed.
• Within each Aggregator, transformation ports have their own description in the format: This port [explanation of
what the port is used for].
Joiner transformation descriptions should adhere to the following format: This Joiner uses [joining field names] from
[joining table names].
• Joining field names are the names of the columns on which the join is done.
• The joining table names are the tables being joined.
Filter transformation descriptions should adhere to the following format: This Filter processes [explanation].
Describe the Update Strategy and whether it is fixed in its function or determined by a calculation.
Describe the port(s) that are being sorted and their sort direction.
Describe the source inputs and indicate what further processing on those inputs (if any) is expected to take place in later
transformations in the mapping.
Describe the function of the Java code, what data is expected as input and what data is generated as output. Also,
indicate whether the Java code determines the object to be an active or passive transformation.
Indicate the columns being used in the rank, the number of records returned from the rank, the rank direction and the
purpose of the transformation.
The address validation transformation descriptions should describe the input type: multiline, discrete or hybrid. It should
also specify the output type.
The association transformation descriptions should list the fields used for association.
The case converter transformation descriptions should explain the main goal of the case conversion.
The classifier transformation descriptions should describe the goal of the classifying operation and which probabilistic
model is used to perform it.
The comparison transformation descriptions should describe the comparison algorithm used to calculate the score output.
The consolidation transformation descriptions should list the association ID used for the consolidation and the different
consolidation type used on the different fields consolidated (most frequent, most non null, etc.).
The data masking transformation descriptions should include input/outputs and data masking types being used.
The decision transformation descriptions should explain the main goal of the algorithm. More descriptive information can
be added at the strategy level.
The exception transformation descriptions should specify the type of exception (duplicate exception or bad record
exception), the exception table name and the list of fields for which exceptions are generated (in case of bad record
exception).
The key generator transformation descriptions should explain the key generation strategies used to generates the groups.
The labeller transformation descriptions should explain what the labeller transformation is actually labelling. It should
specify if this is using probabilistic labelling or traditional reference data labelling.
The match transformation descriptions should list the type of match operation executed (traditional or Identity Match
Option (IMO)), whether it is using a match rule and which group keys are expected.
The merge transformation descriptions should explain the main goal of the merge operation.
The parser transformation descriptions should explain what the parser transformation is actually parsing. It must specify if
this is using a probabilistic parsing or a traditional token parsing.
The REST Web Service Consumer transformation descriptions should include WSDL information and mapping operation
inputs and outputs.
The sequence generator descriptions should explain any configurations for sequence range or reset values.
The SQL transformation descriptions should explain the purpose of the SQL transformation and list the tables involved in
the SQL query, which fields are used as parameters and the fields that are returned.
The standardizer transformation descriptions should explain the actual goal of the standardization operation, which type of
data is standardized and what is the expected result.
The Web Service Consumer transformation descriptions should include WSDL information and mapping operation inputs
and outputs.
The weighted average transformation description should list the expected input weight and give a goal of the computed
weight given as output.
• Indicate the purpose of the Write Transformations and the data it is intended to select.
• Indicate if any overrides are used and if so, it should describe the filters or settings used. Some developers prefer
items such as the SQL statement to be included in the description as well. NOTE: Use of SQL overrides is
discouraged as it disables the licensable push down optimization feature.
• Indicate any update strategy being used for the write target
The notification task descriptions should describe the type of notification sent to the user and who is the audience of this
notification.
The exclusive gateway task description should describe the purpose of the gateway, what sequence flows are generated
and which is the default.
The assignment task descriptions should describe the purpose of the variable assignment operation and which variables
are impacted.
The command task descriptions should specify what actions are performed and which type of commands are executed.
The human task descriptions should specify the exception table, the participant types and if there is any notification
generated.
This section defines the standards to be used for step descriptions in the Developer Tool.
The cluster step descriptions should describe the type of data to be reviewed, if there is any time out, which types of users
are involved and the list of notifications setup.
The exception step descriptions should describe the type of data to be reviewed, if there is any time out for the step,
which types of users are involved and the list of notifications setup.
The review step descriptions should describe the type of data to be reviewed, if there is any time out for the step, which
types of users are involved and the list of notifications setup.
Workflow Comments
Workflow comments should cover a number of areas and provide relevant information about:
Mapping Comments
Mapping comments should provide relevant information about the workflow purpose and function. Describe the source
data obtained and the data quality rules applied.
Informatica recommends using business terms along with technical details such as, table names. This is beneficial when
maintenance is required or if issues arise that need to be discussed with business analysts.
Note: There are limitations to how much can be put in a comment so it is recommended that comments be kept succinct.
It is also cautioned that comments are not used to replace user documentation.