Denodo Data Virtualization Basics
Denodo Data Virtualization Basics
Problem to solve
First you have to stop and understand the problem before starting your project. Let's take the
following scenario:
• You are working in a company that has information about its customers (CRM) stored in a
MySQL database.
• The complete billing information for any customer is exposed in an internal Web Service.
• End users do not want to use different applications to get available information of customers
(CRM, Sales Application, etc.).
• The IT department does not like the idea of creating a specific application for this business
need and it would like to reuse this customer global view, if possible, in any other current
application.
We said in the previous paragraph: "End users do not want to use different applications to get all the
information of customers", so the problem here is to connect disparate data to create a single view of
the customers, and the IT deparment wants that single view to be reusable.
If you follow a bad design (not reusing components, creating ad-hoc code/applications, etc.) your
systems will grow as in the image below. This is a typical example of exactly what you do not want:
At the end of this Tutorial you will learn that the Denodo Platform provides:
• Easy to generate data services.
• Data Services independent of the physical source(s).
• A single point to control your data sources.
• Short and Agile development cycles.
• Little to no coding!
• Intuitive solutions to simple needs.
• Reusabilty of your models by all clients
Before starting this tutorial, make sure you have all the necessary materials by checking
the Installation & Bootstrapping section.
If you have already installed all the components, please go ahead to the First Steps section.
• NEXT >
Before you start, be sure you have your development environment set up. You need to:
1. Install Denodo into a directory (avoid using the %Program Files% folder). This directory will be
referred to as <DENODO_HOME> throughout this tutorial.
2. Copy the mysql-connector-java-.jar to <DENODO_HOME>/lib-external/jdbc-drivers/mysql-5
3. Install & configure the database:
1. Install MySQL server.
2. Start MySQL and launch the MySQL Workbench application.
3. Connect to your MySQL server and then open
the <tutorial_directory>/MySQL/schema.sql script by choosing "Open SQL Script" from the
File menu.
4. Once the script has been opened, click on Execute (you can use any other method to
load the database).
5. After doing this, you should see a new database schema called "acme_crm" with three
tables defined (address, client and client_type). Test MySQL by logging in to the
acme_crm database with the credentials: acme_user / acme_user
4. Install a web server:
We are going to use Jetty to run some of the examples in the tutorials:
1. Go to <tutorial_directory>/jetty and run: java -jar start.jar from the command line. If you do
not have a Java Virtual Machine installed on your system you can use the JVM installed
with the Denodo Platform under <DENODO_HOME>/jre/bin.
2. Test the billing Web Service to see if it has been properly deployed, direct the web
Denodo Installation
At this point, you should have already downloaded the Denodo Express installation package from
your user account.
The installation package is not the only file to download, make sure
you have your personal license too.
TIP
The installation package is a .zip file. After decompressing the package you will see the files shown
in the image below:
In the next step you have to select the modules to be installed. This tutorial covers every module so
we suggest installing them all, but you only need Virtual DataPort to get started.
We recommend that you follow the guide for Installing Denodo before
starting this tutorial.
• NEXT >
Denodo is a global solution for heterogeneous and dispersed data source integration using a virtual
approach. It can connect to a wide range of data sources like relational databases, web services,
XML documents, flat files, multidimensional databases, JSON sources, etc.
Denodo will create wrappers on top of those data sources to create a common interface to access
them. Then, a user can combine the data coming from different data sources by defining views,
using the Administrator Tool GUI.
The diagram below shows the general architecture of the Denodo Platform:
After the Denodo installation, a desktop icon is generated for Denodo 7.0.
First, we have to double-click on that icon to launch the Denodo Platform Control Center.
• NEXT >
The Denodo Administration Tool allows the development and administration of your Data
Virtualization projects. Specifically, you can perform the following tasks:
• Create/Edit/Drop Denodo Virtual Databases.
• Create/Edit/Drop Data Sources.
• Create/Edit/Drop Views.
• Publish Data Services.
• Execute Queries.
• Add Extensions.
• Configure the Cache System.
• Import/Export Metadata.
• Configure the Denodo Server.
At the end of the previous section you launched the Administration Tool. The first screen that the
application shows is a login dialog; the credentials you type here will be used to connect to a running
Denodo server.
• NEXT >
We saw in the previous section that we need valid credentials to connect to a Denodo database. In
our case, we used the default admin user to connect to the admin database.
Now, we are going to learn how to create another database.
To complete the tutorial this step is not neccessary. If you are going
to use the default admin database you can continue to the next
section.
NOTE
The Denodo server can contain different virtual databases. A virtual database is a schema
comprised of data sources, views, stored procedures, web services, etc. Each virtual database is
independent of the rest of the virtual databases created in the Denodo server (and different users
can have different privileges for each virtual database).
We are going to create a new database called tutorial.
2. In the workspace, you will see the predefined Denodo databases (these databases cannot be
dropped):
○ admin: default database for Denodo Virtual DataPort.
○ itpilot: default database for Denodo ITPilot.
3. You are going to start a new project, so you will need to create a new database (this is a best
practice):
1. Click on the New button.
2. Specify the name of the database: tutorial.
3. Click Ok.
That's all!
Now, you can follow this Tutorial using this new database, so disconnect your current session (File >
Disconnect) and log in again into the tutorial database:
• Login: admin
• Password: admin
• URI: //localhost:9999/tutorial
When changing the user, it is not necessary to disconnect and reconnect. Another option is to
navigate through the Elements Tree and simply selecting the tutorial database.
• NEXT >
Now it is time to create elements in our virtual database. But wait, first we have to stop and think
about what are good practices when creating elements. One of the best practices is to have good
organization of elements inside of our database.
For that purpose, the Denodo Administration Tool offers the option to organize elements
inside folders in the Elements Tree, making it easier to work with them.
Creating a folder
In the next section you will learn how to create elements inside this folder.
• NEXT >
The MySQL database that you installed in the Installation & Bootstrapping section contains the data
of the CRM of a company. This data is split into several tables:
• a table for client data,
• a table for client types (a client can be residential or business)
• and a table for addresses.
The diagram of the organization of this database is the following:
Your goal here is to combine this data using a Data Virtualization approach, this will enable us to
create views that are more meaningful for the consumers of the data without having to modify the
underlying data source (in many real-world scenarios we are not the owners of the data, just
consumers, so changing the data schema will not be possible).
The first step we need to follow to virtualize this relational database is to connect to it using the
Denodo Platform. Connecting to the data source will allow us to introspect it and graphically select
which of its tables are to be virtualized within the Denodo Platform. Once connected, we will create
one base view per table in the CRM.
A base view is a representation, in the Denodo Platform, of existing data in a remote data source.
This base view is only metadata that describes how the information is stored and accessed in the
First, let's create the data source for the CRM database. In this case, we will create a JDBC
connection to MySQL, but other possibilities are Oracle, Microsoft SQL Server, DB2, PostgreSQL,
Hive, Nettezza, Teradata, Denodo VDP, etc.
The recommended way to connect to databases when using Denodo is through JDBC (this is an
acronym referring to Java Database Connectivity), so let's start this tutorial creating a new JDBC
data source to import a table with a primary key.
In the Installation & Bootstrapping section, you installed a MySQL database server and copied its
driver into the Denodo Virtual DataPort extensions folder: <DENODO_HOME>/lib-external/jdbc-
drivers/mysql-5. With this driver added to the Denodo installation, you are ready to create the JDBC
Data Source following these steps:
1. Create two folders nested under the "1 - first steps" folder you have made, one for data
sources called "1 - Data Sources" and another for base views called "2 - Base Views".
2. Right-click on the "1 - Data Sources" folder and select "New > Data source > JDBC".
Save" button.
9. Click the "Create base view" button at the top.
The Administration Tool will show the introspected schema of the relational database:
Later, you will be able to query these base views or combine them with other views.
When the importing process is finished, you will see the new views in the elements tree panel. If you
double-click on the view name, the schema of the base view will be shown in the workspace.
Data Source child nodes cannot be moved to other folders. They are
added to provide an easy way to see the base views created from a
data source.
TIP
Finally, let's move the base views to the folder that we created for them by dragging them to the "2 -
Base Views" folder.
After these steps are completed, we have a virtual representation of our CRM in Denodo. In the next
section, we are going to learn how to query it to see how the data comes in real time from our
MySQL database, and, after that, we will start creating data combinations that will add semantic
value to the client applications that are consuming this data.
• NEXT >
Context menu
3. Then click the Execute button at the bottom of the panel to send the current sentence (shown
at the top) to the Denodo server.
VQL shell
2. Right-click on the name of the view and select "VQL Shell > Select ..." (a sentence is created
in the top-right panel).
Show results
button, then select the field, the operator and the value for the filter:
To summarize, now you have learned how to execute queries over Denodo views and how the
server is querying the source database in real-time to return the results. In the next section, we are
going to learn how to create new views using combinations between existing Denodo views.
• NEXT >
Now you are ready to explore the capabilities of the Denodo Platform that make Data Virtualization a
very powerful tool. In the previous sections, you have connected to the CRM database and queried
Join operation
Let's see an example of a derived view creation process using the join Operation. Before we begin,
let's create a new folder named 3 - Derived Views to stay organized. Then, right-click in the elements
tree and select New > Join. Now, an empty Join View panel will be shown in the Administration
Tool workspace.
To select the views on which the join operation is going to be executed, you have to drag &
drop them from the list of views that appear on the Elements Tree. As input views are added, the
schema of the resulting join view is generated automatically.
In our example, you have to follow these steps:
1. Drag & drop the client, client_type and address views into the workspace.
2. Drag the client.client_id field and connect the arrow to the address.client_fid field to set one of
the join conditions.
3. Drag the client.client_type field and drop the arrow on the client_type.code field.
5. Rename the view "personal_data_crm" by typing in the input box labeled "View name" at the
top.
Once these steps are completed, you will have a derived view (virtual, data is not stored in Denodo)
that represents the concept of a customer within your organization. This view can be queried in the
same way that you did for the base views (check how to in the previous section). Now that this data
is defined, your client applications can just retrieve this information directly from the virtualization
server without having to define the data combination themselves.
The Denodo Administration Tool also provides a full set of relational operations, in addition to the
join, to create new views:
• UNION
• PROJECTION
• SELECTION
• AGGREGATION
• MINUS/INTERSECTION
• FLATTEN
These operations can be used in the same way than the Join operation (Right-click >
New > OPERATION), you can try yourself!
• NEXT >
In the previous section, you learned how to access a customer database to get personal and contact
information about the company's customers.
The billing department of the sample company exposes the billing information using an internal web
service that exposes all the open bills for each customer: amount due, due date, etc. In this section
you will combine the unified customer view that you have built with this billing information so you can
have a report that lists the total amount that is due for each of your customers.
List of topics:
• Create a Web Service data source.
• Flatten: a hierarchical structure.
• See how derived views are constructed using Tree View.
• Create a join view between heterogeneous sources.
• Create an Aggregation view.
• NEXT >
As part of the installation steps, you have deployed a billing web application that exposes several
SOAP web services. The different services are available at http://localhost:8080/billing/services.
Make sure the billing web service is up and running before following the steps of this section
(Installation & Bootstrapping)
These web services have been created by the billing department and expose the information about
the customers' bills using three different operations:
• getBills: returns a list with all the bills in the system.
• getBillByCustomerID: returns a list with all the bills for the specified input customer id.
• getBillByPhoneCenter: returns a list with all the bills for the specified input phone center.
All the different operations will return the billing information using a hierarchical structure: the bills
will be returned as part of a list. For instance, if you invoke the getBillByCustomerId operation
using a customer id as input parameter, you will get a list with all the bills for that customer. For each
item (bill) in the list, you will see: the customer id and ssn, the amount due for that bill, the billing
start date, end date and due date, the phone center that provided the service for that bill, and the bill
id.
3. Click on the
button (you can leave the default values for the remaining options).
Now you have to click on Create base view button. You will see on the screen a list with the
different operations available in the web service and the option of creating a new base view for each
one of them.
In this case, you are interested in the getBillByCustomerId operation, so click on the Create Base
View link associated with this operation.
Click on
• NEXT >
In the previous section, you have created a base view on top of a SOAP web service data source.
This web service returns a hierarchichal structure that includes a return element of type array.
• NEXT >
In the previous section, you saw how to create new derived views by combining other views. Once
you start combining views the complexity of the new views will grow and it will be useful to visualize
how those views are built. For this, you have the Tree View functionality.
In the Simple Derived Views section, you created the personal_data_crm view using two join
operations over several base views. To see the Tree View you have to right-click on the view name
and select Tree View (see image below).
• NEXT >
At the beginning of the tutorial, you saw how to create new views using the join operation but the
views involved in the joins were all coming from the same datasource. In this section, you will see
how you can create a new join view using the exact same procedure but coming from two different
and heterogeneous data sources.
To create the derived view you can follow these steps:
1. Right-click on the personal_data_crm view and select New > Join.
2. Drag & drop the billing_information view to the Join View wizard Model tab.
3. We will use client_id = customer_id as the join condition so drag & drop a line from
the client_id field in the personal_data_crm view to the customer_id field in
the billing_information view.
• click
Save.
Now, if you execute the new view we will get the information about the bills from the different clients.
• NEXT >
In the previous section,you have created a view that obtained the billing information for all the
customers in your database. In this view we have one record for each bill, but we want to calculate
the total amount due by the customers instead of having the separate bills.
To do so, you can create a new view that aggregates (group by) the different customers using
the customer_id to compute the total amount.
To create the new aggregation view you can follow these steps:
SUM(CAST('float', client_with_bills.amount_due))
CODE
Since the amount_due field comes as a text from the Web service,
you will have to cast the field to a numeric value using
the CAST function.
NOTE
Now, if you execute the new view, you will get as result one record per customer with the total billing
balance:
• NEXT >
In previous sections, you created several views with Denodo to allow client applications to retrieve
the information directly from the Denodo server. In particular, you have created the
view 'amount_due_by_client' which combines data from several sources and exposes the
information about the billing balance of a company's clients.
You already know how to execute queries over that view, but now it's time to connect to the Denodo
server from your external applications. Denodo Platform is based on a client-server architecture,
where clients issue requests to the server. These requests can be sent using one of the following
interfaces:
• JDBC: Denodo provides its own JDBC driver.
• ODBC: Denodo provides an ODBC interface (requires the installation of additional
components).
• ADO .Net: Denodo is compatible with the Npgsql ADO.Net provider for PostgreSQL.
• RESTful Web service (XML, JSON, HTML outputs): useful for applications that cannot use
the JDBC or ODBC interfaces to connect to Denodo.
• NEXT >
JDBC (Java DataBase Connectivity) is a Java data access technology from Oracle Corporation.
JDBC provides an API for the Java programming language for database-independent connectivity,
and it is based on the use of drivers for each database. A client application requires separate
drivers, usually vendor supplied, to connect to different types of databases.
Denodo includes a JDBC driver jar file named denodo-vdp-jdbcdriver.jar, and it is located under
the <DENODO_HOME>/tools/client-drivers/jdbc/ directory.
In this section, you are going to see how to access to the Denodo server using a JDBC client. This
information is valid for any Java-based application. For the example, we will use DBVisualizer (a
generic database management tool for developers) but feel free to use any other JDBC client.
The first thing that you have to do when connecting using JDBC is to add the Denodo's JDBC driver
to the client application.
To use the JDBC driver in your client, you have to add the .jar file to the classpath of your
application.
In DBVisualizer, you have to go to Tools > Driver Manager... and in the Driver Manager window go
to Driver > Create Driver and then browse to the VDP's driver file. Use the following driver settings
and close the window to save the configuration:
Now that you have added the driver, you can configure a connection to your Denodo virtual
database. Go to Database > Create Database Connection and use the following settings for the
connection:
• Driver (JDBC): Denodo 7.0
• Database URL: jdbc:vdb://localhost:9999/tutorial
• Database Userid: admin
• Database Password: admin
Click on the Connect button, and you will establish a connection to the tutorial database. In the left
• NEXT >
ODBC (Open DataBase Connectivity) is a standard to access databases originally developed by
Microsoft. ODBC provides an API to make the code independent of database systems and operating
systems.
Denodo provides an ODBC interface, but it requires the installation of the ODBC driver. Like any
other ODBC driver, you have to install it on the machine where the client application is running.
In this section you will learn how to access to the Denodo server using an ODBC client. This
information is also valid for any other ODBC connection. For the example, we will use MS Excel but
feel free to use any other ODBC client.
The first thing that we have to do when connecting using ODBC is to install the Denodo ODBC
driver. Denodo Platform 7.0 includes an ODBC driver named DenodoODBC and it is located under
the <DENODO_HOME>\tools\client-drivers\odbc directory. Extract the folders in this directory and run
the programs inside to install the drivers. Once this is complete, restart your Virtual DataPort Server
from the Denodo Control Panel.
Select the 32-bit or 64-bit version depending on the client that will use
it.
E.g. Clients such as old MS Excel versions can use only the 32-bits
ODBC driver, even if it is running on a 64 bits O.S.
NOTE
Once you've installed the ODBC driver you will need to add a new user data source:
1. Go to Control Panel > Administrative Tools > Data Sources (ODBC).
2. Select Add User DSN or Add System DSN. The difference is that "User DSN" can only be used
by the current user and "System DSN" can be used by all the users of the system.
3. Select the DenodoODBC ANSI or Unicode driver, and click on the Finish button.
Finally, click the Ok button and then click Save button to finish.
Now, you have your environment ready to connect to Denodo using ODBC (remember than the
previous steps are only valid to connect to the "tutorial" virtual database, so if you want to connect to
another database you will have to create a new DSN).
For an example of an ODBC client application you can use the well-known Microsoft Excel. You will
only have to select this DSN as a data provider to import the customer data into the spreadsheet.
5. Click the > button in the middle and you should see name, surname, client_id,
and total_amount appear underneath "Columns in your query:"
6. Click on Next (three times) and then Finish.
And voilà! The results from Denodo are populated into the MS Excel spreadsheet!
• NEXT >
RESTful Web service apply the ideas of the web to data-delivery, by providing scalable, flexible and
stateless access to data assets based on well-known protocols and formats like HTTP, HTML, XML
and JSON. Additionally to traditional SQL type access methods such as JDBC or ODBC, all the
views in Denodo can be accessed using the RESTful interface.
The Denodo RESTful Web service is an HTTP service deployed by default in the
URL http://localhost:9090/denodo-restfulws that exposes resources like databases and views in the
following standard representation formats:
• XML
• JSON
• XHTML (more user-friendly)
This Web service allows Denodo to work inside applications following the REST service architecture
style and provides support for linked data in the enterprise deployment (see next section for more
information about this).
When accessed from a browser, the Denodo RESTful endpoint will look like this:
Representations
We mentioned at the beginning of this section that Denodo supports three representation formats
(XML, JSON and XHTML). You saw in the previous examples the XHTML format using a browser.
How can you get the response in the other formats? The answer is easy, by adding a query
parameter to the URL:
• JSON output: you have to add $format=json. For example,
http://localhost:9090/denodo-restfulws/tutorial/views/client?client_id=C005&$format=json
• NEXT >
In the previous section, you learned how to issue queries from the RESTful interface of Denodo.
Now you will see how to enable linked data using a new Denodo element: associations.
Associations in Denodo
Associations represent a relationship between elements of two Denodo views. The concept is very
similar to the Primary Key / Foreign Key restrictions in relational databases.
Based on the definition of the associations, the Denodo RESTful Web service will show links that will
allow you to traverse the associations. Let's see how it works with an example, by using the views
created in previous sections of the tutorial.
2. Drag & drop the client and address views involved in the association into the workspace.
3. Link the client_id and client_fid fields that map the association.
4. Go to the Output tab and give a name for the association: client_address.
5. You have to provide names for each of the endpoints (Role name fields).
○ Endpoint 'client': address.
○ Endpoint 'address': belongs_to_client.
Save.
Now it's time to return to the RESTful Web service and get the results of
the client view: http://localhost:9090/denodo-restfulws/tutorial/views/client (see the previous
section for more information about how to query using the RESTful Web Sevice).
As you can see in the screenshot below, a new column with a link is added to the output table and
the text of the link is the Role name configured in the association for the endpoint. In this example, if
we click on address link for the customer John Smith, Denodo will follow the association and display
his address.
• NEXT >
Data Virtualization software accesses and extracts information from target sources at runtime and
combines them in real-time to get the results. As you know, no local copy of the data will be
available within Denodo.
With this in mind, it is clear that some of the more traditional performance optimization practices
used in database and data warehouse implementations, such as index construction, will fall outside
the scope of a real-time Data Virtualization framework but strategies such as caching can help to
improve the performance of real-time source access and combination goals.
The Denodo advanced cache system is based on a relational database (traditional or in-memory
database).
Denodo is an important component of any data management infrastructure, but not the only one.
When measuring performance, it is important to make sure which of the elements are bottlenecks.
For example, a data source might be returning data in a slow fashion; in some cases you will be able
to increase the performance by adding a new index to that source. If these actions cannot be
perfomed, you can configure an intelligent caching system in Denodo to speed up your queries.
1. Some data sources might be slow and you want to speed up your queries.
2. You want to avoid workloads in the data sources.
3. Pre-computed transformations are done in the Denodo layer, so they do not need to be
recomputed every time.
4. You want to delegate some queries with data coming from several different data sources.
5. Data sources temporal unavailability (especially when they are external sources).
• NEXT >
As you already know, Denodo includes a module to store local copies of the data as required. This
cache will use a Relational Database accessible through JDBC protocol (MySQL, Microsoft SQL
Server, Oracle, DB2, Netezza, Oracle TimesTen, etc).
To use the cache system it can be enabled at server level or at database level. For this tutorial we
will configure the cache at server level. Let's see how to configure the Denodo server to use cache:
1. Log-in using a global administrator user (for example, the default admin).
2. In the Menu Bar, go to Administration > Server configuration and then click Cache.
4. The default Embedded Derby server is the Database adapter that will be used this tutorial.
• NEXT >
In the previous section, you activated the cache module in your Denodo server. Now, you need to
configure your views to make use of the cache.
For example, let's activate the cache in the client_with_bills view. We have two main reasons to
select this view as a cached view:
• It queries two data sources (a MySQL database and a SOAP Web Service).
• Usually Web Service response times are worse than traditional databases response times.
Save.
Now test if the cache works as expected by performing the following test:
1. Open Tools > VQL Shell.
2. Execute the following query (make sure you have the tutorial database selected from the drop-
down Database menu):
SELECT * FROM client_with_bills TRACE
3. After the execution, click on the Execution Trace button (above the results) to see the query
execution plan. You can see that the data comes directly from the different data sources.
In the next section, you will learn more about the available cache modes. In the above example,
• INDEX
In the previous section, you activated the cache in one of your views. Now, it's time to learn more
about how the Denodo Cache works. Denodo has the following cache modes:
• Partial:
the first time a query over the view is executed, the cache table will be populated with the
tuples in the output from the datasource. At runtime, when a user queries the view, the Denodo
server checks if the cache contains the data required to answer the query. If it does not have
this data, Denodo will query the data source and populate the cache with that output.
When the Time To Live (TTL) of the data has passed, the cache system will invalidate the
cached data of the view so the next query will hit the data source.
This mode supports the following options:
○ With explicit loads: if this option is selected, the cache has to be loaded explicitly.
○ Match exact queries only: if this option is selected, the cache stores the result of each
query. Then, if the same query is executed, and the entries of this query in cache have
not expired (TTL), the data returned to the client is retrieved from the cache.
• Full:
The data of the view is always retrieved from the cache engine instead of from the source, this
mode always requires explicit cache loads.
The main benefit of this mode over the partial cache is that complex operations (joins, unions,
group by…) involving several views (even from different data sources) can be delegated to the
cache database. Therefore, the performance of these operations is significantly improved.
Cache examples
Let's see how the cache works using the following example (make sure you have
the tutorial database selected from the drop-down Database menu):
In Full mode, you have to explicitly load the cache.(this action can be
scheduled)
SELECT * FROM <view> CONTEXT('cache_preload'='true')
TIP
Congratulations! You have completed the Denodo Basics Tutorial.
This is just the first step in understanding the Denodo Data Virtualization software. Now, you are
prepared to go even further with the rest of our tutorials and become a Master Data Ninja!