0% found this document useful (0 votes)
53 views13 pages

Best Practices and Sizing Guide For Smart Data Integration When Used in SAP Data Warehouse Cloud

Best Practices and Sizing Guide for Smart Data Integration when used in SAP Data Warehouse Cloud

Uploaded by

Kalyan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
53 views13 pages

Best Practices and Sizing Guide For Smart Data Integration When Used in SAP Data Warehouse Cloud

Best Practices and Sizing Guide for Smart Data Integration when used in SAP Data Warehouse Cloud

Uploaded by

Kalyan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

Best Practices and Sizing Guide for Smart Data

Integration
When used in SAP Data Warehouse Cloud

Updated: June 10, 2022


TABLE OF CONTENTS
THE BASIC ....................................................................................................................................................... 4
Components ..................................................................................................................................................... 5
DEPLOYMENT GUIDELINES ........................................................................................................................... 5
Data Provisioning Agent Deployment ........................................................................................................... 5
Data Source Location ...................................................................................................................................... 5
Source System Operating System ................................................................................................................. 5
IMPLEMENTATION GUIDELINES ................................................................................................................... 6
Partitioning ....................................................................................................................................................... 6
Filtering and Projection................................................................................................................................... 6
Connection (Remote Source) properties ...................................................................................................... 6
Multiple Connections / Remote Sources ....................................................................................................... 6
OPERATIONAL GUIDELINES ......................................................................................................................... 7
Suspend replication ........................................................................................................................................ 7
Exceptions ........................................................................................................................................................ 7
SIZING – DATA PROVISIONING AGENT ........................................................................................................ 7
Assumptions .................................................................................................................................................... 7
Categorization of Replication Tables ............................................................................................................ 8
Template-Based Sizing Approach ................................................................................................................. 8
Additional Insights ........................................................................................................................................ 10
SOURCE IMPACT ........................................................................................................................................... 10
SAP DATA WAREHOUSE CLOUD / SAP HANA CLOUD IMPACT ............................................................. 10

2
This guide is intended for implementors of real-time replication using SAP HANA smart data integration
within SAP Data Warehouse Cloud and discusses the best practices that help achieve stable and performant
replication.

In SAP Data Warehouse Cloud, SAP HANA smart data integration based federation and replication
technology is used when federating or replicating data with Remote Tables. In such a case, a Connection
in SAP Data Warehouse Cloud is established based on an SAP HANA Remote Source that itself uses one of
the available SAP HANA smart data integration Adapters.

The relevant connection types in SAP Data Warehouse Cloud using SAP HANA smart data integration
technology are:

SAP Data Warehouse Cloud Connection Type SAP HANA smart data integration Adapter

Cloud Data Integration CloudDataIntegrationAdapter

Generic JDBC CamelJdbcAdapter

Generic OData ODataAdapter

Microsoft SQL Server MssqlLogReaderAdapter

Oracle OracleLogReaderAdapter

SAP ABAP ABAPAdapter

SAP BW ABAPAdapter

SAP ECC ABAPAdapter

SAP Fieldglass CloudDataIntegrationAdapter

SAP HANA with Category "On-Premise" using a Data HanaAdapter


Provisioning Agent

SAP Marketing Cloud CloudDataIntegrationAdapter

3
SAP S/4HANA Cloud CloudDataIntegrationAdapter

SAP S/4HANA On-Premise ABAPAdapter

SAP SuccessFactors for Analytical Dashboards ODataAdapter

THE BASIC
SAP HANA smart data integration implements a generic framework to connect to any remote sources (SAP
Data Warehouse Cloud: Connection) to browse metadata, federate/move data into SAP HANA On Premise
or SAP HANA Cloud / SAP Data Warehouse Cloud. It also adds the necessary functions to move change
data in near real-time.

For browsing metadata, to federate queries and move data it extends SAP HANA smart data access. The
data provisioning framework provides adapters to connect to remote sources. The adapters provided by SAP
are written in C++ and Java:
- The C++ adapters are hosted by SAP HANA – Data provisioning Server (aka DP server). In SAP Data
Warehouse Cloud this affects the “OData” and “SAP SuccessFactors for Analytical Dashboards”
connection types.
- The Java adapters are hosted by Data Provisioning Agent (aka DP agent) that runs outside of SAP
Data Warehouse Cloud / SAP HANA landscape. In SAP Data Warehouse Cloud, this affects all
other connection types mentioned above. Please find here the relevant documentation for preparing
the Data Provisioning Agent Connectivity in SAP Data Warehouse Cloud.

For real-time change data capture, SAP HANA smart data integration adapters can use different change
data capture technologies (SAP Data Warehouse Cloud uses only the so called “trigger based” changed
data capture (CDC) option) and send the changes to SAP HANA DP server for further processing.

Data provisioning has two steps –


• Materialization (aka initial load) which is a one-time step.
o This equals the “Enable Real-Time Access” option for SAP Data Warehouse Cloud Remote
Tables.
• Capture/apply change data which is an on-going step.
o This equals ongoing real-time replication for SAP Data Warehouse Cloud Remote Tables.

The materialization step is handled by SAP HANA TASK framework which executes SQL on virtual tables.
The requests to SAP HANA smart data integration adapters are routed automatically via SAP HANA DP
server. The change data is captured and pushed by adapter to SAP HANA DP server, where the changes
are processed and applied to the target table. The change data capture is expressed using an object called
remote subscription.

In SAP Data Warehouse Cloud, all these objects (SAP HANA Virtual Table, target table for replication, SAP
HANA Task and SAP HANA Remote Subscription) are generated based on SAP Data Warehouse Cloud
Remote Tables and respective user interaction in SAP Data Warehouse Cloud are not visible for end users.
All available modeling and administration actions are performed from the Remote Table Editor or Remote
Table Monitor.

4
Components
• Data Provisioning Agent (DPAgent) – Hosts the adapters used to connect to source of data and
sends data to SAP HANA
• Data Provisioning Server (DPServer) – The following are the critical subcomponents
o Receiver – Receives data from the agent and stores in persistent storage (disk)
o Distributor – Reads data from Receiver’s queue and distributes to relevant subscriptions
o Applier – Receives data from distributor and applies the appropriate DML (insert, update,
delete, upsert) to target.
Remote Sources (SAP Data Warehouse Cloud: Connection) define the connection to the source; virtual
tables and remote subscriptions define the change data that the agent/adapter will send. An instance of
receiver, distributor, applier is started for each remote source. The change data sent by the agent is applied
in a transactionally consistent manner, serially and in the order, it was received. The change data received
and stored in the persistent storage by the receiver is tied to the remote source.

DEPLOYMENT GUIDELINES

Data Provisioning Agent Deployment


This section provides guidelines for deploying the Data Provisioning Agent in your landscape. See also
preparing the Data Provisioning Agent Connectivity in SAP Data Warehouse Cloud.

Data Source Location

We generally recommend that you install the Data Provisioning Agent close to the data sources. If you have
data sources scattered across multiple geographical locations separated by distance and network latency,
you can deploy multiple Data Provisioning Agents. Install at least one in each geographical location.

The best performance is achieved when the agent is directly installed on the same server as the data source.
However, various operational and IT policy reasons may prevent you from installing the agent directly on the
data source servers. In these situations, we recommend that you install the Data Provisioning Agent on a
supported virtual machine close to the source. Installing the agents on the same machine as SAP HANA
server is NOT recommended.

When there is a firewall between the Data Provisioning Agent and the Data Provisioning Server in SAP
HANA, the connection is automatically configured to use JDBC/HTTPS mode. When using JDBC/HTTPS
mode, Gzip compression is used by default.

When the Data Provisioning Agent connects to the SAP HANA server over TCP/IP, compression is not
enabled by default, because the network latency is assumed to be negligible due to geographic proximity. If
the TCP/IP connection introduces significant network latency, configure compression using the Data
Provisioning Agent’s command line tool (dpagentconfig.ini -> set framework.compressData to 3 for tcp
connections to enable compression).

Source System Operating System

The Data Provisioning Agent can be installed on various versions of Windows and Linux. The operating
system required for the Data Provisioning Agent may depend on the operating system of the data source
system, where applicable.

For the latest complete information about operating system support for the Data Provisioning Agent and data
sources, refer to the Product Availability Matrix (PAM).

5
IMPLEMENTATION GUIDELINES

Partitioning
To improve performance for large source objects, consider partitioning the target replication table and the
underlying task of a Remote Table. Task partitioning allows SAP HANA to read, process and commit the
partitioned virtual table input sources in parallel.

For more details, please refer to Creating Partitions for Your Remote Tables.

Filtering and Projection


Within the Remote Table Editor, you can remove unnecessary columns and create filters for reducing the
volume of data that is loaded in your remote table.

For more details, please refer to Restrict Remote Table Data Loads.

Connection (Remote Source) properties


For some SAP Data Warehouse Cloud connection types, advanced properties can be set to optimize the
performance of real-time replication. This section only lists some important examples. For more details,
please refer to Create a Connection.
For SAP Data Warehouse Cloud connections to source databases like SAP HANA, Oracle and Microsoft
SQL Server, knowing the data-change behavior of tables allows fine-tuning of trigger-based real-time (delta)
replication:
• If there are only a few primary-key columns and many non-key columns, consider setting “Triggers
Record Primary Keys Only” or “Triggers Record ROWID Pseudo Column Only” (Oracle) to true. This can
improve the DML performance in the source database significantly. If primary-key values are not updated,
as common for ABAP-managed tables, this can be combined with “Triggers Capture Before and After
Images” set to false. If records are deleted from the source and filters are defined on non-key columns in
SAP Data Warehouse Cloud, make sure to set “Triggers Record Full Before Images” to true.
• If many records are inserted, updated, and deleted, consider setting “Transmit Data in Compact Mode”
and “Enable Transaction Merge” to speed to the transfer of data changes and their application to the
target table in SAP Data Warehouse Cloud. Note that this comes at compromises on transactional
consistency among remote tables.
• If records are inserted, updated, and deleted in bulk, consider setting “Enable Statement-level Triggers”
(SAP HANA) or “Compound Triggers” (Oracle).
Since those advanced properties apply to connections, consider grouping remote tables according to their
data-change behavior and creating multiple connections for the same source if needed.

Multiple Connections / Remote Sources


We recommend using multiple connections (remote sources) connecting to the same source to distribute the
Remote Table real-time replication load across the connections (remote sources) with larger data volumes.
For more details, see the section “Categorization of Replication Tables” and specifically the “SAP Data
Warehouse Cloud / SAP HANA Cloud target (for replication only)” information in the “Template-Based Sizing
Approach”.

Think of a remote source as a pipe. The Data Provisioning Agent pushes change data into this pipe, the
server then applies the data serially, in the order that it was received and in a transactionally consistent
manner. If you have more remote sources, you have better parallelism and faster replication.

Source tables should then be selected for replication from one of the remote sources, making sure that a
source table is not replicated in another remote source. In SAP Data Warehouse Cloud, this would involve
using several connections pointing to the same source with each source entity using one Remote Table in
one (and only one) of these connections.

6
Consider the following criteria for deciding how to distribute the intended set of source tables across the
remote sources:
• Size of transactions: Large/small transactions depending on amount of change data per transactions.
Large transactions are more efficient to replicate since such transactions can be batched up for the
apply process on the target.
• Type of change data: insert, update, delete, upsert
Consecutive insert/upsert or consecutive deletes can be batched up for the Applier process,
therefore faster. Update rows require a two-step process; first to delete the old row and second to
insert the new row hence much slower.
• Row size: overall size of each row in table
• LOBs: Tables containing LOB columns typically require multiple trips to the source to retrieve the LOB
data. This overhead slows down replication.
• Criticality of a set of tables: group critical tables into one remote source

Note that the full table size does not matter for real-time replication. The main consideration for performance
is volume of (unfiltered) changed data per day and the type of changes.

Guidelines for distribution of tables across remote sources


• Keep change volume per remote source less than 75-100 million per day (80% insert/upsert, 10%
delete, 10% update rows for midsized transactions)
• Put tables that have updates or contain LOBs into separate remote sources so that replication for one
remote source does not slow down severely.

OPERATIONAL GUIDELINES

Suspend replication
For planned downtimes/maintenance, we recommended suspending replication prior to the maintenance
window and then resuming replication for normal operations.

In SAP Data Warehouse Cloud, you can perform this when Pausing and Restarting Real-Time Replication.

Exceptions
Real-Time Replication exceptions for Remote Tables are displayed on connection level in the Remote Table
Monitor in SAP Data Warehouse Cloud. They need to be processed with a “Retry” actions that replays the
failed transaction again. It is important to process the exceptions as soon as possible.

Any replication exception causes the replication to stop for all subscriptions attached to that connection
(remote source). For trigger-based replication as used in SAP Data Warehouse Cloud, replication stoppage
means that the change data keeps accumulating in the source system.

In SAP Data Warehouse Cloud, you can perform this by Resuming Real-Time Replication After a Fail.

SIZING – DATA PROVISIONING AGENT

Assumptions
The SAP HANA smart data integration Data Provisioning Agent sizing approach for initial load and real-time
replication focuses on the simplest use case where data from one source SAP HANA system is replicated to
a single SAP HANA target system via the SAP HANA adapter without any complex data transformation.
Other variants, such as replicating data from multiple different source systems, utilizing similar log-based
adapters, or loading to multiple SAP HANA targets can be calculated based on this sizing information. You
can therefore extrapolate the requirements of the single SAP HANA smart data integration configurations to
calculate the overall expected capacity as described in this document. All more advanced variants like
utilizing adapters that require more complex processing may require additional information to determine
proper sizing.

7
Categorization of Replication Tables
As input for sizing SAP HANA smart data integration for an SAP HANA scenario, you need to analyze tables
which will be federated and/or replicated and classify them into categories. Determine the following
information for all tables (or only for the most frequently modified (inserted, updated and deleted)):
a) The weighted average number of table columns (one value)
b) The weighted average record length (one value)
Based on the analysis, determine the appropriate category for the volume of data, either small (S), medium
(M), large (L), or extra-large (XL).

Up to 150 columns 151 to 250 columns More than 250


columns

< 1500 bytes per record S M L

> 1500 bytes per record M L XL

Example: Weighted categorization of replication-relevant tables in the relevant system based their
characteristics and modification rate:

Table Name # Columns Length Category Modification % of all


rate (per modifications
hour)
AAAA 41 510 S 50.000 3.5%

BBBB 111 1350 S 1.000.000 66%

CCCC 136 2130 M 50.000 3.5%

DDDD 180 2170 L 200.000 13.5%

EEEE 312 3250 XL 200.000 13.5%

Total 1.500.000 100%

Weighted avg. 146


number of
columns (WC)

Weighted avg. 1715


bytes per record
(WL)

Weighted M
Category

Weighted average number of columns =


3.5*41 + 66*111 + 3.5*136 + 13.5*180 + 13.5*312 = 143 + 7326 + 476 + 2430 + 4212 = 14587/100 = ~146
columns

Weighted average number of bytes per record =


3.5*510 + 66*1350 + 3.5*2130 + 13.5*2170 + 13.5*3250 = 1785 + 89100 + 7455 + 29295 + 43875 =
171510/100 =
~1715 bytes

Template-Based Sizing Approach


In order to provide a sizing estimate, SAP offers a simplified approach with three different SAP smart data
integration scenarios:

8
SMALL MEDIUM LARGE

Use Case A small scenario with: A midrange scenario with: An upper mid-range
• One source system • Approximately 1-3 scenario with:
different source • Up to 6 different
• Up to 40 tables systems source systems
• A weighted table size • And/or up to 100 • And/or up to 300
category of S-M tables in total tables in total
• Initial load of tables • A weighted table • A weighted table size
balanced based on size category of M-L category of M-XL
SAP HANA target
capacity • Initial load of tables • Initial load of tables
done sequentially done sequentially
• Modification rate less across sources and across sources and
than 1,500,000/hour balanced based on balanced based on
SAP HANA target SAP HANA target
The example above fits here
capacity capacity

Modification rate less Modification rate less than


than 5,000,000/hour 10,000,000/hour

DPAgent Server • Hardware: 8-16 CPU • Hardware: 16-32 • Hardware: 32-64


cores, 16-32 GB of CPU cores, 32-64 CPU cores, 64-96
main memory, 2-3x GB of main GB of main memory,
disk space based on memory, 2-3x disk 2-3x disk space
main memory space based on based on main
main memory memory
• DPAgent ini updates:
Increase Xmx to • DPAgent ini • DPAgent ini updates:
8192m or higher updates: Increase Increase Xmx to
Increase Xms to the Xmx to 16384m or 32768 or up to
same or similar higher Increase 65536m
number Xms to the same or Increase Xms to the
similar number same or similar
• Ensure 6-8 GB of free number
RAM availability for • Ensure 8-12 GB of
the OS and JVM free RAM • Ensure 12-24 GB of
variations*, above and availability for the free RAM availability
beyond the Xmx OS and JVM for the OS and JVM
setting variations*, above variations*, above
and beyond the and beyond the Xmx
Xmx setting setting

SAP Data Warehouse • Single Remote Source • Separate remote • Separate remote
Cloud / SAP HANA / Connection source(s) / source(s) /
Cloud target (for connections for high connections for high
replication only) • ~ 1 additional CPU volume modification volume modification
core rate tables rate tables
• < 1 GB memory (not • ~ 2-4 additional • ~ 4-8 additional CPU
including memory CPU cores cores
growth over time as
data volume • 1-2 GB memory • 2-4 GB memory (not
increases) (not including including memory
memory growth growth over time as
over time as data data volume
volume increases) increases)

9
Additional Insights
Initial load sizing is largely dependent on the SAP Data Warehouse Cloud / SAP HANA Cloud target
capacity. SAP Note 2688382 provides additional information for an SAP HANA On Premise context
specifically with certain aspects that can serve as useful background information when configuring your SAP
Data Warehouse Cloud Tenant.

As noted in the template section above, for source tables with a high-volume modification rate (>5M/Hr), it is
recommended to create separate connections / remote sources to ensure near real-time data replication.

If your use case exceeds the parameters in the Large scenario above, either by number of source systems,
number of tables, number of tables to federate in parallel or modification rate volume, it may be necessary to
deploy multiple DPAgent instances, replicating the sizing guidelines referenced above for each DPAgent.

These sizing configurations are based upon the default DPAgent configuration settings. If modifications are
made to the configuration settings, the sizing considerations within this document could be impacted.

* Java processes can consume more than the Xmx setting for a variety of reasons. The JVM overhead can
range from anything between just a few percentages to over one hundred percent more than the Xmx
setting. We do not recommend setting Xmx value larger than 128GB.

SOURCE IMPACT
This chapter will provide additional input on the impact that SAP HANA smart data integration based Remote
Table Replication will have on the source system. Compared to data federation in SAP Data Warehouse
Cloud or frequent data snapshots, real-time replication requires only a few additional resources.

When you enable real-time access for a remote table in SAP Data Warehouse Cloud, the resource
consumption for the initial load is like a snapshot load and the impact can be mitigated by the same
measures: partitioning, filtering, and projection. During the actual real-time replication, the resource
consumption depends on the data-change volume and the change-data-capture technique. SAP Data
Warehouse Cloud makes use of trigger-based replication through database adapters and API-based
replication for ABAP-based source systems:
• Triggers update generated shadow tables and a so-called "trigger queue" in the source database.
Execution of these triggers slows down booking transactions. This impact can be mitigated by fine-
tuning the advanced connection properties in SAP Data Warehouse Cloud to the update behavior in
the source database (see chapter” Connection (Remote Source) properties”). Multiple connections to
the same source database can also help to ensure that data changes are fetched fast enough to
avoid congestion.
• In ABAP-based source systems, the ODP API uses a set of techniques to capture data changes:
ABAP background jobs to extract data changes through a so-called "extractor", database triggers or
similar hooks into booking transaction. Data changes are compressed and queued in a so-called
"delta queue" where they are kept after being fetched for a retention time of 24 hours. For further
details refer to Operational Data Provisioning.

For both trigger-based replication and API-based replication, creating a dedicated user for replicating data
into SAP Data Warehouse Cloud helps to control the resource consumption in the source system.

SAP DATA WAREHOUSE CLOUD / SAP HANA CLOUD IMPACT

Please find all relevant information for calculating the number of capacity units required and for
configuring the size of your SAP Data Warehouse Cloud Tenant in our community:
https://community.sap.com/topics/data-warehouse-cloud

This chapter will provide additional input on the impact that SAP HANA smart data integration based Remote
Table Replication will have on capacity required for your SAP Data Warehouse Cloud (SAP HANA Cloud)
tenant.

10
In addition to guidance provided under the above link, SAP HANA smart data integration replication has
additional overhead. We recommend sizing your compute resources (memory) such that after all planned
source tables are replicated, there is at least 40% available free memory of the Global Allocation Limit (GAL).
Due consideration should be made for future data growth.

Realtime portion of replication is not expected to utilize large amounts of memory unless extremely large
transactions are the dominant type of transaction. The main consideration for the real-time portion is the
amount disk space available since pending unapplied transactions are stored in the persistent store
temporarily.

During the initial load of Remote Table, additional memory utilized is at least 3 times the raw uncompressed
data size of the partition. Other factors that influence memory utilized are partitions that may be defined for
the table.
The following are observed memory utilizations for typical SAP Tables MARM (30 columns), MBEW (110
columns), EKPO (345 columns):

#Partitions MARM: MARM: MARM: MARM:

FetchSize Memory Memory Memory


Used: Used: Used:
DPAgent DPServer Indexserver
MB MB MB
1 10,000 12785 113 50267

2 10,000 55889 1298 53320

4 10,000 66090 1129 61305

8 10,000 69689 692 47494

#Partitions MBEW: MBEW: MBEW: MBEW:


FetchSize Memory Memory Memory
Used: Used: Used:
DPAgent DPServer Indexserver
MB MB MB
1 10,000 17106 1725 120266

2 10,000 113866 520 151722

4 10,000 93860 1784 179093

8 10,000 84819 897 178194

11
#Partitions EKPO: EKPO: EKPO: EKPO:
FetchSize Memory Memory Memory
Used: Used: Used:
DPAgent DPServer Indexserver
MB MB MB
1 10,000 17200 421 600022

2 10,000 117325 1277 439042

4 10,000 141287 1430 447224

8 10,000 148981 1870 420313

12
www.sap.com/contactsap

© 2022 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See www.sap.com/trademark for additional trademark information and notices.

You might also like