0% found this document useful (0 votes)
751 views18 pages

High Performance Computing in CST Studio Suite: Felix Wolfheimer

CST slides

Uploaded by

Pragash Sangaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
751 views18 pages

High Performance Computing in CST Studio Suite: Felix Wolfheimer

CST slides

Uploaded by

Pragash Sangaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

High Performance Computing

in
CST STUDIO SUITE
Felix Wolfheimer

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


GPU Computing Performance
GPU computing performance has
Speedup of Solver Loop been improved for CST STUDIO
18
SUITE 2014 as CPU and GPU
16 Promo offer for EUC resources are used in parallel.
14 participants:
25% discount for K40 cards
12
Speedup

GPU
10
8
CPU
6
CST STUDIO SUITE 2013
4
CST STUDIO SUITE 2014
2
0
0 1 2 3 4
Number of GPUs (Tesla K40)
Benchmark performed on system equipped with dual Xeon E5-2630 v2 (Ivy Bridge EP) processors, and four Tesla K40 cards. Model has 80 million mesh cells.

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


Typical GPU System Configurations
Entry level Professional level Enterprise level

Cluster system with high-


Workstation with 1 GPU card Workstation/server with speed interconnect.
multiple internal or
 Available "off the shelf“ High flexibility: Can
external GPU cards
 Good acceleration for handle extremely large
smaller models  Many configurations available models using MPI
 Limited model size  Good acceleration for medium Computing and also a lot
(depends on available GPU size and large models of parallel simulation
memory and features used)  Limited model size tasks using Distributed
(depends on available GPU Computing (DC)
memory and features used)  Administrative overhead
 Higher price
CST engineers are available to discuss with you which configuration makes sense for your applications and usage scenario.

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


MPI Computing — Area of Application
MPI Computing is a way to handle very large models efficiently
Some application examples for MPI Computing:

Electrically very large structures Extremely complex structures


(e.g. RCS calculation, lightning strike) (e.g.SI simulation for a full package)
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
MPI Computing — Working Principle
Subdomain boundary CST STUDIO SUITE®
Frontend

connects to

MPI Client Nodes

Domain decomposition is
shown in mesh view. High speed/low latency interconnection network (optional)

 Based on a domain decomposition of the simulation domain.


 Each cluster computer works on its part of the domain.
 Automatic load balancing ensures an equal distribution of the workload.
 It works cross-platform on Windows and Linux systems.
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
MPI Matrix Computation
The performance of the matrix computation step has been improved significantly for the
new version of CST STUDIO SUITE.

Performance Results (for two cluster nodes):*


Matrix Comp. Matrix Comp. Speedup Speedup
Model
Time/s (2013) Time/s (2014) (Matrix Comp.)** (Total Sim.)**

10,301 1,217 8.46 2.63


340M cells
Matrix computation is
CPU CPU single-threaded in case of
MPI up to version 2013.
12,921 4,018 3.22 1.85 Core Core

CPU CPU Version 2014 uses all


47M cells Core Core available cores on all
cluster nodes.

* =System configuration: Compute nodes are equipped with dual eight core Xeon E5-2650 processors, 4xK20 GPUs, and Infiniband FDR interconnect.
**=Speedup between version 2013 and 2014 of CST STUDIO SUITE.

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


MPI Calculation Example
2 GHz blade antenna positioned on aircraft

2 GHz
17.4 x 4.5 x 16.2 m
116 x 30 x 108 λ
375,840 λ3

660 million cells


4 node MPI cluster
4 Tesla K20 GPU on each node
Total of 16 GPUs with 6GB RAM at 60% Memory
Total memory: < 100 GB

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


MPI Calculation Example
2 GHz blade antenna positioned on aircraft

2 GHz
17.4 x 4.5 x 16.2 m
116 x 30 x 108 λ
375,840 λ3

660 million cells


4 node MPI cluster
4 Tesla K20 GPU on each node
Total of 16 GPUs with 6GB RAM at 60% Memory
Total memory: < 100 GB Broadband calculation time ~ 4h
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
Sub-Volume Monitors
Sub-volume monitors allow to record field data only in a region of interest allowing for a reduction of
data. This is especially important for large models which have hundreds of millions mesh cells.

Field data is only stored in the


sub-volume defined by the box

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


Distributed Computing
CST STUDIO SUITE®
Frontend

“Jobs” could be: DC Main Controller


 port excitations*
excitations
 frequency points*
points
 parameter variations connects to
 optimization iterations
*2 in parallel included
with standard license DC Solver Servers

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


 Model has 16 ports
 Only 8 ports need to be computed if defining symmetry conditions
 Distribute the 8 simulation runs to different solver servers with
GPU acceleration
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
DC Simulation Time Improvement
Speedup (total time)
30

25 CPU

1 GPU (Tesla 20)


20
Speedup

15

10

0
1 2 4 8
Number of DC Solver Servers

Dual Intel Xeon X5675 CPUs (3.06 GHz), fastest memory configuration, 1 Tesla 20 GPU
per node, 1 Gb Ethernet interconnect, 40 million mesh cells
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
DC Main Controller
The DC Main Controller gives you a complete overview about what is happening on your cluster.
Job Status

Machine Status
Essential resources (RAM usage
and disk space) are monitored
as well in the 2014 version.

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


GPU Assignment
Users who have
smaller jobs can start
multiple solver servers
and assign each GPU
to a separate server.
This allows for a more
efficient use of multi-
GPU hardware

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


Supported Acceleration Methods
Acceleration methods supported by the solvers of CST STUDIO SUITE.
Solver Multithreading GPU Computing Distributed Computing MPI Computing

on one
GPU card

Most other solvers support Multithreading and Distributed Computing for parameter sweeps and
optimization.
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
Choose the Right Acceleration Method
Number of
Solver Model Size Acceleration Technique
Simulations
Transient
below memory limit of GPU
low GPU Computing
hardware

Transient
below memory limit of GPU
medium/high GPU Computing on a DC Cluster (Distributed Excitations)
hardware

Transient
above memory limit of GPU
- MPI or combined MPI+GPU Computing
hardware

Frequency Domain
can be handled by a single
medium/high Distributed Computing (Distributed Frequency Points)
machine

Integral Equation
can't be handled by a single
- MPI Computing
machine

Integral Equation
can be handled by a single
medium/high Distributed Computing (Distributed Frequency Points)
machine

Parameter
n/a medium/high Distributed Computing
Sweep/Optimization

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com


HPC in the Cloud
CST is working together with HPC hardware and service providers to enable easy
access to large computing power for challenging simulations which can't be run
on in-house hardware.
Users rent a CST license for the resources they need and pay the HPC provider
for the required hardware.
+
HPC system provider

Currently supported providers hosting CST STUDIO SUITE:

More information can be found in the HPC section of our website:


https://www.cst.com/Products/HPC/Cloud-Computing
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com
HPC Hardware Design Process
A general hardware recommendation is available on our website which helps you to
configure standard systems (e.g. workstations) for CST STUDIO SUITE.
For HPC systems (multi-GPU systems, clusters) our hardware experts are available to guide
you through the whole process of system design and benchmarking to ensure that your new
system is compatible with CST STUDIO SUITE and delivers the expected performance.
HPC System Design Process

Benchmarking of designed
Personal contact with CST computing solution in the Buy the machine if it fulfills your
engineers to design solution. hardware test center of the expectations.
preferred vendor.
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com

You might also like