SGI Technology Guide For CD-adapco Star-Ccm+ Analysts: March, 2014
SGI Technology Guide For CD-adapco Star-Ccm+ Analysts: March, 2014
P a p e r
March, 2014
Author
Abstract
STAR-CCM+ is a process oriented CAE software used to solve multi-disciplinary problems within a single integrated environment.
STAR-CCM+ is designed to utilize hardware resources as effectively independent of physical location and local computer
resources. In general CFD software has experienced significant effects to their compute environment in recent history. These
effects were governed by hardware and software features such as the introduction of cores and sockets as processor
components along with memory speed, I/O sub-systems, interconnect fabrics and communications software such
as InfiniBand and MPI.
This SGI Technology guide provides an analysis of the parallel performance of two STAR-CCM+ solvers, namely the Segregated
and the Coupled solvers using the Intel x86-64 architectural features of Intel, Hyper-Threading technology and Intel Turbo
Boost technology on SGI computer systems running the Intel Xeon Processor E5-2600/E5-4600 product families (code
named Sandy Bridge) and the Intel Xeon processor E5-2600 v2 product families (code named Ivy Bridge). This work is based
on SGI hardware architectures, specifically, the SGI ICE X System, SGI Rackable Standard Depth C2112-4RP4 cluster
solutions as well as the SGI UV 2000 shared memory system. The main objective is to provide STAR-CCM+ users with a
qualitative understanding of the benefits gained from these two features when executing them on SGI hardware platforms.
W H I T E PA P E R
TA B L E O F C O N T E N T S
11
11
12
21
21
21
4.3 Comparisons
21
21
W H I T E PA P E R
1 .1
Figure 1: Overhead View of SGI Rackable Server with the Top Cover Removed
W H I T E PA P E R
1.2
W H I T E PA P E R
1.3
SGI UV 2000
The SGI UV 2000 is a scalable cache-coherent shared memory architecture. SGI UV 2 product family can scale
a single system image (SSI) to a maximum of 2,048 cores (4,096 threads) due to its SGI NUMAflex, blade-based
architecture. The SGI UV 2 includes the Intel Xeon processor E5-4600 and the latest Xeon processor
E5-4600 v2 product family. This system can operate unmodified versions of Linux such as SUSE Linux
Enterprise Server and Red Hat Enterprise Linux. The SGI UV also supports scalable graphics accelerator cards,
including NVIDIA Quadro, NVIDIA Tesla K20 GPU computing accelerator and Intel Xeon Phi. Job
memory is allocated independently from cores allocation for maximum multi-user, heterogeneous workload
environment flexibility. Whereas on a cluster, problems have to be decomposed and require many nodes to
be available, the SGI UV can run a large memory problem on any number of cores and application license
availability with less concern of the job getting killed for lack of memory resources compared to a cluster. Fig. 3.
Figure
3: SGI UV 2000 with door open
W H I T E PA P E R
1.4
other options
2.0
STAR-CCM+ Overview
STAR-CCM+ includes an extensive range of validated physical models that provide the user with a toolset
capable of tackling the most complex multi-disciplinary engineering problems. The software is deployed as
a client that handles the user interface and visualization, and a server which performs the compute operations.
The client/server approach is designed to facilitate easy collaboration across organizations, simulations
can be accessed independently of physical location and local computer resources. STAR-CCM+ recently
W H I T E PA P E R
became the first commercial Computer Fluid Dynamics (CFD) package to mesh and solve a problem with
over one billion cells. Much more than just a CFD solver, STAR-CCM+ is an entire engineering process for
solving problems involving flow (of fluids or solids), heat transfer, and stress. It provides a suite of integrated
components that combine to produce a powerful package that can address a wide variety of modeling
needs. These components are:
3D-CAD Modeler
The STAR-CCM+ 3D-CAD modeler is a feature-based parametric solid modeler within STAR-CCM+ that
allows geometry to be built from scratch. The 3D-CAD models, can subsequently be converted to geometry
parts for meshing and solving. A major feature of 3D-CAD is design parameters, which allow you to modify
the models from outside of the 3D-CAD environment. These allow you to solve for a particular geometry,
change the size of one or more components, and quickly rerun the case.
CAD Embedding
STAR-CCM+ simulations can be set up, run and post-processed from within popular CAD and PLM
environments such as SolidWorks, CATIA V5, Pro/ENGINEER, SpaceClaim, and NX. STAR-CCM+s unique
approach gets you from CAD model to an accurate CFD solution quickly and more reliably. CFD results are
linked directly to the CAD geometry (a process called associativity). The STAR-CCM+ CAD clients have
bi-directional associativity so that geometry transferred across may be modified directly in STAR-CCM+
with the underlying CAD model updated.
Surface Preparation Tools
At the heart of STAR-CCM+ is an automated process that links a powerful surface wrapper to CD-adapco
unique meshing technology. The surface wrapper significantly reduces the number of hours spent on
surface clean-up and, for problems that involve large assemblies of complex geometry parts, reduces the
entire meshing process to hours instead of days.
The surface wrapper works by shrink-wrapping a high-quality triangulated surface mesh onto any geometrical
model, closing holes in the geometry and joining disconnected and overlapping surfaces, providing a single
manifold surface that can be used to automatically generate a computational mesh without user intervention.
STAR-CCM+ also includes a comprehensive set of surface-repair tools that allow users to interactively
enhance the quality of imported or wrapped surfaces, offering the choice of a completely automatic repair,
user control, or a combination of both.
Automatic Meshing Technology
Advanced automatic meshing technology generates either polyhedral or predominantly hexahedral control
volumes at the touch of a button, offering a combination of speed, control, and accuracy. For problems
involving multiple frames of reference, fluid-structure interaction and conjugate heat transfer, STAR-CCM+
can automatically create conformal meshes across multiple physical domains.
An important part of mesh generation for accurate CFD simulation is the near-wall region, or extrusion-layer
mesh. STAR-CCM+ automatically produces a high-quality extrusion layer mesh on all walls in the domain.
In addition, you can control the position, size, growth-rate, and number of cell layers in the extrusion-layer mesh.
W H I T E PA P E R
Physics Models
STAR-CCM+ includes an extensive range of validated physical models that provide the user with a toolset
capable of tackling the most complex multi-disciplinary engineering problems.
Time
Steady-state, unsteady implicit/explicit, harmonic balance
Flow
Coupled/segregated flow and energy
Motion
Stationary, moving reference frame, rigid body motion, mesh morphing, large displacement solid stress,
overset meshes
Dynamic Fluid Body Interaction (DFBI)
Fluid-induced motion in 6 degrees of freedom or less, catenary and linear spring couplings
Material
Single, multiphase and multi-component fluids, solids
Regime
Inviscid, laminar, turbulent (RANS, LES, DES), laminar-turbulent transition modeling. Incompressible
through to hypersonic non-Newtonian flows
Sensitivity analysis
Adjoint solver with cost functions for pressure drop, uniformity, force, moment, tumble and swirl.
Sensitivities with respect to position and flow variables.
Multi-Domain
Porous media (volumetric and baffle), fan and heat exchanger models
Heat Transfer and Conjugate Heat Transfer
Conducting solid shells, solar, multi-band and specular thermal radiation (discrete ordinates or
surface-to-surface) convection, conjugate heat transfer
Multi-component Multiphase
Free surface (VOF) with boiling, cavitation, evaporation & condensation, melting & solidification, Eulerian
multiphase with boiling, gas dissolution, population balance, granular flow, Lagrangian, droplet breakup,
collision, evaporation, erosion and wall interaction as well as discrete element modeling (DEM) composite
and clumped particles, non-spherical contact, particle deformation and breakup, fluid film with droplet
stripping, melting & solidification, evaporation & boiling, dispersed multiphase for soiling and icing
analysis fluid film.
Multi-Discipline
Finite Volume stress (small and large displacements, contacts), fluid structure interaction, electromagnetic
field, Joule heating, electro-deposition coating, electrochemistry, casting
Combustion and Chemical Reaction
PPDF, CFM, PCFM, EBU, progress variable model (PVM), thickened flame model, soot moments
emission, and DARS CFD complex chemistry coupling interphase reactions for Eulerian multiphase
W H I T E PA P E R
Aeroacoustic Analysis
Fast Fourier transform (FFT) spectral analysis, broadband noise sources, Ffowcs-Williams Hawkings
(FWH) sound propagation model, wave number analysis
Post-processing
STAR-CCM+ has a comprehensive suite of post-processing tools designed to enable you to obtain maximum
value and understanding from your CFD simulation. This includes scalar and vector scenes, streamlines,
scene animation, numerical reporting, data plotting, import, and export of table data, and spectral analysis
of acoustical data.
CAE Integration
Several third-party analysis packages can be coupled with STAR-CCM+ to further extend the range of
possible simulations you can do. Co-simulation is possible using Abaqus, GT-Power, WAVE, RELAP5-3D,
AMESIM and OLGA, and file-based coupling is possible for other tools such as Radtherm, NASTRAN
and ANSYS.
3.0
3.1
W H I T E PA P E R
W H I T E PA P E R
>& ${NPROC}core_${CASE}_${MPITYPE}_${CCMVER}_${MACHINE}.log
else if ( $MPITYPE == MPT ) then
setenv MPI_DSM_CPULIST 0-19:allhosts
setenv MPTVER 2.09-ga
#SGI_MPI_HOME is a STAR_CCM+ environment variable that contains the PATH to
#the SGIMPI installation directory
setenv SGI_MPI_HOME /sw/sdev/mpt-x86_64/${MPTVER}
# use the env var below for UV only
#setenv MPI_SHARED_NEIGHBORHOOD HOST
setenv MPI_IB_RAILS 2
setenv MPI_VERBOSE 1
#
(time starccm+ -power -np ${NPROC} \
-mpidriver sgi \
-batchsystem pbs \
-rsh ssh \
-batch Benchmark25.java \
./${CASE}.sim ) \
>& ${NPROC}core_${CASE}_${MPITYPE}${MPTVER}_${CCMVER}_${MACHINE}.log
else
echo Unkown MPI type $MPITYPE
endif
3 .2
W H I T E PA P E R
virtual cores thus executing 12 threads. Thus, for example, in the Intel Xeon 10-core 3.0 GHz E5-2690 v2
based compute node there is a total of 40 virtual cores allowing one to execute 40 threads. In practice, an
executing thread may occasionally be idle waiting for data from main memory or the completion of an I/O or
system operation. A processor may stall due to a cache miss, branch misprediction, data dependency or the
completion of an I/O operation. This allows another thread to execute concurrently on the same core taking
advantage of such idle periods.
Thus in this guide we use the following definitions and metrics based on the two features above:
Sn denote the elapsed time for the n-thread per node job in a standard non HTT and non Turbo Boost mode
of operation. Each node is configured to have n physical cores.
H2n denote the elapsed time for a 2n-thread per node job under HTT mode of operation where each
node is configured to have n physical and n hyper-threaded cores.
Tn denote the elapsed time for the n-thread per node job in a Turbo Boost mode and non HTT and non
mode of operation. Each node is configured to have n physical cores.
C2n denote the elapsed time for a 2n-thread per node job running in combined Turbo Boost and HTT mode
of operation. Each node is configured to have n physical and n hyper-threaded cores.
%H2n denote the percentage gain of H2n relative to Sn
%Tn denote the percentage gain of Tn relative to Sn
%C2n denote the percentage gain of C2n relative to Sn
For more details on the above definitions and metrics see [6].
4.0
Benchmark Examples
LeMansCar17m: 17m cells, turbulent flow, 500 iterations using Segregated and Coupled solvers, Fig 4.
Large Classified Model: Very large model, turbulent flow, 11 iterations using Segregated and Coupled Solvers
W H I T E PA P E R
Segregated Sol
30
Coupled Sol
Sn
25
20
15
10
5
0
1
16
32
64
30
20
10
0
4
16
32
64
-10
%Tn
%H2n
%C2n
-20
-30
25
20
15
10
5
%Tn
%H2n
4
%C2n
0
1
-5
16
32
-10
-15
-40
35
Figures 6, 7 and 8 present benchmark results for the LeMansCar17m model on SGI Sandy Bridge based
hardware, namely SGI Rackable C2112-4TY14, SGI UV 2000 and SGI ICE X respectively.
Figure 6: LeMansCar17m Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n, %Tn and %C2n on SGI Rackable C2112-4TY14 with
Sandy Bridge E5-2670 @ 2.60GHz, n=16.
20
30
Segregated Sol
Coupled Sol
25
20
15
10
5
0
1
16
32
10
0
1
16
32
-10
-20
-30
-40
%Tn
%H2n
%C2n
-50
-60
25
35
Sn
20
15
10
5
0
-5
-10
%Tn
4
8
%H2n
%C2n
16
32
-15
-20
Figure 7: LeMansCar17m Segregated and Coupled solver average elapsed times per iteration Sn and the
corresponding percentage gains %H2n, %Tn and %C2n on UV 2000 with Sandy Bridge E5-4600 @ 2.60GHz, n=16.
30
20
Segregated Sol
25
15
Coupled Sol
10
% H2n
20
Sn
4.1
15
10
5
0
1
-5
- 10
0
1
16
32
64
128
- 15
16
32
64
Figure 8: LeMansCar17m Segregated and Coupled solver average elapsed times per iteration Sn and the
corresponding percentage gains %H2n on SGI ICE X with Sandy Bridge E5-2690 @ 2.90GHz, n=16.
W H I T E PA P E R
Figures 9, 10 and 11 present benchmark results for the LeMansCar17m model on SGI Ivy Bridge based
hardware, namely SGI Rackable C2112-4TY14, SGI UV 2000 and SGI ICE X respectively.
S eg reg ated S ol
16
C oupled S ol
14
Sn
12
10
8
6
4
2
0
1
16
32
64
30
50
20
10
0
- 10
16
32
64
-20
-30
%Tn
%H2n
%C2n
-40
-50
-60
-70
0
1
16
32
64
-50
- 10 0
%Tn
%H2n
%C2n
- 15 0
-200
-80
18
20
Figure 9: LeMansCar17m Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n, %Tn and %C2n on SGI Rackable C2112-4TY14 with Ivy Bridge
E5-2697v2 @ 2.70GHz FDR cluster, n=24.
S eg reg ated S ol
25
C oupled S ol
Sn
20
15
10
5
0
1
16
25
25
20
15
10
%Tn
%H2n
%C2n
30
20
15
10
0
1
16
16
Figure 10: LeMansCar17m Segregated and Coupled solver average elapsed times per iteration Sn and the
corresponding percentage gains %H2n, %Tn and %C2n on UV 2000 with Ivy Bridge E5-4650 v2 @ 2.40GHz, n=20.
25
20
Segregated Sol
10
Coupled Sol
20
0
- 10
Sn
% H2n
15
10
16
32
64
128
16
32
64
-40
-60
-30
-50
-20
-70
-80
Figure 11: LeMansCar17m Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n on SGI ICE X with Ivy Bridge E5-2690v2 @ 3.0GHz cluster, n=20.
W H I T E PA P E R
70
S eg reg ated S ol
60
C oupled S ol
Sn
50
40
30
20
10
0
4
16
32
20
15
10
5
0
4
- 10
%Tn
%H2n
- 15
%C2n
-5
-20
16
32
64
Figures 12, 13 and 14 present benchmark results for the Large Classified model on SGI Sandy Bridge
based hardware, namely SGI Rackable C2112-4TY14, SGI UV 2000 and SGI ICE X respectively.
20
15
10
%Tn
%H2n
%C2n
5
0
4
-5
64
25
16
32
64
Figure 12: Large Classified Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n, %Tn and %C2n on SGI Rackable C2112-4TY14 with
Sandy Bridge E5-2670 @ 2.60GHz, n=16.
S eg reg ated S ol
60
C oupled S ol
50
40
30
20
10
0
4
16
32
16
15
70
Sn
10
5
0
-5
16
32
-10
-15
%Tn
%H2n
%C2n
-20
-25
-30
-35
14
12
10
%Tn
%H2n
%C2n
8
6
4
2
0
-40
16
32
Figure 13: Large Classified Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n, %Tn and %C2n on SGI UV 2000 with Sandy Bridge
E5-4600 @ 2.60GHz, n=16.
30
20
S eg reg ated S ol
25
10
C oupled S ol
% H2n
20
Sn
4.2
15
10
-20
-30
-40
0
8
16
32
64
16
32
64
128
- 10
Segregated Solver
Coupled Solver
128
-50
Figure 14: Large Classified Segregated and Coupled solver average elapsed times per iteration Sn and the
corresponding percentage gains %H2n on SGI ICE X with Sandy Bridge E5-2690 @ 2.90GHz cluster, n=16.
W H I T E PA P E R
Figures 15,16 and 17 present benchmark results for the Large Classified model on SGI Ivy Bridge based
hardware, namely SGI Rackable C2112-4TY14, SGI UV 2000 and SGI ICE X respectively.
C oupled S ol
30
Sn
25
20
15
10
5
20
10
0
4
16
32
32
64
%Tn
%H2n
%C2n
-20
-30
-40
0
4
16
- 10
S eg reg ated S ol
35
20
40
10
0
4
16
32
64
-10
%Tn
%H2n
%C2n
-20
-30
-40
64
Figure 15: Large Classified Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n, %Tn and %C2n on SGI Rackable C2112-4TY14 with Ivy Bridge
E5-2697 v2 @ 2.7GHz FDR, n=24.
S eg reg ated S ol
40
C oupled S ol
35
Sn
30
25
20
15
10
5
0
4
18
45
50
16
14
12
10
16
8
6
%Tn
%H2n
%C2n
4
2
20
18
16
14
12
10
8
6
%Tn
%H2n
%C2n
4
2
0
0
4
16
16
Figure 16: Large Classified Segregated and Coupled solver average elapsed times per iteration Sn
and the corresponding percentage gains %H2n, %Tn and %C2n on SGI UV 2000 with
Ivy Bridge E5-4600 v2 @ 2.4GHz, n=20.
25
20
S eg reg ated S ol
20
10
C oupled S ol
Sn
% H2n
15
10
0
8
16
32
64
128
256
- 10
16
32
64
128
-20
-30
-40
Segregated Solver
-50
Coupled Solver
-60
-70
Figure 17: Large Classified Segregated and Coupled solver average elapsed times per iteration Sn and
the corresponding percentage gains %H2n on SGI ICE X with Ivy Bridge E5-2690v2 @ 2.90GHz cluster, n=20.
W H I T E PA P E R
Comparisons
Figures 18 (a, b, c and d) indicate that the H2n values in the case of ICE X Ivy Bridge, n=20, drop to negative
values faster than in the case of ICE X Sandy Bridge, n=16 (for
both Segregated
and Coupled
Solvers)
Figure
18b: LeMansCar
%H2n
on due
ICE-X
to
the
larger
number
of
cores
per
socket.
Similar
observations
apply
also
to
the
SGI
Rackable
and
UV
2000
.
Figure 18a: LeMansCar %H2n on ICE-X
Solver
Segregated Solver
20
15
0
1
16
32
64
10
-20
%H2n
%H2n
-10
-30
-40
Sandy
Bridge
Ivy Bridge
-50
-60
5
0
1
-5
-10
-70
-20
2n
Figure 18c: Large Classified
%H2n on ICE-X
Segregated Solver.
Segregated Solver
Sandy Bridge
16
32
64
Ivy Bridge
-15
-80
15
10
10
- 10
- 15
-20
-25
-30
16
32
64
128
Sandy Bridge
Ivy Bridge
%H2n
-10
0
-5
Coupled
20
10
%H2n
4.3
16
32
64
128
-20
-30
-40
-50
-60
-70
Sandy Bridge
Ivy Bridge
N umber C o mpute N o des
Figures 19 (a, b, c and d) present plots of Hyper-threading gain, %H2n, values for four different size models
generated on an SGI ICE-X Sandy Bridge (n=16) and Ivy Bridge (n=20) using the Segregated and Coupled
solvers. The four models are the Mercedes A Class 5M cell model, the LeMansCar 17M cell model, the
LeMansCar94M cell model refined from the 17M model and the Large Classified model. These figures
describe the trends of Hyper-threading gain based on the size of the model. Comparing the trends for the
four cases it seems that Hyper-threading gains are lowest for the A Class model which is the smallest size
model (approximately 5M cells). However for the other three models, namely the LeMansCar17M,
LeMansCar94M and the Large Classified, the corresponding gains appear to be of similar trends despite
the significant differences in the sizes of the three models. This indicates that Hyper-threading gains are
limited by the compute nodes resources such as memory bandwidth and cache sizes.
W H I T E PA P E R
20
10
10
0
1
16
32
64
%H2n
%H2n
0
- 10
-20
LeMansCar17M
Large Classified
LeMansCar94M
Aclass5M
-30
-40
-50
16
32
64
LeMansCar17M
Large Classified
LeMansCar94M
Aclass5M
-40
-50
-20
-30
- 10
20
50
10
0
1
16
32
64
-30
-40
-60
-70
-80
16
32
64
128
-50
-20
-50
128
%H2n
%H2n
- 10
LeMansCar17M
Large Classified
LeMansCar94M
Aclass5M
Number C ompute Nodes
- 10 0
- 15 0
-200
-250
LeMansCar17M
Large Classified
LeMansCar94M
Aclass5M
N umber C o mpute N o des
In fact the Large Classified model gains appear to be slightly below those of the two LeMansCar
models. This observation indicates that extremely large size models can incur heavier demands on the
Hyper-threading resources of the compute node. Thus Hyper-threading gain values have a limit beyond
which there are no further gains irrespective of how large the model size may be.
A further comparison may be made on the effects of FDRto-QDR interconnects on the performance of
the Segregated and Coupled solvers. Figures 20a and 20b present plots of percentage gains of Sn FDR
to Sn QDR on the SGI Rackable cluster with Intel E5-2697 v2, n=24, for both solvers in the case of the
LeMansCar17m and the Large Classified models respectively.
W H I T E PA P E R
C oupled S olver
%Gain
%Gain
4
3
5
4
3
0
1
16
32
64
0
4
16
32
64
The plots show that FDR-to-QDR percentage gains are higher in the case of the Segregated solver than
for the Coupled solver for both models. The gains were observed to be approximately 6-8% and 2-3%
for the Segregated and Coupled Solvers, respectively, for tests involving 32 to 64 nodes of SGI Rackable
E5-2697 v2 cluster. Note that at the time of writing this paper, this cluster was configured to a maximum of
64 nodes, thus we will expect that these gains to be relatively higher for tests using larger number of nodes,
for example 96, 128 and 256 nodes.
5.0
W H I T E PA P E R
For the Hyper-threading feature, experiments in this paper have shown that for STAR-CCM+ this feature
appears to show a useful effect on all the above hardware platforms up to a limited number of compute
nodes per experiment. Experiments have also shown that hyper-threading gains appears to be larger in
the case of the Coupled solver as opposed to the Segregated solver, however this may require more
observations to properly validate on a wider scale. In fact, we have observed some models which benefited
positively from hyper-threading for only up to 4 or 6 nodes. Hyper-threading feature percentage gains tend
to drop gradually to a threshold value relative to an increase in the number of compute nodes beyond
which gains drop significantly to negative values. This drop depends also on the number of cores per
socket where larger number of cores per socket result in further acceleration in the drop of hyper-threading
percentage gains. Thus hyper-threading gains tend to drop at a faster rate in the case of Ivy Bridge processors
due to the larger number of cores per socket as opposed to Sandy Bridge. Overall our experiments have
shown that the hyper-threading feature may be impaired by the following:
Hyper-threading will no longer gain performance if the executing model parallel scalability had reached its
threshold for that number of domains decomposition. e.g. if a model will normally not scale beyond 128 cores,
then executing it as a Hyper-threaded task on 128 cores (or more) will not result in any significant gain.
Hyper-threading will not gain performance if the hyper-threaded threads data access requirements totally
saturate the bandwidth per compute node. This may result in performance degradation.
Experiments have also shown that the application may benefit from the combined use of features of
hyper-threading and Turbo Boost for a range of a number of compute nodes involved in the test. The two
input model experiments have shown that the combined two-feature effect can provide significant
performance gains for up to 16 compute nodes in the case of the segregated solver and up to 32 nodes in
the case of the Coupled solver per test. Note that a combined gain may be considered as significant when
each of these two features yield a positive performance gain for a test. More importantly the gains of the
two features combined can be quantified by the difference between the standard execution time Sn, and
gained differences of the two features individually based on the equation:
C2n~= Sn - Tn - H2n
Where
Tn,= Sn, - Tn and H2n, = Sn - H2n
This makes it possible to determine the value of C2n, as an approximation, without having to run its
corresponding test by simply knowing Sn, Tn and H2n. Correspondingly decided if a C2n test will provide
a positive result based on the criteria that a C2n test is worth performing if both Tn and H2n, are positive.
A useful approach for gaining the maximum benefit of Hyper-threading and Turbo Boost features may be
achieved by implementing the following algorithm. Given an input model and based on a relatively small
number of iterations/time-steps execute the model using the following steps:
1. S
tart with an N number of nodes.
2. Run the model to obtain Sn, Tn and H2n, values.
3. If Tn > 0andH2n > 0 then increase N and repeat step 2.
4. E
lse if Tn < 0orH2n < 0 then stop iterating and use the previous iteration values of Sn, Tn and H2n to
calculate the corresponding C2n value using the equations above.
The above algorithm enables users to approximately determine the optimal number of cores and nodes
corresponding to the resulting value of C2n. Thus running the full simulation using a C2n mode of operation
for an optimal parallel performance.
W H I T E PA P E R
6.0 References
[1] Platform Manager on SGI Altix ICE Systems Quick Reference Guide: Chapter 1. SGI Altix ICE 8200
Series System Overview Hardware-Books (document number: 007-5450-002)
[2] SGI ICE X System Hardware User Guide (document number: 007-5806-001, published 2012-03-28)
[3] SGI ICE X Installation and Configuration Guide (document number: 007-5917-002, published: 2013-11-20).
[4] Technical Advances in the SGI UV Architecture. SGI White paper. June, 2012
[5] SGI Rackable C2112-4TY14 System Users Guide, 2012.
http://techpubs.engr.sgi.com/library/manuals/5000/007-5685-002/pdf/007-5685-002.pdf
[6] A.JASSIM, STAR-CCM+ Using Parallel Measurements from Intel Sandy Bridge / Ivy Bridge x86-64
based HPC Clusters. A Performance Analysis, 2014