0% found this document useful (0 votes)
167 views50 pages

High Performance Computing Lecture 1 HPC Public

This document provides an overview of a lecture on high performance computing. It discusses key topics that will be covered in the lecture such as parallel programming, multi-core processors, shared and distributed memory architectures, and HPC ecosystem technologies. It also lists selected learning outcomes, including understanding latest developments in HPC, programming paradigms, and using technologies and tools to handle parallelism complexity.

Uploaded by

Яeader
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
167 views50 pages

High Performance Computing Lecture 1 HPC Public

This document provides an overview of a lecture on high performance computing. It discusses key topics that will be covered in the lecture such as parallel programming, multi-core processors, shared and distributed memory architectures, and HPC ecosystem technologies. It also lists selected learning outcomes, including understanding latest developments in HPC, programming paradigms, and using technologies and tools to handle parallelism complexity.

Uploaded by

Яeader
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 50

High Performance Computing

ADVANCED SCIENTIFIC COMPUTING

Prof. Dr. – Ing. Morris Riedel


Adjunct Associated Professor
School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Research Group Leader, Juelich Supercomputing Centre, Forschungszentrum Juelich, Germany

@Morris Riedel @MorrisRiedel @MorrisRiedel


LECTURE 1

High Performance Computing


September 5, 2019
Room V02-258
Review of Practical Lecture 0.2 – Short Intro to C Programming & Scheduling

 Many C/C++ Programs used in HPC today  Multi-user HPC usage needs Scheduling

Many multi-physics
application challenges [3] LLview
on different scales &
levels of granularity
drive the need for HPC

[1] Terrestrial Systems SimLab [2] NEST Web page

not good! right way!

Lecture 1 – High Performance Computing 2 / 50


Outline of the Course

1. High Performance Computing 11. Scientific Visualization & Scalable Infrastructures

2. Parallel Programming with MPI 12. Terrestrial Systems & Climate


13. Systems Biology & Bioinformatics
3. Parallelization Fundamentals
14. Molecular Systems & Libraries
4. Advanced MPI Techniques
15. Computational Fluid Dynamics & Finite Elements
5. Parallel Algorithms & Data Structures
16. Epilogue
6. Parallel Programming with OpenMP
7. Graphical Processing Units (GPUs) + additional practical lectures & Webinars for our
hands-on assignments in context
8. Parallel & Scalable Machine & Deep Learning
9. Debugging & Profiling & Performance Toolsets
 Practical Topics
10. Hybrid Programming & Patterns  Theoretical / Conceptual Topics
Lecture 1 – High Performance Computing 3 / 50
Outline

 High Performance Computing (HPC) Basics


 Four basic building blocks of HPC
 TOP500 & Performance Benchmarks
 Multi-core CPU Processors
 Shared Memory & Distributed Memory Architectures
 Hybrid Architectures & Programming

 HPC Ecosystem Technologies


 HPC System Software Environment – Revisited
 System Architectures & Network Topologies
 Many-core GPUs & Supercomputing Co-Design
 Relationships to Big Data & Machine/Deep Learning
 Data Access & Large-scale Infrastructures

Lecture 1 – High Performance Computing 4 / 50


Selected Learning Outcomes

 Students understand…
 Latest developments in parallel processing & high performance computing (HPC)
 How to create and use high-performance clusters
 What are scalable networks & data-intensive workloads
 The importance of domain decomposition
 Complex aspects of parallel programming
 HPC environment tools that support programming
or analyze behaviour
 Different abstractions of parallel computing on various levels
 Foundations and approaches of scientific domain-
specific applications
 Students are able to …
 Programm and use HPC programming paradigms
 Take advantage of innovative scientific computing simulations & technology
 Work with technologies and tools to handle parallelism complexity
Lecture 1 – High Performance Computing 5 / 50
High Performance Computing (HPC) Basics

Lecture 1 – High Performance Computing 6 / 50


What is High Performance Computing?

 Wikipedia: ‘redirects from HPC to Supercomputer’


 Interesting – gives us already a hint what it is generally about

 A supercomputer is a computer at the frontline of contemporary


processing capacity – particularly speed of calculation

[4] Wikipedia ‘Supercomputer’ Online

 HPC includes work on ‘four basic building blocks’ in this course


 Theory (numerical laws, physical models, speed-up performance, etc.)
 Technology (multi-core, supercomputers, networks, storages, etc.)
 Architecture (shared-memory, distributed-memory, interconnects, etc.)
 Software (libraries, schedulers, monitoring, applications, etc.)
[5] Introduction to High Performance Computing for Scientists and Engineers

Lecture 1 – High Performance Computing 7 / 50


Understanding High Performance Computing (HPC) – Revisited

 High Performance Computing (HPC) is based on computing resources that enable the efficient use of parallel computing techniques
through specific support with dedicated hardware such as high performance cpu/core interconnections.

HPC

network
interconnection
important

 High Throughput Computing (HTC) is based on commonly available computing resources such as commodity PCs and small clusters that
enable the execution of ‘farming jobs’ without providing a high performance interconnection between the cpu/cores.

network
interconnection
HTC less important!

 The complementary Cloud Computing & Big Data – Parallel Machine & Deep Learning Course focusses on High Throughput Computing
Lecture 1 – High Performance Computing 8 / 50
Parallel Computing

 All modern supercomputers depend heavily on parallelism


 Parallelism can be achieved with many different approaches

 We speak of parallel computing whenever a number of ‘compute


elements’ (e.g. cores) solve a problem in a cooperative way

[5] Introduction to High Performance Computing for Scientists and Engineers

 Often known as ‘parallel processing’ of some problem space


 Tackle problems in parallel to enable the ‘best performance’ possible
 Includes not only parallel computing, but also parallel input/output (I/O)
 ‘The measure of speed’ in High Performance Computing matters P1 P2 P3 P4 P5
 Common measure for parallel computers established by TOP500 list
 Based on benchmark for ranking the best 500 computers worldwide [6] TOP500 Supercomputing Sites

 Lecture 3 will give in-depth details on parallelization fundamentals & performance term relationships & theoretical considerations
Lecture 1 – High Performance Computing 9 / 50
TOP 500 List (June 2019)

power
challenge
[6] TOP500 Supercomputing Sites

EU #1

Lecture 1 – High Performance Computing 10 / 50


LINPACK Benchmarks and Alternatives

 TOP500 ranking is based on the LINPACK benchmark


[7] LINPACK Benchmark implementation
 LINPACK solves a dense system of linear equations of unspecified size

 LINPACK covers only a single architectural aspect (‘critics exist’)


 Measures ‘peak performance’: All involved ‘supercomputer elements’ operate on maximum performance
 Available through a wide variety of ‘open source implementations’
 Success via ‘simplicity & ease of use’ thus used for over two decades

 Realistic applications benchmark suites might be alternatives


 HPC Challenge benchmarks (includes 7 tests) [8] HPC Challenge Benchmark Suite

 JUBE benchmark suite (based on real applications) [9] JUBE Benchmark Suite

Lecture 1 – High Performance Computing 11 / 50


Multi-core CPU Processors

 Significant advances in CPU (or microprocessor chips)


 Multi-core architecture with dual,
quad, six, or n processing cores
 Processing cores are all on one chip

one chip
 Multi-core CPU chip architecture
 Hierarchy of caches (on/off chip)
 L1 cache is private to each core; on-chip
 L2 cache is shared; on-chip [10] Distributed & Cloud Computing Book

 L3 cache or Dynamic random access memory (DRAM); off-chip


 Clock-rate for single processors increased from 10 MHz (Intel 286) to 4 GHz (Pentium 4) in 30 years
 Clock rate increase with higher 5 GHz unfortunately reached a limit due to power limitations / heat
 Multi-core CPU chips have quad, six, or n processing cores on one chip and use cache hierarchies

Lecture 1 – High Performance Computing 12 / 50


Dominant Architectures of HPC Systems

 Traditionally two dominant types of architectures  Shared-memory parallelization with OpenMP


 Shared-Memory Computers  Distributed-memory parallel programming with the
Message Passing Interface (MPI) standard
 Distributed Memory Computers

 Often hierarchical (hybrid) systems of both in practice


 Dominance in the last couple of years in the community on X86-based commodity clusters running the
Linux OS on Intel/AMD processors
 More recently, also accelerators play a significant role (e.g., many-core chips)

 More recently
 Both above considered as ‘programming models’
 Emerging computing models getting relevant for HPC: e.g., quantum devices, neuromorphic devices

Lecture 1 – High Performance Computing 13 / 50


Shared-Memory Computers

 A shared-memory parallel computer is a system in which a number


of CPUs work on a common, shared physical address space

[5] Introduction to High Performance Computing for Scientists and Engineers

 Two varieties of shared-memory systems:


1. Unified Memory Access (UMA)

Shared Memory
2. Cache-coherent Nonuniform Memory Access (ccNUMA)

 The Problem of ‘Cache Coherence’ (in UMA/ccNUMA)


 Different CPUs use Cache to ‘modify same cache values’
 Consistency between cached data & T1 T2 T3 T4 T5
data in memory must be guaranteed
 ‘Cache coherence protocols’ ensure a consistent view of memory

Lecture 1 – High Performance Computing 14 / 50


Shared-Memory with UMA

 UMA systems use ‘flat memory model’: Latencies and bandwidth


are the same for all processors and all memory locations.
 Also called Symmetric Multiprocessing (SMP)
[5] Introduction to High Performance Computing for Scientists and Engineers

 Selected Features
 Socket is a physical package (with multiple cores), typically a replacable
component
 Two dual core chips (2 core/socket)
 P = Processor core
 L1D = Level 1 Cache – Data (fastest)
 L2 = Level 2 Cache (fast)
 Memory = main memory (slow)
 Chipset = enforces cache coherence and
mediates connections to memory

Lecture 1 – High Performance Computing 15 / 50


Shared-Memory with ccNUMA

 ccNUMA systems share logically memory that is physically distributed


(similar like distributed-memory systems)
 Network logic makes the aggregated memory appear as one single address space
[5] Introduction to High Performance Computing for Scientists and Engineers

 Selected Features
 Eight cores (4 cores/socket); L3 = Level 3 Cache
 Memory interface = establishes a coherent link to enable one
‘logical’ single address space of ‘physically distributed memory’

Lecture 1 – High Performance Computing 16 / 50


Programming with Shared Memory using OpenMP

 Shared-memory programming enables immediate access to all data from all


processors without explicit communication
 OpenMP is dominant shared-memory programming standard today (v3)
 OpenMP is a set of compiler directives to ‘mark parallel regions’

[11] OpenMP API Specification

Shared Memory
 Features
 Bindings are defined for C, C++, and Fortran languages
 Threads TX are ‘lightweight processes’ that mutually access data

T1 T2 T3 T4 T5

 Lecture 6 will give in-depth details on the shared-memory programming model with OpenMP and using its compiler directives
Lecture 1 – High Performance Computing 17 / 50
Distributed-Memory Computers

 A distributed-memory parallel computer establishes a ‘system view’


where no process can access another process’ memory directly

[5] Introduction to High Performance Computing for Scientists and Engineers

 Features
 Processors communicate via Network Interfaces (NI)
 NI mediates the connection to a Communication network
 This setup is rarely used  a programming model view today
Lecture 1 – High Performance Computing 18 / 50
Programming with Distributed Memory using MPI

 Distributed-memory programming enables


explicit message passing as communication between processors
 Message Passing Interface (MPI) is dominant distributed-memory
programming standard today (available in many different version)
 MPI is a standard defined and developed by the MPI Forum

[12] MPI Standard

 Features
 No remote memory access on distributed-memory systems
 Require to ‘send messages’ back and forth between processes PX
 Many free Message Passing Interface (MPI) libraries available
 Programming is tedious & complicated, but most flexible method P1 P2 P3 P4 P5

 Lecture 2 & 4 will give in-depth details on the distributed-memory programming model with the Message Passing Interface (MPI)
Lecture 1 – High Performance Computing 19 / 50
MPI Standard – GNU OpenMPI Implementation Example – Revisited

 Message Passing Interface (MPI)


 A standardized and portable message-passing standard
 Designed to support different HPC architectures
 A wide variety of MPI implementations exist
 Standard defines the syntax and semantics
of a core of library routines used in C, C++ & Fortran [12] MPI Standard

 OpenMPI Implementation
 Open source license based on the BSD license
 Full MPI (version 3) standards conformance [13] OpenMPI Web page
 Developed & maintained by a consortium of
academic, research, & industry partners
 Typically available as modules on HPC systems and used with mpicc compiler
 Often built with the GNU compiler set and/or Intel compilers

 Lecture 2 will provide a full introduction and many more examples of the Message Passing Interface (MPI) for parallel programming
Lecture 1 – High Performance Computing 20 / 50
Hierarchical Hybrid Computers

 A hierarchical hybrid parallel computer is neither a purely shared-memory


nor a purely distributed-memory type system but a mixture of both
 Large-scale ‘hybrid’ parallel computers have shared-memory building
blocks interconnected with a fast network today

[5] Introduction to High Performance Computing for Scientists and Engineers

 Features
 Shared-memory nodes (here ccNUMA) with local NIs
 NI mediates connections to other remote ‘SMP nodes’

Lecture 1 – High Performance Computing 21 / 50


Programming Hybrid Systems & Patterns

 Hybrid systems programming uses MPI as explicit internode


communication and OpenMP for parallelization within the node
 Parallel Programming is often supported by using ‘patterns’ such as stencil
methods in order to apply functions to the domain decomposition

 Experience from HPC Practice


 Most parallel applications still take no notice of the hardware structure
 Use of pure MPI for parallelization remains the dominant programming
 Historical reason: old supercomputers all distributed-memory type
 Use of accelerators is significantly increasing in practice today
 Challenges with the ‘mapping problem’
 Performance of hybrid (as well as pure MPI codes) depends crucially
on factors not directly connected to the programming model
 It largely depends on the association of threads and processes to cores
 Patterns (e.g., stencil methods) support the parallel programming
 Lecture 10 will provide insights into hybrid programming models and introduces selected patterns used in parallel programming
Lecture 1 – High Performance Computing 22 / 50
[Video] Juelich – Supercomputer Upgrade

Lecture 1 – High Performance Computing 23 / 50


HPC Ecosystem Technologies

Lecture 1 – High Performance Computing 24 / 50


HPC System Software Environment – Revisited (cf. Practical Lecture 0.2)

 Operating System
 Former times often ‘proprietary OS’, nowadays often (reduced) ‘Linux’
 Scheduling Systems  HPC systems and
supercomputers typically
 Manage concurrent access of users on Supercomputers provide a software
environment that support
 Different scheduling algorithms can be used with different ‘batch queues’ the processing of parallel
and scalable applications
 Example: SLURM @ JÖTUNN Cluster, LoadLeveler @ JUQUEEN, etc.
 Monitoring systems offer a
 Monitoring Systems comprehensive view of the
current status of a HPC
 Monitor and test status of the system (‘system health checks/heartbeat’) system or supercomputer

 Enables view of usage of system per node/rack (‘system load’)  Scheduling systems
enable a method by which
 Examples: LLView, INCA, Ganglia @ JOTUNN Cluster, etc. user processes are given
access to processors
 Performance Analysis Systems
 Measure performance of an application and recommend improvements (.e.g Scalasca, Vampir, etc.)

 Lecture 9 will offer more insights into performance analysis systems with debugging, profiling, and HPC performance toolsets
Lecture 1 – High Performance Computing 25 / 50
Scheduling vs. Emerging Interactive Supercomputing Approaches

 JupyterHub is a multi-user
version of the notebook
designed for companies,
classrooms and research
labs

[20] A. Lintermann & M. Riedel et al., ‘Enabling [21] A. Streit & M. Riedel et al., ‘UNICORE 6 – [22] Project Jupyter Web page
Interactive Supercomputing at JSC – Lessons Learned’ Recent and Future Advancements’

Lecture 1 – High Performance Computing 26 / 50


Modular Supercomputer JUWELS – Revisited

Lecture 1 – High Performance Computing 27 / 50


HPC System Architectures

 HPC systems are very complex


‘machines‘ with many elements
 CPUs & multi-cores
 ‘multi-threading‘ capabilities
 Data access levels
 Different levels of Caches
 Network topologies
 Various interconnects
 Architecture Impacts
 Vendor designs, e.g.,
 HPC faced a significant change in practice with respect to performance increase after years
IBM Bluegene/Q  Getting more speed for free by waiting for new CPU generations does not work any more
 Infrastructure, e.g., cooling &  Multi-core processors emerge that require to use those multiple resources efficiently in parallel
power lines in computing hall  Many-core processors emerge that are used to accelerate certain computing application parts

Lecture 1 – High Performance Computing 28 / 50


Example: IBM BlueGene Architecture Evolution

 BlueGene/P

 BlueGene/Q

Lecture 1 – High Performance Computing 29 / 50


Network Topologies

 Large-scale HPC Systems have special network setups


 Dedicated I/O nodes, fast interconnects, e.g. Infiniband (IB)
 Different network topologies, e.g. tree, 5D Torus network, mesh, etc.
(raise challenges in task mappings and communication patterns)

network
interconnection
important

[5] Introduction to High Performance


Computing for Scientists and Engineers
Source:
IBM

Lecture 1 – High Performance Computing 30 / 50


HPC System Architecture & Continous Developments

 Increasing number of other ‘new’


emerging system architectures
 HPC at cutting-edge of computing and integrates new
hardware developments as continuous activity
 General Purpose Computation on
Graphics Processing Unit (GPGPUs/GPUs)
 Use of GPUs instead for computer graphics for computing
 Programming models are OpenCL and Nvidia CUDA  Artificial Intelligence with methods from
machine learning and deep learning influence
 Getting more and more adopted in many application fields HPC system architectures today

 Field Programmable Gate Array (FPGAs)  Complement initial focus on compute-


intensive with data-intensive application co-
design activities
 Integrated circuit designed to be configured
by a user after shipping
 Enables updates of functionality and reconfigurable ‘wired’ interconnects
 Cell processors
 Enables combination of general-purpose cores with
co-processing elements that accelerate dedicated forms of computations
Lecture 1 – High Performance Computing 31 / 50
Many-core GPGPUs

 Use of very many simple cores


 High throughput computing-oriented architecture
 Use massive parallelism by executing a lot of
concurrent threads slowly
 Handle an ever increasing amount of multiple
instruction threads
 CPUs instead typically execute a single [10] Distributed & Cloud Computing Book

long thread as fast as possible


 Many-core GPUs are used in large  Graphics Processing Unit (GPU) is great for data parallelism and task parallelism
 Compared to multi-core CPUs, GPUs consist of a many-core architecture with
clusters and within massively hundreds to even thousands of very simple cores executing threads rather slowly
parallel supercomputers today
 Named General-Purpose Computing on GPUs (GPGPU)
 Different programming models emerge

Lecture 1 – High Performance Computing 32 / 50


GPU Acceleration

 GPU accelerator architecture example


(e.g. NVIDIA card)
 GPUs can have 128 cores on one single GPU chip
 Each core can work with eight threads of instructions
 GPU is able to concurrently execute 128 * 8 = 1024 threads
 Interaction and thus major (bandwidth)
[10] Distributed & Cloud Computing Book
bottleneck between CPU and GPU
is via memory interactions
 E.g. applications that use matrix –  CPU acceleration means that GPUs accelerate computing due to a massive parallelism
vector/matrix multiplication with thousands of threads compared to only a few threads used by conventional CPUs
 GPUs are designed to compute large numbers of floating point operations in parallel
(e.g. deep learning algorithms)

 Lecture 10 will introduce the programming of accelerators with different approaches and their key benefits for applications
Lecture 1 – High Performance Computing 33 / 50
NVIDIA Fermi GPU Example

[10] Distributed & Cloud Computing Book

Lecture 1 – High Performance Computing 34 / 50


DEEP Learning takes advantage of Many-Core Technologies

 Innovation via specific layers and architecture types

[26] A. Rosebrock

[25] Neural Network 3D Simulation

 Lecture 8 will provide more details about parallel & scalable machine & deep learning algorithms and how many-core HPC is used
Lecture 1 – High Performance Computing 35 / 50
Deep Learning Application Example – Using High Performance Computing

 Using Convolutional Neural Networks (CNNs) [27] J. Lange and M. Riedel et al.,
with hyperspectral remote sensing image data IGARSS Conference, 2018

 Find Hyperparameters & joint ‘new-old‘ modeling &


transfer learning given rare labeled/annotated data in
science (e.g. 36,000 vs. 14,197,122 images ImageNet)

[28] G. Cavallaro, M. Riedel et al., IGARSS 2019

 Lecture 8 will provide more details about parallel & scalable machine & deep learning algorithms and remote sensing applications
Lecture 1 – High Performance Computing 36 / 50
HPC Relationship to ‘Big Data‘ in Machine & Deep Learning

JURECA
High Performance
Training Computing & Cloud
Model Performance / Accuracy

Time
Large Deep Learning Networks Computing
‘small datasets‘

manual feature
engineering‘ Medium Deep Learning Networks
changes the
ordering
Small Neural Networks

Traditional Learning Models


MatLab
Statistical
SVMs
Random Computing with R
Forests
scikit-learn Weka Octave

Dataset Volume  ‘Big Data‘ [15] www.big-data.tips

Lecture 1 – High Performance Computing 37 / 50


Deep Learning Application Example – Using Cloud Computing

 Performing parallel
computing with Apache
Spark across different
worker nodes

[23] J. Haut, G. Cavallaro and M. Riedel et al.,


IEEE Transactions on Geoscience and Remote Sensing, 2019

 Using Autoencoder deep neural


networks with Cloud computing

[24] Apache Spark Web page

 The complementary Cloud Computing & Big Data – Parallel Machine & Deep Learning Course teaches Apache Spark Approaches
Lecture 1 – High Performance Computing 38 / 50
HPC Relationship to ‘Big Data‘ in Simulation Sciences

[14] F. Berman: Maximising the Potential of Research Data

Lecture 1 – High Performance Computing 39 / 50


Data Access & Challenges

too slow
 P = Processor core elements
 Compute: floating points or integers
 Arithmetic units (compute operations)
cheaper

 Registers (feed those units with operands)


 ‘Data access‘ for application/levels
 Registers: ‘accessed w/o any delay‘
 L1D = Level 1 Cache – Data (fastest, normal)
 L2 = Level 2 Cache (fast, often)  The DRAM gap is the
large discrepancy
 L3 = Level 3 Cache (still fast, less often) between main memory
and cache bandwidths

faster

Main memory (slow, but larger in size)


 Storage media like harddisk, tapes, etc.
(too slow to be used in direct computing)

[5] Introduction to High Performance Computing for Scientists and Engineers

Lecture 1 – High Performance Computing 40 / 50


Big Data Drives Data-Intensive HPC Architecture Designs

 More recently system architectures are influenced by ‘big data‘


 CPU speed has surpassed IO capabilities of existing HPC resources
 Scalable I/O gets more and more important in application scalability
 Requirements for Hierarchical Storage Management (‘Tiers’)
 Mass storage devices (tertiary storage)
too slow to enable active processing of ‘big data’
 Increase in simulation time/granularity means
TBs equally important as FLOP/s
 Tapes cheap, but slowly accessible, direct access to compute nodes needed
 Drive new ‘tier-based’ designs

[16] A. Szalay et al., ‘GrayWulf:


Scalable Clustered Architecture
for Data Intensive Computing’

Lecture 1 – High Performance Computing 41 / 50


Application Co-Design of HPC Architectures – Modular Supercomputing Example

 The modular supercomputing architecture (MSA) [17] DEEP Projects Web Page
enables a flexible HPC system design co-designed
by the need of different application workloads

Lecture 1 – High Performance Computing 42 / 50


New HPC Architectures – Modular Supercomputing Architecture Example

General
MEM
Purpose
CPU
[17] DEEP Projects Web Page
CN

General MEM
Purpose MEM Many
CPU Core
FPGA NVRAM CPU
NVRAM

DN BN

General NVRAM
NVRAM
Purpose NVRAM
CPU NVRAM
NVRAM
NVRAM FPGA
FPGA NVRAM MEM
NVRAM

NAM GCE
Possible Application Workload
Lecture 1 – High Performance Computing 43 / 50
Large-scale Computing Infrastructures

 Large computing systems are often embedded in infrastructures


 Grid computing for distributed data storage and processing via middleware
 The success of Grid computing was renowned when being mentioned by Prof. Rolf-Dieter Heuer, CERN
Director General, in the context of the Higgs Boson Discovery:
 Other large-scale distributed infrastructures exist
 Partnership for Advanced Computing in Europe (PRACE)  EU HPC
 Extreme Engineering and Discovery Environment (XSEDE)  US HPC

 ‘Results today only possible due to extraordinary performance of


Accelerators – Experiments – Grid computing’

[18] Grid Computing Video

 Lecture 11 will give in-depth details on scalable approaches in large-scale HPC infrastructures and how to use them with middleware
Lecture 1 – High Performance Computing 44 / 50
[Video] PRACE – Introduction to Supercomputing

[19] PRACE – Introduction to Supercomputing

Lecture 1 – High Performance Computing 45 / 50


Lecture Bibliography

Lecture 1 – High Performance Computing 46 / 50


Lecture Bibliography (1)

 [1] Terrestrial Systems Simulation Lab, Online:


http://www.hpsc-terrsys.de/hpsc-terrsys/EN/Home/home_node.html
 [2] Nest:: The Neural Simulation Technology Initiative, Online:
https://www.nest-simulator.org/
 [3] T. Bauer, ‘System Monitoring and Job Reports with LLView‘, Online:
https://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/supercomputer-ressources-2018-11/12b-sc-llview.pdf?__blob=publicationFile
 [4] Wikipedia ‘Supercomputer’, Online:
http://en.wikipedia.org/wiki/Supercomputer
 [5] Introduction to High Performance Computing for Scientists and Engineers, Georg Hager & Gerhard Wellein, Chapman & Hall/CRC Computational Science,
ISBN 143981192X, English, ~330 pages, 2010, Online:
http://www.amazon.de/Introduction-Performance-Computing-Scientists-Computational/dp/143981192X
 [6] TOP500 Supercomputing Sites, Online:
http://www.top500.org/
 [7] LINPACK Benchmark, Online:
http://www.netlib.org/benchmark/hpl/
 [8] HPC Challenge Benchmark Suite, Online:
http://icl.cs.utk.edu/hpcc/
 [9] JUBE Benchmark Suite, Online:
http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/JUBE/_node.html
 [10] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book, Online:
http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049
 [11] The OpenMP API specification for parallel programming, Online:
http://openmp.org/wp/openmp-specifications/
Lecture 1 – High Performance Computing 47 / 50
Lecture Bibliography (2)

 [12] The MPI Standard, Online:


http://www.mpi-forum.org/docs/
 [13] OpenMPI Web page, Online:
https://www.open-mpi.org/
 [14] Fran Berman, ‘Maximising the Potential of Research Data’
 [15] Big Data Tips – Big Data Mining & Machine Learning, Online:
http://www.big-data.tips/
 [16] A. Szalay et al., ‘GrayWulf: Scalable Clustered Architecture for Data Intensive Computing’, Proceedings of the 42nd Hawaii International Conference on
System Sciences – 2009
 [17] DEEP Projects Web page, Online:
http://www.deep-projects.eu/
 [18] How EMI Contributed to the Higgs Boson Discovery, YouTube Video, Online:
http://www.youtube.com/watch?v=FgcoLUys3RY&list=UUz8n-tukF1S7fql19KOAAhw
 [19] PRACE – Introduction to Supercomputing, Online:
https://www.youtube.com/watch?v=D94FJx9vxFA
 [20] A. Lintermann & M. Riedel et al., ‘Enabling Interactive Supercomputing at JSC – Lessons Learned’, ISC 2019, Frankfurt, Germany, Online:
https://www.researchgate.net/publication/330621591_Enabling_Interactive_Supercomputing_at_JSC_Lessons_Learned_ISC_High_Performance_2018_Intern
ational_Workshops_FrankfurtMain_Germany_June_28_2018_Revised_Selected_Papers
 [21] A. Streit & M. Riedel et al., ‘UNICORE 6 – Recent and Future Advancements ’, Online:
https://www.researchgate.net/publication/225005053_UNICORE_6_-_recent_and_future_advancements
 [22] Project Jupyter Web page, Online:
https://jupyter.org/hub

Lecture 1 – High Performance Computing 48 / 50


Lecture Bibliography (3)

 [23] J. Haut, G. Cavallaro and M. Riedel et al., IEEE Transactions on Geoscience and Remote Sensing, 2019, Online:
https://www.researchgate.net/publication/335181248_Cloud_Deep_Networks_for_Hyperspectral_Image_Analysis
 [24] Apache Spark, Online:
https://spark.apache.org/
 [25] YouTube Video, ‘Neural Network 3D Simulation‘, Online:
https://www.youtube.com/watch?v=3JQ3hYko51Y
 [26] A. Rosebrock, ‘Get off the deep learning bandwagon and get some perspective‘, Online:
http://www.pyimagesearch.com/2014/06/09/get-deep-learning-bandwagon-get-perspective/
 [27] J. Lange, G. Cavallaro, M. Goetz, E. Erlingsson, M. Riedel, ‘The Influence of Sampling Methods on Pixel-Wise Hyperspectral Image Classification with 3D
Convolutional Neural Networks’, Proceedings of the IGARSS 2018 Conference, Online:
https://www.researchgate.net/publication/328991957_The_Influence_of_Sampling_Methods_on_Pixel-
Wise_Hyperspectral_Image_Classification_with_3D_Convolutional_Neural_Networks
 [28] G. Cavallaro, Y. Bazi, F. Melgani, M. Riedel, ‘Multi-Scale Convolutional SVM Networks for Multi-Class Classification Problems of Remote Sensing Images’,
Proceedings of the IGARSS 2019 Conference, to appear

Lecture 1 – High Performance Computing 49 / 50


Lecture 1 – High Performance Computing 50 / 50

You might also like