High Performance Computing (HPC) is an area of computer science that deals with parallel computation of complex problems. While some think it is a fringe technology, it is used heavily in oil/gas, financial services, manufacturing, health sciences and academia.
This blog post will be part 1 of a multipart introduction to HPC on Azure. This post will be used to define terminologies and technologies that are used throughout traditional HPC. Later posts will drill down into how to do HPC on Microsoft Azure.
HPC problems can generally be distilled down to two different problem types: Tightly Coupled and Loosely Coupled (or Embarrassingly Parallel). Tightly Coupled problems are computations that require communication between each parallel computation. For example, let’s say we have a computation that can be split up into 10 computation processes (a matrix calculation for instance). Each computation would need information from its neighbors results. There is a technology that has been created to allow for these computations to share and pass their sub results around. This technology is “Message Passing Interface” or MPI. There are MPI bindings for many languages (C++ and FORTRAN being the most common). Because the results are shared these computation types are “Tightly Coupled”
Loosely Coupled or Embarrassingly Parallel problems are parallel computations that do not require any interaction between the parallel processes. A good example of this type of problem would be a financial service risk model that would need to be run over thousands of different securities. Each computation is unique to itself and is not reliant on any of the other computations.
In following blogs I will explain an example of each of these types in more detail. For now, it is ok to just understand that there are two types of problems.
So tightly coupled applications need to communicate between parallel computations using MPI. The importance of HPC is not only taking problems that could take days/months/years to compute serially, but to make them as efficient as possible. Traditional networking technology is not fast enough to pass these values between computing processes. Infiniband (IB) is a networking technology standard that features very high throughput and low latency.
To conclude this post I want to list out some terms that are used in HPC and can be referred to as we dive into HPC on Azure.
- Cluster – A group of computing resources available to do HPC processing. For MPI jobs they are on their own Infiniband network with the head node attached to the corporate or enterprise network, plus the infiniband network.
- Message Passing Interface – MPI is a standardized message-passing system designed to run on a variety of parallel computing architectures. There are many types of MPI available, MPICH is an open source version. Intel MPI and Microsoft MPI are two others.
- Infiniband – IB is a computer networking communication standard used in HPC. It allows for high throughput and low latency messages between compute nodes in tightly coupled HPC problems.
- Tightly Coupled – Problems that require the use of MPI to solve in parallel
- Loosely Coupled – Problems that can be solved in parallel independent of the other compute processes. Also called Embarrassingly Parallel or Parametric Sweep.
- Scheduler – A piece of software that submits HPC jobs to the compute nodes. It keeps track of the computations and informs the user when the job is completed.
- Head Node – The computing system (either physical or virtual) that kicks off the computation. Usually is running the scheduler software.
- Compute Node – A computer (physical or virtual) that runs a portion of the compute job. There can be two to hundreds of thousands of compute nodes.
- Lustre – A parallel distributed file system used for HPC. It is an open source implementation.
Later blog posts will tie all this together and show how HPC is being done today on Microsoft Azure in both IaaS and PaaS instances — so make sure you subscribe so you don’t miss them!