An Introduction to Parallelism, OpenMP, and Memory Access on macOS M1/M2 on Visual Studio

Etietop Abraham
6 min readMar 15, 2023

In this tutorial, we will explore the concept of parallelism in computing and learn how to use the OpenMP library to parallelize a simple C program on macOS M1/M2

Photo by Farai Gandiya on Unsplash

We will cover the following,

  1. Introduction to parallelism
  2. Computational complexity of parallelized transformer networks
  3. Flynn’s Taxonomy and its relationship to parallelization
  4. Systematization by access to memory (shared, distributed, and CUDA)
  5. OpenMP basics
  6. Installing OpenMP on macOS with M1/M2 processors
  7. Adding Homebrew to PATH
  8. Updating LDFLAGS & CPPFLAGS environment
  9. Configuring Visual Studio Code for OpenMP
  10. A simple OpenMP example
  11. Compiling and running the OpenMP program

1. Introduction to Parallelism

Parallelism in computing refers to the execution of multiple tasks concurrently, using multiple processing units or cores. Parallelization can help speed up computationally-intensive tasks by dividing the workload among multiple processors, leading to faster execution times and increased performance.

2. Computational Complexity of Parallelized Transformer Networks

Transformer networks can be parallelized effectively due to their self-attention mechanism and layer-wise structure. The computational complexity of transformers is mainly determined by the matrix multiplications in the self-attention mechanism. By parallelizing these operations across multiple GPUs or other hardware accelerators, the overall training and inference times can be significantly reduced.

3. Flynn’s Taxonomy and Its Relationship to Parallelization

Flynn’s Taxonomy is a classification system for computer architectures based on the number of instruction streams and data streams. It defines four categories:

  1. SISD (Single Instruction, Single Data): A single instruction operates on a single data stream, e.g., traditional sequential CPUs.
  2. SIMD (Single Instruction, Multiple Data): A single instruction operates on multiple data streams simultaneously, e.g., vector processors and GPU shader cores.
  3. MISD (Multiple Instruction, Single Data): Multiple instructions operate on a single data stream, e.g., fault-tolerant systems.
  4. MIMD (Multiple Instruction, Multiple Data): Multiple instructions operate on multiple data streams, e.g., multi-core CPUs and distributed systems.

Parallelization techniques like those used in transformer networks primarily focus on the SIMD and MIMD categories, allowing developers to parallelize code across multiple cores or processors.

4. Systematization by Access to Memory (Shared, Distributed, and CUDA)

There are three main ways to organize memory access in parallel computing systems:

  1. Shared memory: All processing units have access to a single shared memory space. Communication between processing units is done through memory reads and writes. OpenMP is an example of a shared memory programming model.
  2. Distributed memory: Each processing unit has its own local memory, and communication between processing units is done via message passing. MPI (Message Passing Interface) is an example of a distributed memory programming model.
  3. CUDA: Compute Unified Device Architecture (CUDA) is a parallel computing platform and programming model developed by NVIDIA. It allows developers to use NVIDIA GPUs for general-purpose computing tasks. CUDA combines elements of both shared and distributed memory models and provides additional abstractions for GPU programming.

Now let’s get to the fun part!

Generated on https://frankincents.ai/ by Etietop Demas Abraham

5. OpenMP Basics

OpenMP (Open Multi-Processing) is an API for writing parallel applications in C, C++, and Fortran. It is designed for shared-memory multi-processor systems and provides a simple and flexible way to parallelize code using compiler directives, library routines, and environment variables

Basic Constructs of OpenMP

Here are some of the basic constructs of OpenMP:

  1. Directives: OpenMP uses directives, which are typically written as pragmas in C and C++. These pragmas provide instructions to the compiler to parallelize specific parts of the code.

Example: Parallelizing a simple for loop: #pragma omp parallel for

#pragma omp parallel for
for (int i = 0; i < N; i++) {
// Perform some operation
}

2. Parallel Regions: A parallel region is a block of code that can be executed by multiple threads concurrently. You can create a parallel region using the #pragma omp parallel directive.

#pragma omp parallel
{
// Code inside this block will be executed in parallel by multiple threads
}

3. Thread Management: OpenMP provides various mechanisms for managing and synchronizing threads, such as specifying the number of threads and setting thread affinity. Example: Setting the number of threads:

omp_set_num_threads(4); // Set the number of threads to 4

4. Synchronization: OpenMP provides synchronization constructs like barriers, critical sections, and atomic operations to ensure the proper execution order of operations within parallel regions.

Example: Using a critical section to protect a shared resource:

#pragma omp critical
{
// Code inside this block will be executed by one thread at a time
}

5. Reduction: Reduction is a common operation in parallel computing where the result of an operation is accumulated across multiple threads. OpenMP provides a reduction clause to simplify the process.

Example: Summing the elements of an array using reduction:

int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
sum += array[i];
}

6. Installing OpenMP on macOS with M1/M2 Processors

Photo by david latorre romero on Unsplash

To use OpenMP on macOS with M1/M2 processors, you need to install the LLVM OpenMP runtime library (libomp) using Homebrew, a package manager for macOS:

  1. Install Homebrew by following the instructions on the official Homebrew website.
  2. Install libomp by running the following command in the terminal:
brew install libomp

7. Adding Homebrew to PATH

After installing Homebrew, you need to add it to your PATH to access the installed tools easily. Run the following commands in your terminal:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/macbookpro/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

These commands add the Homebrew bin directory to your PATH, making it easier to use the installed tools directly from the terminal.

8. Updating LDFLAGS & CPPFLAGS Environment

When you install libomp, you may receive a message suggesting you update your LDFLAGS and CPPFLAGS environment variables. This is necessary for the compiler to find the OpenMP library and include files. Add the following lines to your shell configuration file (e.g., .zshrc or .bashrc):

export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"

After adding these lines, restart your terminal or run source ~/.zshrc or source ~/.bashrc to apply the changes.

9. Configuring Visual Studio Code for OpenMP

To use OpenMP with Visual Studio Code, you need to update the include path settings:

  1. Open Visual Studio Code and go to Settings.
  2. Search for “c_cpp” in the search bar.
  3. Under “C/C++: IntelliSense Configurations”, click on “Edit in settings.json”.
  4. Add the following lines to the settings.json file:
"C_Cpp.default.includePath": [
"/opt/homebrew/opt/libomp/include"
]

5. Save the file and restart Visual Studio Code.

10. A Simple OpenMP Example

Here’s a simple example of an OpenMP program in C that demonstrates the basics of parallelization using OpenMP:

#include <stdio.h>
#include <omp.h>

int main() {
int i, sum = 0;
int array_size = 100;
int array[array_size];

// Initialize the array with values
for (i = 0; i < array_size; i++) {
array[i] = i + 1;
}

// Calculate the sum using OpenMP
#pragma omp parallel for reduction(+:sum)
for (i = 0; i < array_size; i++) {
sum += array[i];
}

printf("The sum of the array elements is: %d\n", sum);
return 0;
}

This example initializes an array and calculates the sum of its elements using a parallelized for loop with a reduction operation.

11. Compiling and Running the OpenMP Program

To compile and run the OpenMP program, follow these steps:

  1. Save the example code in a file called test.c.
  2. Open a terminal and navigate to the directory where test.c is saved.
  3. Compile the program using the following command:
clang -Xpreprocessor -fopenmp -lomp test.c -o your_output_executable

4. Run the compiled program with the following command:

./your_output_executable

The program should output the sum of the array elements, demonstrating that the OpenMP parallelization is working correctly.

Thank you for your attention.

Sign up to discover human stories that deepen your understanding of the world.

Etietop Abraham
Etietop Abraham

Written by Etietop Abraham

Tech and life enthusiast sharing thoughts on the intersection of the two.

No responses yet

Write a response