An Introduction to Parallelism, OpenMP, and Memory Access on macOS M1/M2 on Visual Studio

6 min readMar 15, 2023

In this tutorial, we will explore the concept of parallelism in computing and learn how to use the OpenMP library to parallelize a simple C program on macOS M1/M2

We will cover the following,

Introduction to parallelism
Computational complexity of parallelized transformer networks
Flynn’s Taxonomy and its relationship to parallelization
Systematization by access to memory (shared, distributed, and CUDA)
OpenMP basics
Installing OpenMP on macOS with M1/M2 processors
Adding Homebrew to PATH
Updating LDFLAGS & CPPFLAGS environment
Configuring Visual Studio Code for OpenMP
A simple OpenMP example
Compiling and running the OpenMP program

1. Introduction to Parallelism

Parallelism in computing refers to the execution of multiple tasks concurrently, using multiple processing units or cores. Parallelization can help speed up computationally-intensive tasks by dividing the workload among multiple processors, leading to faster execution times and increased performance.

2. Computational Complexity of Parallelized Transformer Networks

Transformer networks can be parallelized effectively due to their self-attention mechanism and layer-wise structure. The computational complexity of transformers is mainly determined by the matrix multiplications in the self-attention mechanism. By parallelizing these operations across multiple GPUs or other hardware accelerators, the overall training and inference times can be significantly reduced.

3. Flynn’s Taxonomy and Its Relationship to Parallelization

Flynn’s Taxonomy is a classification system for computer architectures based on the number of instruction streams and data streams. It defines four categories:

SISD (Single Instruction, Single Data): A single instruction operates on a single data stream, e.g., traditional sequential CPUs.
SIMD (Single Instruction, Multiple Data): A single instruction operates on multiple data streams simultaneously, e.g., vector processors and GPU shader cores.
MISD (Multiple Instruction, Single Data): Multiple instructions operate on a single data stream, e.g., fault-tolerant systems.
MIMD (Multiple Instruction, Multiple Data): Multiple instructions operate on multiple data streams, e.g., multi-core CPUs and distributed systems.

Parallelization techniques like those used in transformer networks primarily focus on the SIMD and MIMD categories, allowing developers to parallelize code across multiple cores or processors.

4. Systematization by Access to Memory (Shared, Distributed, and CUDA)

There are three main ways to organize memory access in parallel computing systems:

Shared memory: All processing units have access to a single shared memory space. Communication between processing units is done through memory reads and writes. OpenMP is an example of a shared memory programming model.
Distributed memory: Each processing unit has its own local memory, and communication between processing units is done via message passing. MPI (Message Passing Interface) is an example of a distributed memory programming model.
CUDA: Compute Unified Device Architecture (CUDA) is a parallel computing platform and programming model developed by NVIDIA. It allows developers to use NVIDIA GPUs for general-purpose computing tasks. CUDA combines elements of both shared and distributed memory models and provides additional abstractions for GPU programming.

Now let’s get to the fun part!

Generated on https://frankincents.ai/ by Etietop Demas Abraham

5. OpenMP Basics

OpenMP (Open Multi-Processing) is an API for writing parallel applications in C, C++, and Fortran. It is designed for shared-memory multi-processor systems and provides a simple and flexible way to parallelize code using compiler directives, library routines, and environment variables

Basic Constructs of OpenMP

Here are some of the basic constructs of OpenMP:

Directives: OpenMP uses directives, which are typically written as pragmas in C and C++. These pragmas provide instructions to the compiler to parallelize specific parts of the code.

Example: Parallelizing a simple for loop: #pragma omp parallel for

#pragma omp parallel for
for (int i = 0; i < N; i++) {
    // Perform some operation
}

2. Parallel Regions: A parallel region is a block of code that can be executed by multiple threads concurrently. You can create a parallel region using the #pragma omp parallel directive.

#pragma omp parallel
{
    // Code inside this block will be executed in parallel by multiple threads
}

3. Thread Management: OpenMP provides various mechanisms for managing and synchronizing threads, such as specifying the number of threads and setting thread affinity. Example: Setting the number of threads:

omp_set_num_threads(4); // Set the number of threads to 4

4. Synchronization: OpenMP provides synchronization constructs like barriers, critical sections, and atomic operations to ensure the proper execution order of operations within parallel regions.

Example: Using a critical section to protect a shared resource:

#pragma omp critical
{
    // Code inside this block will be executed by one thread at a time
}

5. Reduction: Reduction is a common operation in parallel computing where the result of an operation is accumulated across multiple threads. OpenMP provides a reduction clause to simplify the process.

Example: Summing the elements of an array using reduction:

int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
    sum += array[i];
}

6. Installing OpenMP on macOS with M1/M2 Processors

Photo by david latorre romero on Unsplash

To use OpenMP on macOS with M1/M2 processors, you need to install the LLVM OpenMP runtime library (libomp) using Homebrew, a package manager for macOS:

Install Homebrew by following the instructions on the official Homebrew website.
Install libomp by running the following command in the terminal:

brew install libomp

7. Adding Homebrew to PATH

After installing Homebrew, you need to add it to your PATH to access the installed tools easily. Run the following commands in your terminal:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/macbookpro/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

These commands add the Homebrew bin directory to your PATH, making it easier to use the installed tools directly from the terminal.

8. Updating LDFLAGS & CPPFLAGS Environment

When you install libomp, you may receive a message suggesting you update your LDFLAGS and CPPFLAGS environment variables. This is necessary for the compiler to find the OpenMP library and include files. Add the following lines to your shell configuration file (e.g., .zshrc or .bashrc):

export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"

After adding these lines, restart your terminal or run source ~/.zshrc or source ~/.bashrc to apply the changes.

9. Configuring Visual Studio Code for OpenMP

To use OpenMP with Visual Studio Code, you need to update the include path settings:

Open Visual Studio Code and go to Settings.
Search for “c_cpp” in the search bar.
Under “C/C++: IntelliSense Configurations”, click on “Edit in settings.json”.
Add the following lines to the settings.json file:

"C_Cpp.default.includePath": [
    "/opt/homebrew/opt/libomp/include"
]

5. Save the file and restart Visual Studio Code.

10. A Simple OpenMP Example

Here’s a simple example of an OpenMP program in C that demonstrates the basics of parallelization using OpenMP:

#include <stdio.h>
#include <omp.h>

int main() {
    int i, sum = 0;
    int array_size = 100;
    int array[array_size];

    // Initialize the array with values
    for (i = 0; i < array_size; i++) {
        array[i] = i + 1;
    }

    // Calculate the sum using OpenMP
    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < array_size; i++) {
        sum += array[i];
    }

    printf("The sum of the array elements is: %d\n", sum);
    return 0;
}

This example initializes an array and calculates the sum of its elements using a parallelized for loop with a reduction operation.

11. Compiling and Running the OpenMP Program

To compile and run the OpenMP program, follow these steps:

Save the example code in a file called test.c.
Open a terminal and navigate to the directory where test.c is saved.
Compile the program using the following command:

clang -Xpreprocessor -fopenmp -lomp test.c -o your_output_executable

4. Run the compiled program with the following command:

./your_output_executable

The program should output the sum of the array elements, demonstrating that the OpenMP parallelization is working correctly.

Thank you for your attention.

An Introduction to Parallelism, OpenMP, and Memory Access on macOS M1/M2 on Visual Studio

We will cover the following,

1. Introduction to Parallelism

2. Computational Complexity of Parallelized Transformer Networks

3. Flynn’s Taxonomy and Its Relationship to Parallelization

4. Systematization by Access to Memory (Shared, Distributed, and CUDA)

5. OpenMP Basics

Basic Constructs of OpenMP

6. Installing OpenMP on macOS with M1/M2 Processors

7. Adding Homebrew to PATH

8. Updating LDFLAGS & CPPFLAGS Environment

9. Configuring Visual Studio Code for OpenMP

10. A Simple OpenMP Example

11. Compiling and Running the OpenMP Program

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Etietop Abraham

No responses yet

More from Etietop Abraham

Installing and Running OpenMP with Python on macOS

In this tutorial, we will walk you through the process of installing OpenMP (Open Multi-Processing) for Python on macOS, specifically using…

Belbin’s Team Roles, the Framework for Winning Teams!

Meredith Belbin’s well-researched model for what makes a winning team is such a powerful tool for developing genuine teamwork and is…

CSVToParquetLoader Orchestrator (Pandas > Spark > HDFS)

In a realm of data vast and wide, Where rows and columns in files reside. There lies a script, both deft and clever, To transform and load…

A Step-by-Step Guide to Random Variables: The Long Jump Tournament Tutorial

Welcome to our intuitive guide to random variables, inspired by the story of the long jump tournament. In this tutorial, we will walk you…

Recommended from Medium

Common Lisp In 2055

The year is 2055. AI automation has made 99.7% of all jobs obsolete — a human barista is now a rare novelty.

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Lists

Natural Language Processing

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

What is Rust about?

The PROs and CONs

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.