Integer Performance Comparison for C++, C#, Delphi



In this post, I have compared three versions (from the same codebase) using Delphi to count the number of primes between 1 to N. Actually, this is quite a simple example that can be used to benchmark the integer computation for modern programming languages.

In order to make relative fair comparisons, the IsPrime function is implemented exactly the same despite the fact that it may be implemented using a different algorithm e.g. Fermat Prime Testing algorithm.

C# .NET Counting Primes

C# generates the CIL code, which is then JIT. So the performance of C# application is quite close to the native groups (C++, Delphi etc).

Serial Version

One of the advantages using C# is that we can use LINQ (or in functional programming style):

1
2
3
public static int Serial(int start, int end) =>
            Enumerable.Range(start, end - start)
                      .Count(IsPrime);
public static int Serial(int start, int end) =>
            Enumerable.Range(start, end - start)
                      .Count(IsPrime);

Parallel Version

Suprisingly, we just need to add .AsParallel() to the above C# code and that makes our parallel version:

1
2
3
4
public static int Parallel(int start, int end) =>
            Enumerable.Range(start, end - start)
                      .AsParallel()
                      .Count(IsPrime);
public static int Parallel(int start, int end) =>
            Enumerable.Range(start, end - start)
                      .AsParallel()
                      .Count(IsPrime);

It is the same as:

1
2
3
4
5
public static int Parallel(int start, int end) =>
            Enumerable.Range(start, end - start)
                      .AsParallel()
                      .Where(IsPrime)
                      .Count();
public static int Parallel(int start, int end) =>
            Enumerable.Range(start, end - start)
                      .AsParallel()
                      .Where(IsPrime)
                      .Count();

C# LINQ makes writting parallel code a lot easier. Assume that in the future, you want to ship your code to other platforms e.g. GPU cores, you don’t need to change a single line of code!

Delphi Counting Primes

We ‘ve listed the two versions in this post. However the parallel version is not optimial: it recursively splits the range into two halves which causes performance overhead. To make a fair comparison to the above C# (AsParallel) version, we rewrite the delphi code:

function Test2: integer;
var
  s: integer;
begin
  s := 0;
  TParallel.&For(1, MAXN, procedure(i: integer)
    begin
      if (IsPrime(i) = 1) then
      begin // memory barrier
        AtomicIncrement(s);
      end
    end);
  Result := s;
end;

Please note that, we use the AtomicIncrement to ensure the correct counting but this is the memory barrier. If we get rid of the conditional branch check, it will be slightly faster.

function Test3: integer;
var
  s: integer;
begin
  s := 0;
  TParallel.&For(1, MAXN, procedure(i: integer)
    begin // memory barrier
      AtomicIncrement(s, IsPrime(i));
    end);
  Result := s;
end;

The cost of increasing the chance of memory barrier is compensated by removing a branch check.

Visual Studio C++ Counting Primes

In Visual Studio C++, we can use the concurrency to implement the Parallel For:

1
2
3
4
5
6
7
8
9
int Parallel1(int n) {
    volatile long s = 0;
    Concurrency::parallel_for(0, n, [&](int i) {
        if (IsPrime(i)) {
            InterlockedIncrement(&s);
        }
    });
    return s;
}
int Parallel1(int n) {
	volatile long s = 0;
	Concurrency::parallel_for(0, n, [&](int i) {
		if (IsPrime(i)) {
			InterlockedIncrement(&s);
		}
	});
	return s;
}

Similarly, the following is slightly a bit faster:

1
2
3
4
5
6
7
int Parallel2(int n) {
    volatile long s = 0;
    Concurrency::parallel_for(0, n, [&](int i) {
        InterlockedAdd(&s, IsPrime(i));
    });
    return s;
}
int Parallel2(int n) {
	volatile long s = 0;
	Concurrency::parallel_for(0, n, [&](int i) {
		InterlockedAdd(&s, IsPrime(i));
	});
	return s;
}

MinGW OpenMP C++ Counting Primes

We use the MINGW C++ with OpenMP support:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int Parallel1(int n) {
    volatile long s = 0;
 
    #pragma omp parallel for
    for (int i = 1; i <= n; ++i) {
        if (IsPrime(i)) {
            InterlockedIncrement(&s);
        }
    }
    return s;
}
 
int Parallel2(int n) {
    volatile long s = 0;
 
    #pragma omp parallel for
    for (int i = 1; i <= n; ++i) {
        __sync_fetch_and_add(&s, IsPrime(i));
    }
    return s;
}
int Parallel1(int n) {
	volatile long s = 0;

	#pragma omp parallel for
	for (int i = 1; i <= n; ++i) {
		if (IsPrime(i)) {
			InterlockedIncrement(&s);
		}
	}
	return s;
}

int Parallel2(int n) {
	volatile long s = 0;

	#pragma omp parallel for
	for (int i = 1; i <= n; ++i) {
		__sync_fetch_and_add(&s, IsPrime(i));
	}
	return s;
}

Integer Performance Comparison for C++, C#, Delphi

With the above code (all can be found in github), all compiled to RELEASE version, and TickCount is recorded on Windows 10.

The hardware specs: Intel i7-6700 CPU (3.4 GHz) and 32 GB DDR4 2133MHz.

integer-performance-comparison-for-c-c-delphi Integer Performance Comparison for C++, C#, Delphi C# delphi integer

TickCount (the Shorter, the Faster)

There are no winners here

and it is not about the language war!

  • It is amazing that the C# .NET code is generally quite highly performed.
  • Delphi Integer Win64 Code performs the fastest. But other rivals are very close (marginal differences)
  • Visual Studio C++ compiler produces the fastest Win32 code.
  • Delphi Win64 compiler produces the fastest Win64 code.
  • In order to make a direct comparison (same PC), the testing for Linux64 bit code generated by Delphi Tokyo compiler is done at Linux Sub System on Windows. It is slow but may be totally a different story if you deploy the binaries directly on a truly Ubuntu Linux server (but in this case, you have to make sure the Linux Server and Windows PC are exactly the same specs)

I haven’t be able to get MingGW 64-bit compiler to work with OpenMP, but I guess the results will be similar. Modern compilers are in generally doing really good jobs so the code performance is often not the first concern when choosing the language for your application but e.g. Productivity, Cross Platform.

It might be interesting to add Java for comparison, and I may update this post soon.

Related Delphi for Linux 64-bit Posts

–EOF (Coding For Speed) —

GD Star Rating
loading...
1144 words Last Post: Performance Comparison between PHP7.0.11 and PHP5.6.23 on Two VPS
Next Post: Is There Such Thing as High-Quality Code?

The Permanent URL is: Integer Performance Comparison for C++, C#, Delphi (AMP Version)

5 Comments

    • Mushi Mage

Leave a Reply