Sunday, 26 April 2015

Relative speed of Raspberry Pi, Pi 2, and desktop PC (x86 and AMD64)

The last two posts described my attempts to build a benchmark program from the Quake 2 source code, to complement a similar program I made from the Doom source code. The main issues have been related to floating-point numbers; much of the effort has gone into discovering why the rendered output from Quake 2 was slightly different with different GCC configurations, different CPUs and different math libraries.

However, I have now completed the work, and uploaded it here. The floating-point problems are resolved by (1) forcing the use of SSE on x86 platforms, and (2) bundling a math library with the benchmark. I chose openlibm for this purpose. The approach has the additional benefit that SSE will be used within the math library. x86 Linux math libraries typically use the x87 FPU, with all of its associated issues.

The benchmark allows me to make a speed comparison of a few different platforms that I have access to. None of these results are particularly surprising, but I think it is interesting to quantify the difference between these platforms.

Of course, this isn't SPEC or EEMBC. The experiments aren't rigorous. These figures are only a rough guide to the relative speed of those platforms when running integer-only code (Doom) or mixed integer/floating-point code (Quake 2). Here's the data:

Arch Model GCC version Absolute Time (s) Relative Time
Quake 2 Doom Quake 2 Doom
FP/Integer Integer FP/Integer Integer
ARMv6 RPi 4.6.3 483.2 217.1 1.0 1.0
ARMv7 RPi 2 4.6.3 218.0 77.3 2.2 2.8
x86 Core2 E8600 4.7.2 38.8 10.3 12.5 21.1
x64 Core2 E8600 4.7.2 30.0 9.4 16.1 23.1
x86 Core i3 3220 4.7.4 25.4 7.5 19.0 28.9
x64 Core i3 3220 4.1.2 20.9 6.9 23.1 31.5

Both benchmarks operate as follows: the game is started in a non-interactive "headless" mode in which the graphical output is rendered to a memory buffer only. The game plays through a demo file that completes the game: Doom Done Quick, or Quake 2 Done Quick 2. These demo files would take about 20 minutes to play in real time; in headless mode, they complete in under a minute on the PC platforms. The benchmark renders every frame at a fixed frame rate (12.5 fps for Quake 2, 35 fps for Doom). We check for correct rendering using a CRC-32 check of the data in the buffer: however, this check does not affect benchmark timing, because it is enabled only while testing the benchmark.

The output time is shown in seconds. We are looking at four different systems. Of these, the original Raspberry Pi (RPi) is slowest, taking 483.2s to run Quake 2 and 217.1s to run Doom. These timings are used as a baseline for relative comparisons.

Here is the relative data for Quake 2 (with RPi = 1.0):

For this mixed floating-point/integer benchmark, the RPi 2 is roughly twice as fast as RPi. An older desktop platform (Core2 E8600) is 16 times faster if running AMD64 code. A recent desktop (i3 3220) is 23 times faster.

In general, AMD64 builds are faster than x86 builds - approximately 25% faster for FP/Integer code, and 10% faster for integer-only code.

The data for Doom shows similar results for integer-only code:
Here, RPi 2 is nearly three times as fast as RPi. The desktop platforms are 23 times faster (E8600) and nearly 32 times faster (i3 3220). These are the reasons why you might be willing to put up with a computer that requires cooling hardware, uses more than a hundred watts of power, and costs ten times as much as an RPi.

Though I did expect a gap, I was surprised by the gap between the RPi 2 and the desktop PCs. I had hoped that my aged Core 2 desktop would be closer to the RPi 2, but it really isn't. The gap is noticeable when you use the RPi 2 for anything CPU-intensive, like browsing a Javascript-heavy site like Gmail or Github, compressing an MP3 file or compiling a program. This is unfortunate. In principle the RPi 2 with Raspbian does most of the things I need - in reality, it is just a bit too slow. The idea of a very power-efficient PC appeals to me, but not enough to put up with this disadvantage.