The simulation steps in my game Steam Revolution consist of a sequence of loops over all trains or industries. The duration of the loops is extremely short, on the order of less than 10 us per loop. Off the shelf parallel libraries have too much overhead and actually made my game run slower than single-threaded, so I made custom parallelization primitives that have 1/8th of the overhead of OpenMP in some circumstances.
I give an overview of important CPU architecture details that affect the implementation.
I describe how I implemented atomics, spin locks, and a parallel for primitive.
I then show synthetic benchmarks of how my implementation beats OpenMP and PPL.
This video is about computer architecture and how to maximize a computers potential. I describe primitives that could be used in any program, and how I actually use them to parallelize my game will be shown in a follow-up video.
My game is similar in concept to OpenTTD. The difference is that my game's focus is more on optimizing static levels for good scores rather than showing how a transportation empire progresses over time. In this series I document my progress building a game from scratch with no pre-made engine; one could call it handmade.
Timestamps:
0:00:00 - Introduction
0:01:18 - CPU Architecture
0:29:22 - Atomic Integer
0:39:28 - Spin Lock
0:46:25 - Parallel For API
1:01:12 - Parallel For Implementation
1:11:11 - Benchmarks
1:22:22 - Conclusion
15 Comments