Windows-10-logoEven now most games are design for a single CPU core. They are threaded but the are not true parallel programs.

Super computers use processor cores by the thousands with parallel programming. Game engines however are more complex and it may take some time for developers to figure out what can be done.

DX12 and AMD Mantle are efforts to bring parallel programming to games, Right now games are using the vector capability of a video card. So when they are rendering a scene the video card does all the heavy lifting.

Windows has been multithreaded for decades. The CPU is not saturated so playing a MP3 uses little resource not unlike the word processor. Windows task switcher changes focus over 100 times per second so the system appears to be multitasking.

Starting around 2004 the first dual core processors became available. The advantage is obvious. More recently CPU cores have added more and more cores,

WIndows XP and above are designed to take better advantage of multicore processors. Core 0 becomes the primary thread and each additional core becomes a manager for separate processes. Over time Windows has become better at allocating tasks to all the available cores.

A game can take advantage of more cores. For example the AI handler can be run a separate process from the UI thread. Threads are not generally the same. This is the idea of asynchronous processing with modern graphics cards.

In Windows many threads may be awaiting an event so it is no longer being scheduled by the OS and using any CPU time. The actual waiting is done by the Windows itself. It simply removes the thread from the list of running threads, and puts it in a waiting list instead. When the event occurs the handler is simply reloaded.

Our old AMD Phenom II X4 965 Black Edition runs are 3.4 GHz. It can easily be overclocked to a higher speed. With Windows 8 and above the CPU cores are now consolidated as the kernel is now fully capable of allocating cores as needed for tasks. More recent processors are somewhat faster and they use somewhat less power but CPU power management has not changed much which negates the advantages.

Right now the workload for games is mostly focused on the raw processing power of the video card. Recent cards have now moved into the TFLOP realm. As time goes by the incremental improvements of video cards has been stupendous. The video card is basically a secondary processor that signals when it’s done.

The idea of a coprocessor goes back to the original 8088 and the 8087 floating point chip. Eventually the separate device moved into the CPU but today SSE has replaced the FPU which is obsolete. SSE is a new design with new registers that are more specialized for floating point work. As example, 64-bit Windows uses SSE2 and does not use the old FPU at all. The FPU only remains to support legacy programs. SSE is materially faster. SSE can benefit financial programs like spreadsheets etc by being able to calculate more cells at once etc. SSE first appeared with the Pentium III processors circa 2000. SSE has been expanded ever since.

So considering the hardware that is seen with a typical PC, asynchronous has been the way since the get go. Video cards are more thread friendly with segmented designs as they offer more thread options. Vista and above use the GPU to handle fonts on the screen etc. This is not demanding and integrated graphics are quite capable. Integrated graphics have improved over  time primarily for the benefit of the WIndows Desktop Manager and Aero.

When Batman was released in 2010, it was discovered that the PhysX was still using the old FPU instead of the SSE which is dramatically faster  The meant the game ran very poorly and slow. Even in 2012 when Borderlands 2 came out, the PhysX was stull using the slower FPU code. Visual Studio 2005 and above all support SSE for 32-bit programs suggesting NVIDIA was not concerned with PhysX performance. The OpenCL based Bullet PhysX will run on any Intel CPU, any AMD CPU or GPU and any NVidia GPU. Unfortunately few games use this package yet and NVIDIA has been slow to support OpenCL as well.