Problem solve Get help with specific problems with your technologies, process and projects.

Multi-core systems vs. multi-CPU systems

Can you get better performance from a dual-core system or a dual-processor system? This article evaluates the two configurations and determines that multi-core chips tend to be more efficient than multi-CPU systems.

Multi-core chips tend to be more efficient than multi-CPU systems. Another benefit of multi-core systems is that...

having only a single CPU keeps system board prices low since there is only a need for one CPU socket and the corresponding hardware that facilitates its use.

Today nearly every computer on the market contains a processor with at least two cores, while every server sold today has at least two cores (some have four). The longstanding question of whether or not you should buy a dual-core system isn't even an issue anymore because practically every modern system supports two or more cores. But what's the difference is between a dual-core system and a system with two physical processors? Which configuration offers better performance?

Before I can explain these differences, you should understand that having two processors does not double a machine's performance. There are two main reasons for this.

  1. In order for a machine to benefit from multiple processors (or multiple cores for that matter), the machine must be running multiple threads. This isn't a problem; most of the Windows operating systems in use today are multi-threaded. And even though most applications are not multi-threaded, users of dual-core or dual-CPU systems could still see a considerable performance boost if they multitask.
  2. The inherent overhead in multi-CPU systems. Some of each CPU's power is lost to scheduling tasks. This is necessary because the computer must have some way of deciding which thread will run on each CPU.

Even more performance is lost to communications occurring between processors. For multiple processors to work cooperatively, the two processors must be able to communicate with each other. Imagine what would happen if two processors tried to apply conflicting changes to the same memory location at the same time. To prevent this type of catastrophe, a processor must verify that the contents of a memory location that it is about to manipulate are up to date in the system's memory, and are not presently residing in another processor's cache.

To perform this verification, systems with multiple processors use a protocol for communications between processors. Several protocols exist; two of the more common are MESI (Modified, Exclusive, Shared, Invalid) and MOESI (Modified, Owner, Exclusive, Shared, Invalid).

Which protocol is actually in use is irrelevant. What is important is that when a CPU wants to execute an instruction, it must issue a request to the other CPUs and wait for a response. If the target memory area is not in use, the CPU can execute the instruction. Otherwise, it must wait for the memory area to become available.

Birth of dual-core processors

This verification process, while necessary, is extremely inefficient. In fact, Intel and AMD both realized that the communications between processors were the most inefficient aspect of multi-CPU computing. Both companies decided to eliminate most of the delay by incorporating two CPUs on the same die (the same chip). And so dual-core CPUs were born.

As in a system with multiple CPUs, communication between cores is still necessary. However, having both cores existing on the same die eliminates most of the lag time associated with communications between processors.

Although both Intel and AMD took the same basic approach to creating dual-core processors, the similarities end there.

Intel's design was more evolutionary than revolutionary, since it essentially stuck two Pentium 4 processors on a single piece of silicon. AMD designed its 64 X2 chip from the ground up. In doing so, the company realized that while the largest inefficiency in multi-CPU systems was the communications between processors, there was also a large delay associated with the processor communicating with the memory bus. As a way of reducing this delay, AMD integrated the Northbridge chipset into its processors.

About the author: Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server, Exchange Server and IIS. He has served as CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. He writes regularly for and other TechTarget sites.

More information on this topic:

This was last published in June 2007

Dig Deeper on Enterprise infrastructure management

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

1.) Original P4/Xeons had 24 execution units for one core. Then Intel gives us dual core (whoop) with 10 execution units per core; then quad core with 4 e.u. per core.
2.) Memory coherency was handled by cache controllers. Fact since multi-386 Sequent minicomputers.
3.) Dual Pentium CPUs were L1 cache cache only.
4.) Wide (64-bit) data buses are problematic as they get faster. Syncing data transfers and problems like bit-jitter (faster data changes travel slower on the wire) require shorter data paths (on chip) or serial (hypertransport, pcie, sata) data paths. C.f. Pentium II off die cache vs Celeron on-chip cache.
5.) Even with the speed disadvantage, there are mothermboards with 10 discrete AMD multicore CPUs and 8 discrete multicore Xeon CPUs.
6.) unless this is an ancient article, there should be mention of multicore GPUs as well as HSA.