Symmetric multiprocessing

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors. Processors may be interconnected using buses, crossbar switches or on-chip mesh networks. The bottleneck in the scalability of SMP using buses or crossbar switches is the bandwidth and power consumption of the interconnect among the various processors, the memory, and the disk arrays. Mesh architectures avoid these bottlenecks, and provide nearly linear scalability to much higher processor counts.

SMP systems allow any processor to work on any task no matter where the data for that task are located in memory, provided that each task in the system is not in execution on two or more processors at the same time; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.

Alternatives


SMP represents one of the earliest styles of multiprocessor machine architectures, typically used for building smaller computers with up to 8 processors. Larger computer systems might use newer architectures such as NUMA (Non-Uniform Memory Access), which dedicates different memory banks to different processors. In a NUMA architecture, processors may access local memory quickly and remote memory more slowly. This can dramatically improve memory throughput as long as the data is localized to specific processes (and thus processors). On the downside, NUMA makes the cost of moving data from one processor to another, as in workload balancing, more expensive. The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

Other systems include asymmetric multiprocessing (ASMP), which uses separate specialized processors for specific tasks (which increases complexity), and computer clustered multiprocessing (such as Beowulf), in which not all memory is available to all processors.

Examples of ASMP include many media processor chips that are a relatively slow base processor assisted by a number of hardware accelerator cores. High-powered 3D chipsets in modern videocards could be considered a form of asymmetric multiprocessing. Clustering techniques are used fairly extensively to build very large supercomputers. In this discussions a single processor is denoted as a uni processor (UN).

Advantages and disadvantages
SMP has many uses in science, industry, and business which often use custom-programmed software for multithreaded (multitasked) processing. However, most consumer products such as word processors and computer games are written in such a manner that they cannot gain large benefits from concurrent systems. For games this is usually because writing a program to increase performance on SMP systems can produce a performance loss on uniprocessor systems. , however, multi-core chips are becoming more common in new computers, and the balance between installed uni- and multi-core computers may change in the coming years.

Uniprocessor and SMP systems require different programming methods to achieve maximum performance. Therefore two separate versions of the same program may have to be maintained, one for each. Programs running on SMP systems may experience a performance increase even when they have been written for uniprocessor systems. This is because hardware interrupts that usually suspend program execution while the kernel handles them can execute on an idle processor instead. The effect in most applications (e.g. games) is not so much a performance increase as the appearance that the program is running much more smoothly. In some applications, particularly compilers and some distributed computing projects, one will see an improvement by a factor of (nearly) the number of additional processors.

In situations where more than one program executes at the same time, an SMP system will have considerably better performance than a uni-processor because different programs can run on different CPUs simultaneously.

Systems programmers must build support for SMP into the operating system: otherwise, the additional processors remain idle and the system functions as a uniprocessor system.

In cases where an SMP environment processes many jobs, administrators often experience a loss of hardware efficiency. Software programs have been developed to schedule jobs so that the processor utilization reaches its maximum potential. Good software packages can achieve this maximum potential by scheduling each CPU separately, as well as being able to integrate multiple SMP machines and clusters.

Access to RAM is serialized; this and cache coherency issues causes performance to lag slightly behind the number of additional processors in the system (aga).

Entry-level systems
Before about 2006, entry-level servers and workstations with two processors dominated the SMP market. With the introduction of dual-core devices, SMP is found in most new desktop machines and in many laptop machines. The most popular entry-level SMP systems use the x86 instruction set architecture and are based on Intel’s Xeon, Pentium D, Core Duo, and Core 2 Duo based processors or AMD’s Athlon64 X2, Quad FX or Opteron 200 and 2000 series processors. Servers use those processors and other readily available non-x86 processor choices, including the Sun Microsystems UltraSPARC, Fujitsu SPARC64 III and later, SGI MIPS, Intel Itanium, Hewlett Packard PA-RISC, Hewlett-Packard (merged with Compaq which acquired first Digital Equipment Corporation) DEC Alpha, IBM POWER and Apple Computer PowerPC (specifically G4 and G5 series, as well as earlier PowerPC 604 and 604e series) processors. In all cases, these systems are available in uniprocessor versions as well.

Earlier SMP systems used motherboards that have two or more CPU sockets. , microprocessor manufacturers introduced CPU devices with two or more processors in one device, for example, the POWER, UltraSPARC, Opteron, Athlon, Core 2, and Xeon all have multi-core variants. Athlon and Core 2 Duo multiprocessors are socket-compatible with uniprocessor variants, so an expensive dual socket motherboard is no longer needed to implement an entry-level SMP machine. It should also be noted that dual socket Opteron designs are technically ccNUMA designs, though they can be programmed as SMP for a slight loss in performance..

Mid-level systems
The Burroughs B5500 first implemented SMP in 1961. It was implemented later on other mainframes. Mid-level servers, using between four and eight processors, can be found using the Intel Xeon MP, AMD Opteron 800 and 8000 series and the above-mentioned UltraSPARC, SPARC64, MIPS, Itanium, PA-RISC, Alpha and POWER processors. High-end systems, with sixteen or more processors, are also available with all of the above processors.

Sequent Computer Systems built large SMP machines using Intel 80386 (and later 80486) processors. Some smaller 80486 systems existed, but the major x86 SMP market began with the Intel Pentium technology supporting up to two processors. The Intel Pentium Pro expanded SMP support with up to four processors natively. Later, the Intel Pentium II, and Intel Pentium III processors allowed dual CPU systems, except for the respective Celerons. This was followed by the Intel Pentium II Xeon and Intel Pentium III Xeon processors which could be used with up to four processors in a system natively. In 2001 AMD released their Athlon MP, or MultiProcessor CPU, together with the 760MP motherboard chipset as their first offering in the dual processor marketplace. Although several much larger systems were built, they were all limited by the physical memory addressing limitation of 64 GiB. With the introduction of 64-bit memory addressing on the AMD64 Opteron in 2003 and Intel 64 (EM64T) Xeon in 2005, systems are able to address much larger amounts of memory; their addressable limitation of 16 EiB is not expected to be reached in the foreseeable future.

Operating systems running on SMP computers

 * BeOS and derivatives
 * BSD descendants:
 * DragonFly BSD
 * FreeBSD
 * NetBSD
 * OpenBSD
 * Burroughs (Unisys) MCP (1961-present)
 * IBM i5/OS
 * INTEGRITY
 * LynxOS
 * LabVIEW Real-Time Module (version 8.5 or later)
 * Linux-based systems
 * Mac OS (7.5.5 to 9.2.2) and Mac OS X
 * Microsoft Windows NT
 * Nucleus RTOS
 * OpenVMS (since VMS 5.0)
 * OS/2 (since 2.11 )
 * OSE real-time operating system (OSE5)
 * PikeOS real-time operating system for embedded systems
 * Plan 9
 * QNX real-time operating system (2000-present)
 * SkyOS
 * Syllable
 * Tandem Computers
 * Guardian
 * NonStop Kernel.
 * T/TOS (Tandem Operating System)
 * TOPS-10 Operating System for PDP-10 36-bit architecture (True SMP since version 7.01 )
 * UNIVAC EXEC 8 (1964-present)
 * Unix-like systems such as BSD derivatives, Sun Solaris, AIX, HP-UX, IRIX, DYNIX/PTX 4.4.2
 * VxWorks
 * z/OS
 * z/TPF
 * z/VM
 * z/VSE

SMP-capable processors

 * Advanced Micro Devices (AMD)
 * Athlon MP
 * Athlon 64 X2
 * Turion 64 X2
 * AMD Opteron
 * Phenom
 * Phenom II
 * Athlon II
 * Azul Systems
 * Vega 1
 * Vega 2
 * DEC Alpha
 * Hewlett-Packard
 * HP PA-RISC
 * International Business Machines (IBM)
 * AIM Apple/IBM/Motorola PowerPC 601, 604, 604e
 * IBM POWER
 * System z
 * Inmos
 * INMOS transputers: T400, T800 and T9000
 * Intel
 * Intel 486/DX
 * Intel OverDrive Processor, Socket 7; Intel OverDrive Processor, Socket 8
 * Intel Pentium Pro; Intel Pentium II; Intel Pentium III
 * Intel Pentium D
 * Intel Core; Intel Pentium Dual-Core; Intel Core 2
 * Intel Core i7
 * Intel Core i5
 * Intel Xeon
 * Intel Itanium; Intel Itanium 2
 * glueless up to four processors (max. 16 in IA-32 compatibility mode)
 * Motorola/Freescale/Apple
 * AIM Apple/IBM/Motorola PowerPC 601, 604, 604e
 * Sun Microsystems
 * UltraSPARC The Sparkle SPARC Derivative was multi-processor capable in 1991
 * SGI/MIPS
 * MIPS; MIPS64
 * Razamicroelectronic
 * XLR
 * Tilera
 * TILE64 - 64 cores
 * TILEPro - 36 and 64 cores
 * TILE-Gx - 16, 36, 64 and 100 cores
 * Cavium Networks
 * Octeon-I, II Up to 32 MIPS64 cores Soc