x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other enhancements. The original specification was created by AMD, and has been implemented by AMD, Intel, VIA, and others. It is fully backwards compatible with 32-bit code. Because the full 32-bit instruction set remains implemented in hardware without any intervening emulation, existing 32-bit x86 executables run with no compatibility or performance penalties, although existing applications that are recoded to take advantage of new features of the processor design may see significant performance increases.
AMD later introduced AMD64 as the architecture's marketing name; Intel used the names IA-32e and EM64T before finally settling on the name Intel 64 for their implementation. x86-64 is still used by many in the industry as a vendor-neutral term, as is x64.
The x86-64 specification is distinct from the Intel Itanium (formerly IA-64) architecture, which is not compatible on the native instruction set level with either the x86 or x86-64 architectures.
The primary defining characteristic of AMD64 is the availability of 64-bit general-purpose processor registers, i.e. rax, rbx etc., 64-bit integer arithmetic and logical operations, and 64-bit virtual addresses. The designers took the opportunity to make other improvements as well. The most significant changes include:
- 64-bit integer capability: All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc., can now operate directly on 64-bit integers. Pushes and pops on the stack are always in 8-byte strides, and pointers are 8 bytes wide.
- Additional registers: In addition to increasing the size of the general-purpose registers, the number of named general-purpose registers is increased from eight (i.e. eax, ebx, ecx, edx, ebp, esp, esi, edi) in x86 to 16 (i.e. rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15). It is therefore possible to keep more local variables in registers rather than on the stack, and to let registers hold frequently accessed constants; arguments for small and fast subroutines may also be passed in registers to a greater extent. However, AMD64 still has fewer registers than many common RISC ISAs (which typically have 32–64 registers) or VLIW-like machines such as the IA-64 (which has 128 registers); note, however, that because of register renaming the number of physical registers is often much larger than the number of registers exposed by the instruction set.
- Additional XMM (SSE) registers: Similarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16.
- Larger virtual address space: The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations. This allows up to 256 TB (248 bytes) of virtual address space. The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86. This means that very large files can be operated on by mapping the entire file into the process' address space (which is often much faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.
- Larger physical address space: The original implementation of the AMD64 architecture implemented 40-bit physical addresses and so could address up to 1 TB (240 bytes) of RAM. and therefore can address up to 256 TB of RAM. The architecture permits extending this to 52 bits in the future (limited by the page table entry format); this would allow addressing of up to 4 PB of RAM. For comparison, x86 processors are limited to 64 GB of RAM in Physical Address Extension (PAE) mode, or 4 GB of RAM without PAE mode.
- Larger physical address space in legacy mode: When operating in legacy mode the AMD64 architecture supports Physical Address Extension (PAE) mode, as do most current x86 processors, but AMD64 extends PAE from 36 bits to an architectural limit of 52 bits of physical address. Any implementation therefore allows the same physical address limit as under long mode.
- Instruction pointer relative data access: Instructions can now reference data relative to the instruction pointer (RIP register). This makes position independent code, as is often used in shared libraries and code loaded at run time, more efficient.
- SSE instructions: The original AMD64 architecture adopted Intel's SSE and SSE2 as core instructions. SSE3 instructions were added in April 2005. SSE2 is an alternative to the x87 instruction set's IEEE 80-bit precision with the choice of either IEEE 32-bit or 64-bit floating-point mathematics. This provides floating-point operations compatible with many other modern CPUs. The SSE and SSE2 instructions have also been extended to operate on the eight new XMM registers. SSE and SSE2 are available in 32-bit mode in modern x86 processors; however, if they're used in 32-bit programs, those programs will only work on systems with processors that have the feature. This is not an issue in 64-bit programs, as all AMD64 processors have SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which x86-64 programs can be run. SSE and SSE2 are generally faster than, and duplicate most of the features of the traditional x87 instructions, MMX, and 3DNow!.
- No-Execute bit: The "NX" bit (bit 63 of the page table entry) allows the operating system to specify which pages of virtual address space can contain executable code and which cannot. An attempt to execute code from a page tagged "no execute" will result in a memory access violation, similar to an attempt to write to a read-only page. This should make it more difficult for malicious code to take control of the system via "buffer overrun" or "unchecked buffer" attacks. A similar feature has been available on x86 processors since the 80286 as an attribute of segment descriptors; however, this works only on an entire segment at a time. Segmented addressing has long been considered an obsolete mode of operation, and all current PC operating systems in effect bypass it, setting all segments to a base address of 0 and (in their 32 bit implementation) a size of 4 GB. AMD was the first x86-family vendor to implement no-execute in linear addressing mode. The feature is also available in legacy mode on AMD64 processors, and recent Intel x86 processors, when PAE is used.
- Removal of older features: A number of "system programming" features of the x86 architecture are not used in modern operating systems and are not available on AMD64 in long (64-bit and compatibility) mode. These include segmented addressing (although the FS and GS segments are retained in vestigial form for use as extra base pointers to operating system structures), the task state switch mechanism, and Virtual 8086 mode. These features do of course remain fully implemented in "legacy mode," thus permitting these processors to run 32-bit and 16-bit operating systems without modification.
Virtual address space details
Operating system limits
The operating system can also limit the virtual address space. Details, where applicable, are given in the "Operating system compatibility and characteristics" section.
Physical address space details
Current AMD64 implementations support a physical address space of up to 248 bytes of RAM, or 256 TB,. A larger amount of installed RAM allows the operating system to keep more of the workload's pageable data and code in RAM, which can improve performance, though various workloads will have different points of diminishing returns.
The upper limit on RAM that can be used in a given x86-64 system depends on a variety of factors and can be far less than that implemented by the processor. For example, as of June 2010, there are no known motherboards for x86-64 processors that support 256 TB of RAM. The operating system may place additional limits on the amount of RAM that is usable or supported. Details on this point are given in the "Operating system compatibility and characteristics" section of this article.
|Operating mode||Operating system required||Compiled-application rebuild required||Default address size||Default operand size||Register extensions||Typical GPR width|
|Long mode||64-bit mode||OS with 64-bit support, or bootloader for 64-bit OS||Yes||64||32||Yes||64|
|Legacy mode||Protected mode||Legacy 16-bit or 32-bit OS; or bootloader for 16, 32, or 64-bit OS||No||32||32||No||32|
|Virtual 8086 mode||Legacy 16-bit or 32-bit OS||16||16||16|
|Real mode||Legacy 16-bit OS; or bootloader for 16, 32, or 64 bit OS|
The architecture's intended primary mode of operation; it is a combination of the processor's native 64-bit mode and a combined 32-bit and 16-bit compatibility mode. It is used by 64-bit operating systems. Under a 64-bit operating system, 64-bit programs run under 64-bit mode, and 32-bit and 16-bit protected mode applications (that do not need to use either real mode or virtual 8086 mode in order to execute at any time) run under compatibility mode. Real-mode programs and programs that use virtual 8086 mode at any time cannot be run in long mode unless they are emulated.
Since the basic instruction set is the same, there is almost no performance penalty for executing protected mode x86 code. This is unlike Intel's IA-64, where differences in the underlying ISA means that running 32-bit code must be done either in emulation of x86 (making the process slower) or with a dedicated x86 core. However, on the x86-64 platform, many x86 applications could benefit from a 64-bit recompile, due to the additional registers in 64-bit code and guaranteed SSE2-based FPU support, which a compiler can use for optimization. However, applications that regularly handle integers wider than 32 bits, such as cryptographic algorithms, will need a rewrite of the code handling the huge integers in order to take advantage of the 64-bit registers.
The mode used by 16-bit (protected mode or real mode) and 32-bit operating systems. In this mode, the processor acts just like an x86 processor, and only 16-bit or 32-bit code can be executed. Legacy mode allows for a maximum of 32 bit virtual addressing which limits the virtual address space to 4 GB. 64-bit programs cannot be run from legacy mode.
Operating system compatibility and characteristics
The following operating systems and releases support the x86-64 architecture in long mode.
It may also be possible to enter long mode with a DOS extender similar to DOS/4GW, but more complex since x86-64 lacks virtual 8086 mode. DOS itself is not aware of that, and no benefits should be expected unless running DOS in an emulation with an adequate virtualization driver backend, for example: the mass storage interface.
x86-64 editions of Microsoft Windows client and server, Windows XP Professional x64 Edition and Windows Server 2003 x64 Edition were released in March 2005. Internally they are actually the same build (5.2.3790.1830 SP1), as they share the same source base and operating system binaries, so even system updates are released in unified packages, much in the manner as Windows 2000 Professional and Server editions for x86. Windows Vista, which also has many different editions, was released in January 2007. Windows for x86-64 has the following characteristics:
- 8 TB of "user mode" virtual address space per process. A 64-bit program can use all of this, subject of course to backing store limits on the system, and provided it is linked with the "large address aware" option. This is a 4096-fold increase over the default 2 GB user-mode virtual address space offered by 32-bit Windows.
- 8 TB of kernel mode virtual address space for the operating system. As with the user mode address space,, this is a 4096-fold increase over 32-bit Windows versions. The increased space is primarily of benefit to the file system cache and kernel mode "heaps" (non-paged pool and paged pool). Windows only uses a total of 16 TB out of the 256 TB implemented by the processors because early AMD64 processors lacked a CMPXCHG16B instruction.
- Ability to run existing 32-bit applications (.exe's) and dynamic link libraries (.dll's). Furthermore, a 32-bit program, if it was linked with the "large address aware" option, can use up to 4 GB of virtual address space in 64-bit Windows, instead of the default 2 GB (optional 3 GB with /3GB boot option and "large address aware" link option) offered by 32-bit Windows. Unlike the use of the /3GB boot option on x86, this does not reduce the kernel mode virtual address space available to the operating system. 32-bit applications can therefore benefit from running on x64 Windows even if they are not recompiled for the x64 processor.
- Both 32- and 64-bit applications, if not linked with "large address aware," are limited to 2 GB of virtual address space.
- Ability to use up to 128 GB (Windows XP/Vista), 192 GB (Windows 7), 1 TB (Windows Server 2003), or 2 TB (Windows Server 2008) of random access memory (RAM).
- LLP64 data model: "int" and "long" types are 32 bits wide, long long is 64 bits, while pointers and types derived from pointers are 64 bits wide.
- Kernel mode device drivers must be 64-bit versions; there is no way to run 32-bit kernel-mode executables within the 64-bit operating system. User mode device drivers can be either 32-bit or 64-bit.
- 16-bit Windows (Win16) and DOS applications will not run on x86-64 versions of Windows due to removal of virtual DOS machine subsystem (NTVDM).
- Full implementation of the NX (No Execute) page protection feature. This is also implemented on recent 32-bit versions of Windows when they are started in PAE mode.
- Instead of FS segment descriptor on x86 versions of the Windows NT family, GS segment descriptor is used to point to two operating system defined structures: Thread Information Block (NT_TIB) in user mode and Processor Control Region (KPCR) in kernel mode. Thus, for example, in user mode GS:0 is the address of the first member of the Thread Information Block. Maintaining this convention made the x86-64 port easier, but required AMD to retain the function of the FS and GS segments in long mode — even though segmented addressing per se is not really used by any modern operating system.
- Early reports claimed that the operating system scheduler would not save and restore the x87 FPU machine state across thread context switches. Observed behavior shows that this is not the case: the x87 state is saved and restored, except for kernel-mode-only threads (a limitation that exists in the 32-bit version as well). The most recent documentation available from Microsoft states that the x87/MMX/3DNow! instructions may be used in long mode, but that they are deprecated and may cause compatibility problems in the future.
- Some components like Microsoft Jet Database Engine and Data Access Objects will not be ported to 64-bit architectures such as x86-64 and IA-64.
- Microsoft Visual Studio can compile native applications to target only the x86-64 architecture, that will run only on 64-bit Microsoft Windows, or the x86 architecture which will run as a 32-bit application on 32-bit Microsoft Windows or 64-bit Microsoft Windows in WoW64 emulation mode. Managed applications can be compiled either in x86, x86-64 or AnyCPU modes. While the first two modes behave like their x86 or x86-64 native code counterparts respectively, when using the AnyCPU mode, the applications runs as a 32-bit application in 32-bit Microsoft Windows or as a 64-bit application in 64-bit Microsoft Windows.
Industry naming conventions
Since AMD64 and Intel 64 are substantially similar, many software and hardware products use one vendor-neutral term to indicate their compatibility with both implementations. AMD's original designation for this processor architecture, "x86-64", is still sometimes used for this purpose, as is the variant "x86_64". Other companies, such as Microsoft and Sun Microsystems, use the contraction "x64" in marketing material.
Many operating systems and products, especially those that introduced x86-64 support prior to Intel's entry into the market, use the term "AMD64" or "amd64" to refer to both AMD64 and Intel 64. x86-64 versions of Windows use the AMD64 moniker internally to designate various components which use or are compatible with this architecture. For example, the system folder on a Windows x86-64 Edition installation CD-ROM is named "AMD64", in contrast to "i386" in 32-bit versions.
- AMD's free technical documentation for the AMD64 architecture
- AMD's AMD64 documentation on CD-ROM (U.S. and Canada only) and downloadable PDF format
- AMD64 Technology: Overview of the AMD64 Architecture (PDF)
- x86-64: Extending the x86 architecture to 64-bits - technical talk by the architect of AMD64 (video archive), and second talk by the same speaker (video archive)
- AMD's "Enhanced Virus Protection"
- Intel tweaks EM64T for full AMD64 compatibility
- Analyst: Intel Reverse-Engineered AMD64
- Early report of differences between Intel IA32e and AMD64
- Porting to 64-bit GNU/Linux Systems, by Andreas Jaeger from GCC Summit 2003 . An excellent paper explaining almost all practical aspects for a transition from 32-bit to 64-bit.
- Tech Report article: 64-bit computing in theory and practice
- Intel 64 Architecture
- Intel® Software Network, 64 bits"
- TurboIRC.COM tutorial of entering the protected and the long mode the raw way from DOS
- Optimization of 64-bit programs
- Seven Steps of Migrating a Program to a 64-bit System
- Memory Limits for Windows Releases
- Intel® Software Network, Forum "64-Bit Programming"