World Library  
Flag as Inappropriate
Email this Article

FMA instruction set

Article Id: WHEBN0023095212
Reproduction Date:

Title: FMA instruction set  
Author: World Heritage Encyclopedia
Language: English
Subject: XOP instruction set, F16C, Bit Manipulation Instruction Sets, Advanced Vector Extensions, Advanced Synchronization Facility
Publisher: World Heritage Encyclopedia

FMA instruction set

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.[1] There are two variants:

New instructions

FMA3 and FMA4 instructions have almost identical functionality but are not compatible. Both contain fused multiply–add (FMA) instructions for floating point scalar and SIMD operations, but FMA3 instructions have three operands while FMA4 ones have four. The FMA operation has the form d = round(a × b + c) where the round function performs a rounding to allow the result to fit within the destination register if there are too many significant bits to fit within the destination.

The 4-operand form (FMA4) allows a, b, c and d to be four different registers, while the 3-operand form (FMA3) requires that d be the same register as a, b or c. The 3-operand form makes the code shorter and the hardware implementation slightly simpler while the 4-operand form provides more programming flexibility.

See XOP instruction set for more discussion of compatibility issues between Intel and AMD.

FMA3 instruction set

CPUs with FMA3

  • Intel
  • AMD
    • AMD introduced FMA3 support in processors starting with Piledriver architecture for compatibility reasons.[2][3] The 2nd generation APU processors based on "Trinity" (32nm) supporting FMA3 instructions were launched May 15, 2012. The 2nd generation Bulldozer processors with Piledriver cores supporting FMA3 instructions were launched October 23, 2012.

Excerpt from FMA3

Mnemonic (AT&T) Operands Operation
VFMADD132PDy ymm, ymm, ymm/m256 $0 = $0×$2 + $1
VFMADD132PDx xmm, xmm, xmm/m128
VFMADD132SD xmm, xmm, xmm/m64
VFMADD132SS xmm, xmm, xmm/m32
VFMADD213PDy ymm, ymm, ymm/m256 $0 = $1×$0 + $2
VFMADD213PDx xmm, xmm, xmm/m128
VFMADD213SD xmm, xmm, xmm/m64
VFMADD213SS xmm, xmm, xmm/m32
VFMADD231PDy ymm, ymm, ymm/m256 $0 = $1×$2 + $0
VFMADD231PDx xmm, xmm, xmm/m128
VFMADD231SD xmm, xmm, xmm/m64
VFMADD231SS xmm, xmm, xmm/m32

FMA4 instruction set

CPUs with FMA4

  • AMD
  • Intel
    • It is uncertain whether future Intel processors will support FMA4, due to Intel's announced change to FMA3.

Excerpt from FMA4

Mnemonic (AT&T) Operands Operation
VFMADDPDx xmm, xmm, xmm/m128, xmm/m128 $0 = $1×$2 + $3
VFMADDPDy ymm, ymm, ymm/m256, ymm/m256
VFMADDPSx xmm, xmm, xmm/m128, xmm/m128
VFMADDPSy ymm, ymm, ymm/m256, ymm/m256
VFMADDSD xmm, xmm, xmm/m64, xmm/m64
VFMADDSS xmm, xmm, xmm/m32, xmm/m32


The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:

  • August 2007: AMD announces the SSE5 instruction set, which includes 3-operand FMA instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands.[6]
  • April 2008: Intel announces their AVX and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the new VEX coding scheme which is more flexible than AMD's DREX scheme. (Section requires an actual source, Intel sources are not acceptable for debatable specifics.)[7]
  • December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.[8]
  • May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.[9]
  • October 2011: AMD Bulldozer processor supports FMA4.[10]
  • January 2012: AMD announces FMA3 support in future processors codenamed Trinity and Vishera; they are based on the Piledriver architecture.[11]
  • May 2012: AMD Piledriver processor supports both FMA3 and FMA4.[10]
  • June 2013: Intel Haswell processor supports FMA3.[12]

It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future.

Compiler and assembler support

Different compilers provide different levels of support for FMA4:

  • GCC supports FMA4 with -mfma4 since version 4.5.0[13] and FMA3 with -mfma since version 4.7.0.
  • Microsoft Visual C++ 2010 SP1 supports FMA4 instructions.[14]
  • Microsoft Visual C++ 2012 supports FMA3 instructions (if the processor also supports AVX2 instruction set extension).
  • PathScale supports FMA4 with -mfma.[15]
  • LLVM 3.1 adds FMA4 support.[16]
  • Open64 5.0 adds "limited support".
  • Intel compilers support only FMA3 instructions.[13]
  • NASM supports FMA3 instructions since version 2.03 and FMA4 instructions since 2.06.
  • Yasm supports FMA3 instructions since version 0.8.0 and FMA4 instructions since version 1.0.0.
  • FASM supports both FMA3 and FMA4 instructions.


  1. ^ "FMA3 and FMA4 are not instruction sets, they are individual instructions -- fused multiply add. They could be quite useful depending on how Intel and AMD implement them" Woltmann, George (Prime95). "Intel AVX and GIMPS". Great Internet Mersenne Prime Search (GIMPS) project. Retrieved 27 July 2011. 
  2. ^ "Striking a balance". Dave Christie, AMD Developer blogs. May 7, 2009. Retrieved 2009-05-08. 
  3. ^ Maffeo, Robin. "AMD and the Visual Studio 11 Beta". AMD. Retrieved 19 April 2012. 
  4. ^ "AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions".  
  5. ^ "New "Bulldozer" and "Piledriver" Instructions A step forward for high performance software development".  
  6. ^ "128-Bit SSE5 Instruction Set".  
  7. ^ "Intel Advanced Vector Extensions Programming Reference".  
  8. ^ "Intel Advanced Vector Extensions Programming Reference".  
  9. ^ "Striking a balance". Dave Christie, AMD Developer blogs. May 7, 2009. Retrieved 2009-05-08. 
  10. ^ a b "New Bulldozer and Piledriver Instructions". AMD. Retrieved 25 July 2013. 
  11. ^ "Software Optimization Guide for AMD Family 15h Processors". AMD. Retrieved 19 April 2012. 
  12. ^ "Intel Architecture Instruction Set Extensions Programming Reference". Intel. Retrieved 25 July 2013. 
  13. ^ a b Latif, Lawrence (Nov 14, 2011). "AMD Bulldozer only FMA4 and XOP instructions are supported by GCC Intel still mute". The Inquirer. 
  14. ^ "FMA4 Intrinsics Added for Visual Studio 2010 SP1". 
  15. ^ "EKOPath man doc". 
  16. ^ "LLVM 3.1 Release Notes". 
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.