Concatenative topics
Concatenative meta
Other languages
Meta
This is a quick reference for Intel's Streaming SIMD Extensions. Feel free to make additions or corrections!
The vector types here are named with the same convention as in Factor's SIMD library. It should be obvious what they mean:
The number next to each instruction is the SSE version:
char-16 | uchar-16 | short-8 | ushort-8 | int-4 | uint-4 | longlong-2 | ulonglong-2 | float-4 | double-2 | |
move* | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOV[AU]PS 1 | MOV[AU]PD 2 |
add | PADDB 2 | PADDB 2 | PADDW 2 | PADDW 2 | PADDD 2 | PADDD 2 | PADDQ 2 | PADDQ 2 | ADDPS 1 | ADDPD 2 |
subtract | PSUBB 2 | PSUBB 2 | PSUBW 2 | PSUBW 2 | PSUBD 2 | PSUBD 2 | PSUBQ 2 | PSUBQ 2 | SUBPS 1 | SUBPD 2 |
saturated add | PADDSB 2 | PADDUSB 2 | PADDSW 2 | PADDUSW 2 | ||||||
saturated subtract | PSUBSB 2 | PSUBUSB 2 | PSUBSW 2 | PSUBUSW 2 | ||||||
add-subtract | ADDSUBPS 3 | ADDSUBPD 3 | ||||||||
horizontal add | PHADDW 3.3 | PHADDW 3.3 | PHADDD 3.3 | PHADDD 3.3 | HADDPS 3 | HADDPD 3 | ||||
multiply | PMULLW 2 | PMULLW 2 | PMULLD 4.1 | PMULLD 4.1 | MULPS 1 | MULPD 2 | ||||
divide | DIVPS 1 | DIVPD 2 | ||||||||
absolute value | PABSB 3.3 | PABSW 3.3 | PABSD 3.3 | |||||||
minimum | PMINSB 4.1 | PMINUB 2 | PMINSW 2 | PMINUW 4.1 | PMINSD 4.1 | PMINUD 4.1 | MINPS 1 | MINPD 2 | ||
maximum | PMAXSB 4.1 | PMAXUB 2 | PMAXSW 2 | PMAXUW 4.1 | PMAXSD 4.1 | PMAXUD 4.1 | MAXPS 1 | MAXPD 2 | ||
approx reciprocal | RCPPS 1 | |||||||||
square root | SQRTPS 1 | SQRTPD 2 | ||||||||
comparison | PCMPxxB† 2 | PCMPxxB† 2 | PCMPxxW† 2 | PCMPxxW† 2 | PCMPxxD† 2 | PCMPxxD† 2 | CMPxxxPS‡ 1 | CMPxxxPD‡ 2 | ||
bitwise and | PAND 2 | PAND 2 | PAND 2 | PAND 2 | PAND 2 | PAND 2 | PAND 2 | PAND 2 | ANDPS 1 | ANDPD 2 |
bitwise and-not | PANDN 2 | PANDN 2 | PANDN 2 | PANDN 2 | PANDN 2 | PANDN 2 | PANDN 2 | PANDN 2 | ANDNPS 1 | ANDNPD 2 |
bitwise or | POR 2 | POR 2 | POR 2 | POR 2 | POR 2 | POR 2 | POR 2 | POR 2 | ORPS 1 | ORPD 2 |
bitwise xor | PXOR 2 | PXOR 2 | PXOR 2 | PXOR 2 | PXOR 2 | PXOR 2 | PXOR 2 | PXOR 2 | XORPS 1 | XORPD 2 |
load mask | PMOVMSKB 2 | PMOVMSKB 2 | PMOVMSKB 2 | PMOVMSKB 2 | PMOVMSKB 2 | PMOVMSKB 2 | PMOVMSKB 2 | PMOVMSKB 2 | MOVMSKPS 1 | MOVMSKPD 2 |
shift left | PSLLW 2 | PSLLW 2 | PSLLD 2 | PSLLD 2 | PSLLQ 2 | PSLLQ 2 | ||||
shift right | PSRAW 2 | PSRLW 2 | PSRAD 2 | PSRLD 2 | PSRLQ 2 | |||||
unpack low | PUNPCKLBW 2 | PUNPCKLBW 2 | PUNPCKLWD 2 | PUNPCKLWD 2 | PUNPCKLDQ 2 | PUNPCKLDQ 2 | PUNPCKLQDQ 2 | PUNPCKLQDQ 2 | UNPCKLPS 1 | UNPCKLPD 2 |
unpack high | PUNPCKHBW 2 | PUNPCKHBW 2 | PUNPCKHWD 2 | PUNPCKHWD 2 | PUNPCKHDQ 2 | PUNPCKHDQ 2 | PUNPCKHQDQ 2 | PUNPCKHQDQ 2 | UNPCKHPS 1 | UNPCKHPD 2 |
static shuffle§ | PSHUF[HL]W‖ 2 | PSHUF[HL]W‖ 2 | PSHUFD 2 | PSHUFD 2 | PSHUFD 2 | PSHUFD 2 | SHUFPS¶ 1 | SHUFPD¶ 2 | ||
variable shuffle | PSHUFB 3.3 | PSHUFB 3.3 | PSHUFB 3.3 | PSHUFB 3.3 | PSHUFB 3.3 | PSHUFB 3.3 | PSHUFB 3.3 | PSHUFB 3.3 | ||
static blend | PBLENDW 4.1 | PBLENDW 4.1 | PBLENDW 4.1 | PBLENDW 4.1 | PBLENDW 4.1 | PBLENDW 4.1 | BLENDPS 4.1 | BLENDPD 4.1 | ||
variable blend# | PBLENDVB 4.1 | PBLENDVB 4.1 | PBLENDVB 4.1 | PBLENDVB 4.1 | PBLENDVB 4.1 | PBLENDVB 4.1 | PBLENDVB 4.1 | BLENDVPS 4.1 | BLENDVPD 4.1 |
Notes:
punpckldq xmm0, xmm1 ; xmm0 => ? ? 1 0 punpckldq xmm2, xmm3 ; xmm2 => ? ? 3 2 punpcklqdq xmm0, xmm2 ; xmm0 => 3 2 1 0
unpcklps dst, src2 unpcklps src3, src4 movlhps dst, src3
movss dst, src shufps dst, dst, 0x0
movaps xmm1, xmm0 shufps xmm0, xmm1, 0xb1 addps xmm0, xmm1 movaps xmm1, xmm0 shufps xmm0, xmm0, 0x0a addps xmm0, xmm1
order | code |
0 0 2 2 | movsldup dst, src |
1 1 3 3 | movshdup dst, src |
0 1 0 1 | movlhps dst, dst |
2 3 2 3 | movhlps dst, dst |
0 0 1 1 | unpcklps dst, dst |
2 2 3 3 | unpckhps dst, dst |
movsd dst, src1 unpcklpd dst, src2
movddup dst, src
movapd xmm1, xmm0 unpckhpd xmm1, xmm1 addsd xmm0, xmm1
order | code |
0 0 | unpcklpd dst, dst or movddup dst, src |
1 1 | unpckhpd dst, dst |
For full details, consult Intel's or AMD's instruction set reference documentation.
This revision created on Mon, 28 Sep 2009 18:39:01 by jckarter