SSE

This is a quick reference for Intel's Streaming SIMD Extensions. Feel free to make additions or corrections!

Vector types

The vector types here are named with the same convention as in Factor's SIMD library. It should be obvious what they mean:

char-16
uchar-16
short-8
ushort-8
int-4
uint-4
float-4
double-2

Instruction set

The number next to each instruction is the SSE version:

1: SSE
2: SSE2
3: SSE3
3.3: SSSE3
4.1: SSE4.1
4.2: SSE4.2

	char-16	uchar-16	short-8	ushort-8	int-4	uint-4	float-4	double-2
move	MOVDQ[AU] 2	MOVDQ[AU] 2	MOVDQ[AU] 2	MOVDQ[AU] 2	MOVDQ[AU] 2	MOVDQ[AU] 2	MOV[AU]PS 1	MOV[AU]PD 2
add	PADDB 2	PADDB 2	PADDW 2	PADDW 2	PADDD 2	PADDD 2	ADDPS 1	ADDPD 2
subtract	PSUBB 2	PSUBB 2	PSUBW 2	PSUBW 2	PSUBD 2	PSUBD 2	SUBPS 1	SUBPD 2
add with saturation	PADDSB 2	PADDUSB 2	PADDSW 2	PADDUSW 2
subtract with saturation	PSUBSB 2	PSUBUSB 2	PSUBSW 2	PSUBUSW 2
add-subtract							ADDSUBPS 3	ADDSUBPD 3
horizontal add			PHADDW 3.3	PHADDW 3.3	PHADDD 3.3	PHADDW 3.3	HADDPS 3	HADDPS 3
multiply			PMULLW 2	PMULLW 2	PMULLD 2	PMULLD 2	MULPS 1	MULPD 2
divide							DIVPS 1	DIVPD 2
absolute value	PABSB 3.3		PABSW 3.3		PABSD 3.3
minimum		PMINUB 2	PMINSW 2				MINPS 1	MINPD 2
maximum		PMAXUB 2	PMAXSW 2				MAXPS 1	MAXPD 2
approximate reciprocal							RCPPS 1
square root							SQRTPS 1	SQRTPD 2

Notes:

There are many more instructions that do not fit in this grid, but these are the most important ones to know.
Every move instruction has an aligned and unaligned form. Aligned is faster, but will trap if your address is not a multiple of 16 bytes.

Idioms

int-4

Select nth component

Gather four integers into a vector

float-4

Select nth component

Gather four floats into a vector

Broadcast float into four components

Absolute value

Horizontal add with SSE2

double-2

Select nth component

Gather two doubles into a vector

Broadcast double into two components

Absolute value

Horizontal add with SSE2

References

For full details, consult Intel's or AMD's instruction set reference documentation.

This revision created on Wed, 23 Sep 2009 01:38:41 by jckarter

Contents

SSE

Vector types

Instruction set

Idioms

int-4

Select nth component

Gather four integers into a vector

float-4

Select nth component

Gather four floats into a vector

Broadcast float into four components

Absolute value

Horizontal add with SSE2

double-2

Select nth component

Gather two doubles into a vector

Broadcast double into two components

Absolute value

Horizontal add with SSE2

References