# SSE

This is a quick reference for Intel's Streaming SIMD Extensions. Feel free to make additions or corrections!

# Vector types

The vector types here are named with the same convention as in Factor's SIMD library. It should be obvious what they mean:

- char-16
- uchar-16
- short-8
- ushort-8
- int-4
- uint-4
- float-4
- double-2

# Instruction set

The number next to each instruction is the SSE version:

| char-16 | uchar-16 | short-8 | ushort-8 | int-4 | uint-4 | float-4 | double-2 |

move | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOVDQ[AU] 2 | MOV[AU]PS 1 | MOV[AU]PD 2 |

add | PADDB 2 | PADDB 2 | PADDW 2 | PADDW 2 | PADDD 2 | PADDD 2 | ADDPS 1 | ADDPD 2 |

subtract | PSUBB 2 | PSUBB 2 | PSUBW 2 | PSUBW 2 | PSUBD 2 | PSUBD 2 | SUBPS 1 | SUBPD 2 |

add with saturation | PADDSB 2 | PADDUSB 2 | PADDSW 2 | PADDUSW 2 | | | | |

subtract with saturation | PSUBSB 2 | PSUBUSB 2 | PSUBSW 2 | PSUBUSW 2 | | | | |

add-subtract | | | | | | | ADDSUBPS 3 | ADDSUBPD 3 |

horizontal add | | | PHADDW 3.3 | PHADDW 3.3 | PHADDD 3.3 | PHADDW 3.3 | HADDPS 3 | HADDPS 3 |

multiply | | | PMULLW 2 | PMULLW 2 | PMULLD 2 | PMULLD 2 | MULPS 1 | MULPD 2 |

divide | | | | | | | DIVPS 1 | DIVPD 2 |

absolute value | PABSB 3.3 | | PABSW 3.3 | | PABSD 3.3 | | | |

minimum | | PMINUB 2 | PMINSW 2 | | | | MINPS 1 | MINPD 2 |

maximum | | PMAXUB 2 | PMAXSW 2 | | | | MAXPS 1 | MAXPD 2 |

approximate reciprocal | | | | | | | RCPPS 1 | |

square root | | | | | | | SQRTPS 1 | SQRTPD 2 |

Notes:

- There are many more instructions that do not fit in this grid, but these are the most important ones to know.
- Every move instruction has an aligned and unaligned form. Aligned is faster, but will trap if your address is not a multiple of 16 bytes.

# Idioms

## int-4

### Select nth component

### Gather four integers into a vector

## float-4

### Select nth component

### Gather four floats into a vector

### Broadcast float into four components

### Absolute value

### Horizontal add with SSE2

## double-2

### Select nth component

### Gather two doubles into a vector

### Broadcast double into two components

### Absolute value

### Horizontal add with SSE2

# References

For full details, consult Intel's or AMD's instruction set reference documentation.

*This revision created on Wed, 23 Sep 2009 01:38:41 by jckarter
*