VPGATHERDQ/VPGATHERQQ — Gather Packed Qword Values Using Signed Dword/Qword Indices

Opcode/Instruction Op/En 64/32 -bit Mode CPUID Feature Flag Description
VEX.DDS.128.66.0F38.W1 90 /r VPGATHERDQ xmm1, vm32x, xmm2 RMV V/V AVX2 Using dword indices specified in vm32x, gather qword values from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1.
VEX.DDS.128.66.0F38.W1 91 /r VPGATHERQQ xmm1, vm64x, xmm2 RMV V/V AVX2 Using qword indices specified in vm64x, gather qword values from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1.
VEX.DDS.256.66.0F38.W1 90 /r VPGATHERDQ ymm1, vm32x, ymm2 RMV V/V AVX2 Using dword indices specified in vm32x, gather qword values from memory conditioned on mask specified by ymm2. Conditionally gathered elements are merged into ymm1.
VEX.DDS.256.66.0F38.W1 91 /r VPGATHERQQ ymm1, vm64y, ymm2 RMV V/V AVX2 Using qword indices specified in vm64y, gather qword values from memory conditioned on mask specified by ymm2. Conditionally gathered elements are merged into ymm1.

Instruction Operand Encoding

Op/En Operand 1 Operand 2 Operand 3 Operand 4
A ModRM:reg (r,w) BaseReg (R): VSIB:base, VectorReg(R): VSIB:index VEX.vvvv (r, w) NA

Description

The instruction conditionally loads up to 2 or 4 qword values from memory addresses specified by the memory operand (the second operand) and using qword indices. The memory operand uses the VSIB form of the SIB byte to specify a general purpose register operand as the common base, a vector register for an array of indices relative to the base and a constant scale factor. The mask operand (the third operand) specifies the conditional load operation from each memory address and the corresponding update of each data element of the destination operand (the first operand). Conditionality is specified by the most significant bit of each data element of the mask register. If an element’s mask bit is not set, the corresponding element of the destination register is left unchanged. The width of data element in the destination register and mask register are identical. The entire mask register will be set to zero by this instruction unless the instruction causes an exception. Using dword indices in the lower half of the mask register, the instruction conditionally loads up to 2 or 4 qword values from the VSIB addressing memory operand, and updates the destination register. This instruction can be suspended by an exception if at least one element is already gathered (i.e., if the exception

is triggered by an element other than the rightmost one with its mask bit set). register and the mask operand are partially updated; those elements that have been gathered are placed into the destination register and have their mask bits set to zero. ered elements, they will be delivered in lieu of the exception; in this case, EFLAG.RF is set to one so an instruction breakpoint is not re-triggered when the instruction is continued. If the data size and index size are different, part of the destination register and part of the mask register do not correspond to any elements being gathered. of those registers even if the instruction triggers an exception, and even if the instruction triggers the exception before gathering any elements. When this happens, the destination If any traps or interrupts are pending from already gathIt may do this to one or both
VEX.128 version: The instruction will gather two qword values. vector index register are used. For dword indices, only the lower two indices in the
VEX.256 version: The instruction will gather four qword values. For dword indices, only the lower four indices in

the vector index register are used. Note that:

Operation

DEST ← SRC1;
BASE_ADDR: base register encoded in VSIB addressing;
VINDEX: the vector index register encoded by VSIB addressing;
SCALE: scale factor encoded by SIB:[7:6];
DISP: optional 1, 4 byte displacement;
MASK ← SRC3;
VPGATHERDQ (VEX.128 version)
FOR j← 0 to 1
  i ← j * 64;
  IF MASK[63+i] THEN
     MASK[i +63:i] ← 0xFFFFFFFF_FFFFFFFF; // extend from most significant bit
  ELSE
     MASK[i +63:i] ← 0;
  FI;
ENDFOR
FOR j← 0 to 1
  k ← j * 32;
  i ← j * 64;
  DATA_ADDR ← BASE_ADDR + (SignExtend(VINDEX[k+31:k])*SCALE + DISP;
  IF MASK[63+i] THEN
     DEST[i +63:i] ← FETCH_64BITS(DATA_ADDR); // a fault exits the instruction
  FI;
  MASK[i +63:i] ← 0;
ENDFOR
MASK[VLMAX-1:128] ← 0;
DEST[VLMAX-1:128] ← 0;
(non-masked elements of the mask register have the content of respective element
VPGATHERQQ (VEX.128 version)
FOR j← 0 to 1
  i ← j * 64;
  IF MASK[63+i] THEN
     MASK[i +63:i] ← 0xFFFFFFFF_FFFFFFFF; // extend from most significant bit
  ELSE
     MASK[i +63:i] ← 0;
  FI;
ENDFOR
FOR j← 0 to 1
  i ←j * 64;
  DATA_ADDR ← BASE_ADDR + (SignExtend(VINDEX1[i+63:i])*SCALE + DISP;
  IF MASK[63+i] THEN
     DEST[i +63:i] ← FETCH_64BITS(DATA_ADDR); // a fault exits the instruction
  FI;
  MASK[i +63:i] ← 0;
ENDFOR
MASK[VLMAX-1:128] ← 0;
DEST[VLMAX-1:128] ← 0;
(non-masked elements of the mask register have the content of respective element
VPGATHERQQ (VEX.256 version)
FOR j← 0 to 3
  i ← j * 64;
  IF MASK[63+i] THEN
     MASK[i +63:i] ← 0xFFFFFFFF_FFFFFFFF; // extend from most significant bit
  ELSE
     MASK[i +63:i] ← 0;
  FI;
ENDFOR
FOR j← 0 to 3
  i ← j * 64;
  DATA_ADDR ← BASE_ADDR + (SignExtend(VINDEX1[i+63:i])*SCALE + DISP;
  IF MASK[63+i] THEN
     DEST[i +63:i] ← FETCH_64BITS(DATA_ADDR); // a fault exits the instruction
  FI;
  MASK[i +63:i] ← 0;
ENDFOR
(non-masked elements of the mask register have the content of respective element
VPGATHERDQ (VEX.256 version)
FOR j← 0 to 3
  i ← j * 64;
  IF MASK[63+i] THEN
     MASK[i +63:i] ← 0xFFFFFFFF_FFFFFFFF; // extend from most significant bit
  ELSE
     MASK[i +63:i] ← 0;
  FI;
ENDFOR
FOR j← 0 to 3
  k ← j * 32;
  i ← j * 64;
  DATA_ADDR ← BASE_ADDR + (SignExtend(VINDEX1[k+31:k])*SCALE + DISP;
  IF MASK[63+i] THEN
     DEST[i +63:i] ← FETCH_64BITS(DATA_ADDR); // a fault exits the instruction
  FI;
  MASK[i +63:i] ← 0;
ENDFOR
(non-masked elements of the mask register have the content of respective element

Intel C/C++ Compiler Intrinsic Equivalent

VPGATHERDQ: __m128i _mm_i32gather_epi64 (int64 const * base, __m128i index, const int scale);

VPGATHERDQ: __m128i _mm_mask_i32gather_epi64 (__m128i src, int64 const * base, __m128i index, __m128i mask, const int scale);

VPGATHERDQ: __m256i _mm256_i32gather_epi64 ( int64 const * base, __m128i index, const int scale);

VPGATHERDQ: __m256i _mm256_mask_i32gather_epi64 (__m256i src, int64 const * base, __m128i index, __m256i mask, const int scale);

VPGATHERQQ: __m128i _mm_i64gather_epi64 (int64 const * base, __m128i index, const int scale);

VPGATHERQQ: __m128i _mm_mask_i64gather_epi64 (__m128i src, int64 const * base, __m128i index, __m128i mask, const int scale);

VPGATHERQQ: __m256i _mm256_i64gather_epi64 (int64 const * base, __m256i index, const int scale);

VPGATHERQQ: __m256i _mm256_mask_i64gather_epi64 (__m256i src, int64 const * base, __m256i index, __m256i mask, const int scale);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 12