3DSoftware.com > Programming > SIMD
SIMD
 
Single Instruction Multiple Data
 
SIMD is Single Instruction Multiple Data. This refers to each machine instruction operating on more than one piece of data at once.
 
The data are transformed in parallel, instead of sequentially. This parallel processing requires that the data be in vectors.
 
A vector is a set of things. A single thing is a scalar.
 
A single number is a scalar. A set of numbers is a vector.
 
For example, here is a single number:   5
That is a scalar.
 
“The instruction stream is defined as the sequence of instructions performed by the processing unit. The data stream is defined as the data traffic exchanged between the memory and the processing unit.”
—  El-Rewini & Abd-El-Barr¹

“The vector operand is an ordered set of scalars that is generally located on a vector register.”
—  Lastovetsky²
 
Here is a set of numbers:   5   18   3   2
That is a vector of four scalars.
 
A vector can consist of only one scalar. For some SIMD instructions, a data vector may contain only one scalar. You could say that is scalar processing. But from the SIMD point of view, it is vector processing, just a special case where the vector only has one scalar in it.
 
For SIMD processing, you simply write programs for the computer's ISA:
 
“The defining trait of general-purpose processors is a high degree of programmability via a general instruction-set architecture (ISA)…The ISA is a contract between hardware and software” [ 3 ]
 
This makes SIMD programming for ISA much more portable than other types of programming including non-ISA SIMD programming and CISC programming.
 
“This is in contrast to domain-specific processors such as network processors and graphics processors” [ 4 ]
 
“unlike vector processors, superscalar processors allow more sophisticated mixtures of operations to efficiently load their pipelined units than just basic array operations. But normally such mixtures are rather specific for each superscalar processor and therefore are not portable.” [ 5 ]
References:
 
1.   H. El-Rewini & M. Abd-El-Barr, Advanced Computer Architecture and Parallel Processing, p. 4.
 
2.   A.L. Lastovetsky, Parallel Computing on Heterogeneous Networks, p. 15.
 
3.   Rotenberg & Anantaraman, Architecture of Embedded Microprocessors, Ch. 4 in Multiprocessor Systems-on-Chips, Jerraya & Wolf eds., p. 83-84.
 
4.   Ibid.
 
5.   Lastovetsky, op.cit., p. 22.
 

 
Getting Started with SIMD
 
Advanced Micro Devices supports on-chip SIMD processing in all processor modes, including 32-bit Windows XP. You can get started by writing SIMD programs for computers that use AMD processors. The complete programming manuals for the ISA of AMD-based computers are available at the AMD web site (free download):
 
http://developer.Amd.com/devguides.jsp#Manuals
 
Read those manuals even if you are not using an AMD-based system. Of particular interest is the Architecture Programmer's Manual Volume 4: 128-Bit Media Instructions. Also of interest are volumes 1 and 3.
 
Volume 1 gives an overview of SIMD programming in Section 4: “128-Bit Media and Scientific Programming”. Each vector in SIMD processing is 128 bits wide (16 bytes).
 
Volume 3 covers general purpose programming. Volume 4 describes every SIMD instruction.
 
The following is an example program for C/C++:
 
typedef struct tEIGHT_UINTS {
    unsigned int u[4];
    unsigned int v[4];
} EIGHT_UINTS;
EIGHT_UINTS a = { 1, 2, 3, 4, 5, 6, 7, 8 };
_asm {
    lea esi, a
    movdqu xmm0, xmmword ptr [esi]
    movdqu xmm1, xmmword ptr [esi+16]
    paddd  xmm0, xmm1
    movdqu xmmword ptr [esi], xmm0
}
for ( int i = 0; i < 4; i++ )
    printf( " %d", a.u[i] );
 
You could also use intrinsics instead of assembler. Do not proceed to write other SIMD programs until you get a program like this sample program to display the correct answer. The correct answer for this example is ( 6 8 10 12 ).
 

 
Microsoft Packet Order
 
In assembly language, AMD uses Microsoft (backwards) packet ordering to be compatible with Microsoft Windows. The following example shows how digit packets can be shifted in assembly language:
 
FOUR_UINTS a = { 1, 2, 3, 4 };
_asm {
    movdqu xmm0, a  ; xmm0 = 4 3 2 1
    psrldq xmm0, 8  ; xmm0 = 0 0 4 3
    movdqu a, xmm0
}
for ( int i = 0; i < 4; i++ )
    printf( " %d", a.u[i] );
// Output:
// 3 4 0 0
 
 
Copyright © 2008 by 3D Software. All rights reserved.
3D Software, P.O. Box 221190, Sacramento CA 95822 USA
www.3DSoftware.com     Contact us
Sunday, 06-Jul-2008 15:30:45 GMT