3DSoftware.com > Programming > SIMD  

SIMD

Single Instruction Multiple Data
SIMD is Single Instruction Multiple Data. This refers to each machine instruction operating on more than one piece of data at once.
 
The data are transformed in parallel, instead of sequentially. This parallel processing requires that the data be in vectors.
 
A vector is a set of things. A single thing is a scalar.
 
A single number is a scalar. A set of numbers is a vector.
 
For example, here is a single number:   5
That is a scalar.
 
Here is a set of numbers:   5   18   3   2
That is a vector of four scalars.
 
A vector can consist of only one scalar. For some SIMD instructions, a data vector may contain only one scalar. You could say that is scalar processing. But from the SIMD point of view, it is vector processing, just a special case where the vector only has one scalar in it.
 
For SIMD processing, you simply write programs for the computer's ISA:
 
“The defining trait of general-purpose processors is a high degree of programmability via a general instruction-set architecture (ISA)…The ISA is a contract between hardware and software” [ 1 ]
 
This makes SIMD programming for ISA much more portable than other types of programming including non-ISA SIMD programming and CISC programming.
 
“This is in contrast to domain-specific processors such as network processors and graphics processors” [ 2 ]
 
“unlike vector processors, superscalar processors allow more sophisticated mixtures of operations to efficiently load their pipelined units than just basic array operations. But normally such mixtures are rather specific for each superscalar processor and therefore are not portable.” [ 3 ]
References:
 
1.   Rotenberg & Anantaraman, Architecture of Embedded Microprocessors, Ch. 4 in Multiprocessor Systems-on-Chips, Jerraya & Wolf eds., p. 83-84.
 
2.   Ibid.
 
3.   A.L. Lastovetsky, Parallel Computing on Heterogeneous Networks, p. 22.
 

 
Getting Started with SIMD
Advanced Micro Devices supports on-chip SIMD processing in all processor modes. You can get started by writing SIMD programs for computers that use AMD processors. The complete programming manuals for the ISA of AMD-based computers are available at the AMD web site (free download):
 
http://developer.Amd.com/devguides.jsp#Manuals
 
Read those manuals even if you are not using an AMD-based system. Of particular interest is the Architecture Programmer's Manual Volume 4: 128-Bit Media Instructions. Also of interest are volumes 1 and 3.
 
Volume 1 gives an overview of SIMD programming in Section 4: “128-Bit Media and Scientific Programming”. Each vector in SIMD processing is 128 bits wide (16 bytes).
 
Volume 3 covers general purpose programming. Volume 4 describes every SIMD instruction.
 
The following is an example program for C/C++:
 
typedef struct tEIGHT_UINTS {
    unsigned int u[4];
    unsigned int v[4];
} EIGHT_UINTS;
EIGHT_UINTS a = { 1, 2, 3, 4, 5, 6, 7, 8 };
_asm {
    lea esi, a
    movdqu xmm0, xmmword ptr [esi]
    movdqu xmm1, xmmword ptr [esi+16]
    paddd  xmm0, xmm1
    movdqu xmmword ptr [esi], xmm0
}
for ( int i = 0; i < 4; i++ )
    printf( " %d", a.u[i] );
 
You could also use intrinsics instead of assembler. Do not proceed to write other SIMD programs until you get a program like this sample program to display the correct answer. The correct answer for this example is ( 6 8 10 12 ).
 

 
Microsoft Packet Order
In assembly language, AMD uses Microsoft (backwards) packet ordering to be compatible with Microsoft Windows. The following example shows how digit packets can be shifted in assembly language:
 
FOUR_UINTS a = { 1, 2, 3, 4 };
_asm {
    movdqu xmm0, a  ; xmm0 = 4 3 2 1
    psrldq xmm0, 8  ; xmm0 = 0 0 4 3
    movdqu a, xmm0
}
for ( int i = 0; i < 4; i++ )
    printf( " %d", a.u[i] );
// Output:
// 3 4 0 0
 
 
Copyright © 2010 by 3D Software. All rights reserved.
3D Software, P.O. Box 221190, Sacramento CA 95822 USA
www.3DSoftware.com     Contact us
Saturday, 31-Jul-2010 16:12:03 GMT