3DSoftware.com > ECE > MPSoC > Architecture Advances 

Architecture Innovations

This article is part of our series on MPSoCs and General Computing.
 
Previous Page:
Hardware Innovations
 
Wrapping RISC Microcode
As mentioned earlier, microcode should be invisible to programmers after the computer is manufactured. Programmers will use each CPU's macro instruction set or higher level languages.
 
If customers were to use microcode, then CPU manufacturers would be under pressure to make microcode easier to use. But making microcode easier to use makes it less machine efficient. Such microcode is said to be vertical, and is oriented toward corresponding to instructions in the macro instruction set.
 
It is better to have horizontal microcode, which is oriented toward generating signals at the machine level.
 
“Horizontal microcode allows the hardware to run faster, but is more difficult to program.” [ 1 ]
References:
 
1.   Douglas E. Comer, Essentials of Computer Architecture (Pearson, 2005), p. 53, 100.
 
There has been interest in customizing the macro instruction set of CPUs. For example, if a CPU is purchased to be used in video processing, while the same CPU is also purchased separately to perform text processing (pattern matching), the CPU manufacturer can implement two different macro instruction sets with different underlying microcode – a different ISA for each of the two otherwise identical CPUs.
 
Changing the macro instruction set of a CPU requires that the microcode of the CPU be changed. A new way of doing this is emerging. CPU companies like Tensilica are developing programming interfaces that can be used to create new macro instructions for their models of CPUs. Such an interface, while affecting but not exposing microcode, can go even deeper and alter the physical design of the actual CPU. This is done inexpensively just before the CPU is manufactured.
 
A computer manufacturer can order CPU chips to be tailored to work with certain size memory blocks and other components. Efficiency increases sharply, but costs remain low.
 
For now, we will call this type of wrapper a “Processor Definition Wrapper” (PDW).
 

 
Registers
Registers are high speed memory blocks that are right on the CPU chip (said to be “on-chip”). An example of how registers are used is adding two numbers. First the numbers are loaded into registers, then an ADD instruction is issued, and the sum is put in another register. Then that sum is moved to other memory as needed.
 
Sets of registers are defined as “register files.” A register file simply consists of equal size registers one right after the other, like records in a file.
 
Registers are used by the macro instruction set to hold data operands. You can actually have more than one set of registers.
 
Using Tensilica's PDW language, which is called Tensilica Instruction Extension (TIE), a set of registers is simply defined with the following statement: [ 1 ]
 
regfile   MyLongInt   16   128   1
 
That defines a set of 16 registers, with each register 128 bits wide, to hold a data type we are inventing called “MyLongInt.” Another line (not shown here – see the book) creates a macro instruction that adds numbers of that data type using those registers.
1.   Chris Rowen, “Performance and Flexibility for MPSoC Design,” in Jerraya & Wolf (Eds.) Multiprocessor Systems-On-Chips (MPSoCs) (2005), p. 134.
 
When you declare a variable with that data type (MyLongInt) in your application software, the C compiler automatically stores that variable in one of those 16 registers. And you can access the custom macro instruction directly in your C source code, as if the macro instruction was a subroutine (C function).
 
Those 16 registers are automatically created on the CPU chip when the chip is manufactured. This will be in addition to any other sets of registers (register files) that are defined for that CPU.
 

 
On-Chip Memory
On-chip memory has much faster CPU access than other memory, and can be registers, caches, or scratch pad memory. On-chip memory should be in medium-sized blocks, not a single large monolithic block of memory, but not a large number of tiny blocks either. [ 1 ]  (To prevent the latter problem, a processor definition wrapper could combine small defined blocks.)
 
Scratch pad memory (SPM) is for miscellaneous use, and one brand of processors calls it tightly coupled memory (TCM). [ 2 ]  SPM can be Static RAM or less expensive Dynamic RAM. Even if it is DRAM, it is faster and more power-efficient than off-chip DRAM. [ 3 ]
 
An important use of scratch pad memory is in combination with a cache. The cache is a separate memory block. When the CPU needs data, it looks for the data in the cache and SPM. The off-chip memory is accessed only if the data is not found in the cache or SPM. [ 4 ]
 
That way, if you want something to always be cached, simply put it in scratch pad memory. For example, if you are using a matrix of numbers to perform a noticable operation regularly (but not often enough for the matrix to stay in the cache), you can put that matrix in the SPM to ensure it will always be on-chip for quicker access (until you remove it from SPM).
1.   Mahmut Kandemir and Nikil Dutt, “Memory Systems and Compiler Support for MPSoC Architectures,” in MPSoCs, p. 265.
 
2.   Eric Rotenberg and Aravindh Anantaraman, “Architecture of Embedded Microprocessors,” in MPSoCs, p. 101, 106.
 
3.   Kandemir & Dutt, p. 254.
 
4.   Kandemir & Dutt, p. 255.
 
 
Consider a classic problem in graphics programming: speed optimization of trigonometric functions. To speed up trigonometric calculations for low to medium precision application, you can precalculate sines and cosines and store those values in a lookup table. In game programming:
 
“A 256-entry floating-point table takes 1K, which should easily stay within cache for the duration of your inner loops … Larger lookup tables increase the accuracy of your results, but will hurt cache performance. ” [ 1 ]
 
This problem is solved by using scratch pad memory: you could use more entries in the lookup table, yet assure it is not thrashed (swapped out when needed).
1.   Yossarian King, “Floating-Point Tricks,” Game Programming Gems 2, p. 175.
 

 
 
Next Page:
Software
 
 
Copyright © 2010 by 3D Software. All rights reserved.
3D Software, P.O. Box 221190, Sacramento CA 95822 USA
www.3DSoftware.com     Contact us
Thursday, 11-Mar-2010 01:58:03 GMT