|  3DSoftware.com > ECE > MPSoC > Architecture Advances |
|
Architecture Innovations
This article is part of our series on MPSoCs and General Computing. |
Previous Page: Hardware Innovations |
|
Wrapping RISC Microcode
As mentioned earlier, microcode should be invisible to programmers after the computer is manufactured. Programmers will use each CPU's macro instruction set or higher level languages. If customers were to use microcode, then CPU manufacturers would be under pressure to make microcode easier to use. But making microcode easier to use makes it less machine efficient. Such microcode is said to be vertical, and is oriented toward corresponding to instructions in the macro instruction set. It is better to have horizontal microcode, which is oriented toward generating signals at the machine level.
|
Footnotes:
|
|
There has been interest in customizing the macro
instruction set of CPUs. For example, if a CPU is
purchased to be used in video processing, while
the same CPU is also purchased separately
to perform text processing (pattern matching),
the CPU manufacturer
can implement two different macro instruction
sets with different underlying
microcode a different ISA
for each of the two otherwise
identical CPUs. Changing the macro instruction set of a CPU requires that the microcode of the CPU be changed. A new way of doing this is emerging. CPU companies like Tensilica are developing programming interfaces that can be used to create new macro instructions for their models of CPUs. Such an interface, while affecting but not exposing microcode, can go even deeper and alter the physical design of the actual CPU. This is done inexpensively just before the CPU is manufactured. A computer manufacturer can order CPU chips to be tailored to work with certain size memory blocks and other components. Efficiency increases sharply, but costs remain low. For now, we will call this type of wrapper a Processor Definition Wrapper (PDW). |
|
Registers
Registers are high speed memory blocks that are right on the CPU chip (said to be Sets of registers are defined as register files. A register file simply consists of equal size registers one right after the other, like records in a file. |
|
Registers are used by the macro instruction set to
hold data operands. You can actually have more
than one set of registers. Using Tensilica's PDW language, which is called Tensilica Instruction Extension (TIE), a set of registers is simply defined with the following statement:
That defines a set of 16 registers, with each register 128 bits wide, to hold a data type we are inventing called MyLongInt. Another line (not shown here see the book) creates a macro instruction that adds numbers of that data type using those registers. |
|
|
When you declare a variable with that data type (MyLongInt)
in your application software,
the C compiler automatically stores that variable in one
of those 16 registers. And you can access the custom
macro instruction directly in your C source code,
as if the macro instruction was a subroutine
Those 16 registers are automatically created on the CPU chip when the chip is manufactured. This will be in addition to any other sets of registers (register files) that are defined for that CPU. |
|
On-Chip Memory
On-chip memory has much faster CPU access than other memory, and can be registers, caches, or scratch pad memory. On-chip memory should be in medium-sized blocks, not a single large monolithic block of memory, but not a large number of tiny blocks either. Scratch pad memory (SPM) is for miscellaneous use, and one brand of processors calls it tightly coupled memory (TCM). An important use of scratch pad memory is in combination with a cache. The cache is a separate memory block. When the CPU needs data, it looks for the data in the cache and SPM. The off-chip memory is accessed only if the data is not found in the cache or SPM. That way, if you want something to always be cached, simply put it in scratch pad memory. For example, if you are using a matrix of numbers to perform a noticable operation regularly (but not often enough for the matrix to stay in the cache), you can put that matrix in the SPM to ensure it will always be on-chip for quicker access (until you remove it from SPM). |
|
|||||||||||||||
|
|
Consider a classic problem in graphics programming:
speed optimization of trigonometric functions.
To speed up trigonometric calculations
for low to medium precision application,
you can precalculate sines and cosines
and store those values in a lookup table.
In game programming:
This problem is solved by using scratch pad memory: you could use more entries in the lookup table, yet assure it is not thrashed (swapped out when needed). |
|
|
Next Page: Software |
|
Copyright © 2008 by 3D Software. All rights reserved. 3D Software, P.O. Box 221190, Sacramento CA 95822 USA www.3DSoftware.com Contact us |
| Thursday, 28-Aug-2008 04:18:35 GMT |