Load Store Continued

The LS (Load Store) story has some more twists which i did not mention in the previous post simply because it had already become way too long.

Stack Memory?

RISC machines always come with sufficient amount of registers which in turn results in storing of all the local variables inside registers itself and hardly any stack memory usage.Example code illustrating this phenomenon is given below.

Consider the below given C Code:

int add_on(int a,int b)
int i =0;
for(i =0;i < 10; i++)
a = a+b;

The corresponding BlackFin Assembly code does not use a single stack variable. The register P0 and Loop counter register( LC0 ) takes care of the purpose of stack variable “i”.

P0 = 10;
/* Initialize the Zero Overhead Loop registers */
Lsetup(Loop_Starts,Loop_Ends) LC0 = P0;
/* Start of the loop */
Loop_Starts: Loop_Ends:
R0 = R0 + r1;

Zero What! Overhead Loop?

Compilers also reduce the overhead of context switching between function calls, how? Ideally in between calls, all the registers need not be pushed on to stack, but only those registers which are used inside that particular function is saved and restored. Because the rest  anyway remain untouched and hence the  costly affair of stack memory usage and slow memory access is minimized. Result is a lean stack usage, have seen this feature on BlackFin compilers but yet to see on Starcore though!

Bogus Store

Within the load-store structure, most of the time store follows the load. But sometimes even though in the code we might see a store written before another memory read we really cannot guarantee the sequence in which it might execute .  Let me quote the lucid BlackFin HRM.

“The relaxation of synchronization between memory access instructions
and their surrounding instructions is referred to as weak ordering of loads
and stores. Weak ordering implies that the timing of the actual completion
of the memory operations—even the order in which these events
occur—may not align with how they appear in the sequence of the program
source code.”

“Because of weak ordering, the memory system is allowed to prioritize
reads over writes. In this case, a write that is queued anywhere in the pipeline, but not completed, may be deferred by a subsequent read operation,and the read is allowed to be completed before the write. Reads are prioritized over writes because the read operation has a dependent operation
waiting on its completion, whereas the processor considers the write operation complete, and the write does not stall the pipeline if it takes more
cycles to propagate the value out to memory. This behavior could cause a
read that occurs in the program source code after a write in the program
flow to actually return its value before the write has been completed.
This ordering provides significant performance advantages in the operation
of most memory instructions. However, it can cause side effects that
the programmer must be aware of to avoid improper system operation.”

When we get optimizations, we also get trade-offs. But let us keep the price limited to what we pay for in the die space, and not write buggy code and learn the hard way.

The side effect which is mentioned above will come into picture when we are configuring any kind of registers or any such memory, on which a read after write might give different results compared to a read before write. Consider the following sequence

1. Write Register 1

2. Read Register 2

What if Register 2 value depend on the Register1 write? In such situations we need to ensure strict ordering of reads and writes by using the SYNC instructions (CSYNC & SSYNC).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s