DMA Architecture

DMA hardware is integral to most ARM SoC designs. They are extensively used for driving various peripheral controllers related to flash memory, LCD displays, USB, SD etc. Here we elaborate the functioning of a typical ARM SoC DMA controller and briefly delve into its internal mechanisms. The intent is to explain the underlying architecture of a typical DMA module and then eventually tie those elementary structural features to ARM’s PrimeCell® DMA Controller, which initially might come across as a more sophisticated hardware.

Direct memory access is simply meant to enable the processor to offload memory transfer operations. In this manner a system will be able to dedicate its valuable computational resources towards more apt alternate purposes. So, in fact we can state that a DMA controller is like a processor module which is capable of only executing load-store operations. While a full blown pipelined processor core possesses several other computational features and hence it should rarely be employed for rudimentary data transfers.

Eventually DMA is about transferring data across different hardware modules. Depending on the use case, the DMA source and destination might vary. And depending on the hardware the specifics of the bus protocol employed for load-store also varies. But just like a processor core, a DMA controller core will also be compatible with the hardware bus protocol features related to burst mode, flow control etc. Otherwise it simply won’t be able to move data around.


Fig 1 : DMA v/s Processor


So, a DMA controller should be simply interpreted as a form of rudimentary logical block meant to implement data transfers. Similar to a processor core, this IP block also plugs into the larger data bus matrix involving SRAM and other peripherals. While a processor executes its object code from instruction memory, a DMA controller executes its operations by loading transfer configurations from a RAM memory segment. Such a configuration is usually termed as a descriptor. This recognizable structure would essentially describe all the details regarding the source/destination, burst size, length of the transfer etc. Eventually, such a configuration table would have to be loaded into the RAM by the DMA driver software running on the main processor. And usually the memory location of this descriptor table would be communicated to the DMA controller via some shared register.

Fig 2 : Descriptor

DMA controller is eventually a co-processor assisting the main core software with various complex use cases involving heavy data transfers. For example, populating an LCD image bitmap or transferring contents to an SD card are all DMA intensive features. Such a controller would usually support multiple DMA channels enabling simultaneous transfers. In this particular case, a DMA controller with more than two channels should be able to simultaneously drive an LCD and an SD Card.

An interrupt signal would be usually employed to synchronize these DMA transfers with the main processor operations. So, once the requested data transfer is completed, DMA hardware would simply assert an interrupt, which would be acknowledged by the processor.

Fig 3 : DMA Interrupt


Usually, the contents of a transfer descriptor would resemble the DMA controller register structures. Essentially, these memory descriptor contents are blindly loaded into its actual hardware registers by the DMA controller itself. For example, a descriptor would include entries for the source and destination address which would basically correspond to the actual DMA controller source and destination address registers. Such a DMA controller block would also involve a separate core logic which reads these registers and then implements the actual memory transfers. Illustrated by the Fig 4 below.

Fig 4 : Generic DMA architecture

The transfer descriptor structures written to the memory by the processor would be loaded into the controller registers by the DMA hardware and eventually parsed and executed by its core load/store transfer logic. Undoubtedly, any errors within these memory resident transfer descriptors would also be propagated to the core transfer logic via these registers. Essentially, such a register set is meant to communicate all the relevant transfer configuration details to the DMA load/store engine. It clarifies all the details like source/destination, burst size, transfer length etc, but the actual circuit implementing the load-store loop would be already present within this DMA core.

Every hardware module has a primary interfacing mechanism, simple peripherals use registers and complex processors employ binary object codes. A typical DMA controller interfacing mechanism as elaborated above involves descriptor table reflecting its register set and transfer capabilities. PrimeCell is a DMA IP block from ARM, but unlike a typical DMA controller, its primary interfacing mechanism is not register set or descriptor tables. But it’s an instruction set, very much resembling a simple processor.

Fig 5 : PrimeCell

PrimeCell integrates a rudimentary processor core which understands a limited instruction set, mostly meant for implementing load store operations and other synchronization mechanisms related to data bus and processor interrupts. So, here, instead of a descriptor table, we have an object code describing the transfer. Essentially a binary code implementing memcpy, but written using assembly instruction code understood by the PrimeCell DMA processor. Obviously, such an instruction code would also specify the source/destination address, burst size etc. It’s an actual binary object code.

Fig 6 : PrimeCell Architecture

This DMA processor core is also multi-threaded with one transfer channel being associated with each thread. These threads possess the same scheduling priority,  and the system implements a round-robin scheme. The internal FIFO is utilized as a buffer during DMA transfers and it’s shared across all the channels. Each channel is allowed access to the whole FIFO memory, but of course, at any instant, FIFO utilization across all the running channels can never exceed its total size. Such a flexible shared memory design adds to software complexity but it can also lead to a more productive hardware resource allocation pattern.

Finally, PrimeCell IP block also includes an instruction cache to optimize object code access. Also provides scope for instruction abort handling and watchdog exceptions to avoid lockups. In other words, the module almost resembles a full blown multi tasking processor. But do note that, such transfer related or hardware related exception triggers are present in a typical DMA controller also. So, seems like, with PrimeCell IP, a software engineer is being exposed to the underlying machinery which already exist in other typical DMA hardwares but is eventually hidden behind the veil of register set abstractions. As always, nothing is without a trade-off, the more complicated aspect of managing this module is definitely the object code generation. Which makes a crude compiler out of an otherwise relatively simple DMA driver.