Real Time DMAS Beamforming

Oversimplifying, my contribution to this work was an algebraic rearrangement. False modesty aside, such rearrangement makes the real-time implementation of the Delay Multiply and Sum (DMAS) beamforming algorithm possible. The intuition came while I was a student attending the "Sensors and Transducers" course during my Master's. As a project for the class, I was implementing the classic Delay-and-Sum (DAS) beamforming algorithm on a Nvidia Jetson TK1 board. During a seminar, an invited researcher mentioned their novel DMAS algorithm for ultrasound medical imaging. Apparently, the algorithm gave better contrast and resolution, but its complexity limited its real-time implementation. Intrigued by it (and to please my professors during the final exam), I wondered how to integrate this new algorithm into my DAS GPU implementation.

Simplifying DMAS

The DAS algorithm is embarrassingly parallel. In fact, the signals from N channels are independently delayed, and a reduction sum is then applied. We can't say the same for DMAS. DMAS adds an all-vs-all multiplication (a tensor product) stage before applying the reduction sum. The tensor product stage (excluding self-multiplications) consists in $\frac{N^2-N}{2}$ multiplications, making the algorithm quadratic with the number of channels. That was the bottleneck. Moreover, the flow chart became much more complex to be parallelized... Then, the eureka moment: where did I hear " the sum of all products of each term with each other"? Of course, polynomial multiplication.

For example, given three signals $a$, $b$, and $c$ the sum of all multiplications is $a \cdot b + a \cdot c + b \cdot c = \frac{1}{2}[( a + b + c)^2 - (a^2 + b^2 +c^2) ]$. (The example is a bit deceiving; try to use all the a-z letters.) Hence, instead of a sum reduction of $\frac{N^2-N}{2}$ products, the algorithm reduces to performing two sum reductions of N elements each. Moreover, it is again embarrassingly parallel since there is no interaction between channels apart from the two final reductions.

I brought the intuition to my professors during the exam. I got the maximum mark and a paper a few years later. They, in fact, used the idea to propose a real time FPGA/DSP implementation. Quite a good outcome for an exam.