Speed Up Caculation with eMAC and MAC in ColdFire
ColdFire integrated enhanced multiply accumulate (eMAC) unit as a fundamental function block. Many algorithms of DSP processing and cryptography heavily rely on add and multiply operations, which can be boosted with MAC and eMAC. Before reading this article, please keep in mind ColdFire is not a general purpose DSP, which has not only MAC, but also special addressing modes and instructions. ColdFire's eMAC can speed up operations, but it can not replace DSP.
Differences between MAC and eMAC
MAC unit has only one accumulator, and it is a unit optimized for 16x16 multiplication. The firmware can uses only 16 upper bits of operands to performance the multiplication, although input and output operands are 32 bits long. The eMAC unit has four accumulators. It is optimized for 32x32 multiplication with 48 bit data path, so in optimized code, all 32 bits of operands are used. Therefore precision of computations increases. The eMAC has both integer and factional modes with less mac instruction and suitable precision. The factional mode is absent in the MAC unit. So eMAC is faster and has higher precision than MAC. Digital audio (music) normally is sampled in 16 bit, however processing 16 bit requires at least 24 bit MAC. So, if you are designing a digital music product like MP3, eMAC enabled ColdFire should be your choice. The 16 bit MAC can be used in other applications like servo control, image compression, voice processing. Please reads Reference Manual of ColdFire for more detail information about the great improvement of eMAC.
In order to assist its clients to utilize eMAC, Freescale offers some software libraries and application notes and user manual. You can learn how to use eMAC to speed up your algorithms after reading the documentations.
Mathematical Operations and DSP Processing
Before reading the following documents, please note that some ColdFire parts have MAC unit, which doesn't support factional mode, so these documents are using fixed-point with emulated floating point type, such as FRAC16/32/64, which may be document dependent. Please read it carefully before using it.
Freescale's DSPLIB is a library of digital signal processing functions designed to work with eMAC and MAC units in ColdFire processors. It includes three of the most common DSP functions: Fast-Fourier transform (FFT), Finite impulse filter (FIR), Infinite impulse filter (IIR). The library supports different diversity configurations, such as eMAC vs. MAC, 16-bit vs. 32-bit operations. Before integrating the library into your project, please review your textbook about DSP/FFT/FIR/IIR. This DSPLIB covers the important aspects. The library becomes more complete if you also use the software library from CFLMOPM (ColdFire's Library of Macros for Optimization Programmer Manual).
Freescale's CFLMOPM is a programmer manual for its eMAC/MAC optimization macros library, which offers many optimized operations written in ColdFire macro assembly. Although CodeWarrior offers optimized C libraries for ColdFire, the library in the manual was manually optimized with assembly language to get the best performance. The operations involve 1D array, 2D array, DSP algorithm, Mathematical functions, and sample projects for CodeWarrior. The Wind River and GCC are also mentioned, but I can not locate the code for the library. I checked the code, the macro assembly code should be invoked as inline assembly, rather than a function library, because it doesn't follow calling invention. That means you have to pay attention to the register assignment. The index of the manual is not user friendly, so I list the library here.
1D Array Operations
- The sum of array elments of unsigned/signed values.
- The elementwise sum of two vector arrays with unsigned/signed values.
- The elementwise sum of two vector arrays with unsigned/signed values, and store the results to a third vector with unsigned/signed values.
- The elementwise sum of a vector array of unsigned/signed values with a scalar unsigned/signed value.
- The product of the vector array of unsigned/signed values.
- The multiplication of two vector arrays of unsigned/signed values.
- The multiplication of two vector arrays of unsigned/signed values, and store the results to a third vector with unsigned/signed values.
- The multiplication of one vector array by scalar unsigned/signed value.
- The search result of a maximum element in 1D array of signed/unsigned integer values.
- The search result of a minimum element in 1D array of signed/unsigned integer values.
- Cast/Convert an array of word data elments to an array of long data elements, which is useful when a programmer wants to use the library with word data element arrays.
2D Array Operations
- The sum of the 2D array elements of unsigned/signed values.
- The elementwise sum of two 2D arrays with unsigned/signed values.
- The elementwise sum of two 2D arrays with unsigned/signed values, and store the results in a third 2D array of unsigned/signed values.
- The elementwise sum of 2D array of unsigned/signed values with a scalar unsigned/signed value.
- The multiplication of two 2D arrays of unsigned/signed values.
- The multiplication of two 2D arrays of unsigned/signed values, and store the results in a third 2D array of unsigned/signed values.
- The multiplication of one 2D array by scalar unsigned/signed value.
- The search result of a maximum element in 2D array of signed/unsigned integer values.
- The search result of a minimum element in 2D array of signed/unsigned integer values.
- 2D version for casting word data into long data elements before entering the library.
DSP Operations
- The dot product of two vector arrays with unsigned/signed values.
- The reverse dot product of two vector arrays with unsigned/signed values.
- The product of two matrices with unsigned/signed values.
- The convolution using array of samples and array of coefficients (array elements in FRAC32 type).
- A calculation of the first differences on input fractional operands, commonly known as discrete derivation.
- A calculation of the running sum of the input fractional operands, commonly known as discrete integration.
- A single pole low-pass filter (LPF).
- A single pole high-pass filter (HPF).
- A four stage low-pass filter with five coefficients.
- A band-pass filter (BPF) with five coefficients.
- A band-reject filter (BRF) with five coefficients.
- The moving average filter.
Mathematical Functions
- SIN, performs arithmetical operations with the angle parameter to reduce the angle value to the range of [0...PI/4], and then called the SIN_F/COS_F macro to compute the sine function.
- COS, performs arithmetical operations with the angle parameter to reduce the angle value to the range of [0...PI/4], and then called the SIN_F/COS_F macro to compute the cosine function.
- SIN_F, computes the sine of an angle from the range, Computation is done by Teylor's series consisting of 6 elements.
- COS_F, computes the cosine of an angle from the range, Computation is done by Teylor's series consisting of 7 elements.
- MUL, computes a product of two fixed point numbers in FXIED64 type.
We can find out that the DSP processing functions in macro library can be integrated into DSPLIB to form a more complete DSP library.
Cryptographic
In general, cryptographic is a branch of mathematical, so some cryptographic algorithm can be accelerated by MAC and eMAC. Additionally, Freescale also offers Cryptographic Accelerator Unit (CAU) and Random Number Generator (RNG) in its ColdFire product lines. The cryptographic acceleration unit (CAU) is a ColdFire coprocessor implementing a set of specialized operations in hardware to increase the throughput of software-based encryption and message digest functions. The latest released ColdFire v2 MCU Kirin3 MCF52259 has both eMAC and CAU/RNG.
The popular cryptographic algorithm includes AES/DES/MD5/SHA-1/RSA and some proprietary ones. The algorithm is used in identification, authentication, authorization, digital signature, encryption, decryption, secure payment, SSL/IPSec/VPN, which is a big topic. The CAU is an instruction-level ColdFire coprocessor. It implements a set of 22 coprocessor commands that operate on a register file of eight 32-bit registers. You can implement the innermost round functions by using the coprocessor instructions, and implement higher-level functions in software by using the standard ColdFire instructions. For CAU enabled part, Freescale offers a CAU software library to support AES, DES, MD5 and SHA-1, which can be integrated directly in a CW ColdFire project.
Since cryptographic also involves many mathematical operations besides logical operations. Freescale offers an application note about using the ColdFire eMAC to improve RSA performance. The principle described in this document can also apply to other modular exponentiation calculation and PKI cryptographic algorithms including RSA/EIGammal.
In fact, Cryptographic is an interesting topic. I read a lot of documents because of my previous job for Identication (RFID and Smart Card). Sometime we must find a way to speed up the algorithms like RSA, sometime we do the other way, to slow down the algorithm on purpose to balance the power consumption for silicon to prevent revealing the power consumption ripple, which could be reference for outside attacker. Anyway, the hardware accelerator is an important critera for a candidate silicon. If the microcontroller can only implement the algorithm in firmware, it is too easy to be hacked.
References
- Digital Signal Processing Software Libraries Using the ColdFire eMAC and MAC (DSPLIBUM v1.2)
- Library of Macros for Optimization Using eMAC and MAC, Programmer's Manual (CFLMOPM v1)
- Using the ColdFire EMAC Unit to Improve RSA Performance (AN3038)
- ColdFire Cryptographic Accelerator Unit (CAU) Software Library
CONTACT REQUEST
If you want to know more about this Freescale product, please submit your request to Arrow Italy using this form.
NOTE: this form is valid ONLY for Companies or Customers based in Italy and working in the Italian area.
- allankliu's blog
- 425 reads





Post new comment