Turkish Online Journal of Qualitative Inquiry (TOJQI) Volume 12, Issue 5, July, 2021: 4397 - 4404

**Research Article** 

### Rns System With Improved Reverse Conversion Process For High Performance Dsp Applications

T.R.Dinesh Kumar<sup>1</sup>, Dr.M.Anto Bennet<sup>2</sup>, R.Sowmiya<sup>3</sup>, S.Saranya<sup>4</sup>, R.Yuvasree<sup>5</sup>

#### ABSTRACT

Digital filtering is the core of digital signal processing, since it underlies the solution of most practical problems in this area: noise reduction, amplification and suppression of frequencies, interpolation, decimation, equalization and many others. In recent years DSP applications deals with samples of high bit sizes for improved precisions which required large amount of hardware resources for computation. Irrespective of arithmetic models used for data computation the tradeoff is always exists with word length size. To retain the performance metrics in terms of speed it is essential to invent some mathematics based on modular fields for arithmetic computations. In addition to that, this work also proposes a decomposed LUT based residuegeneration for RNS MACcore that constitutes simpler arithmetic operation which explores improved path delay optimization level. It is proved that inRNS as compared all other functions reverse conversion process is directly influence the overall RNS system performance both in terms of power consumption overhead, performance measures and path delay etc. Here reverse conversion can be implemented using decomposed LUTwhich produces high performance FIR design.

*KEYWORDS*:RNS, FIR design, MAC system, residue generation, FPGA etc.

#### **1. INTRODUCTION**

The advent of High speed communication in 5G devices high performance arithmetic modelling is the key and heart of next generation applications to build unified accumulation and multiplication units that are optimized can be utilized in many applications such as finite impulse response (FIR) filtering [1], Fast Fourier transform (FFT) computation [2] and wavelet transform [3] etc. parallel prefix accumulation is widely preferred for many digital signal processing (DSP) and wireless communication devices for improved system performance. It is also used for formulating extended arithmetic units such as multiplication and division unit. To narrow down path propagation delay and design complexity overhead in adder unit many works introduced

<sup>&</sup>lt;sup>1</sup>Assistant Professor, Dr.RangarajanDr.Sakunthala Engineering College,Chennai-62.

<sup>&</sup>lt;sup>2</sup>Professor,Department of ECE, Vel Tech RangarajanDr.Sagunthala R&D Institute of Science and Technology,Chennai-62

<sup>&</sup>lt;sup>3,4,5</sup>UG Students.,Department of ECE,Vel Tech High Tech Dr.RangarajanDr.Sakunthala Engineering College,Chennai-62.

Corresponding Author Email: drmantobenet@veltech.edu.in

various prefix topologies. In most cases, the notable performance degradations occur with input bit width of the adder unit. In general many DSP applications need to accommodate large number of multiply and accumulator (MAC) operation and its hardware accumulation also increased accordingly [4].

The major objective of high performance FIR MAC core [5] is to evaluate, develop, and analyze transformation models used in each stages of data propagation in-order to achieve higher throughput efficiency with improved hardware utilization rate. Among much high performance MAC models, RNS core is the most promising method used in communication systems and has proven to be an optimal technique well suited for next generation DSP systems. It can able to support high bit rate FIR system and also offers improved memory efficiency.

### 2. RELATED WORKS

In this section includes the advantages of existing RNS FIR core and its implications on FPGA implementation in detailed. In general working principle and its parametric measures of MAC core system are largely depends on input length, number of taps and various physical transformation functions involved during FIR core in related DSP applications are applicable only with some memory optimization models due to its parametric constraints.

Many previous works focused only on FIR filter design optimization using various multiplication methods like booth [6], vedic [7] for high performance and canonical sign digit (CSD) formulation of filter coefficient for low complexity. However, the major issues with these models are that, as the complexity reduced the performance rate is also significantly degraded; or vice versa. Therefore, unified model is required to narrow down this performance gap. To accomplish this task, in recent years many works has been investigates the prefix accumulation and DA arithmetic for complexity reduction, high throughput and low power FIR design.

Researchers also investigate the RNS for various DSP applications [8] since FIR design based on residue arithmetic offers both high speed as well as optimized computational complexity overhead. The architectural choice of FPGA provides additional metrics in RNS systems due to its resource availability. The DA model proposed in [9] for 2-D block FIR filters used reconfigurable DA lookup tables (DA-LUT) for low complexity multiply and accumulate block. And during processing most common elements are shared among DA-LUTs at various stages which also reduce the hardware complexity of DA-LUT.In [10] improved distributed algorithm (DA) is used to design high-order FIR filters with least path delay propagation and computational complexity overhead. By decomposing LUTs into several smaller size look-uptables memory requirements are reduced and parallel processing is incorporated to improve the system performance.The binary serial implementation of RNS system introduced in [11] for adaptive FIR filter eliminates the complex scaling. In this design, each LUT module consists of wide range of parameter configuration to accommodate different number of taps and coefficient word lengths.

In [12] invented binary coded format to compute the residues and the thermometer code encoded residues to compute the modular inner products[13-17]. This Distributed arithmetic involves no carry propagation in accumulation and pre-computed LUT blocks to attain maximum operating speed and least possible hardware complexity overhead in FIR filter design.



Fig.1.Proposed RNS FIR MAC model

## 3. RESIDUE NUMBER SYSTEM (RNS)

Among various methodologies investigated for high performance FIR MAC, RNS is considered as a prominent methodology that is useful for next generation DSP applications and proved to be a potential alternative to exiting the all other existing MAC core system. In this real time DSP applications FIR with various taps are preferred due to its optimal tradeoff between performance and computational complexity overhead as shown in Figure 1. During cipher conversion input text is arranged as 4x4 matrixes which is known as a state matrix or state array.

### Hardware efficient RNS methodology

Hardware-efficient RNS system consisting of several parallel processing units with effective representations of moduli subsets. But the trade off is always exists between the moduli sets dynamic range and the memory efficiency over reverse computation since this efficiency is directly proportional to the number of parallel units.

### High performance FIR design model

The design complexity and performance rate of any FIR filter design is directly related to filter order and in most cases this will dominates with the length of filter. In the FIR filter, MAC units are used each processing steps and some appropriate arithmetic

Though RNS is numerically advanced arithmetic core significant performance tradeoff is always occurs in many real time DSP applications due to its complex computations. Optimization is essential for RNS system to implement in DSP and wireless systems.

### **3.1 Performance Measures**

The attainable system performance of RNS algorithm is largely depends on residue size and associated memory elements involved in reverse conversion operations. However performance enhancements through RNS system come with some significant computational time and memory overhead. Different algorithms used different level of complexity trade off to meet desired performance levels. In this proposed RNS system LUT decomposed model is used for high performance measures over FIR design.

## 4. EXPERIMENTAL RESULTS

### **4.1 Simulation Results**

In order to validate the importance of decomposition of memory unit in reverse conversion process and to verify its impact in RNS system during FIR convolution process is simulated using appropriate test inputs in various stages of data propagation as shown in Figure 2 and its associated modulo conversion is shown in Figure 3. The potential benefits of modulo residue conversion and its efficiency in RNS core design is also proved through simulation results



Fig.2.SimulatedRNS FIR convolution



Fig.3.Modulo conversion for residue generation

## 4.2 Hardware synthesis results

compare the performance metrics of proposed RNS FIRover conventional ROM based RNS model and validated the metrics both in terms high performance and complexity trade off measures. The proposed RNS FIR MAC core is modeled using the Verilog HDL and synthesized using FPGA QUARTUS II EDA synthesizer for state-of-the-art comparison..The resultant RNS core is capable of achieving a flexible tradeoff with least possible design complexity and tolerable path delay optimization.Moreover, by exploiting the benefits of reverse conversion computation which can minimize memory space requirements and can able to support the path delay optimization using decomposed LUT model. In this memory optimized RNS FIR CORE can able to jointly optimize the computational complexity and energy from beneficiary decomposed reverse conversion model.

| Table.1 Compar     | rative analyzes | of various  | FIR MAG   | C unit         | designs | with | <b>QUARTUS</b> | Π |
|--------------------|-----------------|-------------|-----------|----------------|---------|------|----------------|---|
| hardware synthesis | using CYCLON    | NE III fami | ly (EP3C1 | 6 <b>F48</b> 4 | C6)     |      |                |   |

| MULTIPLIER MODEL                | Design<br>complexity(LEs) | Speed(MHz) | Power<br>dissipation(mW) |
|---------------------------------|---------------------------|------------|--------------------------|
| Conventional RNS FIR            | 3016                      | 46.02 MHz  | 82.97mW                  |
| FSM based decomposed<br>RNS FIR | 2878                      | 67.47 MHz  | 76.23mW                  |





#### 4.3 Performance comparison report

The FPGA hardware synthesizer tool has been used to measure the power utilization report and its experimental results are listed in Table 1. From the logical elements utilization summary it is proved that the proposed decomposed LUT based RNS using FIR design offers 5% area efficiency over conventional ROM based RNS approach and achieves significant hardware complexity reduction. The energy efficiency of memory optimized reverse conversion modelis also proved to be the significant one as shown in Figure 5 through FPGA hardware synthesis results.



**Fig.5.**Power dissipation report.

#### CONCLUSION

Here in this work FPGA implementation of RNS residue arithmetic drivenhigh speedFIRdesign is carried outand analyzed itsperformance metric futures. It is also demonstrated that decomposed LUT based reverse conversion introduced for RNS shows superior system

performance. Here, we propose hierarchical accumulation of FIR coefficientsalong with FIR sequence extraction model that provides both energy efficiency and path delay optimization. Compared to conventional RNS FIR design, this technique offers low cost RNS based algorithm with maximum memory depth. Here we proved that proposeddecomposed LUT based RNS will give better hardware complexity & power optimization with considerable delay enhancement.

## REFERENCES

- 1. Egila, Mohamed G., Magdy A. El-Moursy, Adel E. El-Hennawy, Hamed A. El-Simary, and AmalZaki. "FPGA-based electrocardiography (ECG) signal analysis system using least-square linear phase finite impulse response (FIR) filter." Journal of Electrical Systems and Information Technology 3, no. 3 (2016): 513-526.
- 2. Highlander, Tyler, and Andres Rodriguez. "Very efficient training of convolutional neural networks using fast fourier transform and overlap-and-add." arXiv preprint arXiv:1601.06815 (2016).
- 3. Daubechies, Ingrid, and Stephane Maes. "A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models." In Wavelets in medicine and biology, pp. 527-546. Routledge, 2017.
- 4. Sen, Avisek, ParthaMitra, and DebarshiDatta. "Low power mac unit for DSP processor." International Journal of Recent Technology and Engineering (IJRTE) 1, no. 6 (2013): 93-95.
- 5. Kobayashi, Ryohei, Yuma Oobata, Norihisa Fujita, Yoshiki Yamaguchi, and TaisukeBoku. "OpenCL-ready high speed FPGA network for reconfigurable high performance computing." In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 192-201. 2018.
- Dr. AntoBennet, M, SankarBabu G, Natarajan S, "Reverse Room Techniques for Irreversible Data Hiding", Journal of Chemical and Pharmaceutical Sciences 08(03): 469-475, September 2015.
- Dr. AntoBennet, M ,Sankaranarayanan S, SankarBabu G, "Performance & Analysis of Effective Iris Recognition System Using Independent Component Analysis", Journal of Chemical and Pharmaceutical Sciences 08(03): 571-576, August 2015.
- 8. Dr. AntoBennet, M, Suresh R, Mohamed Sulaiman S, "Performance & analysis of automated removal of head movement artifacts in EEG using brain computer interface", Journal of Chemical and Pharmaceutical Research 07(08): 291-299, August 2015.
- 9. AntoBennet, M &JacobRaglend, "Performance Analysis Of Filtering Schedule Using Deblocking Filter For The Reduction Of Block Artifacts From MPEQ Compressed Document Images", Journal of Computer Science, vol. 8, no. 9, pp. 1447-1454, 2012.

- 10. AntoBennet, M &JacobRaglend, "Performance Analysis of Block Artifact Reduction Scheme Using Pseudo Random Noise Mask Filtering", European Journal of Scientific Research, vol. 66 no.1, pp.120-129, 2011.
- 11. Jyothi, Grande Naga, Kishore Sanapala, and A. Vijayalakshmi. "ASIC implementation of distributed arithmetic based FIR filter using RNS for high speed DSP systems." International Journal of Speech Technology (2020): 1-6.
- 12. Safari, Azadeh, CheecottuVayalilNiras, and Yinan Kong. "Power-performance enhancement of two-dimensional RNS-based DWT image processor using static voltage scaling." Integration 53 (2016): 145-156.
- 13. Murugan, S., Jeyalaksshmi, S., Mahalakshmi, B., Suseendran, G., Jabeen, T. N., & Manikandan, R. (2020). Comparison of ACO and PSO algorithm using energy consumption and load balancing in emerging MANET and VANET infrastructure. *Journal of Critical Reviews*, 7(9), 2020.
- 14. Sampathkumar, A., Murugan, S., Sivaram, M., Sharma, V., Venkatachalam, K., & Kalimuthu, M. (2020). Advanced Energy Management System for Smart City Application Using the IoT. In *Internet of Things in Smart Technologies for Sustainable Urban Development* (pp. 185-194). Springer, Cham.
- 15. S. Kanaga Suba Raja, A. Sathya,S.Karthikeyan,T. Janane(2021) 'Multi cloud-based secure privacy preservation of hospital data in cloud computing', International Journal of Cloud Computing (Inderscience Enterprises Ltd), ISSN 2043-9989, Volume 10, Issue – 1/2, pp. 101-111. <u>https://doi.org/10.1504/IJCC.2021.10036376</u>
- 16. V. Balaji, S. Kanaga Suba Raja, S. Aparna, J. Haritha, M. Lakshmi Kanya (2019) 'Smart Assistance for Asperger Syndrome using Raspberry Pi', International Journal of Innovative Technology and Exploring Engineering, ISSN: 2278–3075, Volume 9, Issue - 1, pp. 2680-2683.
- 17. JA ALzubi, B Bharathikannan, S Tanwar, R Manikandan, A Khanna, Applied Soft Computing 80, 579-591