Turkish Online Journal of Qualitative Inquiry (TOJQI) Volume 12, Issue 10, October 2021: 849-860

# High Speed Systolic Architecture 2-D Forward Lifting DWT IC for Analysis of Covid X-Ray Images

# Dr.L.Malliga<sup>1</sup>,S.Karthick<sup>2</sup>, Dr. Seetharam Khetavath<sup>3</sup>,R.Ragumadhavan<sup>4</sup>

 <sup>1</sup>Professor, Department of Electronics and Communication Engineering, Malla Reddy Engineering College for Women (Autonomous), Maisammaguda, Medchal (M), Hyderabad, Telangana -500 100<sup>1</sup>Email:dr.malligalakshmanan18@gmail.com
 <sup>2</sup> PG Scholar, Arignar Anna Institute of Science and Technology Sriperumbudur, Pennalur EB, Sriperumbudur, Tamil Nadu 602105 <sup>2</sup>Email:251287karthick@gmail.com
 <sup>3</sup>Professor& Head, Department of Electronics and Communication Engineering, Chaitanya (Deemed to be University), Warangal, Telangana -506001.<sup>3</sup>Email: seetharamkhetavath@gmail.com
 <sup>4</sup>Assistant professor, PSNA College of Engineering and Technology, Dindigul-624622, Tamil Nadu <sup>4</sup>Email: raguece85@gmail.com

## Abstract

This paper presents the design and implementation of the high-speed systolic architecture for 2D forward lifting based DWT structure for analysis of Covid x-ray images. The detail analysis of various DWT architectures like convolution based and lifting based for image processing application is done. From the analysis it has been found that the systolic architecture is faster when compared to other architectures. This architecture uses high parallelism, has very high throughput, compact robust and efficient. The real time images were used. The proposed design is implemented in FPGA using VHDL Programming. Quartus 2 tool is used.

Keywords: Systolic, DWT, Lifting, convolution, Covid, x-ray images

## I. INTRODUCTION

The discrete wavelet transform has many advantages compared to other transforms and it is most appropriate for image compression JPEG2000 standards. It resolves the problem of blocking artefacts that occur in DCT by the sub-band decomposition technique and gains a higher compression ratio along with multi resolution capability. The DWT implementation using traditional methods has some of the disadvantages in image/video applications like computational difficulty and there occurs deficiency in memory storage for image coefficients.

When compared to convolutional DWT the lifting scheme is advantageous. The properties of biorthigonality and regularity are determined by linear relationship between the filter bank coefficients. The ligting scheme is independent on the fourier transform of wavelets and hence the wavelets are constructed on arbitrary lattices in spatial domain. They are employed as a multi-resolution analysis tool in signal or image for texture classification [4] and are employed for compression in real-time multimedia applications. For the determination of efficiency the data movement and data transfer act as a primary concern in VLSI implementation for lifting-based DWT. Usually the systolic arrays is a normal computational structure that consists of a simplest data routing and control for effective VLSI implementation, if the filter length is high. Systolic arrays denote a suitable architectural design in the construction of wavelet-lifting schemes.

The aim of this work is to propose a high-speed systolic VLSI architecture for the given image coefficients to implement lifting-based DWT effectively. The lifting algorithm for 1-D and 2-D based systolic arrays is designed in VHDL.

# . II. LITERATURE SURVEY

A High-Efficient Architectures for 2-D Lifting-Based Forward and Inverse Discrete Wavelet Transform is introduced [1]. It used Horizontal filter (HF) and Vertical filter (VF). The Biorthogonal 5/3 wavelet low-pass filter coefficient is quantized at start and end state before applying high speed computation hardware. All Arithmetic operations are done with lesser shifts and addition. A High-Speed Systolic VLSI Architecture for 2-D Forward Lifting based DWT was stimulated [2]. It applies of Row and Column processor for evaluating inputs results This Experiment was done by using FPGA. The Results were obtained at 260MHz speed with lower process time of 0.246µs for designing block-based lifting lossy 9/7 wavelet filter.

A fast parallel VLSI architecture for lifting based 2-D discrete wavelet transform was introduced [3]. It used 4 way process data scheduling. The Experiment result are done with VHDL and a row and column processor efficiently uses only 30% of its internal memory and N/sup 2//4+a also gets developed. A detailed survey on VLSI architectures for lifting based DWT for efficient hardware implementation innovated [4]. This System uses parallel processing parameters. The Results are evaluated in terms of hardware and timing complexity of input given and its limit towards decomposition.

An Area and power efficient architecture for high throughput implementation of lifting 2D DWT was framed [5]. It Uses systolic arrays and it has memory storage of (4N + 8P) words and functions P samples in every routine .The Experiment process are achieved at 1.5N of lesser memory by block size 8 of 512 512 size of image with 28% and 27% lower area and energy also with lower delay product of 35%. A Memory efficient modular VLSI architecture for high-throughput and low-latency implementation of multilevel lifting 2-D DWT was constrained [6]. This Structure doesn't consist of line buffer and its used cascade structure for enlarging HUE. The Experiment results are calculated in terms of *L* Pyramid Algorithm (PA) and one Recursive Pyramid Algorithm (RPA) units. It has 17% lesser SDP than other methods with maximum 12.2 times throughput and utilized lower power per output (PPO) of 52%.

An efficient multi-input/multi-output VLSI architecture (MIMOA) was developed for 2D liftingbased DWT [7]. The Experiment results are evaluated based on dimension of an image along with its computation period of N  $\times$  N images with minimum N<sup>2</sup>/M period. [8] Framed an efficient folded architecture (EFA) for lifting based discrete wavelet transform. The EFA achieves higher gain with minimum register and also uses parallel and pipeline approach to reduce optimized architecture (OA) disadvantages. A hardware efficient systolic modular design for 2D DWT was constrained [9]. It

Consists of 2 step process namely1) Performance made on column, 2) Performance made on Row. The Systolic modular designs are not built with memory chip. This Method computes faster because it III. Process with lesser robust hardware performance with lower duration period.

A lifting based Discrete Wavelet Transform (DWT) was framed [10]. The Aim behind the lifting based scheme was to achieve easier filtering stage for inputs given to it and the output are evaluated based on hardware and timing complexities. An efficient Architecture for Modified Lifting-Based Discrete Wavelet Transform was stimulated [11]. This Lifting based architecture has transpose block, temporal memory. It has type *Z* fashion that reduces the buffer mode DWT and this experiment is carried in VHDL and synthesized Cadence tool in 90 nm technology. Memory-efficient architecture of 2-D lifting-based discrete wavelet transform (LDWT) was introduced for Motion JPEG2000 [12]. This Architecture used row and column parameter to for analysing signal and it also uses a special interlaced read scan architecture (IRSA) with an approach of pipeline and parallel structure to get maximum processing speed.

A memory efficient high-speed VLSI implementation of multi-level discrete wavelet transform was innovated [13]. It used 3 approach state namely 1) dual data scanning approach 2) Row and Column transformation 3)Analysing and processing the memory unit and outperforms the output stimulated a memory-efficient high-throughput architecture for lifting-based multi-level 2-D DWT. It used parallel stripe scanning. It aimed to reduce the memory chip dimension. This Experiment obtained results at 90 nm CMOS with maximum of 60% delay and 97% of throughput. 2-D DWT system architecture for image compression was developed [14]. This can be flexible according to the input given and it also uses least number of components and memory storage. This Architecture achieved an output for 138 frames per second for a  $2048 \times 1536$  video processing with help of 8 parallel DWT engines. VLSI architecture for lifting based forward and inverse wavelet transform was innovated [15]. It used 7 pair of filter and digital circuits like adder, multiplier. This Architecture has been carried out on VHDL and resulted 0.18-/spl mu/ technology is 2.8 nun square and operating frequencies of 200 MHz.

## III..BACKGROUND METHODOLOGY

# A. Basics of Lifting Scheme

The lifting scheme is appropriate for the construction of second-generation wavelets and it is regarded as a flexible tool. It contains three major operations such as split, predict and update. Figure 1 explains the lifting scheme of wavelet filter for the computation of 1-D signal. The three major steps for lifting based DWT are



Fig. 1 Block diagram of the forward lifting scheme



Fig. 2 Block diagram of the inverse lifting scheme

Split step: here the signal is split into two points even and odd, since the maximum correlation between adjacent pixels is used for the following predict step. The every pair of given input sample x(n) is split into even coefficients x(2n) and odd coefficients x(2n+1).

Predict step: In predict step the even samples are multiplied by a predict factor and the results obtained are added with the odd samples for generating the detailed coefficients (dj). The detailed coefficient is the resulted due to high-pass filtering.

Update Step: The update factor is multiplied with the detailed coefficients obtained from the predict step and the obtained results are added with even samples to get the coarse coefficients (sj). The low-pass filter output is got from coarse coefficients output.

The inverse transform can be obtained by changing the sign of the predict step and update step and implementing the above operations in reverse order, as given in fig.2. The lifting based inverse transform (IDWT) is easier to implement and it includes the operations to be carried out in reverse order. Thus the resources can be used again in forward and inverse DWT for the definition of general programmable architecture.

# IV.Systolic Architecture

A systolic array is usually composed of a closely coupled processing elements which may include adder, multiplier or shifter. Cells or nodes generate processing elements in the systolic array. Every node/cell get data from its neighbouring node, process it individually, generates a partial result, stores it within themselves, and further it is passed downstream. This function is similar to a human brain or human circulatory system. The main advantage of the proposed design of adaptive filter is the cause of open operand data handling and partial results calculation.

The problem here requires parallel processing and high acceleration of the computations. A basic block diagram of the systolic architecture is shown in figure 1.



Fig.3. Block diagram of the systolic architecture to perform array multiplication.

The figure 3 explains a network of processing element (PEs) that carries array multiplication. The implementation of VLSI design is done with the help of modularity and regularity feature. Parallel array multipliers are highly applied in DSP applications and they are implemented with the help of large number of multiplier and adder. Here the power consumption is due to multipliers. Thus the coefficient updates of the systolic array does not have any adaption delays and it can be used in the design of adaptive filters or any other variable step-size strategies. Here the main goal is to design an adaptive filter using systolic array by using VLSI design architecture. The inner product output is the convolution sum of the tap-input vector with the tap- weight vector and it is calculated by matrix-by-vector multiplication employing systolic processing elements. The design is done with dependence graph against conventional methods.



Fig. 4. Representation of the systolic array for 9/7 lifting



Fig.5. Pipelined architecture for the 9/7 lifting DWT

Here the image coefficients are read in a block or group of pixel coefficients employed for the calculation of lifting steps P1, U1, P2 and U2 as explained are calculated by using processing elements. For every clock pulse, the processing element 1 took a pair of odd and even coefficients and calculates the predict stage(P1) and this is send to the next processing element to calculate the update stage (U2) and a delay of one clock pulse. The predict stage (P2) is calculated with the help of the processing element 3 that gets coefficients from PE2. Similarly the update stage U2 is calculated with the fourth lifting stage to that is calculated from the processing element 4. For obtaining the final first-level high and low-pass coefficients are done this idea can be expanded for N dimensional input image coefficients. The arrays are arranged so as to gain pipelining with effective hardware usage and speed. The required filter output can be obtained by L level decomposition of an N-dimensional image.

#### A.Systolic Array Representation for the 2-D 9/7 Lifting DWT

In this paper a 2-D systolic array for lifting scheme is proposed for covid images as given in Fig.9. The row processor contains processing elements in  $M \times N/2$  that are organized in systolic manner. The column processing is done by cyclic symmetry property of the input image coefficients, the last row is overlapped with the first row in the systolic array for every given levels of decomposition which results in effective hardware utilization.



Fig. 6. Block diagram of the 2-D Systolic architecture



Fig.7. Representation of the 2-D systolic array

Fig.7 gives the architecture of 2-D systolic array for lifting DWT. The image coefficients in first block in first block are given to the processing elements in first row. The input image coefficients are sent for systolic row processing for producing the intermediate matrices of  $M \times N/2$  and are stored in intermediate buffer. Here M and N are dimensions of the image. The column processing is done with the help of intermediate coefficients by mapping each PEs so as to arrange the PEs to produce 2-D array. The input block lf image coefficients are using cyclic symmetry property {x(0)p, x(1)p, x(2)p, . . ., x(N - 1)p,... x(N - 1)p, x(N - 2)p...} here p represents the block size. The last row is overlapped with the first row in the systolic array for the given input coefficients in the similar systolic array processing element. A pair of parallel structure architecture can be obtained for given image with block size of  $P = M \times N$  with M = N with a factor of N2/2. This decreases the average computing time and the processing elements employed in systolic array. For calculating the jth level DWT, M\*N/22

j-1 samples for each clock cycle is required, here  $1 \le j \le L$ ,  $L = [log(M \times N) 4]$ . Every row of matrix clusters Introduces the heterogeneity that prolongs the time interval before the death of first node called stability period. Thus protocol is based on the weighted election probabilities of each node to become cluster head according to the remaining energy in each node. In this there are two types of nodes was considered as normal and advance. This protocol does not require global knowledge of energy at every round to select cluster heads. Authors Extended the LEACH protocol except the heterogeneity awareness. Cluster count is variable in this algorithms and also unstable period is not good.Aj-1 is overlapped and supplied to the P E- j in 2 j-1 \* 4 cycles.

| Clo<br>ck<br>cyc<br>le                       | 0           | 1                | 2           | 3                     | 4                     | 5                | 6      | 7                | 8           | 9                |
|----------------------------------------------|-------------|------------------|-------------|-----------------------|-----------------------|------------------|--------|------------------|-------------|------------------|
| Inp<br>ut<br>dat<br>a                        | X<br>0<br>1 | X<br>0<br>2      | X<br>0<br>3 | X<br>0<br>4           | X<br>0<br>5           | X<br>0<br>6      | X<br>7 | X<br>8           | X<br>1<br>1 | X<br>1<br>2      |
| Pre<br>dict<br>sta<br>ge<br>1<br>(P1<br>)    | _           | D<br>(<br>0<br>) | _           | D<br>(<br>1<br>)      | _                     | D<br>(<br>2<br>) |        | D<br>(<br>3<br>) | _           | D<br>(<br>4<br>) |
| Up<br>dat<br>e<br>sta<br>ge<br>1<br>(U1<br>) |             | S<br>(<br>0<br>) |             | S<br>(<br>1<br>)      | _                     | S<br>(<br>2<br>) |        | S<br>(<br>3<br>) | _           | S<br>(<br>4<br>) |
| Pre<br>dict<br>sta<br>ge<br>2<br>(P2         |             |                  |             | H<br>P<br>(<br>0<br>) | H<br>P<br>(<br>1<br>) |                  |        |                  |             |                  |

#### TABLE I .Data flow of the systolic array architecture

#### Dr. L. Malliga, S. Karthick, Dr. Seetharam Khetavath, R. Ragumadhavan

| )<br>Un                                |  |                       |                       |  |  |  |
|----------------------------------------|--|-----------------------|-----------------------|--|--|--|
| dat<br>e<br>sta<br>ge<br>2<br>(U2<br>) |  | L<br>P<br>(<br>0<br>) | L<br>P<br>(<br>1<br>) |  |  |  |

The images for analysis are got from MATLAB and are loaded to systolic VHDL file. Here we considered the image of size  $128 \times 128$  as given in Fig.8. For any square 2-D input image the architecture is considered to be generic. A sample of  $8 \times 8$  coefficients and the block of size P=4 areparallely fed to the systolic array. The respective image coefficient sample of M = N = 8 and a block of size P=4 for the image coefficient given.



Fig.8. Sample image

| Architectures             | DWT<br>scheme | Adders | Multipliers | Frequency<br>(MHz) |
|---------------------------|---------------|--------|-------------|--------------------|
|                           |               |        |             |                    |
| Mohanty and<br>Meher [6]  | Lifting       | 8P     | 4.5P        | 112.892            |
|                           |               |        |             |                    |
| Salehi and<br>Amirfattahi | T . C .       |        |             | 07                 |
| [18]                      | Lifting       | _      | —           | 97                 |
| Tian et al.<br>[1]        | Lifting       | 8P     | 6P          | 64.25              |
|                           |               |        |             |                    |
| Cheng and<br>Parhi [14]   | FIR           | 16     | 12          | 58.73              |

| Meher et al. [5]        | Convolution | 4 ( <i>K</i> – 1) | 4K  | 230.3  |
|-------------------------|-------------|-------------------|-----|--------|
|                         |             |                   |     |        |
| Proposed 2-             |             |                   |     |        |
| D systolic architecture | Lifting     | 8 N               | 4 N | 260.01 |

The given image of N \* N with M = N has to undergo L level decomposition. The level 1 works on all N2 pixels of the original image, and the every other next level works on approximation. Table 1 demonstrates the first level of decomposed image coefficients, after the passage through PE in parallel for the blocks P1, P2, P3 and P4. To calculate one lifting step the processing elements in the systolic array needs tow samples. We need four pipeline stages in 9/7 lifting filters for its effective implementation. The typical flow diagram of the block-based systolic array is executed and it is shown is Fig.13. Here the ROW and COLUMN denotes the current row and column for the given j level.

| FPGA                       | Altera cyclone II<br>EP2C35F672C6 |                    |  |  |  |
|----------------------------|-----------------------------------|--------------------|--|--|--|
|                            | One<br>dimensional                | Two<br>dimensional |  |  |  |
|                            | systolic<br>array                 | systolic<br>array  |  |  |  |
| Total<br>logic<br>elements | 248/33, 216<br>(<1%)              | 31/33,216<br>(<1%) |  |  |  |
| Total<br>registers         | 200                               | 200                |  |  |  |
| Total<br>pins              | 241/475<br>(50%)                  | 281/475<br>(59%)   |  |  |  |
| Total<br>memory<br>bits    | 0/483,840<br>(0%)                 | 0/483,840<br>(0%)  |  |  |  |

#### TABLE II. SYNTHESIS REPORT OF THE SYSTOLIC ARCHITECTURE

# V.RESULTS AND DISCUSSION

# A.Systolic Array Representation for the 2-D 9/7 Lifting

# DWT

The systolic array architecture is designed in VHDL for doing performance analysis of the 1-D and 2-D systolic arrays. EP2C35F672C6 cyclone II Altera FPGA is employed for the implementation of the architecture. The proposed architectural performance is analysed in terms of the number of multipliers, number of adders, storage size, computing time, control complexity and hardware utilization. The computing time is normalized to the internal clocking rate. Each pair of input block of coefficients let to the generation of two output blocks of coefficients along a delay of about four clock cycles. The table 3, shows the hardware implementation of lifting scheme is suited for real-time high-speed multimedia applications.

## COMPARISON OF ARCHITECTURES

The 2-D lifting systolic architecture is compared and analysed with various existing architectures. Hardware complexity(measure of number of multipliers, adders and registers), DWT scheme, operating frequency are the parameters compared. The proposed design is given in Table 3 and it is proven that it gives high speed with other existing architectures.

TABLE III.Comparison of the performance analysis of various architectures

# B. FPGA Implementation

For the implementation of the proposed systolic array architecture a synthesizable hardware description model is created and it is implemented for a block size of P = 4 for the given coefficients. For implementation the Quartus II IDE with Altera cyclone II EP2C35F672C6 FPGA hardware is employed. The input image coefficients are denoted in integer data format to eliminate finite word length effects. The EFA [10] and the multiple input multiple output architecture [15] are redesigned for proving the results of the proposed method using the same input specifications for the conditions given. The table 5 presents the results of FPGA implementation in terms of dedicated logic registers, combinational LUTs, critical path and operating frequency achieved for the architectures. From the results it is observed that the critical path of the folded architecture is wide, due to the latency that are included in the output coefficients to reach the input. By using two input/two outputs per cycle the MIMOA architecture reduces the critical path and area used in the logic implementation. High speed is achieved in our proposed systolic array architecture because of the regular data flow scheduling and it is appropriate for high-speed multimedia applications.

| Architecture         | Combinati-<br>onal<br>LUT's | Dedicat-<br>ed logic<br>registers | Critical<br>path<br>latency<br>(ns) | Operating<br>frequency<br>(MHz) |
|----------------------|-----------------------------|-----------------------------------|-------------------------------------|---------------------------------|
| Folded[8]            | 438                         | 170                               | 8.359                               | 119.63                          |
| MIMOA[7]             | 235                         | 235                               | 5.763                               | 173.52                          |
| Systolic<br>2D[1]    | 248                         | 200                               | 3.846                               | 260.06                          |
| proposed<br>systolic | 286                         | 260                               | 3.675                               | 260.06                          |

#### **TABLE IV FPGA I**MPLEMENTATION RESULTS OF THE VARIOUS ARCHITECTURES

# **VI. CONCLUSION**

The implementation of lifting based DWT structure using high-speed systolic architecture for 2D image analysis of covid patients is presented in this paper. The investigation presented the convolution and lifting based architecture and its performance for image processing application. The results shows the performance dominance of systolic architecture on speed compared to other architectures. The systolic architecture with parallel architecture is found efficient and for implementation VHDL Programming using Quartus 2 tool is done

## References

- [1] Sung, Tze-Yun, "High-Efficient Architectures for 2-D Lifting-Based Forward and Inverse Discrete Wavelet Transform," Proceedings of the 9th Joint International Conference on Information Sciences (JCIS-06),2006,doi:10.2991/jcis.2006.200.
- [2] N.Usha Bhanu, A.Chilambuchelvan, "High-Speed Systolic VLSI Architecture for 2-D Forward Lifting-Based DWT," Arabian Journal of Science and Engineering, vol.39, pp. 6125–6135, 2014, doi:10.1007/s13369-014-1208-2.
- [3] Jong Woog Kim and Jong Wha Chong, "A fast parallel VLSI architecture for lifting based 2-D discrete wavelet transform," 30th Annual Conference of IEEE Industrial Electronics Society, 2004, IECON 2004, Busan, South Korea, pp.1258-1262 Vol.2,2004, doi: 10.1109/IECON.2004.1431756.
- [4] N. Usha Bhanu, A. Chilambuchelvan, "A Detailed survey on VLSI architectures for lifting based DWT for efficient hardware implementation," International journal of VLSI Design & Communication System(VLSICS), vol.3(2), pp.143–164,2012,doi: 10.5121/vlsic.2012.3213.
- [5] B.K. Mohanty, A.Mahajan, P.K Meher, "Area and power efficient architecture for high throughput implementation of lifting 2D DWT,"IEEE Transaction on Circuits and System II Express Briefs, vol.59(7), pp.434–438 2012,doi:10.1109/TCSII.2012.2200169.

- [6] B.K. Mohanty, P.K Meher, "Memory efficient modular VLSI architecture for high-throughput and low-latency implementation of multilevel lifting 2-D DWT," IEEE Transaction on Signal Processing,vol.59(5), pp.2072–2084,2011,doi: 10.1109/TSP.2011.2109953.
- [7] X.Tian, L.Wu, Y.H.Tan, J.W Tian, "Efficient multi-input/multi-output VLSI architecture for two dimensional lifting based DWT," IEEE Transaction on Computational, vol.60(8), pp.1207–1211,2011,doi:10.15242/IE.E0814532.
- [8] G.Shi, W.Liu, L.Zhang , "An Efficient folded architecture for lifting based discrete wavelet transform, IEEE Transaction Circuits and System II Express Briefs,vol.56(4),pp.290-294doi:10.1109/TCSII.2012.2184369.
- [9] P.K Meher, B.K Mohanty, J.C Patra, "Hardware efficient systolic like modular design for 2D DWT,"IEEE Transaction Circuits and System II Express Briefs, vol.55(2), pp.151–155,2008, doi:10.1109/TCSII.2007.911801.
- [10] T.Acharya and C.Chakrabarti, "A survey on lifting-based discrete wavelet transforms architectures," Journal on VLSI for Signal Processing, image and video technology, vol.42, pp.321– 339, 2006, doi:10.1007/s11266-006-4191-3.
- [11] R.Pinto, K.Shama, "An Efficient Architecture for Modified Lifting-Based Discrete Wavelet Transform," Sensing and Imaging, vol.21, pp.53, 2020, doi:org/10.1007/s11220-020-00317-z.
- [12] C.H Hsia, W.H Li, J.S Chiang, "Memory-efficient architecture of 2-D lifting-based discrete wavelet transform," Journal of the Chinese Institute of Engineers,vol.34(5),pp.629–643,2011.
- [13] Y.Zhang, H.Cao, H.Jiang and B.Li, "Memory-efficient high-speed VLSI implementation of multilevel discrete wavelet transform," Journal of Visual Communication and Image Representation, vol.38, pp.297–306,2016,doi:10.j.jjcir.2016.03.014.
- [14] K. Andra, C. Chakrabarti, T. Acharya, A VLSI architecture for lifting based forward and reverse wavelet transform, IEEE transaction on signal processing, vol. 50(4), 966-977, 2002, doi: 10.1109/78.992147.
- [15] B.H Ang, U.U Sheikh, MN Marsono, "2-D DWT system architecture for image compression," Journal of Signal Processing System, vol.78, pp.131–137,2015.
- [16] K.Andra, C. Chakrabarti, T.Acharya, "A VLSI architecture for lifting-based forward and inverse wavelet transform," IEEE Transaction on Signal Processing, vol.50(4), pp.966– 977,2002,doi:10.1109/78.992147.
- [17] Cheng, C and Parhi, K.K, "High-speed VLSI implementation of 2-D discrete wavelet transform," IEEE Trans. Signal Process, vol.56(1), pp.393–403, 2008.
- [18] Salehi, S.A and Amirfattahi, R., "A block based 2D discrete wavelet transform structure with new scan method for overlapped sections," In: Proceedings of the First Middle East Conference on Biomedical Engineering, pp. 126–129, Sharjah 2011.