• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    FPGA Implementation of Elliptic-Curve Diffie Hellman Protocol

    2022-11-10 02:32:20SikandarZulqarnainKhanSajjadShaukatJamalAsherSajidandMuhammadRashid
    Computers Materials&Continua 2022年10期

    Sikandar Zulqarnain Khan,Sajjad Shaukat Jamal,Asher Sajid and Muhammad Rashid

    1Department of Aeronautical Engineering,Estonian Aviation Academy,Tartu,61707,Estonia

    2Department of Mathematics,College of Science,King Khalid University,Abha,61413,Saudi Arabia

    3Department of Electrical Engineering,Bahria University,Islamabad,44000,Pakistan

    4Department of Computer Engineering,Umm Al-Qura University,Makkah,21955,Saudi Arabia

    Abstract:This paper presents an efficient crypto processor architecture for key agreement using ECDH (Elliptic-curve Diffie Hellman) protocol overThe composition of our key-agreement architecture is expressed in consisting of the following:(i)Elliptic-curve Point Multiplication architecture for public key generation(DESIGN-I)and(ii)integration of DESIGN-I with two additional routing multiplexers and a controller for shared key generation(DESIGN-II).The arithmetic operators used in DESIGN-I and DESIGNII contain an adder,squarer,a multiplier and inversion.A simple shift and add multiplication method is employed to retain lower hardware resources.Moreover,an essential inversion operation is operated using the Itoh-Tsujii algorithm with similar hardware resources of used squarer and multiplier units.The proposed architecture is implemented in a Verilog HDL.The implementation results are given on a Xilinx Virtex-7 FPGA(field-programmable gate array) device.For DESIGN-I and DESIGN-II over GF(2163),(i) the utilized Slices are 3983 and 4037,(ii)the time to compute one public key and a shared secret is 553.7 μs and 1170.7 μs and(iii)the consumed power is 29 μW and 57 μW.Consequently,the achieved area optimized and power reduced results show that the proposed ECDH architecture is a suitable alternative(to generate a shared secret) for the applications that require low hardware resources and power consumption.

    Keywords:Elliptic curve cryptography;point multiplication;key-agreement;diffie hellman;area optimized;architecture;FPGA

    1 Introduction

    In recent years,demand for the secure transmission of reliable information over any network is increasing dramatically.Various applications such as healthcare[1],automotive[2,3],aviation[4,5]smart cards[6,7],Internet of Things (IoT)[8-12],and several others require the secure exchange of sensitive information on insecure public channels.The cryptographic algorithms(both symmetric and asymmetric) are available in the literature to provide several security services like a public key exchange,signature generation/verification,authentication and encryption/decryption.Elliptic-curve cryptography (ECC) is a form of asymmetric cryptography,which provides an equivalent security level with a smaller key size in respect to other public-key cryptosystems such as RSA (Rivest-Shamir-Adleman)[13,14].Moreover,many organizations such as IEEE (Institute of Electrical and Electronics Engineering),NIST(National Institute of Standards and Technology),ANSI(American National Standards Institute) and SEC (Standards for Efficient Cryptography) accepted ECC as standardization.

    The security strength of ECC relies on solving the discrete logarithms problem[15].It comprises four layers of a model[13].The highest layer is the protocol that defines a set of rules to perform encryption/decryption,signature generation/verification and public key exchange.The Elliptic-curve point multiplication(ECPM)is the most critical operation.Its computation depends on the execution of point addition(PA)and double(PD)operations.Arithmetic operations(such as addition,multiplication,square and inversion)are needed to perform point addition and double computations.

    There are different settings in the literature to implement ECC protocols.These settings indicate choices for (i) two different basis (polynomial and Gaussian normal) for the representation of the initial &final point and (ii) availability for two different coordinate systems (affine and projective).The polynomial basis is considered in this paper as it is more beneficial to acquire faster modular multiplications while a Gaussian normal basis is valuable where frequent squares are needed to compute[13,16].Moreover,a projective coordinate system is selected because it requires minimum inversion operations when compared to an affine coordinate system[15].

    1.1 Related Work

    Several hardware architectures are published in the state-of-the-art to accelerate the performance of ECC operations[16-24].Mostly the hardware implementations are considered for the optimizations of point multiplication(PM)operation in terms of various parameters such as frequency,area,power and throughput[16-21].

    In[16],low-cost and high-speed PM architectures are described for Binary Edward Curves(BEC).Low cost means lower hardware resource utilization.The use of a pipelined digit serial multiplier results in low area utilization.The high-speed is attained with the employment of three pipelined multipliers.OverGF(2163)andGF(2233),the synthesis results are given on Virtex-5 FPGA device.In[17],an efficient PM architecture using Lopez Dahab algorithm is presented overGF(2m)withm=163.Towards clock cycles reduction,a hybrid Karatsuba multiplier is utilized.The reduced hardware resources are achieved with the utilization of multiplier and squarer units for the computation of modular inversion using the Itoh-Tsujii algorithm.The implementation results are reported on Xilinx Virtex-7 device,where the architecture takes 25.3 μs for one PM computation and utilizes only the 3657 slices.Another area optimized architecture overGF(2163)for PM computation is described in[18]where bit/digit serial multipliers are utilized.On Virtex-5 FPGA,their architecture requires 0.11msfor one PM operation and utilizes 473 slices.

    In[19],a high-speed and low-latency ECC processor implementation overGF(2m)on Virtex-4,Virtex-5 and Virtex-7 FPGA devices is presented.They reported implementation results using single and three modular multipliers.Their singular multiplier-based ECC processor provides the highest speed on Virtex4,Virtex5 and Virtex7 FPGA devices.Finally,on same FPGA devices,the use of three multipliers results in a decrease in the number of clock cycles.A dual binary field PM implementation of ECC overGF(2283)andGF(2571)on Virtex-5 and Virtex-7 FPGA devices is described in[20].A Karatsuba multiplication method is employed to reduce the utilization of required clock cycles.

    In[21],a high-performance hardware architecture of ECPM overGF(2m)field is presented.Their design is consists of a Montgomery ladder method and employs a polynomial basis for finite field arithmetic operations.A single Karatsuba multiplier is used to improve the performance of the PM architecture.Other operators(adder and squarer)with the consideration of a small number of hardware resources are utilized in parallel with the multiplier.These architectural employments ultimately reduce the clock cycles and lead to improving performance.ForGF(2163)on Virtex-4 FPGA,their design utilizes 10417 slices and achieves an operational clock frequency of 121MHz.

    There are very few hardware designs where key-agreement using Elliptic-curve Diffie Hellman is considered for the optimizations on FPGA[22-24].A hardware/software implementation of an ECDH protocol is described in[22].The objective was to reduce the hardware resources and processing power for area-constrained applications.The implementation of their architecture was realized on a Virtex-5 FPGA device with the integration of a MicroBlaze softcore processor.A high-level synthesis(HLS)method is used in[23]to offload the ECDH operations on FPGA for the area and power reductions.Recently in[24],for wireless sensor nodes,a low-cost ECC hardware accelerator architecture is presented on FPGA.Their accelerator architecture was validated using a hardware/software codesign of the ECDH scheme(integrated into the MicroZed FPGA board).The ECDH protocol is operated in 4.1msfor one shared key generation.

    The related implementations,reported in[16-24],shows that the several modern applications such as healthcare[1],automobile[2,3],smart cards[6,7]and IoT[8-10]demand the key agreement architectures.Although,some limited designs are available[22-24],we believe further improvements in area and power can be achieved with use of different settings and strategies.Consequently,the aim of this work is to provide the FPGA implementation of ECDH protocol with different settings and optimization strategies to achieve low area and power values.

    1.2 Contributions and Settings

    Our contributions and employed strategies to reduce area and power values are as follows:

    ? Elliptic-curve PM design(DESIGN-I):With the attention of flexible input/output interfaces,we have presented a low-area crypto processor architecture for Elliptic-curve PM computation overGF(2163).The flexible input/output interfaces determine that for generating coordinates of a public key,users can takexandycoordinates of an initial point on ECC either by using the constant(inside the crypto core)or user-dependent values(from the outside to the crypto core).

    ? Key-agreement architecture (DESIGN-II):Generation of a shared key (or) key agreement between two different users/parties using the integration of DESIGN-I with two additional routing multiplexers and an efficient controller.

    ? Dedicated controllers:For various control activities,we have used two dedicated finite state machine(FSM)based controllers.

    ? Employed strategies to achieve low area and power:To achieve the low area and power values,a schoolbook multiplication method is used in the proposed crypto processor architectures(DESIGN-I and DESIGN-II).Moreover,similar hardware resources of squarer and multiplier units are used to compute the essential inversion operation.To further reduce the hardware resources,we set a design goal from balanced to area reduction during the logic synthesis in the Vivado IDE tool.

    1.3 Critical Findings and Significance of This Work

    The RTL (register transfer level) implementations of proposed crypto processor designs(DESIGN-I and DESIGN-II) are implemented in a Verilog (HDL).The results are reported on Xilinx Virtex-7(xc7vx690t-3ffg1930)FPGA device.For DESIGN-I,the utilized Slices,LUTs(lookup-tables)and FFs(flip flops)overGF(2163)are 3983,10693 and 2389.Similarly,the consumed Slices,LUTs and FFs for DESIGN-II are 4037,11713 and 2427.The required clock cycles for DESIGN-I and DESIGN-II are 163902(to generate one public key)and 327804(to generate one shared secret),respectively.The maximum clock frequency for DESIGN-I and DESIGN-II is 296 and 280MHz,respectively.The time to operate one public key using DESIGN-I is 553.7μs.Similarly,the time to generate one shared secret using the ECDH scheme is 1170.7μs.The DESIGN-I takes 29μWto generate one public key (or to operate one PM computation) while DESIGN-II consumes 57μWfor one shared secret generation.The achieved area optimized and power reduced results reveal that the DESIGN-II is an appropriate choice (to generate a shared secret) for the applications that demand lower hardware resources without considering the timing characteristics(such as clock cycles,throughput and latency).

    The structure of this paper is organized as:Section 2 presents the essential information for the computation of the ECDH scheme and the PM operation,respectively.The proposed crypto processor architecture for generating a public key(DESIGN-I)and shared secret(DESIGN-II)is provided in Section 3.The implementation results are presented in Section 4.Finally,Section 5 concludes the paper.

    2 Mathematical Background

    The mathematical structure of the ECDH protocol is described in Section 2.1.Moreover,the sequence of instructions for the computation of Elliptic-curve PM and the corresponding point addition and double formulas are provided in Section 2.2.

    2.1 ECDH Protocol

    A complete layout of the ECDH protocol for key agreement(or shared key generation)between two distinct users is illustrated in Fig.1.

    As shown in Fig.1,we have two distinct users,i.e.,UA,andUB.TheUBwill generate his/her public keyQBby taking a base-pointGon the defined curve and a private keydB.AdBis anm-bit random number(or)a scalar multiplier.Based on these parameters(G&dB),the public key will be generated(i.e.,QB=dB×G).Similarly,UAwill repeat the exact same process to generate his/her public keyQAusing his/her private keydAand a base pointG.The resulting public key will beQA=dA×G.Prior to generate the shared key,both users(UA&UB)will exchange their public keys to each other.Then,theUAandUBwill generate their shared keys usingSKA=dA×QBandSKB=dB×QA,respectively.

    2.2 Elliptic-curve Point Multiplication

    The PM operation is the computation ofd-1 times the sum of a base (or) initial pointG,as presented in Eq.(1):

    In Eq.(1),Gis the initial point on the curve,Qis the resultant point anddis a scalar multiplier(or a private key of the corresponding user).There are several algorithms in the literature such as Double and Add,Montgomery ladder,Lopez Dahab,and many more to operate the ECPM operation in ECC.A comprehensive comparison over various ECPM algorithms as hardware accelerators on FPGA and ASIC(application-specific integrated circuits)platforms is shown in[25].Consequently,we have used the following Montgomery (Algorithm 1) algorithm because (inherently) it counters simple power analysis(SPA)attacks.It is important to note that we are implementing only the SPA attacks at the algorithmic level without realizing on a design level.

    Algorithm 1:Montgomery ECPM Algorithm[13]images/BZ_1850_1360_1781_1386_1807.pngimages/BZ_1850_267_1904_335_1943.pngimages/BZ_1850_499_2150_524_2176.pngimages/BZ_1850_504_2401_529_2427.pngimages/BZ_1850_988_2398_1013_2424.pngimages/BZ_1850_1470_2150_1495_2176.pngimages/BZ_1850_1485_2401_1510_2427.pngimages/BZ_1850_1924_2398_1949_2424.pngimages/BZ_1850_585_2641_610_2667.pngimages/BZ_1850_761_2641_786_2667.pngimages/BZ_1850_1020_2641_1045_2667.pngimages/BZ_1850_1220_2641_1245_2667.pngimages/BZ_1850_1330_2641_1355_2667.pngimages/BZ_1850_1983_2641_2009_2667.png

    The inputs to Algorithm 1 are initial pointGand a scalar multiplierd.A sequencedn-1,...,d1,d0determines the bits of the scalar multiplier in terms of 0′sand 1′s.The output from Algorithm 1 is thexandycoordinates of resultant pointQ.The PA and PD operations in Algorithm 1 shows the number of instructions for point addition and double computations,respectively.

    3 Proposed Key-Agreement Architecture

    The proposed crypto processor architecture for ECDH is shown in Fig.2.It consists of three blocks,i.e.,(i) an ECDH controller,(ii) two routing multiplexers (M1andM2) and an ECPM(DESIGN-I)unit.Description of these units are as follows:

    3.1 ECDH Controller

    The ECDH controller is responsible for generating control signals for the routing multiplexersM1andM2,as shown with red color dotted lines in Fig.2.Moreover,it is responsible for communicating with the ECPM controller for the public and shared key generations.It takesd(anm-bit secret key or a scalar multiplier given in Algorithm 1),QSSxp(x-coordinate of an initial pointG(or)public key of another user for the shared secret generation),QSSyp(y-coordinate of an initial pointG(or)public key of another user for the shared secret generation)andST(start signal)as an input.TheRxpandRypdetermines thexandycoordinates of either(i)our generated public key forUAorUBor(ii)shared secret forUAorUBas an output.TheDPMandDSSare the one-bit done signals.These signals become true when the corresponding public key or shared secret generation is completed by the ECPM design.Theclkandrstsignals are input to the proposed ECDH crypto processor architecture and these are not shown in Fig.2.

    The ECDH controller is implemented by adopting a finite state machine(FSM)model.It contains three states,i.e.,IDLE,PKG and SKG.As the name implies,IDLE is the initial state of the crypto processor that determines none of the operations is under computation.The PKG shows that the crypto processor is processing a public key.Finally,the SKG means that the processor architecture is computing a shared key.A two-bit input signal “select” from outside to ECDH controller,not shown in Fig.2,identifies either the generation of a public key (or) shared key.Whenever the twobitselectsignal is 00,it means NOP (meaning with no operation and processor will reside in IDLE state).For 01 and 10 cases,the ECDH controller passes theSTsignal to the ECPM controller for the generation of corresponding public and shared keys.Once the corresponding public or shared key is generated,the ECPM controller acknowledges the ECDH controller with its associated done signal(DPMwhen coordinates of a public key is generated andDSSwhen coordinates of a shared key is generated).Based on the two-bitselectsignal,the ECDH controller generates the control signals for the routing multiplexersM1andM2to select the appropriate inputs(either coordinates ofGor a public key of another user for computation of shared key)for DESIGN-I.

    3.2 Routing Multiplexers(M1 and M2)

    The purpose ofM1andM2multiplexers is to select thexandycoordinates of either base pointG(denoted withxpandypin Fig.2)or public key(shown withQSSxpandQSSypin Fig.2)of another user.As shown in Fig.1,the ECDH scheme requires two PM computations.One for public key generation and another for a shared secret generation.Our architecture selects thexandycoordinates ofGto generate the public key when the control signals forM1andM2are 0.Similarly,it selects thexandycoordinates of public key of another user to generate the shared key when the control signals toM1andM2are 1.

    3.3 ECPM Unit(DESIGN-I)

    The ECPM unit consists of (i) an array of registers,(ii) three routing multiplexers,(iii) adder,squarer,multiplier,polynomial reduction &inversion units and (iv) an ECPM controller.The description of these units are given in the next subsequent sections:

    3.3.1 Register Array

    The register array in the proposed crypto processor architecture is an 8×msize of register file.A variablemdenotes the width of each register while a numerical value 8 shows the total number of register addresses.As a fact,the used register array is liable for preserving the intermediate and the final results during and after the implementation of the Algorithm 1.The inside to register array unit,shown in Fig.2,comprises two 8×1 multiplexers that are responsible to fetch/reading data from the used register array.Moreover,it includes a 1×8 demultiplexer to modify the contents of each register address.The corresponding control signals for reading and write operations are generated from the ECPM controller(not shown in Fig.2).

    3.3.2 Three Routing Multiplexers(M3,M4and M5)

    The three routing multiplexers in the proposed architecture includeM3(3×1),M4(2×1)andM5(3×1).TheM3multiplexer is used to choose an ECC parameter(i.e.,xcoord,ycoordand a curve constantb).These parameters are needed during the implementation of Algorithm 1.For our implementations,there are two choices to takexcoordandycoordof input parameters.The first choice is from the outside to our design usingM1andM2multiplexers.Another choice is the use of parameters as constants inside the design.For these two choices,we are using always a constant value for parameterb(shown in Fig.2).For instance,this is the limitation of our proposed architecture.Similarly,theM4multiplexer drives the output ofM3and an operand from the register array to arithmetic operators.Finally,M5multiplexer is responsible for selecting an output from arithmetic operators for the register file to perform written back.Noted that,in this paper,thexcoord,ycoordand constantbparameters are selected from NIST(National Institute of Standards and Technology)standardized document[26].

    3.3.3 Adder,Squarer,Multiplier,Reduction and Inversion Units

    The arithmetic operators,i.e.,addition (ADDER),squaring (SQUARE) and multiplication(MULT),perform the PM operation of the ECPM unit.The implementation of a polynomial addition inGF(2m)is rather simple than the prime field.Therefore,the ADDER unit comprises an array of bitwise Exclusive (OR) gates.The polynomialsDT1andDT2determine the inputs while anm-bit polynomialAoutis output.The implementation of the SQUARE unit is achieved by placing a 0-bit after each successive data bit.It takes anm-bit operand either from the register array or routing multiplexers to perform polynomial squaring.A 2×m-1 bit polynomial(Sout)is generated as an output.

    The efficiency of a polynomial multiplier specifies the performance of an ECC-based crypto processor architecture.Based on the available literature,there are four possibilities to implement a multiplier circuit.These possibilities are (i) bit-serial,(ii) digit-serial,(iii) bit-parallel and (iv) digitparallel.Each aforementioned architectural choice has certain differences.When we consider only the computational cost in terms of clock cycles,the bit-serial multipliers requiremcycles.On the other hand,one clock cycle is needed to implement the bit/digit parallel multiplication approaches.The digit-serial multipliers demandcycles.A variableashows the operand size andbpresents the length of an incorporated digit.The aforesaid discussion reveals that each multiplication style has its pros and cons as the computational costs are different.Despite the computational costs,the bitserial multiplication approaches are practical in scenarios where demand for lower hardware resources and power consumption are required.The specific examples include the RFID (radio frequency identification networks) and WSNs (wireless sensor networks) related applications.The bit/digitparallel multipliers provide high-performance(s).The digit-serial multipliers are useful in applications that require lower hardware area and lower computation time.Consequently,a bit-serial shift and add multiplication method is utilized in our MULT unit to perform twombit polynomial multiplications.It takes twoDT1andDT2polynomials as input and yields in 2 ×m- 1 bit polynomial (Mout) as output.In the employed shift and add multiplication approach,the multiplicand(DT1in our work)is addedDT2times to itself,whereDT2is the multiplier.It takesmcycles to produce one polynomial multiplication.

    The size of resultant polynomials generated after MULT and SQUARE units are 2×m-1 bit(not shown in Fig.2).Therefore,the reduction from 2×m-1 tombit is essential.The polynomial reduction,in this work,is achieved by using a NIST-specified algorithm(we refer interested readers to Algorithm 2.41 of[15]).

    As shown in Algorithm 1,the reconversion from projective to affine requires two polynomial inversion computations.There are several inversion methods in the literature to perform the multiplicative inverse of the polynomial(s).Therefore,the Itoh Tsujii algorithm is extensively employed in the state-of-the-art because it needs multiplications and square operations for computation[13,17,18].In this paper,we have (also) utilized the implementation resources of our employed MULT and SQUARE units to perform multiplicative inverse over polynomials using the Itoh Tsujii algorithm.OverGF(2163),the cost for execution of nine polynomial multiplications andm-1 squares determine the computational cost of an Itoh Tsujii algorithm[27].As expressed before,each polynomial multiplication using shift and add method needsmclock cycles when the inputs to our MULT arem-bits.Therefore,the execution of nine multiplications needs 9×mcycles.Them-1 cycles are needed for the squaring as one square takes only the one clock cycle.Thus,the computational cost to perform one inversion is(9×m)+(m-1).

    3.3.4 ECPM Controller

    The FSM-based ECPM controller is responsible to perform the PM operation of Eq.(1).It generates the control signals to implement the Algorithm 1.The generated control signals for the routing multiplexers and register array are shown with blue color dotted lines in Fig.2.The details of the ECPM controller is as follows:

    ? The proposed crypto processor will remain in state 0(IDLE state).

    ? It starts the computation once it receives theSTsignal as 1 from the ECDH controller.After theSTsignal becomes 1,the control signals for projective to affine conversions are generated which takes six clock cycles(for six states).For PM computation in projective coordinates,the point addition and double instructions of Algorithm 1 are operated in 14 states as these functions contain seven instructions(from 1 to 7)for each point addition and double operation.Thus,a total of fourteen instructions are involved.During the execution of each state,the value for an inspected key bit,i.e.,di,is checked.When the inspected value fordiis 1,the corresponding instructions ofifstatement will be executed.If not,theelsestatement is operated.In short,these fourteen states will continue to repeat as far as the condition forloopinstruction of Algorithm 1 eventually be true.

    ? Under different conditions,when theloopinstruction of Algorithm 1 becomes true,the processor switches control from PM to projective to affine conversion states.

    3.4 Total Clock Cycles(TCC)Calculation for DESIGN-I and DESIGN-II

    For the proposed DESIGN-I,Eq.(2)is used to calculate the total clock cycles.Similarly,the total cycles for DESIGN-II are calculated using Eq.(3).

    In Eq.(2),6 determines the clock cycles for the affine to projective conversion.Them-1(6×m+8)shows the clock cycles for the PM computation in projective coordinates.More precisely,6 ×mare the clock cycles for modular multiplication instructions in point addition and double operations,wheremis the clock cycles for one modular shift and add multiplication method.A numerical value 8 shows the clock cycles for the modular addition and square operations in point addition and double instructions(total 8).Hence,the addition of modular multiplications(total six instructions),modular addition(total three instructions)and modular square(total five instructions)becomes the total of fourteen instructions in point addition and double operations.ForGF(2163),the proposed DESIGN-I takes 6 clock cycles for affine to projective conversion,159732 cycles for the PM computation in projective coordinates and 4164 cycles for projective to affine conversions.In projective to affine conversion,eachinversionrequires 1629 clock cycles and additional 906 cycles are needed to compute the remaining operations of projective to affine conversions.Consequently,the proposed DESIGN-I takes 163902 clock cycles for a public key generation(or for one PM computation).

    As shown in Fig.1,the ECDH scheme consists of dual PM computation (one for a public key generation and another for shared key).Therefore,Eq.(3) confirms that our DESIGN-II needs 163902+163902 (327804) clock cycles for one shared key generation between two distinct users (as we mentionedUAandUBin Section 2.1).

    4 Implementation Results and Comparisons

    The details for the implementation results and comparison to state-of-the-art is given in Section 4.1 and Section 4.2,respectively.

    4.1 Implementation Results

    Our DESIGN-I and DESIGN-II overGF(2163)is implemented in Verilog HDL using the Vivado IDE tool.The implementation results are reported after running the post-place-and-route design process.A Xilinx Virtex-7 (xc7vx690t-3ffg1930) FPGA device[28]is used for logic synthesis and routing purposes.Before presenting our implementation results,we have reported area utilization of building blocks of DESIGN-I and DESIGN-II in Section 4.1.1.The details for implementation results of our DESIGN-I and DESIGN-II are described in Section 4.1.2.The area comparison of DESIGNI and DESIGN-II with the sum of Slices and LUTs (look-up-tables) of building blocks are given in Section 4.1.3.Finally,the comparison between DESIGN-I and DESIGN-II is presented in Section 4.1.4.

    4.1.1 Area Utilization of Building Blocks of DESIGN-I and DESIGN-II

    The area breakdown for building blocks in DESIGN-I and DESIGN-II are shown in Tab.1.Column one provides the name of the building-blocks of our proposed design.From columns two to four,the area utilization is presented in terms of FPGA LUTs,Slices and FFs(flip-flops).

    Table 1:Area utilizations of building blocks of our proposed DESIGN-I and DESIGN-II over GF(2163)

    As shown in Tab.1,the critical building blocks of DESIGN-I and DESIGN-II over Virtex-7 FPGA utilizes 77,38,4429 and 652 FPGA Slices for ADDER,SQUARE(including NIST Reduction),MULT (including NIST Reduction) and register array.Similarly,for identical building blocks,the LUTs utilization is 82,39,10531 and 1956.The sum of resources for building blocks is 5196(Slices),12608 (LUTs) and 1304 (FFs).Tab.1 reveals that the MULT unit (even with simple shift and add architecture) utilizes more hardware resources as compared to other building blocks.Moreover,the implementation results show that the modular square operation being much simpler in theGF(2m)field than with the array of bitwise Exclusive(OR)gates(in our implemented ADDER unit).

    Concerning the sum of resources of building blocks of our DESIGN-I and DESIGN-II,apart from the routing multiplexers,the reported Slices (5196) and LUTs (12608) values in Tab.1 are still enough in large.Many area-constrained applications such as smart cards,IoT,automotive and several others demand lower hardware resources.In this regard,we have set design goals from balanced to area reduction in the Vivado IDE tool to minimize the hardware resources of our DESIGN-I and DESIGNII implementations.Therefore,a description of the results is provided in the following section.

    4.1.2 Implementation Results for DESIGN-I and DESIGN-II

    The detailed implementation results of our proposed DESIGN-I and DESIGN-II are shown in Tab.2.Column one provides the reference design.The area information in Slices,LUTS and FFs is given in columns two,three and four.Columns five and six provide the total clock cycles and operational clock frequency (inMHz),respectively.Latency is the time for computation of one required cryptographic operation and therefore,the latency (inμs) for one public key and a shared secret generation is given in columns seven to eight,respectively.Finally,the last column presents the total power consumption of our proposed DESIGN-I and DESIGN-II.Note that the reported values for area and clock frequency in Tab.2 are obtained from the Vivado IDE tool.The total clock cycles are calculated from Eqs.(2) and (3).For both DESIGN-I and DESIGN-II,the latency for one PM computation is calculated using Eq.(4).Similarly,we have used the same Eq.(4)or the calculation of ECDH latency of our DESIGN-II.We clarify that our DESIGN-I does not support ECDH operation,hence its latency is denoted by‘-’in Tab.2.

    Table 2:Implementation results for DESIGN-I and DESIGN-II after post-place-and-route over GF(2163)on Xilinx Virtex-7(xc7vx690t-3ffg1930)FPGA device

    As shown in Tab.2,the utilized hardware resources of DESIGN-I are 3983(Slices),10639(LUTs)and 2389 (FFs).Similarly,the utilized resources of DESIGN-II are 4037 (Slices),11713 (LUTs)and 2427 (FFs).The achieved clock frequency for DESIGN-I and DESIGN-II are 296MHzand 280MHz,respectively.The required clock cycle(for DESIGN-I)to generate one public key is 163902.Similarly,the clock cycle(for DESIGN-II)to generate one shared secret is 327804.Both DESIGN-I and DESIGN-II take the 553.7μsto generate one public key,while the DESIGN-II requires 1170.7μsto generate one shared key.As shown in the last column of Tab.2,the power consumption of our DESIGN-I and DESIGN-II is 29μWand 57μW,respectively.

    4.1.3 Area Comparison of DESIGN-I and DESIGN-II with the Sum of Slices and LUTs of Building Blocks

    The total sum of Slices and LUTs of building blocks is 1.30 (ratio of 5196 with 3983) and 1.18(ratio of 12608 with 10639)times higher than the Slices and LUTs of our DESIGN-I.Similarly,when the area resources of DESIGN-II are considered for comparison,the total sum of Slices and LUTs of building blocks is 1.28(ratio of 5196 with 4037)and 1.07(ratio of 12608 with 11713)times higher.This indicates that the use of different optimization strategies for synthesis of DESIGN-I and DESIGN-II results in lower resources.The reason is that we set the area optimization as synthesis goal in a Vivado IDE tool.

    4.1.4 Comparison Between DESIGN-I and DESIGN-II

    As far as the utilized area is concerned,DESIGN-I takes only the 3983,10639 and 2389 Slices,LUTs and FFs.The reported FPGA area values in terms of Slices,LUTs and FFs for DESIGN-II are 4037,11713 and 2427 that are comparatively 1.01(ratio of 4037 with 3983),1.10(ratio of 11713 with 10639) and 1.01 (ratio of 2427 with 2389) times higher than our DESIGN-I.The reason for higher hardware resources in DESIGN-II is the use of two additional routing multiplexers (M1andM2)and an ECDH controller to generate the shared key.On the other hand,the DESIGN-I is only responsible for the computation of a public key(or scalar multiplication).The clock cycles requirement of our DESIGN-I and DESIGN-II is 163902 and 327804 for public key and a shared secret generation.Our DESIGN-I can operate on a maximum clock frequency of 296MHzwhile DESIGN-II achieves 280MHz.For DESIGN-I and DESIGN-II,the time(or latency)required to compute one public key is 553.7μs.Time to perform a shared secret with our DESIGN-II is twice the time for a public key as in the ECDH scheme,the PM(or scalar multiplication)is essential to compute two times(one for public key and another for shared secret).On Virtex-7 FPGA,DESIGN-I and DESIGN-II consumes 29μWand 57μWfor generating the public key and a shared secret,respectively.

    4.2 Comparison with State-of-the-Art

    The comparison with the most compatible state-of-the-art architectures,published in[13,16-19,22,24],is given in Tab.3.The column one in Tab.3 provides the reference design.The implemented PM algorithms and key-agreement protocols are presented in column two.The implementation device for logic synthesis is provided in column three.Column four presents the utilized FPGA slices.Finally,the last two columns (five and six) provide the clock frequency (inMHz) and latency (inμs)parameters,respectively.In Tab.3,a representation‘-’means that the relevant information is not given in the reference design.

    Table 3:Comparison with recent state-of-the-art PM and ECDH architectures over GF(2163)

    4.2.1 Comparison of our DESIGN-I with[16]and[19]

    As compared to the dedicated PM architecture of[16]on Virtex-5 FPGA,our DESIGN-I utilizes 1.09(3126 over 3122)times lower FPGA Slices.Moreover,in terms of clock frequency,our DESIGNI is 1.04 (ratio of 301 over 288) times faster than[16].On the other hand,the time required for one PM computation or public key generation in[16]is 22.2(ratio of 544.5 over 24.5)times lower than our DESIGN-I.There is always a tradeoff between area and performance(latency)parameters.On Virtex-7 FPGA,a high-speed and latency optimized architecture of[19]for PM computation utilizes 4150 FPGA Slices which is comparatively 1.04(ratio of 4150 over 3983)times higher than our DESIGN-I.Moreover,their architecture results in higher clock frequency(352MHz)when compared to our design(where we achieved 296MHz).Similar to[16],the PM architecture of[19]requires lower computational time as compared to our DESIGN-I.It is important to note that the main objective of this work was to succeed the minimum hardware resources(FPGA Slices,LUTs and FFs)without concentrating on other design parameters such as clock frequency,power,and latency.

    4.2.2 Design Tradeoffs Between[13,17,18]and our DESIGN-I

    As shown in Tab.3,the dedicated PM architectures (reported in[13,17,18]) over Virtex-5 and Virtex-7 FPGA devices show that there is always a tradeoff between several design parameters when implementing the cryptographic algorithms as a hardware accelerator.In the text that follows,a tradeoff by providing the comparison of dedicated PM designs (published in[13,17,18]) with our notion of a flexible crypto processor (DESIGN-I) is described.As reported earlier in this paper,flexibility determines the selection of ECC parameters according to the user’s needs(readers can turn to multiplexerM3in Fig.2).

    On Virtex-7 FPGA,a 2-stage pipelined PM architecture of[13]utilizes 1.80 (ratio of 3983 over 2207) times lower Slices as compared to our DESIGN-I.The reason for the utilization of lower hardware resources is the use of only one multiplier architecture for both square and polynomial multiplication computations.The square operations are computed by providing similar inputs to the multiplier unit.Additionally,the use of a singular NIST reduction unit also leads to a decrease in hardware resources.Comparatively,we are using a separate MULT and SQUARE unit to operate polynomial multiplications and squaring operations,respectively.In addition,we have two NIST reduction units connected one after each MULT and SQUARE unit in our designs.These architectural differences result in an increase in hardware resources in our work.Due to pipelining in[13],the achieved clock frequency is a bit higher than our crypto processor design.Furthermore,the time to operate one PM computation is lower than our DESIGN-I.In[17],3657 FPGA Slices is used on Virtex-7 FPGA for the PM computation overGF(2163).This is comparatively 1.08(ratio of 3983 over 3657)times lower than our DESIGN-I.The clock frequency of their architecture is not presented.Thus,this comparison is not possible to provide.The calculated latency value for[17]is lower than our proposed DESIGN-I as this is a dedicated PM architecture,where the coordinates of the initial point on ECC was applied as a 163-bit constants(inside the core).In our DESIGN-I(shown in Fig.2),a routing multiplexerM3takes coordinates of an initial point on ECC as input from outside to DESIGNI.Therefore,for each public key generation,users can choose different ECC basepoints according to their needs when initiating a setup for public key generation.

    As compared to the PM architecture of[18]on Virtex-5 FPGA,our DESIGN-I consumes 6.6(ratio of 3126 to 473)times higher FPGA Slices.The reason is the employment of two NIST reduction units,one after in each SQUARE and MULT unit in our architecture.Moreover,a segmented digit serial multiplier architecture is used in[18]while in our design,a shift and add multiplication method is used for polynomial multiplication.Similar to the area(FPGA slices),the dedicated PM architecture of[18]achieves higher clock frequency and requires lower computational time as compared to our DESIGN-I.One of the reasons for this is to consider flexibility when selecting different curve parameters as inputs to our DESIGN-I using routing multiplexersM1,M2andM3.An additional reason is the use of a higher number of input/output interfaces when taking coordinates of a public key as input from outside and producing either public key or shared secret as an output.On the other hand,the dedicated architecture of[18]uses the ECC parameters as constants inside their design.

    Remarks:We are not asserting that our DESIGN-I outperforms (in terms of hardware area) in all state-of-the-art PM implementations.There is always a tradeoff when considering several design parameters concurrently for optimizations.Therefore,based on the comparisons presented above,we have noted that the hardware accelerators of ECC for only optimized for PM computation can result in little unfair measurements when scaling the same PM architecture to any ECC protocol (such as ECDH in this work).Therefore,a hardware architecture with the considerations of all four layers could be more beneficial.

    4.2.3 Comparison of our DESIGN-II(ECDH)with[22]and[24]

    The accelerator architectures of[22,24]for ECDH scheme are presented overGF(2233)while our ECDH design is provided overGF(2163).Although their architectures consider 1.42(ratio of 233 over 163)times higher key length when compared to our DESIGN-II.On the other hand,our DESIGN-II is 2.312(ratio of 289 over 125)and 4.48(ratio of 280 over 62.5)times faster as compared to[22]and[24]over Virtex-5 and Virtex-7 FPGA devices,respectively.When comparing the hardware resources,our DESIGN-II utilizes 2.99 (ratio of 35102 over 11713) times lower FPGA LUTs as compared to[22].The FPGA Slices used in[24]is 2.23(ratio of 4037 over 1809)times lower than our DESIGN-II.Apart from hardware resources (FPGA Slices and LUTs) and clock frequency,the time to generate one public key and a shared secret are much lower than the ECDH accelerator architectures of[22]and[24],respectively.In detail,a coprocessor architecture of[22]requires 8.88msto generate one shared secret while our DESIGN-II takes only the 1.13ms,which is comparatively 7.85 (ratio of 8.88 over 1.13) times lower.Similarly,the architecture of[24]on Virtex-7 FPGA requires 4.13msto generate one public key while our DESIGN-II takes only the 0.58ms,which is comparatively 7.12(ratio of 4.13 over 0.58)times lower.

    5 Conclusions

    This paper has presented a key-agreement architecture for the ECDH scheme overGF(2163)with the consideration of low hardware resources.Towards hardware resource reduction:(i)a simple shift and add multiplier is used and (ii) a polynomial inversion is computed using the Itoh-Tsujii algorithm with similar hardware resources of squarer and multiplier units.The proposed architecture is implemented in a Verilog HDL using Vivado IDE.The implementation results after the post-placeand-route are given on a Xilinx Virtex-7 FPGA.The utilized Slices overGF(2163)for one shared key generation is 4037.The clock cycles to compute one public key and a shared secret are 163902 and 327804.The time to operate one public key and a shared secret is 553.7msand 1170.7ms,respectively.The consumed power is 29μW(for one public key calculation)and 57μW(for singular shared secret computation).We have concluded that the hardware accelerators of ECC for only optimized for PM computation can result in little unfair measurements when scaling same architectures to any ECC protocol(such as ECDH in this work).Subsequently,a hardware architecture with the considerations of all four layers could be more beneficial to achieve better performance(s).

    Acknowledgement:We extend our gratitude to the Estonian Aviation Academy,Tartu,Estonia for supporting and funding this research work.

    Funding Statement:We acknowledge the support of Deanship of Scientific Research at King Khalid University for funding this work under grant number R.G.P.1/399/42.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    亚洲欧美精品自产自拍| 两个人看的免费小视频| 男男h啪啪无遮挡| 热re99久久精品国产66热6| 少妇精品久久久久久久| av.在线天堂| 国产野战对白在线观看| 免费大片黄手机在线观看| 日韩av不卡免费在线播放| 亚洲成国产人片在线观看| 91精品伊人久久大香线蕉| 国产精品国产三级国产专区5o| 亚洲精品国产av成人精品| 久久精品久久久久久久性| 久久婷婷青草| 大话2 男鬼变身卡| 国产国语露脸激情在线看| 天堂8中文在线网| 国产精品免费视频内射| 美女中出高潮动态图| 五月开心婷婷网| av网站免费在线观看视频| 国产女主播在线喷水免费视频网站| 在线观看一区二区三区激情| 免费看av在线观看网站| 青春草国产在线视频| 久久国内精品自在自线图片| 日日爽夜夜爽网站| 男女高潮啪啪啪动态图| 日本猛色少妇xxxxx猛交久久| 在线观看免费日韩欧美大片| 黄片无遮挡物在线观看| 亚洲天堂av无毛| 亚洲国产av新网站| 国产黄频视频在线观看| 亚洲激情五月婷婷啪啪| av福利片在线| 一级片'在线观看视频| 午夜免费鲁丝| 91精品国产国语对白视频| 毛片一级片免费看久久久久| 好男人视频免费观看在线| 日韩中文字幕欧美一区二区 | 亚洲精品国产av蜜桃| 亚洲三区欧美一区| 午夜精品国产一区二区电影| 黄网站色视频无遮挡免费观看| 在线 av 中文字幕| 国产精品熟女久久久久浪| 亚洲国产最新在线播放| 亚洲av免费高清在线观看| 最近的中文字幕免费完整| 久久久a久久爽久久v久久| a级片在线免费高清观看视频| 免费观看在线日韩| 桃花免费在线播放| 五月天丁香电影| 亚洲第一av免费看| 男女啪啪激烈高潮av片| 欧美精品av麻豆av| 亚洲国产欧美日韩在线播放| 在线天堂中文资源库| 日本色播在线视频| 亚洲国产欧美网| 国产欧美日韩一区二区三区在线| 尾随美女入室| 婷婷色综合大香蕉| 国精品久久久久久国模美| 999精品在线视频| 人妻一区二区av| 成人手机av| 久久综合国产亚洲精品| 欧美亚洲日本最大视频资源| 成人亚洲精品一区在线观看| 一本—道久久a久久精品蜜桃钙片| 国产精品女同一区二区软件| 妹子高潮喷水视频| videosex国产| 爱豆传媒免费全集在线观看| 最近2019中文字幕mv第一页| 亚洲,欧美精品.| 亚洲精品,欧美精品| 美女xxoo啪啪120秒动态图| 国产亚洲一区二区精品| 夫妻性生交免费视频一级片| 日韩制服骚丝袜av| 久久久久网色| 久久久精品免费免费高清| 成人毛片a级毛片在线播放| 丰满乱子伦码专区| 亚洲欧美精品综合一区二区三区 | 另类亚洲欧美激情| 日韩一区二区视频免费看| 欧美+日韩+精品| 欧美人与善性xxx| 国产精品二区激情视频| 久久久久国产网址| 人妻一区二区av| 欧美激情高清一区二区三区 | 中文字幕制服av| 纯流量卡能插随身wifi吗| 99热全是精品| 午夜老司机福利剧场| 久久精品久久久久久久性| 少妇被粗大的猛进出69影院| 成人毛片a级毛片在线播放| 亚洲一级一片aⅴ在线观看| 久久久久久久久免费视频了| 男人爽女人下面视频在线观看| a级毛片黄视频| 少妇人妻精品综合一区二区| 午夜福利视频在线观看免费| 伊人久久国产一区二区| 18禁观看日本| 精品99又大又爽又粗少妇毛片| 老女人水多毛片| 制服丝袜香蕉在线| 美女主播在线视频| 黄色毛片三级朝国网站| 欧美97在线视频| 伊人亚洲综合成人网| 成人漫画全彩无遮挡| 亚洲av综合色区一区| www.av在线官网国产| 少妇被粗大猛烈的视频| 丰满少妇做爰视频| 久久精品熟女亚洲av麻豆精品| 国产成人精品久久二区二区91 | 成人国语在线视频| 水蜜桃什么品种好| 欧美日本中文国产一区发布| 亚洲一级一片aⅴ在线观看| 男人爽女人下面视频在线观看| 一区二区日韩欧美中文字幕| 欧美日韩视频精品一区| 亚洲成人手机| 天天影视国产精品| 母亲3免费完整高清在线观看 | 免费少妇av软件| 精品福利永久在线观看| xxxhd国产人妻xxx| 涩涩av久久男人的天堂| 免费人妻精品一区二区三区视频| 高清不卡的av网站| 97在线人人人人妻| 这个男人来自地球电影免费观看 | 国产老妇伦熟女老妇高清| 国产不卡av网站在线观看| 在线观看免费日韩欧美大片| 国产精品久久久久成人av| 巨乳人妻的诱惑在线观看| 日韩av在线免费看完整版不卡| www.精华液| 下体分泌物呈黄色| 亚洲精品久久成人aⅴ小说| 街头女战士在线观看网站| 美女xxoo啪啪120秒动态图| 中文字幕亚洲精品专区| 成年美女黄网站色视频大全免费| 美女国产视频在线观看| 青春草国产在线视频| 日本黄色日本黄色录像| 伊人亚洲综合成人网| 一级,二级,三级黄色视频| 国产极品天堂在线| 日本vs欧美在线观看视频| 丰满乱子伦码专区| av国产久精品久网站免费入址| 亚洲欧洲日产国产| 久久国产亚洲av麻豆专区| 高清欧美精品videossex| 麻豆精品久久久久久蜜桃| 国产亚洲av片在线观看秒播厂| 精品人妻在线不人妻| 交换朋友夫妻互换小说| 91精品国产国语对白视频| 免费少妇av软件| 91国产中文字幕| 亚洲国产最新在线播放| 99国产精品免费福利视频| 国产麻豆69| 777久久人妻少妇嫩草av网站| 欧美激情 高清一区二区三区| 中文字幕人妻熟女乱码| 又黄又粗又硬又大视频| 午夜免费男女啪啪视频观看| 极品人妻少妇av视频| 国产成人aa在线观看| 国产一区二区三区综合在线观看| 亚洲欧洲国产日韩| 侵犯人妻中文字幕一二三四区| kizo精华| 欧美日韩精品成人综合77777| 天堂8中文在线网| 丁香六月天网| 亚洲天堂av无毛| 亚洲四区av| 国产不卡av网站在线观看| 人人妻人人澡人人看| 97人妻天天添夜夜摸| 中文字幕人妻丝袜一区二区 | 中文精品一卡2卡3卡4更新| a级毛片黄视频| 老汉色av国产亚洲站长工具| 有码 亚洲区| 欧美日韩亚洲高清精品| 汤姆久久久久久久影院中文字幕| 国产1区2区3区精品| 国产精品麻豆人妻色哟哟久久| 伊人亚洲综合成人网| 精品第一国产精品| 久久国产精品男人的天堂亚洲| 精品人妻偷拍中文字幕| 在线观看免费高清a一片| 国产日韩欧美在线精品| 99热国产这里只有精品6| 女性被躁到高潮视频| 欧美黄色片欧美黄色片| √禁漫天堂资源中文www| 国产欧美日韩一区二区三区在线| 天天躁夜夜躁狠狠久久av| 在线观看免费高清a一片| 久久精品aⅴ一区二区三区四区 | 青春草国产在线视频| 叶爱在线成人免费视频播放| 亚洲欧洲精品一区二区精品久久久 | 亚洲综合色惰| 性色avwww在线观看| 国产无遮挡羞羞视频在线观看| 99久久人妻综合| 久久精品熟女亚洲av麻豆精品| 欧美+日韩+精品| 国产在线免费精品| 亚洲av日韩在线播放| 99热国产这里只有精品6| 欧美国产精品一级二级三级| 国产精品三级大全| 精品少妇久久久久久888优播| 日日摸夜夜添夜夜爱| 女性被躁到高潮视频| 丝袜在线中文字幕| 久久亚洲国产成人精品v| 精品国产乱码久久久久久男人| 亚洲一区二区三区欧美精品| 天美传媒精品一区二区| 日日啪夜夜爽| 国产一区二区三区综合在线观看| 欧美xxⅹ黑人| 最近最新中文字幕免费大全7| 9色porny在线观看| 晚上一个人看的免费电影| 久久av网站| 免费在线观看黄色视频的| 婷婷色综合www| 国产日韩欧美在线精品| 母亲3免费完整高清在线观看 | 人人澡人人妻人| 一级黄片播放器| 免费看不卡的av| 18在线观看网站| 黑人猛操日本美女一级片| 最近的中文字幕免费完整| av视频免费观看在线观看| 2022亚洲国产成人精品| 国产成人精品婷婷| 大片免费播放器 马上看| 亚洲伊人色综图| 日韩熟女老妇一区二区性免费视频| 亚洲精品日韩在线中文字幕| 成人国语在线视频| 亚洲欧美一区二区三区久久| 亚洲国产精品国产精品| 永久免费av网站大全| 亚洲国产欧美网| 亚洲第一青青草原| 日韩中文字幕视频在线看片| 中文字幕亚洲精品专区| 一级,二级,三级黄色视频| 嫩草影院入口| 国产精品不卡视频一区二区| 久久久国产一区二区| 97在线人人人人妻| 极品人妻少妇av视频| 欧美亚洲日本最大视频资源| 青草久久国产| 韩国高清视频一区二区三区| 日韩av在线免费看完整版不卡| 久久精品国产综合久久久| 91精品三级在线观看| 国产激情久久老熟女| 日韩制服丝袜自拍偷拍| 制服诱惑二区| 亚洲三区欧美一区| 精品人妻偷拍中文字幕| 日本欧美国产在线视频| 最近中文字幕2019免费版| 黄色怎么调成土黄色| 久久久精品免费免费高清| 精品一区二区三区四区五区乱码 | 日韩不卡一区二区三区视频在线| 少妇 在线观看| 久久人妻熟女aⅴ| 亚洲国产欧美网| 久久鲁丝午夜福利片| √禁漫天堂资源中文www| 久久久国产欧美日韩av| 久久99一区二区三区| 最近最新中文字幕免费大全7| 有码 亚洲区| 国产精品一二三区在线看| 秋霞伦理黄片| 91午夜精品亚洲一区二区三区| 久久久久久久久久久久大奶| 亚洲伊人久久精品综合| 天堂8中文在线网| 久久人人97超碰香蕉20202| 久久精品人人爽人人爽视色| 成人国产av品久久久| 欧美日韩成人在线一区二区| 国精品久久久久久国模美| 国产精品麻豆人妻色哟哟久久| 亚洲av电影在线进入| 久久久久视频综合| 97人妻天天添夜夜摸| 在线精品无人区一区二区三| 亚洲综合色惰| 下体分泌物呈黄色| 18+在线观看网站| av免费观看日本| 国产精品一二三区在线看| 亚洲激情五月婷婷啪啪| 精品少妇一区二区三区视频日本电影 | 成年女人在线观看亚洲视频| 大码成人一级视频| 一级爰片在线观看| 又粗又硬又长又爽又黄的视频| 在线亚洲精品国产二区图片欧美| 免费高清在线观看视频在线观看| 欧美精品亚洲一区二区| 边亲边吃奶的免费视频| 人妻少妇偷人精品九色| 男女边摸边吃奶| 9热在线视频观看99| 黄频高清免费视频| 中文乱码字字幕精品一区二区三区| 在线天堂中文资源库| 日韩成人av中文字幕在线观看| 国产精品香港三级国产av潘金莲 | 日韩伦理黄色片| 九色亚洲精品在线播放| 这个男人来自地球电影免费观看 | 天天影视国产精品| 大片电影免费在线观看免费| 国产成人精品一,二区| 久久精品久久久久久噜噜老黄| 在线免费观看不下载黄p国产| 国产又色又爽无遮挡免| 国产精品亚洲av一区麻豆 | 国产精品久久久久久av不卡| 欧美97在线视频| 香蕉丝袜av| 精品少妇久久久久久888优播| 男人爽女人下面视频在线观看| 成人午夜精彩视频在线观看| 国产麻豆69| 一级a爱视频在线免费观看| 久久久久久久久久久免费av| 国产精品三级大全| 高清在线视频一区二区三区| 日韩一本色道免费dvd| 91精品伊人久久大香线蕉| 赤兔流量卡办理| 亚洲av.av天堂| 天堂俺去俺来也www色官网| 九草在线视频观看| 在线观看三级黄色| 成年av动漫网址| 香蕉国产在线看| 亚洲欧美日韩另类电影网站| 精品国产露脸久久av麻豆| 久久久久人妻精品一区果冻| 久久精品国产a三级三级三级| 两个人免费观看高清视频| 婷婷成人精品国产| 看十八女毛片水多多多| 国产熟女午夜一区二区三区| 日韩三级伦理在线观看| 中文字幕av电影在线播放| av在线老鸭窝| 亚洲精品日韩在线中文字幕| 中国国产av一级| 999精品在线视频| 日本-黄色视频高清免费观看| 亚洲精品,欧美精品| 91在线精品国自产拍蜜月| 午夜精品国产一区二区电影| 一级毛片黄色毛片免费观看视频| 男女高潮啪啪啪动态图| 老汉色∧v一级毛片| 岛国毛片在线播放| 亚洲综合色惰| 香蕉精品网在线| 大话2 男鬼变身卡| 看免费av毛片| 精品国产一区二区三区四区第35| 国产女主播在线喷水免费视频网站| 观看美女的网站| 成年动漫av网址| 亚洲色图 男人天堂 中文字幕| 亚洲精品av麻豆狂野| 国产亚洲av片在线观看秒播厂| 久久精品亚洲av国产电影网| 午夜老司机福利剧场| 热99久久久久精品小说推荐| 少妇被粗大猛烈的视频| 亚洲成人手机| 亚洲,欧美,日韩| 中文精品一卡2卡3卡4更新| 九九爱精品视频在线观看| 91午夜精品亚洲一区二区三区| 国精品久久久久久国模美| h视频一区二区三区| 日本vs欧美在线观看视频| 免费看av在线观看网站| 免费观看在线日韩| 啦啦啦视频在线资源免费观看| 日韩视频在线欧美| 国产精品秋霞免费鲁丝片| 麻豆精品久久久久久蜜桃| 久久99一区二区三区| 久久午夜福利片| 伦精品一区二区三区| 各种免费的搞黄视频| 久久久久久久国产电影| 国产成人91sexporn| 午夜福利视频在线观看免费| 熟女少妇亚洲综合色aaa.| 蜜桃在线观看..| 免费观看性生交大片5| 国产白丝娇喘喷水9色精品| 久久午夜福利片| 久久精品国产亚洲av高清一级| 久久久久网色| 在线观看免费日韩欧美大片| 韩国精品一区二区三区| 国产成人精品婷婷| 深夜精品福利| 99久久综合免费| 中文字幕另类日韩欧美亚洲嫩草| 亚洲av免费高清在线观看| 男女边摸边吃奶| 老汉色∧v一级毛片| 亚洲国产av影院在线观看| 波多野结衣av一区二区av| 一边亲一边摸免费视频| 18禁国产床啪视频网站| 亚洲成人手机| 天天操日日干夜夜撸| 国产黄色视频一区二区在线观看| 男女免费视频国产| 秋霞在线观看毛片| 最新中文字幕久久久久| 欧美在线黄色| 一本一本久久a久久精品综合妖精 国产伦在线观看视频一区 | 丰满乱子伦码专区| 日韩在线高清观看一区二区三区| 亚洲伊人久久精品综合| 久久国产亚洲av麻豆专区| 亚洲在久久综合| 日韩制服骚丝袜av| 亚洲欧美一区二区三区国产| 色网站视频免费| 在线观看美女被高潮喷水网站| 中文字幕制服av| 男女边摸边吃奶| 日日摸夜夜添夜夜爱| 高清在线视频一区二区三区| 在线观看免费日韩欧美大片| 成年人午夜在线观看视频| www.av在线官网国产| 久久99热这里只频精品6学生| 久久狼人影院| xxx大片免费视频| 女人高潮潮喷娇喘18禁视频| 26uuu在线亚洲综合色| 亚洲av综合色区一区| 日产精品乱码卡一卡2卡三| 少妇人妻精品综合一区二区| 日韩一本色道免费dvd| 人妻少妇偷人精品九色| 久久精品国产亚洲av高清一级| 91在线精品国自产拍蜜月| 伦精品一区二区三区| 国产日韩一区二区三区精品不卡| 久久久久久久大尺度免费视频| 日韩一本色道免费dvd| 亚洲国产精品成人久久小说| 亚洲一码二码三码区别大吗| 免费在线观看视频国产中文字幕亚洲 | 丰满乱子伦码专区| 看十八女毛片水多多多| 下体分泌物呈黄色| 日韩伦理黄色片| 亚洲国产看品久久| 亚洲,一卡二卡三卡| 香蕉丝袜av| 人妻系列 视频| 菩萨蛮人人尽说江南好唐韦庄| 极品少妇高潮喷水抽搐| √禁漫天堂资源中文www| 亚洲成av片中文字幕在线观看 | 国产伦理片在线播放av一区| 啦啦啦视频在线资源免费观看| 精品国产乱码久久久久久男人| 黄色视频在线播放观看不卡| 亚洲熟女精品中文字幕| 国产野战对白在线观看| 精品第一国产精品| 日本免费在线观看一区| 妹子高潮喷水视频| 大码成人一级视频| 久久鲁丝午夜福利片| 欧美国产精品va在线观看不卡| 亚洲精品aⅴ在线观看| 亚洲精品在线美女| 日韩三级伦理在线观看| 亚洲成av片中文字幕在线观看 | 在线天堂中文资源库| 色婷婷久久久亚洲欧美| 最黄视频免费看| 国产精品香港三级国产av潘金莲 | 国产精品欧美亚洲77777| 在线精品无人区一区二区三| 老司机亚洲免费影院| 中文欧美无线码| 一个人免费看片子| 9色porny在线观看| 日本-黄色视频高清免费观看| 一级a爱视频在线免费观看| 国产精品偷伦视频观看了| 极品少妇高潮喷水抽搐| 久久人人爽av亚洲精品天堂| 美女脱内裤让男人舔精品视频| 国产精品99久久99久久久不卡 | 亚洲,一卡二卡三卡| 边亲边吃奶的免费视频| 熟妇人妻不卡中文字幕| 国产激情久久老熟女| 久久久久久久久久久免费av| 精品久久久精品久久久| 亚洲av国产av综合av卡| 久久精品国产a三级三级三级| 少妇的丰满在线观看| 黄片小视频在线播放| 亚洲国产欧美网| 超色免费av| 亚洲伊人色综图| 亚洲国产精品一区三区| 亚洲欧美精品自产自拍| 亚洲四区av| 亚洲av国产av综合av卡| 韩国av在线不卡| 久久久亚洲精品成人影院| 久久韩国三级中文字幕| 亚洲人成77777在线视频| 国精品久久久久久国模美| 日韩在线高清观看一区二区三区| 一本色道久久久久久精品综合| 韩国高清视频一区二区三区| 汤姆久久久久久久影院中文字幕| 热99国产精品久久久久久7| 国产av精品麻豆| 国产精品久久久久久av不卡| 国产精品一区二区在线观看99| 国产精品欧美亚洲77777| 久久婷婷青草| 国产成人精品婷婷| 国产精品久久久久久久久免| 2022亚洲国产成人精品| 亚洲av综合色区一区| 国产成人精品久久久久久| 又黄又粗又硬又大视频| 亚洲欧美精品自产自拍| 亚洲欧美一区二区三区国产| 免费人妻精品一区二区三区视频| 国产 精品1| 2022亚洲国产成人精品| 18禁动态无遮挡网站| 中国三级夫妇交换| 日韩一区二区视频免费看| 黑人欧美特级aaaaaa片| 日韩熟女老妇一区二区性免费视频| 国产免费视频播放在线视频| 免费女性裸体啪啪无遮挡网站| 欧美国产精品va在线观看不卡| 男女下面插进去视频免费观看| 亚洲综合精品二区| 男女免费视频国产| 欧美日韩精品网址| 国产熟女午夜一区二区三区| 一级爰片在线观看| 黑人欧美特级aaaaaa片| 熟女电影av网| 欧美日韩av久久| 精品人妻偷拍中文字幕| 国产精品久久久久久精品古装| 哪个播放器可以免费观看大片| 啦啦啦啦在线视频资源| av福利片在线| 97在线人人人人妻| 国产精品三级大全| 国产一区二区三区综合在线观看| 美女福利国产在线| 天天操日日干夜夜撸| 蜜桃国产av成人99|