OregonStateUniversity,USA,范德堡大学美国排名多少

美国 4
AnAlgorithmandHardwareArchitectureforIntegratedModularDivisionandMultiplicationinGF(p)andGF(2n) Lo’aiA.TawalbehandAlexandreF.TencaSchoolofElectricalEngineeringandComputerScience, OregonStateUniversity,USAemail:tawalbeh,tenca@ece.orst.edu Abstract ThispaperpresentsanalgorithmandarchitecturethatintegratesmodulardivisionandmultiplicationinbothGF(p)andGF(2n)fields(Unified).ThealgorithmisbasedontheExtendedBinaryGCDalgorithmformodulardivisionandontheMontgomery’smethodformodularmultiplication.Forthedivisionoperation,theproposedalgorithmusesacountertokeeptrackofthedifferencebetweentwofieldelementsandthiswayeliminatetheneedparisonswhichareusuallyexpensiveandtime-consuming.Theproposedarchitectureefficientlysupportsalltheoperationsinthealgorithmandusescarry-saveunifiedaddersforreducedcriticalpathdelay,makingtheproposedarchitecturefasterthanotherpreviouslyproposeddesigns.ExperimentalresultsusingsynthesisforAMI0.5µmCMOStechnologyareshownparedwithotherdividersandmultipliers.
1.Introduction Giventheincreasingdemandformunications,cryptographicalgorithmswillbeembeddedinalmosteveryapplicationinvolvingexchangeofinformation. Modulararithmeticoperationssuchasdivisionandmultiplicationoverfinitefields(mostlyGF(p)andGF(2n)),arewidelyusedinseveralcryptographicapplications.Ingeneral,public-keycryptographicalgorithmsrequiresahugeamountputations,therefore,thereisanincreasingdemandfordedicatedhardwaretoelerateputations.AnalgorithmandhardwareimplementationthatisableputemodulardivisionandmultiplicationinbothGF(p)andGF(2n)isdefinitelyadvantageous. Themodularmultiplicationoperationwithalargemodulusisveryimportantinmanypublic-keycryptosystems.Montgomerymultiplicationalgorithm[11]isanefficientwayputemodularmultiplication.Authorsin[5]proposedaunifiedalgorithmputeMontgomerymultiplicationinbothGF(p)andGF(2n).Ontheotherhand,modulardivisionandmodularinversearetimeconsumingoperationsandtheycannotbepletelyinpracticalapplications.Forinstance,theyareusedindecryptionofElGamal[7]public-keycryptosystem.Researchershaveclaimedthatagaininperformancecanbeobtainedwhentheyareimplementedinhardware[1,6]. TheUnifiedmodularDivision/MultiplicationAlgorithm(UDMA)proposedinthisputesmodulardivisionandMontgomerymultiplicationinbothGF(p)andGF(2n)fields.TheUDMAisbasedontheExtendedBinaryGCDalgorithm[2,12]for modulardivisionandontheMontgomery’smethodformodularmultiplication.Forthedivisionoperation,theproposedalgorithmusesacountertokeeptrackofthedifferencebetweenintegersinGF(p)orpolynomialdegreesinGF(2n),reducingplexityofeachalgorithmiterationbyeliminatingtheneedparisonsoffieldelements.Othermodulardivision(inverse)algorithmshaveintegerandpolynomialparisons[6,8,9]aspartoftheircontrolflow. Thehardwarearchitecturefortheunifieddivider/multiplierthatimplementstheUDMAefficientlysupportsallputationsinthealgorithmandusesCarrySaverepresentationsothatalladditionsandsubtractionsaredoneinconstanttime,independentoftheoperandsprecision. ThefollowingSectionpresentssomegeneralconceptsandmathematicalnotationusedinthispaper.TheUnifiedmodularDivision/MultiplicationAlgorithm(UDMA)isproposedinSection3.TheanizationofthehardwaredesignthatimplementstheUDMAanditsmainfunctionalblocksareshownanddescribedindetailinSection4.ExperimentalresultsareshowninSection5,followedbyconclusionsinSection6.
2.GeneralConceptsandNotation TheelementY(x)∈GF(2n)isapolynomialofdegreelessthannandhascoefficients inGF
(2).Inthispaper,weconsiderpolynomialbasisfortheelementsofGF(2n).On theotherhand,theelementY∈GF(p)isanintegerintheset{1,...,p−1},wherepisa n-bitprimemodulusintherange2n−1X,
Y, p and Field. The computation
Z = XY mod p is performed when Field = GF(p), and Z(x) = X(x)Y(x) modp(x)isdonewhenField=GF(2n). In bothcases,Y=
0.ThemodularinversecanbeobtainedbyforcingX=
1.Noticethat thedivisionorinversetakesthesameamountoftime. •UnifiedMontgomeryModularMultiplicationAlgorithm:Montgomeryintroducedan efficientalgorithmtocalculatemodularmultiplication[11].AMontgomeryalgorithm tohandlemultiplicationinGF(p)andGF(2n)finitefieldswasintroducedin[5].The Montgomerymultiplicationalgorithmgeneratestheproductoftwon-bitintegersXand Yinmodulopas:MM(
X,Y)=XYr−1modp,wherer=2nfor2n−13.UnifiedAlgorithmforModularDivisionandMontgomeryModularMultiplication. TheUnifiedmodularDivision/MultiplicationAlgorithm(UDMA)isstatedasfollows: Function:ModularDivisionandMultiplicationinGF(p)andGF(2n) Inputs:0≤XZ =XY2−n mod p when Op=mult,
Z = XY mod p when Op=div. Algorithm: C=
Y. IFOp=multTHEN /*MultiplicationMode*/ D=
0,U=
0,W=
X,δ=n ELSE /*DivisionMode*/ D=p,U=
X,W=
0,δ=
0 ENDIF; WHILE[(C=0ANDOp=div)OR(δ=0ANDOp=mult)] IFc0=0THEN C:=C1 δ:=δ−
1 /*IntegerOperation*/ ELSE k=
1 IF(Op=div)THEN IFδ<0THENC⇔
D,U⇔
W,δ:=−δENDIF;/*Swapping*/ IF((C+D)mod4=0ANDField=GF(p))THENk=−
1 ELSEδ:=δ−1ENDIF; ELSE /*Op=mult*/ δ:=δ−
1 ENDIF; C:=(C+k∗D)
1,U:=(U+k∗W) ENDIF; U:=(U+u0∗p)
1 ENDWHILE; IFOp=divTHENZ:=
W ELSEZ:=
U ENDIF; Tothebestofourknowledge,thisisthefirstalgorithmthatintegratesputationofmodulardivisionandMontgomerymodularmultiplicationinbothGF(p)andGF(2n).TheUDMAmodeofoperationiscontrolledbyinputOp(divormult),andthefinitefieldiscontrolledbytheinputField(GF(p)orGF(2n)).Forsimplicity,thepolynomialsX(x),Y(x),andp(x)aredenotedasX,Y,andp,respectively,whichcorrespondtothebit-vectorrepresentationofthesepolynomials. Mostofthearithmeticoperationsinthealgorithmmontobothmodesofoperation.Theinitializationofvariablesdependsontheoperation.Foragivenfield,alltheadditions/subtractionsaredoneinthefield,besidesthearithmeticoperationsonδ(subtractionsandchangeofsign)whicharealwaysintegeroperations. ThealgorithmintegratestheExtendedBinaryGCDalgorithmandtheMontgomerymultiplicationalgorithmanditwasverifiedusingMaple.puteMontgomerymultiplicationusingann-bitmodulusp,UDMAperformsniterations.Thecounterδisinitializedwithvaluen,andineachiterationitisdecrementedbyone.Thevariablesusedinthealgorithmareinitializedas:C=
Y,D=
0,U=0,andW=
X.Theresultisready(Z=U)whenδ=
0.ThepartialproductUisreducedmodpineachiteration.Inbothfields,whilemultiplying,additionisusedintheoperationsthatupdateCandD(k=1).Theoperatorindicatesa1-bitrightshiftoperation. TheputesmodulardivisionusingthesamestructureusedbytheExtendedBinaryGCDAlgorithmformodulardivision[2].Thevariablesareinitializedas:C=
Y,D=p,U=
X,W=0,andδ=
0.IfthedivisionputedinGF(p),UDMAteststheleastsignificanttwobitsofCandD((C+D)mod4=0)toconditionallysubtractsCfromD(setk=−1).Otherwise,CisalwaysaddedtoDinbothfields.ThedivisionpletedwhenC=0,andthefinalresultisavailableinW.MoredetailsabouttheoperationoftheUDMAarepresentedinthefollowingSection.
4.OverallHardwareSystemArchitecture Figure1showslevelarchitectureoftheUnifiedModularDivider/Multiplier(callitUMDM)thatimplementstheUDMA.ThemainfunctionalblocksareRegisterFile(RF),Datapath,andControl.Moredetailsabouttheseblocksareprovidednext. Op Field n Control 333 n input(
X,Y,P) Load 2vectors2n indstsrc1src2out1 RegisterFile(RF)
Y (2vectors) 2nA out2(2vectors)B 2n Load UMDMDatapath Sum/Carry (2vectors)2n Figure1.Toplevelhardwarearchitectureoftheunifiedmodulardivider/multiplier(UMDM). 4.1.RegisterFile Theregisterfilehasfiveregisters(R1toR5).SinceputationsaredoneinCarry-Saveform,eachintermediatevariable(
C,U,
D,W)isrepresentedintwovectors(sum,carry).So,theregistersinsidetheregisterfilearedesignedtostoretwon-bitvectors.Inotherwords,theithregisterRiisrepresentedasRi=(sum,carry)=(Ris,Ric). Theregisterfilehasoneinput,andtwooutputports.TheControlblockprovidestheregisterfilewiththesignalsnecessarytoperformreading/writingoperations.The3-bitsignaldstdeterminesthedestinationregistertobewritten.Thesignalssrc1,src2(3-bitseach),specifytheregisterstobereadatoutputportsout1,out2,respectively. 4.2.UDMADatapath Then-bitdatapathimplementingtheUDMAisshowninFigure2.Each“whileiteration”ofthealgorithmisimplementedin1clockcycleformultiplicationmode,and in3clockcyclesfordivisionifCisodd,and2clockcyclesifCiseven,asexplainedlater. As Ac Bs Bc LoadYshiftY FSEL a b c NEGCin UnifiedCarry-SaveAdder1withComplement(UCSA1)
Y LSBitofU(u0) complementer Yshifter (c0) control FSEL AND UnifiedCarry-SaveAdder2(UCSA2) Cin result_shiftershshift Sum Carry Nsel_zero Figure2.Unifieddatapathofthemodulardivider/multiplier(UMDMdatapath). Theproposeddatapathhastwoinputsrepresentedincarry-saveformasA=(As,Ac)andB=(Bs,Bc)whichreceivetheirvaluesfromtheregisterfileports(out1,out2),respectively.Theponentsofthedatapatharetwo(3-2)UnifiedCarry-SaveAdders(UCSAs)whicharesimilarplexitytofull-adders[5].TheunifiedadderscanperformbitadditionwithorwithoutcarrydependingontheinputFSEL(FieldSelect). Theunifiedaddermaybeusedtoimplementaredundantornon-redundantadder.Theuseofnon-redundantformoftheoperandsandresultsreducestheregisterareabutincreasestheadditiontime(becauseofcarrypropagation).WedecidedtouseCarrySaveadderstomaketheadditiontimeconstantandindependentoftheoperand’sprecision. 4.2.1.UnifiedCarry-SaveAdders:ThefirstadderinthedatapathisaUnifiedCarry-SaveAdderplement(UCSA1).Figure4.ashowsthebitslicediagramforthisadderandFigure4.bshowstheconnectionofnslicestoformann-bitadder.TheUCSA1outputsare:(sum,carry)=a+b+cwhenNEG=Cin=0,and(sum,carry)=a+b−cwhenNEG=Cin=
1.AdditionandsubtractioninGF(2n)arethesame. NEGcbaFSEL (a)bitslice sumcarry an-1-
1 FSEL a1b1c1 FSEL a0 FSEL b0c0 NEG UCSA11-bit UCSA11-bit carryn sumn-1carryn-
1 sum1carry1 (b)n-bitadder UCSA11-bit Cinsum0carry0 Figure3.Unifiedcarry-saveadderplement(UCSA1)for1-bitandn-bitprecision. AswecanseefromFigure3,thechangeofsignoperationinsidetheUCSA1isdoneinparallelwiththeadditionoperation,andso,itdoesnotaddtothecriticalpath. TheANDgateshownatofUCSA2inFigure2isusedtoselectbetweenthevalue0orthemoduluspdependingwetherUisevenorodd(thisisdeterminedbytestingu0),respectively.TheinputsignalselzerowhenassertedforcesthethirdinputoftheUCSA2tozero. ThedelayoftheUMDMdatapath(tdatapath)isdeterminedbythedelayofthetwoUCSAs,thedelayoftheANDgatebetweenthem,andthedelayoftheresultshifterwhichhasadelayof2-inputmultiplexer(tMUXtXOR).ByintegratingtheANDgatewiththesecondadder(shownindashedboxinFigure2),itsdelaywillnotaddtothepathdelay.KnowingthateachUCSAhasadelayofafulladder(tFA=2tXOR),weget: tdatapath=tUSCA1+tUCSA2+tresultshifter=4tXOR+tMUX=5tXOR TheYshiftershowninFigure2isashiftregisterusedtoimplementtheoperation(C>>1)onlyinthemultiplicationmode.ItisloadedwiththeinputY(multiplier)whenlaodY=
1.WhenshiftY=
1,Cisshiftedrightby1-bit(rememberthatC=Y).TheleastsignificantbitoftheshiftedCgoestothecontrolsectiontobetested(c0=0). Thedatapathoutputs(Sum,Carry)areshiftedright1-bitbycorrectwiringusingtheresultshifterattheoutputoftheUCSA2.Thismoduleisimplementedastwo2-inputmultiplexerswithselectlineshift.Whenshift=1,theshiftedoutputsareselected. 4.3.ControlBlock ThissectiondescribestheControlblockandthesystemoperationduringdivisionandmultiplicationmodesandthedesigntechniquesusedtoimplementtheproposedUMDM. WhenLoad=1,theregistersinsidetheregisterfileareinitialized(bytheuser)withtheinputsX,Y,andpdependingontheoperationtobeperformedbythealgorithm. Whilethealgorithmisindivisionmode,thetestC=0mustbedone.ThevectorCisinCarrySaveform.SimulationresultsshowthatC=4or−4attheendputation.Ifmoreiterationsthanrequiredareperformed,theresultisstillcorrect.IntheproposedUMDMdesign,thetestC=0isreplacedbytestingthebitvectorCforspecificvalues(4or-4),andnoneedpareallthebitsofCwithzero. 4.3.1.MultiplicationMode:Theproposedmultiplier/dividerperforms1iterationofthealgorithmineachclockcycleputingMontgomerymultiplicationinbothfields.Onlythreeregistersareused.Table1showstheloadingphaseformultiplicationanddivision(initializationoftheintermediatevariables)asdescribedbythealgorithm. Table2showstheoperationoftheunifieddivider/mulitplierwhenperformingMontgomerymultiplicationinGF(p)orGF(2n).Itshowsthemaincontrolsignalsappliedtotheregisterfileandthedatapathputemultiplication.ThevaluesofthesignalswhicharenotmentionedintheTableareequaltozero.DependingonC(evenorodd),adifferentsetofsignalsareused.TheTablealsoshowstheinterpretationofthesesignalsonthedatapath,andthecorrespondingoperationintheUDMA.Whenc0=1,twoadditions(U:=(U+k∗W)andU:=(U+u0∗p)>>1)areperformedbythedatapath. ThevariableCisshiftedrightonebitusingtheYshifterineverycycle(shiftY←1).Thecounterδisinitializedwithntoindicatethenumberofiterations.Sinceδisdecre- Table1.Loadphaseformultiplicationanddivision Multiplication Division Variables/parametersLoadingtheRFrepresentedbyRR←(Rs,Rc) Variables/parametersLoadingtheRFrepresentedbyRR←(Rs,Rc) Wandp R1←(X,p)
W R1←(X,0)
U R2←(0,0)
U R2←(0,0) p R3←(0,p)
D R3←(p,0) Variable LoadingtheShifter
C R4←(Y,0)
C YShifter←(Y) p R5←(0,p) Table2.TheoperationoftheUMDMduringMontgomerymultiplication ControlSignals c0=0(then)DatapathInterpretation ControlSignals c0=1(else)DatapathInterpretation src1←(010)src2←(011)dst←(010)shiftY←1shift←1decdelta←
1 (As,Ac)←(Us,Uc)(Bs,Bc)←(0,p)storetheresultinUC>>1theresult>>1δ=δ−
1 src1←(010)src2←(001)dst←(010)shiftY←1shift←1decdelta←
1 (As,Ac)←(Us,Uc)(Bs,Bc)←(X,p)storetheresultinUC>>1theresult>>1δ=δ−
1 Computes: U:=(U+u0∗p)>>1U:=(U+k∗W),U:=(U+u0∗p)>>
1 mentedbyoneineachiteration(decdelta=1),putationendswhenδ=0,andtheresultcanbereadformR2. 4.3.2.DivisionMode:EachmodulardivisioniterationintheproposedUMDMarchitecturerequires2clockcyclesifCisevenand3clockcyclesifCisoddinbothfields.TheinitializationofvariableswasshowninTable1. LetusassumethatthealgorithmputingmodulardivisioninGF(p).Tables3and4showthecontrolsignalsduringdivisionwhenCisevenandodd,respectively.Table3showswhenδispositive,theexpressionU:=(U+u0∗p)>>1putedinthefirstclockcycle.Cisshiftedrightby1-bit(C>>1)inthesecondcycleinordertoenablethetestforc0=0whichdecidesonthecontrolsignalsofthenextiteration. Table3.TheUMDMoperationputingdivisionwhenc0=
0 c0=0(then) State1:Original State2:Swapped ClockControl Datapath Control Datapath CycleSignals Interpretation Signals Interpretation cycle1cycle2 src1←(001)src2←(101)dst←(001)shift←1src1←(100)selzero←1dst←(100)shift←1decdelta←
1 (As,Ac)←(Us,Uc)(Bs,Bc)←(0,p)storeresultinR1(U)result>>
1 U:=(U+u0∗p)(As,Ac)←(Cs,Cc)(Bs,Bc)←(0,0)storeresultinR4(C)result>>1δ=δ−
1 C:=(C)>> src1←(010)(As,Ac)← src2←(101)(Bs,Bc)← dst←(010)storeresult shift←
1 result>> >>1puted src1←(011)selzero←1dst←(011)shift←1decdelta←11puted (As,Ac)←(Bs,Bc)←storeresultresult>>δ=δ−
1 (Ws,Wc)(0,p)inR2(W)1(Ds,Dc)(0,0)inR3(D)
1 ClockCyclecycle1cycle2cycle3 Table4.TheUMDMoperationduringdivisionwhenc0=
1 c0=1(else) State1:Original State2:Swapped Control Datapath Control Datapath Signals Interpretation Signals Interpretation src1
←(001)(As,Ac)←(Us,Uc)src1←(010)(As,Ac)←(Ws,Wc) src2←(010)(Bs,Bc)←(Ws,Wc)src2←(001)(Bs,Bc)←(Us,Uc) dst←(001)storeresultinR1(U)dst←(010)storeresultinR2(W) shift←
0 resultisnotshiftedshift←
0 resultisnotshifted decdelta←∗,*=1whenTEST=0,otherwise*=
0.(δ=δ−1) U:=(U+k∗W)puted src1←(001)(As,Ac)←(Us,Uc)src1←(010)(As,Ac)←(Ws,Wc) src2←(101)(Bs,Bc)←(0,p) src2←(101)(Bs,Bc)←(0,p) dst←(001)storeresultinR1(U)dst←(010)storeresultinR2(W) shift←
1 result>>
1 shift←
1 result>>
1 negatedelta←∗,*=1whenδ<0,otherwise*=
0.(δ=−δ) U:=(U+u0∗p)>>1puted src1←(100)(As,Ac)←(Cs,Cc)src1←(011)(As,Ac)←(Ds,Dc) src2←(011)(Bs,Bc)←(Ds,Dc)src2←(100)(Bs,Bc)←(Cs,Cc) dst←(100)storeresultinR4(C)dst←(011)storeresultinR3(D) shift←
1 result>>
1 shift←
1 result>>
1 C:=(C+k∗D)>>1puted TheControlduringdivisionhastwomajor”states”:Original(notswapped)andSwapped.Thecontrolgoesbackandforthbetweenthesestatesonlywhenc0=1andδ<
0.Thesignofδismadepositive(δ=−δ)whenthisconditionhappens(realizedbythecontrolsignalnegatedeltaasshowninTable4).WhenthecontrolstateisOriginal,thevariablesW,
U,D,CarereadfromtheregistersR1,R2,R3,R4,respectively.Also,theupdatedvaluesofthesevariablesarewrittentotheaboveregistersinthesameorder.WhenthecontrolstateisSwapped,thesourcesforthevariablesWandUarereversed(WisreadfromR2andUfromR1).SamethingappliesforCandD(CisreadfromR3andDfromR4).Thesameapproachisusedforwritingtothesevariables.Thisway,theswapoperationisrealizedbyreadingfrom/writingtothecorrectregister. Thevalueofk∈{-1,1}isdeterminedbythecontrolsectiondependingontheresultofthetest((C+D)mod4=0ANDField=GF(p)),whichisdenotedby(TEST)inTable4.Inthecasek=−1,negativeDandnegativeWareobtainedbysettingN=
1.SincetheTESTisdoneontheleasttwosignificantbitsofCandD,itcanbeimplementedusingatwo-levelwork. IfthealgorithmputingthemodulardivisioninGF(2n),thesameproceduredescribedaboveisfollowed,exceptthattheTESTisnotapplicable(Field=GF(2n)).Forbothfields,putationpletedwhenC=0,andtheresultisZ=
W. 5.ExperimentalResultsandComparisons Thissectionhastwocategoriesofexperimentalresults:(a)numberofiterationsobtainedfromaMaplemodeland(b)thecriticalpathdelayresultsobtainedbysynthesisoftheVHDLdescriptionofthealgorithm. 5.1.TheNumberofIterations Maplewasusedtodescribetheproposedalgorithm(Alg1=UDMA)andtheunifiedMontgomeryinversealgorithmpresentedin[6](Alg2).Atleast100randomsampleswereusedtoverifyeachalgorithmoperationandobtainstatistics. Theotherdivision(inversion)algorithmparedistheunifiedMontgomeryinversealgorithmpresentedin[6](Alg2).Forconsistency,nomultiple-wordcalculationisconsideredhere.Forann−bitinputY,putesZ=Y−12k,wheren≤k≤2nisthenumberofalgorithmiterations.AcorrectionstepisneededtogettheinverseintheMontgomerydomain(Y−12n)orintheintegerdomain(Y−1).Therefore,thetotalnumberofiterationsrequiredputetheinverseinMontgomerydomainis2k−n.putethemodularinverseintheintegerdomainitneeds2kiterations. parisonswereusedinAlg2parethesizeofthebitvectorsthatrepresentelementsinthefield(thesamewayitwasdonein[1,8,9])insteadofthecounter(δ)usedinouralgorithm.parisonlimitsafasthardwareimplementation. Figure4showsthenumberofiterationsasafunctionofoperandsizerequiredbyAlg1(UDMA)andAlg2putetheintegermodularinverse(division)inGF(p)andGF(2n).Thesizeoftheoperandsisintherange(160to512)bits.Fromfigure4,wecanseethatAlg1executesinabout25%lessiterationsthanAlg2putingtheinverseintheintegerdomainforGF(p).ForGF(2n),Alg1hasabout40%lessiterationsthanAlg2putingtheinverseintheintegerdomain.NoticethatthenumberofiterationsforAlg1inbothfieldsincreaselinearlywiththeoperandprecision. Iterations 18001600140012001000 800600400200 00 100 200 300 400 Operandsize(bits) Alg1GF(p)Alg2GF(p)Alg1GF(2^n)Alg2GF(2^n) 500 600 Figure4.ThenumberofiterationsasafunctionofoperandsizerequiredbyAlg1(UDMA)andAlg2(presentedin[6])putethemodularinverseinGF(p)andGF(2n)
J.Goodmanetal.presentedin[8]apublic-keyprocessorthatimplementsoperationsrequiredforEllipticCurveCryptography(ECC)includingmodularinverseinGF(2n)only.Itisstatedin[8]thattheinversioninGF(2n)takesonaverage3.3cyclesforeachbit.Alg1needsamaximumof2iterations/bitandonaverage1.5iterations/bitputethemodularinverseinGF(2n).TheUMDMarchitectureperformseachiterationofAlg1in2.5clockcyclesonaverage.Therefore,theGF(2n)inversionbyAlg1takesonaverage1.5×2.5=3.75cyclesforeachbit.Butontheotherhand,thecriticalpathdelay(clockperiod)ofthedatapathoftheprocessorpresentedin[8](callittGµP)isthedelayof2fulladders(tFA)and2levelsofmuxesandanANDgate (tGµP=2tFA+2tmux+tAND),paredtoacriticalpathdelayofonly2fulladdersand1levelofmuxesfortheUMDMdatapathproposedinthiswork(tUMDM=2tFA+tmux).Thecriticalpathdelaywillaffecttheputationaltimeaswillbeseenlater. putesMontgomerymodularmultiplicationinncycles,whichisconsistentwithseveralotherdesigns[5,8,11]. 5.2.SynthesisResults TheexperimentaldatapresentedinthissectionweregeneratedusingMentorGraphicsCADtools.ThetargettechnologywassettoAMI05fastauto(0.5µmCMOSwithhierarchypreserved)providedintheASICDesignKitfromthepany[3]. TheUMDMarchitecturewasdescribedinVHDLandsimulatedinModelSimforfunctionalcorrectness.ItwassynthesizedusingLeonardotoolforthementionedtechnology. Figure5showsthecriticalpathdelays(innano-seconds)oftheUMDMfortheprecisionrange(160-512)bits.TheFigureshowsthatthedelayincreaseswiththenumberofbitsanditsaturatesathigherprecision.Thisindicatesthatthecriticalpathdelay(clockperiod)oftheUMDMbecameindependentoftheoperandssizeathighprecisions. 13 Time(nanoseconds) 120 100 200 300 400 500 600 Operandsize(bits) Figure5.ThecriticalpathdelayoftheUMDMinnano-seconds(operandsizefrom160-512bits). Thepublic-keyprocessorpresentedin[8](GµP)runsatclockrateof50MHz(clockperiod=20nsec),anditisconsideredagoodrepresentativeofthisclassofhardwaredesigns.Thedivider/multiplierproposedinthisworkhasamaximumclockperiodof12.8nsecat512-bitoperandsize,whichisabout1.6timesfasterthanGµ
P.Also,asmentionedbefore,theGµPtakes3.3cycles/bittoperformamodularinverseinGF(2n)andthisdesigntakes3.75cycles/bit.LetthetotalputationtimeofagivendesignTdesign: Tdesign=(cycles/bit)∗n∗clockperiod wherenistheoperandsizeinbits.Atn=512-bits,theputationtimeof GµP(TGµP)is:TGµP=3.3∗512∗20×10−9=33.79µsecandtheputationtimeforthisdesign(TUMDM)is:TUMDM=3.75∗512∗12.8×10−9=24.57µsec.By comparingTGµPandTUMDM,wefindthattheUMDMproposedinthisputes division in 33.79−24.5733.79 = 27.3% less total computation time than the Gµ
P. Ontheotherside,theareafortheUMDMdesignwasextractedfromtheexperimental dataas:AUMDM=236.12∗n+180=O(n)gates. Thearchitecturepresentedin[4]isdedicatedtoGF(2n)only,anditusesparisonsofthefieldpolynomials.ThedesignhasanplexityofO(n)too.
6.Conclusion InthisworkwepresentaUnifiedmodularDivision/MultiplicationAlgorithm(UDMA)anditscorrespondingarchitecture.ThealgorithmisabasedontheBinaryGCDalgorithmformodulardivisioninbothfields,andontheMontgomerymodularmultiplicationalgorithm.TheputesthedivisionineitherGF(p)orGF(2n)fieldsinanefficientwayparedwithotheralgorithms.Itusesacountertokeeptrackofthedifferencebetweenthevaluesofelementsinthefield,reducingplexityofeachiterationandmakingitfasterthantheiterationofotheralgorithmsthatuseparisons.paredwithMontgomerymultipliers,theproposedalgorithmhasthesamenumberofiterationsplexity. TheUnifiedDivider/Multipliermodulethatimplementsthealgorithmefficientlysupportsallitsoperationsandusescarry-saveunifiedaddersforreducedcriticalpathdelay. Experimentalresultsshowthatputationtimeoftheproposedsolutionpetitivewithotherdedicated(limited)solutionsandthecostinareapaidtohavetheintegrationofdivisionandmultiplicationisnothigh.Multiplicationisdonealmostasfastasinadedicatedmultiplierwiththeaddedfunctionalityofdivisionandtheflexibilitytochoosetherequiredfinitefield. AcknowledgementsThisworkwassupportedbytheNSFCAREERgrantCCR0093434andbytheJordanUniversityofScienceandTechnology(JUST). References [1]
A.A.Gutub,
A.F.Tenca,
E.SavasandC.K.Koc.Scalableandunifiedhardwareputemontgomeryinverseingf(p)andgf(2n).InProc.CHES2002,pages484–499. [2]
A.F.TencaandL.A.Tawalbeh.AnAlgorithmforUnifiedModularDivisioninGF(p)andGF(2n)SuitableforCryptographicHardware.IEEElectronicsLetters,40
(5):304–306,March2004. [3]ASICDesignKit.MentorGraphicsCo./partners/hep/AsicDesignKit/dsheet/ami05databook.html. [4]
A.D.DaneshbehandM.A.Hasan.AUnidirectionalBitSerialArchitectureforDouble-BasesDivisionoverGF(2m).InIEEE16thSymposiumonComputerArithmetic. [5]
E.Savas,
A.F.TencaandC¸.K.Ko¸c.AscalableandunifiedmultiplierarchitectureforfinitefieldsGF(p)andGF(2m).InProc.CHES2000,pages281–296. [6]
E.SavasandC.K.Koc.Architecturesforunifiedfieldinversionwithapplicationsinellipticcurvecryptography.InThe9thIEEEinternationalconferenceonElectronic,Circuitsandsystems,2002. [7]
T.ElGamal.Apublickeycryptosystemandsignitureschemebasedondiscretelogarithms.IEEETrans.-InformationTheory,IT-31
(4):469–472,July1998. [8]
J.GoodmanandA.P.Chandrakasan.Anenergy-efficientreconfigurablepublic-keycryptographyprocessor.IEEEJournalofsolid-statecircuits,36(11):1808–1820,November2001. [9]
J.Wolkerstorfer.Dual-fieldarithmeticunitforgf(p)andgf(2n).InCHES2002,pages500–514.[10]
D.Knuth.TheArtofComputerProgramming,Volume2,SeminumericalAlgorithms.3rdedition, 1998.[11]
P.L.Montgomery.Modularmultiplicationwithouttrialdivision.MathematicsofComputation, 44(170):519–521,April1985.[12]
N.Takagi.AVLSIAlgorithmforModularDivisionBasedontheBinaryGCDalgorithm.IEICE Trans.fundamentals,E81-A
(5):724–728,May1998.

标签: #多少钱 #要多 #多少钱 #多少钱 #需要多少钱 #概要 #域名 #长沙