Warning: closedir() expects parameter 1 to be resource, null given in /home/aware/public_html/etc/alpha/index.php on line 23
.aware alpha zine - ECDLP Quick & Dirty
.aware eZine Alpha - Overground Hacking
α
Ω


[=========================================================================]
[-------------------------[ ECDLP Quick & Dirty ]-------------------------]
[=========================================================================]


       _==####==_
     .############.
   .################.
  .##################.__ __ __ __ _ _ __ __ 
  ##############/¯_`|#\ V  V // _` | '_/ -_)
  ##############\__,|# \_/\_/ \__,_|_| \___|
  ###########*¯*######                                     
  *#########{   }####* 
   ##########._.#####
    ################  (C) 2006 by .aware computer security research group
     *############*
       ¯==####==¯         Web: http://awarenetwork.org
                       Author: rattle@awarenetwork.org



------------------------------------------------[ Table of Contents ]-----

    i.  Abstract
   ii.  Preface
  iii.  Mathematical Background
   iv.  Implementation Issues - Fields
    v.  Implementation Issues - Curves
   vi.  Implementation
  vii.  Conclusion
 viii.  References
  

-[ i ]----------------------------------------------------[ Abstract ]-----


 We aim to implement a very small and versatile assymetric key exchange
 algorithm for x86 platforms, based on the Elliptic Curve Discrete
 Logarithm Problem (ECDLP). This algorithm allows 2 parties to exchange a
 key of sufficient length for subsequently encrypting traffic with a 
 symmetric cipher.

 NOTE: This paper might offend the mathematically schooled among you
 because I was not exactly precise, let alone thorough. 


-[ ii ]----------------------------------------------------[ Preface ]-----


                 "I encrypt all my data with one-time pad's that consist
                  of an electrometre measuring me urinating into the
                  toilet. I'm secure."
                                                               - sirukin


 It has come to my attention that modern cryptography, while being on
 everyone's favourite feature list, is hardly found explained in papers
 with the due clarity it requires. More accurately, most articles give
 you a rough idea of how certain cryptographic schemes work and,
 succeed in completely failing to give you enough insight (or tools) to
 implement an actually secure, working algorithm. In particular, I am 
 referring to Elliptic Curve Cryptography (ECC), which I will be
 revolving around for the rest of the night.

 If everyone has such a compulsive thirst for knowledge - why the utter
 lack of enlightenment? Pretty simple. Just like the less sophisticated
 asymmetric schemes (like RSA), the basic idea of ECC can be wrapped up
 in a couple of fancy sketches and some first semester algebra. However,
 once you delve deeper into the topic, you will find that it is
 headblowingly complicated and not nearly as easy as they told you.

 Don't be intimidated, though. I will work around most problems of ECC
 with dirty tricks and hardcoded parameters. What we will get, though,
 is a working and secure ECDLP key exchange algorithm implemented in C
 for x86 machines. Mac users, rejoice. 



-[ iii ]-----------------------------------[ Mathematical Background ]-----


                   "Well, just because you don't understand it, does not
                    mean it's complicated."
                                                  - My Algebra Professor


 (1) Fields. A field is a mathematical structure F which behaves similar to
 the real or the rational numbers. For in time, I recommend you think of
 these structures as we analyze them in a more abstract fashion. Elements
 of a field can be added and multiplied. Each element a has a uniquely
 defined additive inverse (-a) such that a+(-a)=0. Also, each element a!=0
 has a multiplicative inverse (1/a) such that a * (1/a) = 1. Also, you have
 a*b=b*a, a*(b*c)=(a*b)*c and (a+b)*c = ac+bc. All this is obvious for real
 or rational numbers - but there are much more interesting structures that
 obey the same laws. As you might have already concluded, each field
 contains the Elements 0 and 1, the neutral Elements of Addition and
 Multiplication respectively. (Neutral because a+0=a, a*1=a.)

 Well, since it seems to be the smallest possible one, how about the field
 F2 = { 0, 1 }  with


   Addition:         Multiplication:
 
    + | 0 | 1         * | 0 | 1
   ---+---+---       ---+---+--- 
    0 | 0 | 1         0 | 0 | 0 
   ---+---+---       ---+---+--- 
    1 | 1 | 0         1 | 0 | 1
      '   '             '   '

 The addition is a simple XOR operation, the multiplication works just 
 as though 1 and 0 were integer numbers. This means, you use + and * on
 0 and 1 as though they were integer numbers, but the result of your 
 operation is the parity of the calculated number. Since there aren't that
 many scenarios, you can easily convince yourself that you can operate
 inside F2 just like you can operate on real and rational numbers.

 While it's an interesting and entertaining example, let's try to
 generalize the concept. For any prime p, let F(p) = { 0, 1, ..., p-1 } be
 the field where every operation is performed modulo p. This means, for
 any a,b in F(p), you have
 
    add(a,b) :=  (a + b) % p
    mul(a,b) :=  (a * b) % p

 It can be shown that the resulting structure is a field. Why primes? I 
 will not go into detail here, but p being a prime is necessary for the 
 existence of a  multiplicative inverse for every a != 0. For instance, if
 we'd try p = 6, you had mul(2,3) = (2*3)%6 = 0, and consequentially, 
 neither 2 or 3 would have a multiplicative inverse. However, for p=7:

    mul(1,1)               =  1 % 7 = 1
    mul(2,4) = (2 * 4) % 7 =  8 % 7 = 1
    mul(3,5) = (3 * 5) % 7 = 15 % 7 = 1
    mul(6,6) = (6 * 6) % 7 = 36 % 7 = 1

 These fields are known as prime fields. It can be shown that every finite
 field (a field with finitely many elements) has p to the power of m
 elements, where p is a prime and m a positive integer number. We have
 covered the fields with p elements already, what do the others look like?

 Consider polynomials with coefficients in F(p). As you might know,
 polynomials can be added and multiplied. Most axioms of a field are 
 already given, but we are lacking a multiplicative inverse for every
 polynomial of degree 1 or more. In fact, the polynomials over F(p)
 are, as a structure, very similar to the integer numbers - there's a
 polynomial division with rest, just like you have a division with rest for
 integers. Accordingly, you will also find prime polynomials. You will
 have to believe me when I tell you the following two things (or read a big
 book on algebra):

  - For any m, there exists a prime polynomial of degree m over F(p).
  - Performing every operation in F(p) modulo a prime polynomial of
    degree m yields a field structure with p to the power of m 
    elements.

 We are particularly excited about polynomials over F2, because they can
 easily be represented as a bitstring (the coefficients of the
 polynomial are either 1 or 0). We will refer to this structure as F(2,m)
 where (m+1) is the degree of the prime polynomial we use for reduction.

 I cannot venture too deep into the maths here. For now, I just want 
 you to know and trust me that F(2,m) works just like real or rational
 numbers - except for nasty things like a + a = (1 + 1) * a = 0 * a = 0.
  
 
 (2) Elliptic curves. What is an elliptic curve? An elliptic curve, for
 us, will be the set of all points (x,y) from any field which satisfy an 
 equation of the form


                       y² + xy = x³ + ax² + b                          (*)


 Note that this is not the general definition of an elliptic curve. Indeed,
 this is a "non-supersingular" curve. However, frankly, that's all we are 
 interested in.
 We will write, for given coefficients a and b:


  E(F) = { (x,y) : x in F, y in F, y² + xy = x³ + ax² + b } + { @ }


 In words: E(F) is the set of all tuples (x,y) with x and y in F, such that 
 x and y satisfy the curve equation (*). We also say that x=infinity and
 y=infinity satisfy the curve equation and call the point @ the "point at
 infinity". 

 The sketch below shows how these kinds of curves look over the real
 numbers:

                                                          
                             |    .
                             |     
                             |   .
                             |  .
                             | .
                             |. 
               . . .        .|
             .       . .. .  |
            .                |
           .                 |
       ----------------------+-----------------
           .                 |
            .                |
             .       . .. .  |
               . . .        .|
                             |.
                             | .
                             |  .
                             |   .
                             |     
                             |    .


 There is something particularly interesting about these elliptic curves.
 Provided that P and Q are points on the curve, draw a line through P and
 Q. It will intersect the curve in a third point R'. The sum of
 P and Q is then defined as the negative of R', which is simply the 
 reflection of R' about the x-axis. To add a point P to itself, you draw
 the tangent instead. 



                             |       /
                             |      .
                             |     /. R'
                             |    /  
                             |   / . 
                             |  /    
                             | /  .  
                             |/      
                             /   .    
                            /|  .    
                           / | .    
                          /  |.     
               . . .     /  .|      
             .       . .. .  |       
            .          / Q   |       
           .          /      |       
       --------------/-------+-----------------
           .        /        |       
            .      /         |
             .    /  . .. .  |
               . . .        .|
                / P          |.
               /             | .
              /              |  .
             /               |   .
            /                |     
           /                 |    .
          /                  |    
         /                   |     .
        /                    |      
       /                     |      . 
      /                      |      X  -R' = (P + Q)




 To be precise, for P = (x,y) we set -P = (x,-y). Set P + (-P) := @. 

 As an interesting result, E(F) together with this addition law is an
 abelian group (that means, adding these points technically works like 
 like adding integer numbers) with @ serving as the neutral element
 (the "zero"). 

 How can we put this structure to cryptographic use? First of all, we
 consider elliptic curves over finite fields. Also, define a function


    f(n,P) = nP = P + P + ... + P

                  <-- n times -->


 For very large n, this function is still easy to calculate for a given
 point P. However - given Q and P, it is very hard to find an integer 
 Number n such that Q = f(n,P). This is called the Elliptic Curve Discrete
 Logarithm Problem. Now, due to the abelian structure, you have 


    f(n*m,P) = f(n,f(m,P)) = f(m,f(n,P))


 for any two integers n and m. Let me show you how we can exchange keys
 with this little trick.


    (1) Alice and Bob agree on a Point P in E(F).
    (2) Alice secretly chooses a big random number n and calculates nP.
        Bob also chooses big random number m and calculates mP.
    (3) Alice sends Bob (nP), Bob sends Alice (mP).
    (4) Alice calculates n(mP) = nmP =: Q.
        Bob also calculates m(nP) = nmP = Q.
    (5) Bob and Alice use Q as the key for encrypting traffic.

 
 What does Eve know? She knows P, (nP) and (mP). To get Q, she'd need the
 product nm. However, this would only be possible by solving the ECDLP for
 (nP) and (mP) (or solving a similarly hard problem).

 And that's that, folks. Let's get down to business!



-[ iv ]-----------------------------[ Implementation Issues - Fields ]-----


  "If I recally correctly, the compiler does issue optimizer notes for
   this, it's just that they don't particularly stand out from the other
   notes so you might not realize that these particular optimization
   notes are OPTIMIZATION PROBLEMS WHICH WILL BURN ALL YOUR CPU CYCLES
   IN THE DEEPEST PITS OF COMPUTATIONAL HELL."

      - William Newman on run-time casting with SBCL (#lisp 2003/03/29)


 
 First of all, the ECDLP is hard only if the two following rather strong 
 conditions are met:


   - The number of elements in your finite Field is larger than at least
     2 to the power of 128. 256 bits is better, 512 scares the NSA.     
   - The number of points on your elliptic curve is almost prime, which
     means that it is of the form |E(F)| = h * p, where p is prime and
     h very small.


 This leads to the following problems:


  (A) Implement sufficiently fast finite field arithmetics for fields
      with approximately 1.34e+154 elements (that's around 512 bits).
   
  (B) Find a curve which has appropriate order.

 
 Problem (A) is our biggest problem, as (B) has been solved by government
 agencies and smart mathematicians around the world. So we're left with
 the task of implementing F(2,m) for m ~= 512. 

 Polynomials will be implemented as bitstrings of length m, stored in a
 little endian array of T = (m/32) machine words (32 bits). 


 (1) Shifting / Multiplication with x. Because we store polynomials as 
 bitstrings, shifting is a cheap linear operation. Only consider shifts
 of l<32 (for shifts larger than our wordsize of 32, we can shift
 word-wise first in linear time). We now have to shift and rotate through
 the carry bit at most 31*(m/32) ~ m times. 


 (2) Addition. Adding two polynomials in F(2,m) is trivially linear. 
 It is implemented as T XOR operations.


 (3) Multiplication. Multiplication might seem more complicated, but is
 easily implemented in O(m²). It works like this:

 --------------------------------------------------------------
   Bitwise Comb Polynomial Multiplication Algorithm
 --------------------------------------------------------------

   Input:   Polynomials p(x), q(x) as bitstrings
   Output:  c(x) = p(x) * q(x)

   SET c(x) = 0
   FOR i=0 TO m DO:
       IF bit i of p(x) is set:
           c(x) := c(x) + (q(x) << i)
   RETURN c(x)

 --------------------------------------------------------------

 You can achieve better complexity with a divide-and-conquer approach as
 explained in [K1]. However - for m as small as approximately 512 bit,
 benchmarks for bitwise comb turn out to be faster than benchmarks for
 divide&conquer, on several systems. Thus, for our purposes, the 
 algorithm is completely sufficient, if not optimal.


 (4) Squaring. Calculating p(x)² can be implemented as a very fast 
 linear operation due to the following observation which only  
 applies to fields where 1+1=0:

   (a + b + c)² = (a + b + c) (a + b + c)
     = a² + ab + ac + ba + b² + bc + ca + cb + c²
     = a² + b² + c² + (1+1)ab + (1+1)ac + (1+1)bc
     = a² + b² + c²

 Thus, given a polynomial p(x) as bitstring, we can calculate p(x)² by
 simply inserting zeros between all bits. For instance,

   (1100111)² = (1010000010101)

 With a wee bit of precomputation, this is easily done in 4T = m/8 
 steps.

 
 (5) Reduction. The algorithms for squaring and multiplication yield 
 polynomials of degree > m. These polynomials need to be reduced modulo
 a prime polynomial f(x) of degree m+1. For now, assume we have such an
 algorithm.

 
 (6) Inversion. Last, but definitely not least, we need a "division"
 algorithm. This means, for a given polynomial p(x), find the (uniquely
 defined) multiplicative inverse polynomial q(x), such that:

    p(x) * q(x) = 1 mod f(x)

 where f(x) is our reduction polynomial. To find the Inverse, we use
 the polynomial version of euclid's extended algorithm:
  
 --------------------------------------------------------------
   Euclid's Extended Algorithm (Polynomial)
 --------------------------------------------------------------
 
   Input:  Polynomials p(x) and f(x)
   Output: Polynomials a(x),b(x) and d(x) such that  
           a(x) * p(x) + b(x) * f(x) = d(x)

   SET a(x) = 1, a´(x) = 0,
       b(x) = 0, b´(x) = 1
   SET u(x) = p(x),
       d(x) = f(x)

   WHILE u(x) != 0 DO:
       SET j := deg(u(x)) - deg(d(x))
       IF j < 0:
           u <-> d
           a <-> a´
           b <-> b´
           j := -j
       u(x) := u(x) + ( d(x)<<j)
       a(x) := a(x) + (a´(x)<<j)
       b(x) := b(x) + (b´(x)<<j)

   RETURN a(x),b(x),d(x)
 
 --------------------------------------------------------------

 where "a <-> b" means "exchange a and b". 

 In this particular case, we always have d(x) = 1 because f(x) is prime,
 and we don't care about b(x). 


 The only thing that is left to be done is reduction. This is the only 
 algorithm that depends on the form of the reduction polynomial f(x). 
 This is also the only algorithm that requires a lot of work and 
 hand-tuned optimization. You can skip the rest of this chapter if you
 are not interested in this kind of stuff.

 We will now try to find a reduction polynomial of degree > 512 which
 is particularly suited for reduction. 

 On the other hand, why try to find one if one has already been found?
 The [F1] standard suggests reduction modulo the pentanomial

 --------------------------------------------------------------
   Pentanomial f571(x)
 --------------------------------------------------------------

   f(x)  =   x**571  +  x**10  +  x**5  +  x**2  +  1

         ~  (1<<571) + (1<<10) + (1<<5) + (1<<2) +  1
 
   = 0x 08000000 00000000 00000000 00000000 00000000 00000000
        00000000 00000000 00000000 00000000 00000000 00000000
        00000000 00000000 00000000 00000000 00000000 00000425

 --------------------------------------------------------------


 Write f(x) = h(x) + r(x). where h(x)= (1<<571). Also assume we have a
 polynomial p(x) of degree at most 2m = 1140. Write 

    p(x) = b(x)*h(x) + a(x).

 with polynomials b(x),a(x) of degree m. Because 

    b(x)h(x) = b(x)f(x) + b(x)r(x)

 We get that b(x)h(x) = b(x)r(x) mod f(x), thus we have

    p(x) = b(x)r(x) + a(x) mod f(x).


 Now we're getting down to the hard part. We need to devise a fast
 reduction algorithm. Assume that the bitstring of P is stored as an array
 of unsigned little endian machine words

   
    p(x) = P[2T], P[2T-1], ..., P[T], P[T-1], ..., P[0]


 In this case, our polynomial b(x) starts at bitoffset 26 of P[T-1], 
 because we assume a(x) to be a 570 bit polynomial which does not need to
 be reduced. Consider:

 
 <---- P[2T] --->     <--- P[T] ---> <------------ P[T-1] --------------->

 {1151} .. {1120} ... {608} .. {576} {575}{574}{573}{572}{571} {570} .....

 <---------------------------- B ----------------------------> <--- A --->


 This means we have to understand B[i] == (P[i+T]<<5) ^ (P[i+T-1]>>27).
 Also, obviously A[i] == P[i].


 Now, let's forget this for a while and think about how b(x)r(x) is 
 calculated. We have r(x) = (1<<10)+(1<<5)+(1<<2)+1 = 0x425. We get 

     B[i]*R = B[i] ^ (B[i] << 10) ^ (B[i] <<  5) ^ (B[i] << 2)
 
 and a (maximum 10-bit) carry:

     C(B,i) := (B[i] >> 22) ^ (B[i] >> 27) ^ (B[i] >> 30)

 Thus, in one step of our algorithm, we will have to do

 -------------------------------------------------------------------------

 FOR i = 1, ..., T-1 DO

     P[i]   := A[i]   ^ B[i] ^ (B[i] << 10) ^ (B[i] <<  5) ^ (B[i] << 2)
     P[i+1] := A[i+1] ^ C(B,i)

 -------------------------------------------------------------------------

 Consider what we had before and we get 

 P[i] 
  =   ((P[i+T]<<5) ^ (P[i+T-1]>>27))         ^   
    ( ((P[i+T]<<5) ^ (P[i+T-1]>>27)) <<  5 ) ^
    ( ((P[i+T]<<5) ^ (P[i+T-1]>>27)) <<  2 ) ^
    ( ((P[i+T]<<5) ^ (P[i+T-1]>>27)) << 10 ) ^ P[i]

  = P[i] ^ (P[i+T]<<5) ^ (P[i+T]<<7) ^ (P[i+T]<<10) ^ (P[i+T]<<15)

         ^ (P[i+T-1]>>27)      ^ (P[i+T-1]>>27)<<2
         ^ (P[i+T-1]>>27)<<5   ^ (P[i+T-1]>>27)<<10                    (*)


 P[i+1]
  = 
     ( ((P[i+T]<<5) ^ (P[i+T-1]>>27)) >> 22) ^ 
     ( ((P[i+T]<<5) ^ (P[i+T-1]>>27)) >> 27) ^
     ( ((P[i+T]<<5) ^ (P[i+T-1]>>27)) >> 30) ^ P[i+1]

  = P[i+1] ^ (P[i+T]<<5)>>22 ^ (P[i+T]<<5)>>27 ^ (P[i+T]<<5)>>30  


 however, one step later, P[i+1] will be XOR'd with (*). We have,
 if we look at a bitwise representation of X:

   Bits, numbered from A to 6:          Variable:    Label:

   ABCDEFGHIJKLMNOPQRSTUVWXYZ123456     (X)
 ^ 0000000000000000000000ABCDE00000     (X>>27)<<5   (A1)
 ^ 000000000000000000000000000FGHIJ     (X<<5)>>27   (A2)
 ^ 0000000000000000000000FGHIJKLMNO     (X<<5)>>22   (B1)
 ^ 000000000000000000000000000000FG     (X<<5)>>30   (C1)
 ^ 000000000000000000000000000ABCDE     (X>>27)      (D)
 ^ 0000000000000000000000000ABCDE00     (X>>27)<<2   (C2)
 ^ 00000000000000000ABCDE0000000000     (X>>27)<<10  (B2)

              /\                           /\         /\
              ||                           ||         ||
              \/                           \/         \/

   ABCDEFGHIJKLMNOPQRSTUVWXYZ123456     (X)
 ^ 0000000000000000000000ABCDEFGHIJ     (X>>22)       (A) = (A1)^(A2)
 ^ 000000000000000000000000000ABCDE     (X>>27)       (D)
 ^ 0000000000000000000000000ABCDEFG     (X>>25)       (C) = (C1)^(C2)
 ^ 00000000000000000ABCDEFGHIJKLMNO     (X>>17)       (B) = (B1)^(B2)
   

 Hence, we can rewrite our Algorithm to:

 -------------------------------------------------------------------------

 FOR i = T-1, ..., 0 DO
     SET X  := P[i+T]
     P[i]   := P[i]   ^ (X<<5) ^ (X<<7) ^ (X<<10) ^ (X<<15)
     P[i+1] := P[i+1] ^ (X>>17) ^ (X>>22) ^ (X>>25) ^ (X>>27)

 -------------------------------------------------------------------------
 

 The Polynomial we get after reduction will, of course, still have degree
 > 570. However, since deg(C(B,T)) <= 10 and deg(r(x)) <= 10, we have
 deg(C(B,T)r(x)) <= 20, which fits into one single machine word. Hence, we
 should be able to complete the reduction with one more operation (and a
 few shifts). Without proof I will tell you that we can complete the
 reduction algorithm to:

 -------------------------------------------------------------------------
   Polynomial Reduction Algorithm Modulo f571 
 -------------------------------------------------------------------------

   Input:  Polynomial p(x) of degree 1140 or less, stored as 
           an array of 2T machinewords.
   Output: p(x) mod f571(x)

   FOR i = T-1, ..., 0 DO
       SET X  := P[i+T]
       P[i]   := P[i]   ^ (X<<5) ^ (X<<7) ^ (X<<10) ^ (X<<15)
       P[i+1] := P[i+1] ^ (X>>17) ^ (X>>22) ^ (X>>25) ^ (X>>27)

   SET X  := P[T-1] >> 27
   P[0]   := P[0] ^ X ^ (X<<2) ^ (X<<5) ^ (X<<10)
   P[T-1] := P[T-1] & 0x07ffffff
 
   RETURN P[T-1],...,P[0]

 -------------------------------------------------------------------------
 
 We're done! We have each and every function to implement a fast 570 bit
 binary field. 




-[ iv ]-----------------------------[ Implementation Issues - Curves ]-----


                        "Premature Optimization is the root of all evil."
                                                             - Tony Hoare


 Now that we have a field to work with, we need to sort out some final 
 issues with the curve implementation. As I told you before, the ECDLP is
 hard only if the curve does not have a smooth order. At this point we 
 shortcut a huge problem of elliptic curve cryptography by choosing a 
 hardcoded (Koblitz) curve over F(2,570) which has almost prime order. 
 That curve is  


                           E0:  y² + xy = x³ + 1 


 This is a non-supersingular Koblitz curve over F2. Fancy words, huh? The
 laws for point addition work as follows:


  1. P + @ = @ + P = P for all P in E(F(2,m))

  2. The negative of  P = (x,y) is  -P = (x, x + y). Also, -@ = @ and 
     of course, P + (-P) = @.

  3. Point addition: Let P=(x1,y1) and Q=(x2,y2), where P is neither Q nor
     -Q. Then we have P + Q = (x3,y3), where


         x3 = l² + l + x1 + x2                   (y1 + y2)
                                       with l := ---------
         y3 = l(x1+x3) + x3 + y1                 (x1 + x2)
 

  4. Point doubling. Let P = (x, y) and P is not -P. Then we will write
     P+P = 2P = (a, b), where


         a = l² + l = x² + (1/x²)                (x + y)
                                       with l := -------
         b = l(x+a) + a + y                         x



 But wait! You said in (iii.) that -(x,y) = (x,-y)! Yea. I lied. That law
 in (iii) is not the law for non-supersingular curves over F2, but for
 curves over the real numbers. You see, in F(2,570), many things are very 
 different from other fields because we have 1+1=0. Simple things like 
 dividing by 2 turn out to be real explosive. Hence, most algorithms and
 laws have to be tweaked a bit.

 Know that Koblitz curves have special traits which gave birth to a lot of
 literally insane optimization algorithms. Also know that I did not
 implement any of these. It might be a topic for later papers. Instead, 
 the above laws have been implemented naively. What requires a wee bit of
 attention is the point multiplication algorithm. First of all, point 
 multiplication is defined as the n-fold addition of a point P. Here, n 
 is any integer number. We will use a standard square & multiply (double
 & add) algorithm to implement the point multiplication using point
 addition. Hence, we will view the integer number as a bitstring and
 consequentially, we can use the same data type for 576-bit polynomials and
 576-bit integer numbers.

 ---------------------------------------------------------
  Double & Add Point Multiplication Algorithm
 ---------------------------------------------------------

  Input:   Point P, Polynomial k(x) over Z
  Output:  k(2)*P
 
  SET Q := P

  FOR i = deg(k), ..., 0 DO:
      Q := Q+Q
      IF bit i of k(x) is set THEN:
          Q := Q+P

  RETURN Q

 ---------------------------------------------------------
 
 Our work here is almost done. The only remaining problem is - how do we
 find a point on the curve? (1,1) and (1,0) are trivial solutions, but
 they are of such small order that they are not really suitable for key 
 exchange via ECDLP. On the other hand, |E(F)| ~= q, where q = |F| (there
 are approximately as many points on the curve as there are elements in
 the underlying field). Since we have q² possible point combinations, we 
 have a probability of 1/q ~= 2.59e-172 that a random point actually
 satisfies the curve equation. 

 We will work around this problem the same way we worked around most
 problems so far - we won't solve it. The following point is, according to
 [F1], a generator point of the koblitz curve we chose in little Endian
 notation:

 PGEN = {
   0xA01C8972,0xE2945283,0x4DCA88C7,0x988B4717,0x494776FB,0xBBD1BA39,
   0xB4CEB08C,0x47DA304D,0x93B205E6,0x43709584,0x01841CA4,0x60248048,
   0x0012D5D4,0xAC9CA297,0xF8103FE4,0x82189631,0x59923FBC,0x026EB7A8,

   0x3EF1C7A3,0x01CD4C14,0x591984F6,0x320430C8,0x7BA7AF1B,0xB620B01A,
   0xF772AEDC,0x4FBEBBB9,0xAC44AEA7,0x9D4979C0,0x006D8A2C,0xFFC61EFC,
   0x9F307A54,0x4DD58CEC,0x3BCA9531,0x4F4AEADE,0x7F4FBF37,0x0349DC80   }
 
 What is a generator point? A point P of E(F) is called a generator point
 if and only if for every Q in E(F), there is an integer number n such 
 that nP = Q. In plain English, this point generates the entire curve.
 
 In step (1) of the key exchange, explained in the last paragraph of
 chapter (iii), we will always choose the point PGEN. Now, having cheated
 our way around the more cumbersome problems of ECC, we shall have some
 fun.




-[ vi ]---------------------------------------------[ Implementation ]-----


              "I considered life, I implemented life, now, life is boring."
                                                                 - thE AWKz


 In the Appendix, you will find the following files:


   binfields.h   curve.h
   binfields.c   curve.c

 
 The binfields module implements polynomial extension fields modulo the 
 prime polynomial presented in (iv). All operations are implemented as
 described in (iv) and provide sufficiently fast arithmetics. The curve
 module implements the koblitz curve E0 as described in (iv).

 Note that the curve module has not been optimized for speed. I have
 tested the Module, and for a full-blown key exchange, the algorithm takes
 around 3-4 seconds, which means an effort of max. 2 seconds for each
 party. This is reasonably fast and definitely fast enough for any 
 real life application. On a slow box, this might grow to time spans 
 of 10 seconds, but I still think it is within acceptable range.

 Now, how does the Module work? Basically, include curve.h and then,
 all you need are the following functions:

  word* pointMul(word* R, word* k, word* P);
  word* pointGen();
  word* polyRand();


 polyRand() generates a random polynomial in F(2,570). For the sake of
 point Multiplication, it can also be interpreted as a 570-bit integer
 number. pointGen() gives you a hardcoded generator point of E0. 

 For key exchange, each side performs the following operation (after
 initializing the random number generator):

 --------------------------------------------------------------

   word* n = polyRand();
   word* P = pointGen();

   pointMul(P,n,P); // P := nP
   sendToBob(P);    // send (nP) to bob
   recvFromBob(P);  // get (mP) from bob
   pointMul(P,n,P); // calculate Q = nmP = n(mP)
   free(n);
   
   /* use the 36-word array P or any substring of it
      for symmetric encryption.                      */

 --------------------------------------------------------------

 And voila, you're good to go.
  
  
  

-[ vii ]------------------------------------------------[ Conclusion ]-----
 

                        "I know that you believe you understand what you
                         think I said, but I'm not sure you realize that
                         what you heard is not what I meant."
                                                     - Robert McCloskey


 We have implemented exactly what we wanted - a quick and dirty ECDLP key
 exchange algorithm. It is far from being a comprehensive and complete 
 ECC library - it is limited. On the other hand, it is small and easily 
 incorporated with applications you might already have. Any kind of remote
 networking software can now be equipped with full-blown assymetric
 encryption. I can imagine a whole lot of ways to put this option to good
 use. 

 And now, enjoy the code.



-[ viii ]-----------------------------------------------[ References ]-----


 [K1] Karatsuba - Ofman Multiplication using Divide & Conquer

 http://mathworld.wolfram.com/KaratsubaMultiplication.html


 [F1] Federal Information Processing Standards Publication
      FIPS-186-2 - Digital Signature Standard 

 http://csrc.nist.gov/publications/fips/fips186-2/fips186-2-change1.pdf
 

 



[=========================================================================]
[------------------------------[ APPENDIX ]-------------------------------]
[=========================================================================]


 All files have been compiled using Microsoft Visual Studio 6 on Windows.
 Since there is a lot of inline Assembler, you might have to hand-tweak
 this code for other platforms.



-------------------------------<binfields.h>-------------------------------


#ifndef _BINFIELDS__H
#define _BINFIELDS__H

typedef unsigned char  byte;
typedef unsigned long  word;

#define SIZE_BITS   0x0000023a
#define SIZE_WORDS  0x00000012
#define SIZE_BYTES  0x00000048
#define EMPTY_MASK  0x07ffffff
#define SIZE_BITS2  0x00000474
#define SIZE_WORDS2 0x00000024
#define SIZE_BYTES2 0x00000090

void  __fastcall   lshift(word* a, word n);
void  __fastcall   rshift(word* a, word n);

int                deg(word *a);

int   __fastcall   polyCmp(word *a, word *b);
int   __fastcall   polyIsZero(word *a);

word* __fastcall   polyAdd(word* c, word* a, word* b);
word* __fastcall   polyAddTo(word* a, word* b);

word*              polySqr(word* c, word* a);
word*              polyMul(word* c, word *a, word *b);
word*              polyDiv(word* c, word *a, word *b);
word*              polyInv(word *c, word *a);


word*              polyGen();
word*              polyRand();
word*              polyDup(word* a);

#endif


-------------------------------<binfields.c>-------------------------------


#include <memory.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include "binfields.h"

// Precomputed Array. For each byte a1,a2,...,a8, this 
// array contains the halfword a1,0,a2,0,a3,0,...,0,a8.
// This is used for fast squaring.

const static unsigned short POW[0x100] = {
    0x0000,0x0001,0x0004,0x0005,0x0010,0x0011,0x0014,0x0015,
    0x0040,0x0041,0x0044,0x0045,0x0050,0x0051,0x0054,0x0055,
    0x0100,0x0101,0x0104,0x0105,0x0110,0x0111,0x0114,0x0115,
    0x0140,0x0141,0x0144,0x0145,0x0150,0x0151,0x0154,0x0155,
    0x0400,0x0401,0x0404,0x0405,0x0410,0x0411,0x0414,0x0415,
    0x0440,0x0441,0x0444,0x0445,0x0450,0x0451,0x0454,0x0455,
    0x0500,0x0501,0x0504,0x0505,0x0510,0x0511,0x0514,0x0515,
    0x0540,0x0541,0x0544,0x0545,0x0550,0x0551,0x0554,0x0555,
    0x1000,0x1001,0x1004,0x1005,0x1010,0x1011,0x1014,0x1015,
    0x1040,0x1041,0x1044,0x1045,0x1050,0x1051,0x1054,0x1055,
    0x1100,0x1101,0x1104,0x1105,0x1110,0x1111,0x1114,0x1115,
    0x1140,0x1141,0x1144,0x1145,0x1150,0x1151,0x1154,0x1155,
    0x1400,0x1401,0x1404,0x1405,0x1410,0x1411,0x1414,0x1415,
    0x1440,0x1441,0x1444,0x1445,0x1450,0x1451,0x1454,0x1455,
    0x1500,0x1501,0x1504,0x1505,0x1510,0x1511,0x1514,0x1515,
    0x1540,0x1541,0x1544,0x1545,0x1550,0x1551,0x1554,0x1555,
    0x4000,0x4001,0x4004,0x4005,0x4010,0x4011,0x4014,0x4015,
    0x4040,0x4041,0x4044,0x4045,0x4050,0x4051,0x4054,0x4055,
    0x4100,0x4101,0x4104,0x4105,0x4110,0x4111,0x4114,0x4115,
    0x4140,0x4141,0x4144,0x4145,0x4150,0x4151,0x4154,0x4155,
    0x4400,0x4401,0x4404,0x4405,0x4410,0x4411,0x4414,0x4415,
    0x4440,0x4441,0x4444,0x4445,0x4450,0x4451,0x4454,0x4455,
    0x4500,0x4501,0x4504,0x4505,0x4510,0x4511,0x4514,0x4515,
    0x4540,0x4541,0x4544,0x4545,0x4550,0x4551,0x4554,0x4555,
    0x5000,0x5001,0x5004,0x5005,0x5010,0x5011,0x5014,0x5015,
    0x5040,0x5041,0x5044,0x5045,0x5050,0x5051,0x5054,0x5055,
    0x5100,0x5101,0x5104,0x5105,0x5110,0x5111,0x5114,0x5115,
    0x5140,0x5141,0x5144,0x5145,0x5150,0x5151,0x5154,0x5155,
    0x5400,0x5401,0x5404,0x5405,0x5410,0x5411,0x5414,0x5415,
    0x5440,0x5441,0x5444,0x5445,0x5450,0x5451,0x5454,0x5455,
    0x5500,0x5501,0x5504,0x5505,0x5510,0x5511,0x5514,0x5515,
    0x5540,0x5541,0x5544,0x5545,0x5550,0x5551,0x5554,0x5555 };


void __fastcall lshift2(word* a, word n);


word* __fastcall __reduce(word* c, word* temp) {
    register int i;
    register word T;
    
    if (!deg(&c[SIZE_WORDS]) && deg(c)<=SIZE_BITS)
        goto __reduce_done;

    for (i=SIZE_WORDS2-1;i>=SIZE_WORDS;i--) {
     T = c[i];
        c[i-SIZE_WORDS]   ^= (T<<5)  ^ (T<<7)  ^ (T<<10) ^ (T<<15);
        c[i-SIZE_WORDS+1] ^= (T>>27) ^ (T>>25) ^ (T>>22) ^ (T>>17);
    }
    T = c[SIZE_WORDS-1] >> 27;
    c[0] = c[0] ^ T ^ (T<<2) ^ (T<<5) ^ (T<<10);
    c[SIZE_WORDS-1] = c[SIZE_WORDS-1] & EMPTY_MASK;
__reduce_done:
    return memcpy(temp, c, SIZE_BYTES);
} 

word* polyMul(word* r, word* a, word* b) {
    register int k,j;
    word c[SIZE_WORDS2] = {0};

    for (k=31;k>=0;k--) {
        for (j=0;j<SIZE_WORDS;j++)
            if ((a[j]>>k)&1) 
                polyAddTo(&c[j],b);
        if (k) lshift2(c,1);
    }

    return __reduce(c,r);
}

word* polySqr(word* r, word *a) {
    register word i;
    word c[SIZE_WORDS2] = {0};
    for (i=0;i<SIZE_WORDS;i++) {
        c[2*i]    = POW[(a[i]>>0x00) & 0xFF];
        c[2*i]   += POW[(a[i]>>0x08) & 0xFF] << 0x10;
        c[2*i+1]  = POW[(a[i]>>0x10) & 0xFF];
        c[2*i+1] += POW[(a[i]>>0x18) & 0xFF] << 0x10;
    }
    return __reduce(c,r);
}

word* polyDiv(word *c, word *a, word *b) {
    word t[SIZE_WORDS];
    return polyMul(c,a,polyInv(t,b));
}

word* polyInv(word *r, word* a) {

    register word t;
    register int j;

    word x[5*SIZE_WORDS] = {0},
    *v = &x[1*SIZE_WORDS],
    *u = &x[2*SIZE_WORDS],
    *g = &x[3*SIZE_WORDS],
    *f = &x[4*SIZE_WORDS];

    memcpy(u,a,SIZE_BYTES);

    v[0] = 0x425; 
    v[SIZE_WORDS-1] = 0x08000000;
    g[0] = 1;

inv_loop:
    if (u[0]==1 || u[0]==0) {
        for (j=1;j<SIZE_WORDS;j++)
            if (u[j]) goto inv_run;
        goto inv_done;
    }
inv_run:
    if ((j = deg(u)-deg(v)) < 0) {
        t = (word) v; v = u; u = (word*) t; // v <-> u
        t = (word) g; g = f; f = (word*) t; // g <-> f
        j = -j;    }

    memcpy(x,v,SIZE_BYTES);
    lshift(x,j); 
    polyAddTo(u,x);  // u = u + v>>j

    memcpy(x,f,SIZE_BYTES); 
    lshift(x,j); 
    polyAddTo(g,x);  // g = g + f>>j

    goto inv_loop;
inv_done:

    memcpy(r, g,SIZE_BYTES);
    return r;
}

__declspec(naked) word* __fastcall polyAddTo(word* a, word* b) {
    __asm {
        mov  edi, ecx
        mov  ecx, SIZE_WORDS
    _add_to_loop:
        mov  eax, [edx+4*ecx-4]
        xor  [edi+4*ecx-4], eax
        loop _add_to_loop
        mov  ecx, edi
    ret
} }


word* __fastcall polyAdd(word* c, word* a, word* b) {
    __asm {
        mov  ebx,ecx
        mov  esi,b
        mov  ecx, SIZE_WORDS
    _add_loop:
        mov  eax, [edx+4*ecx-4]
        xor  eax, [esi+4*ecx-4]
        mov  [ebx+4*ecx-4], eax
    loop _add_loop
} }


_declspec(naked) void __fastcall lshift(word* a, word n) { 
    __asm {
        test edx,edx
        jnz  _lshift_nonzeroshift
        ret
_lshift_nonzeroshift:
        mov  edi,ecx
        mov  esi,edi
        mov  eax,edx
        shr  eax,3
        test eax,eax
        je   _lshift_go
             mov  ecx, SIZE_BYTES
             sub  ecx, eax
             js   _lshift_zero
             add  edi, eax
        _lshift_preloop:
             dec  ecx
             mov  bl,  byte ptr[esi+ecx]
             mov  byte ptr [edi+ecx], bl
             test ecx, ecx
             jne  _lshift_preloop
             mov  ecx, eax
        _lshift_zloop:
             mov  byte ptr [esi+ecx-1],0
             loop _lshift_zloop
;;             add  esi, eax
             shl  eax, 3
             sub  edx, eax
             jz   _lshift_done
_lshift_go:
        mov  ecx,SIZE_WORDS
        mov  edi,esi
        clc
_lshift_subloop:
        rcl  dword ptr [edi], 1
        inc  edi
        inc  edi
        inc  edi
        inc  edi
        loop _lshift_subloop
        dec  edx
        jne  _lshift_go
_lshift_done:
        ret
_lshift_zero:
        mov  ecx,SIZE_BYTES
_lshift_zloop2:
        mov  byte ptr [esi+eax],0
        loop _lshift_zloop2
        ret
} }



_declspec(naked) void __fastcall lshift2(word* a, word n) { 
    __asm {
        test edx,edx
        jnz  _lshift2_nonzeroshift
        ret
_lshift2_nonzeroshift:
        mov  edi,ecx
        mov  esi,edi
        mov  eax,edx
        shr  eax,3
        test eax,eax
        je   _lshift2_go
             mov  ecx, SIZE_BYTES2
             sub  ecx, eax
             js   _lshift2_zero
             add  edi, eax
        _lshift2_preloop:
             dec  ecx
             mov  bl,  byte ptr[esi+ecx]
             mov  byte ptr [edi+ecx], bl
             test ecx, ecx
             jne  _lshift2_preloop
             mov  ecx, eax
        _lshift2_zloop:
             mov  byte ptr [esi+ecx-1],0
             loop _lshift2_zloop
;;             add  esi, eax
             shl  eax, 3
             sub  edx, eax
             jz   _lshift2_done
_lshift2_go:
        mov  edi, esi
        mov  ecx, SIZE_WORDS2
        clc
_lshift2_subloop:
        rcl  dword ptr [edi],1
        inc  edi
        inc  edi
        inc  edi
        inc  edi
        loop _lshift2_subloop
        dec  edx
        jne  _lshift2_go
_lshift2_done:
        ret
_lshift2_zero:
        mov  ecx,SIZE_BYTES2
_lshift2_zloop2:
        mov  byte ptr [esi+eax],0
        loop _lshift2_zloop2
        ret
} }


_declspec(naked) void __fastcall rshift(word* a, word n) { 
    __asm {
        mov  edi,ecx
        mov  esi,edi
        mov  eax,edx
        shr  eax,3
        test eax,eax
        je   _rshift_go
             push eax
             mov  ebx, SIZE_BYTES
             sub  ebx, eax
             js   _rshift_zero
             xor  ecx, ecx
             add  edi, eax
        _rshift_preloop:
             mov  al,  byte ptr[edi+ecx]
             mov  byte ptr [esi+ecx], al
             inc  ecx
             cmp  ecx, ebx
             jl   _rshift_preloop
             mov  edi,esi
             add  edi,ebx
             pop  ecx
             mov  eax,ecx
        _rshift_zloop:
             mov  byte ptr [edi+ecx-1],0
             loop _rshift_zloop
             shl  eax, 3
             sub  edx, eax
             jz   _rshift_done
_rshift_go:
        mov  ecx,SIZE_WORDS
        mov  edi,esi
        sub  edi,4
        shr  dword ptr [edi+4*ecx],1
        dec  ecx
_rshift_subloop:
        rcr  dword ptr [edi+4*ecx], 1
        loop _rshift_subloop
        dec  edx
        jne  _rshift_go
_rshift_done:
        ret
_rshift_zero:
        mov  ecx,SIZE_BYTES
_rshift_zloop2:
        mov  byte ptr [esi+eax],0
        loop _rshift_zloop2
        ret
} }



word* polyGen()  {
    return calloc(SIZE_WORDS,sizeof(word));
} 

word* polyRand() {
    register int i;
    register word* p = calloc(SIZE_WORDS,sizeof(word));
    
    for (i=SIZE_WORDS-1;i>=0;i--) 
        p[i] = (rand()<<16) | rand();
    p[SIZE_WORDS-1] &= EMPTY_MASK;
    return p;
}



int deg(word *a) {
    __asm {
        mov edx, a
        mov ecx, SIZE_WORDS
_deg_loop:
        mov ebx, [edx+4*ecx-4]
        test ebx,ebx
        jnz _deg_done
        loop _deg_loop
        xor eax,eax
        jmp _deg_zero
_deg_done:
        finit
        push 1
        fild dword ptr [esp]
        mov dword ptr [esp],0
        push ebx
        fild qword ptr [esp]
        fyl2x

        fstcw word ptr [esp]  ; round down
        fstcw word ptr [esp+2]
        and word ptr[esp],0xF3FF
        or word ptr[esp], 0x400
        fldcw [esp]
        frndint
        fldcw [esp+2]

        fistp qword ptr [esp]
        pop ebx
        pop eax
        mov eax,32
        dec ecx
        mul ecx
        add eax,ebx
        inc eax
_deg_zero:

    }
}

__declspec(naked) int __fastcall polyCmp(word* a, word* b) {
    __asm {
        mov edi, ecx            ; edi=a/edx=b
        xor eax, eax
        mov ecx, SIZE_WORDS
__loop_cmp:
        mov  ebx, [edi+ecx*4-4] ; ebx = a[n]
        mov  esi, [edx+ecx*4-4]    ; esi = b[n]
        test ebx,ebx
        jz   __a_n_zero
        test esi,esi
        jnz  __b_n_notzero
        inc  eax
        ret
__b_n_notzero:
        cmp  ebx, esi
        jz   __next_cmp
        jl   __lesser_cmp
        inc  eax
        ret
    __lesser_cmp:
        dec  eax
        ret
    __next_cmp:
        loop __loop_cmp
        ret
__a_n_zero:
        test esi,esi
        jz   __next_cmp
        dec  eax
        ret
} }

__declspec(naked) int __fastcall polyIsZero(word* a) {
    __asm {
        mov edi, ecx            ; edi=a/edx=b
        xor eax, eax
        mov ecx, SIZE_WORDS
__iszero_loop:
        cmp  [edi+ecx*4-4],0
        jne  __is_not_zero
        loop __iszero_loop
        inc  eax
__is_not_zero:
        ret
} }

word* dup(word* a) {
    register word* b = polyGen();
    memcpy(b,a,SIZE_BYTES);
    return b;
}


---------------------------------<curve.h>---------------------------------
 
#ifndef _CURVE_H
#define _CURVE_H

#include "binfields.h"

// We will use, by default, the Koblitz Curve
// y2 + xy = x3 + 1

word*  pointMul(word* R, word* k, word* P);

word*  pointAdd(word* R, word* P, word* Q);
word*  pointDbl(word* R, word* P);
word*  pointNeg(word* P);
word*  pointZero(word* P);

int    isZero(word* P);
int    isValidPoint(word* P);
int    pointCmp(word* P, word* Q);


word*  pointDup(word* P);
word*  pointGen();
word*  pointGenZero();

#endif

---------------------------------<curve.c>---------------------------------

#include <stdlib.h>
#include <memory.h>
#include "curve.h"

#define Y(__P) (&__P[SIZE_WORDS])
#define X(__P) __P


word PGEN[]= {
    0xA01C8972,0xE2945283,0x4DCA88C7,0x988B4717,0x494776FB,0xBBD1BA39,
    0xB4CEB08C,0x47DA304D,0x93B205E6,0x43709584,0x01841CA4,0x60248048,
    0x0012D5D4,0xAC9CA297,0xF8103FE4,0x82189631,0x59923FBC,0x026EB7A8,
    0x3EF1C7A3,0x01CD4C14,0x591984F6,0x320430C8,0x7BA7AF1B,0xB620B01A,
    0xF772AEDC,0x4FBEBBB9,0xAC44AEA7,0x9D4979C0,0x006D8A2C,0xFFC61EFC,
    0x9F307A54,0x4DD58CEC,0x3BCA9531,0x4F4AEADE,0x7F4FBF37,0x0349DC80
};

int isValidPoint(word* P) {

    // y2+xy=x3+1
    
    word t[SIZE_WORDS] = {0},
         l[SIZE_WORDS] = {0};

    polyAdd(t,X(P),Y(P));
    polyMul(t,t,Y(P));  // left side

    polySqr(l,X(P));
    polyMul(l,l,X(P));
    l[0] ^= 1;          // right side

    return !polyCmp(t,l);
}

word* pointDup(word *P) {
    return memcpy(malloc(SIZE_WORDS2*sizeof(word)),P,SIZE_BYTES2);
}
    
int pointCmp(word* P, word* Q) {
    switch (polyCmp(X(P),X(Q))) {
    case 1: return 1;
    case -1: return -1;
    default: return polyCmp(Y(P),Y(Q));
} }

word* pointGen() { 
    word* P = malloc(SIZE_WORDS2 * sizeof(word)); 
    return memcpy(P,PGEN,SIZE_BYTES2);
}

word* pointGenZero() {
    word* pt = calloc(SIZE_WORDS2, sizeof(word));
    pt[SIZE_WORDS-1] = 1 << 31;
    return pt;
}

int isZero(word* P) {
    return (P[SIZE_WORDS-1]>>31);
}

word* pointZero(word* P) {
    memset(P,0,SIZE_BYTES2);
    P[SIZE_WORDS-1] = 1 << 31;
    return P;
}

word* pointDbl(word* R, word* P) {

    word t1[SIZE_WORDS] = {0};
    word t2[SIZE_WORDS];

    if (isZero(P)) 
        return (R==P)?R:memcpy(R,P,SIZE_BYTES2);

    else if (!polyCmp(t1,P)) return pointZero(R); 


    polyDiv(Y(R),Y(P),X(P));
    polyAddTo(Y(R),X(P));
    R[SIZE_WORDS] ^= 1;

    polySqr(t2,X(P));      // t2 = x2
    polyInv(t1,t2);
    polyAddTo(t1,t2);      // t1 = x'
    polyMul(Y(R),Y(R),t1);
    polyAddTo(Y(R),t2);
    
    return memcpy(R,t1,SIZE_BYTES);
}

    

word* pointNeg(word* P) {
    if (isZero(P)) return P;
    polyAddTo(Y(P),X(P));
    return P;
}


word* pointAdd(word* R, word* P, word* Q) {

    word l[SIZE_WORDS],
         x[SIZE_WORDS],
         y[SIZE_WORDS];

    if (isZero(Q)) {
        if (R==P) return R;
        else return memcpy(R,P,SIZE_BYTES2);
    } else if (isZero(P)) {
        if (R==Q) return R;
        else return memcpy(R,Q,SIZE_BYTES2);
    } else {

        if (!pointCmp(P,Q)) {
            return pointDbl(R,P);
        } else {
            word* N = pointNeg(pointDup(Q));

            if (!pointCmp(P,N)) {
                pointZero(R);
                free(N);
                return R;
            } else free(N);

            polyAdd(l, Y(P), Y(Q));
            polyAdd(x, X(P), X(Q));
            polyDiv(l, l, x);
            polyAddTo(x,polySqr(y,l));
            polyAddTo(x,l);

            polyAdd(y, x, X(P));
            polyMul(y, y, l);
            polyAddTo(y, x);
            polyAdd(Y(R), y, Y(P));

            return memcpy(X(R),x,SIZE_BYTES);
        }
    }
}


word* pointSub(word* R, word* P, word* Q) {
    word* N = pointNeg(pointDup(Q));
    pointAdd(R,P,N);
    free(N);
    return R;
}


word* pointMul(word* R, word* k, word* P) {

    register int i;
    word* Q = pointDup(P);

    for (i=deg(k)-1;i>=0;i--) {
        pointDbl(Q,Q);
        if ((k[i/32]>>(i%32))&1) 
            pointAdd(Q, Q,P);
    }
    
    memcpy(R,Q,SIZE_BYTES2);
    free(Q);
    return R;
}


--------------------------------[   EOF   ]--------------------------------
This page is part of the .aware network. Content and design © 2004 - 2013 .aware network
Due to certified insanity, we are not responsible for anything, period.