Sunday, November 19, 2017

Crypto Digest Collisions


Summary

Proof-of-work is a critical component of the various crypto coin variants in use today.  In a nutshell, the blockchain algorithms that form the basis of crypto coins require participants to complete a provable amount of computation before a transaction can be added to the chain.  Partial hash collision is one such proof of work.

The post covers a toy implementation of partial hash collision.

Background

The basis for this proof-of-work method is hash digests.  An input string is processed through a cryptographically-strong function that outputs a fixed-length and unique hash value for that given input string.  Given a strong hash algorithm, it should be computationally unfeasible to find two input strings that create the same hash digest - also known as collision resistance.

Collisions on the full bit-length of a hash may be unfeasible, but collisions on a subset of the bits can be accomplished.  There are no known short-cuts to finding a collision but trial and error - submitting a different input string to a hash algorithm, over and over.  The typical partial hash collision method takes a string as input, concatenates a number to that string, then tests the high-order bits of the resulting hash.  If the necessary number of high-order bits are all 0, a solution has been found.

Example:  Say a given hash algorithm outputs a 10-bit hash and we want to find string with given input that results in the first 4 bits all equal to 0.  There are 2**10 total unique hashes available in this algorithm.  There are 2**6 hashes where the first 4 bits are 0.  In general, we would have to try 2**4 (16) inputs to the hash algorithm to find a solution.  Real world implementations deal with 160/256 bit hashes and collision lengths of >20 bits.

Implementation

Single Process

  1. totalBits = 256
  2. collisionBits = 23
  3. mask = (1<<totalBits) - (1<<totalBits-collisionBits)
  4. textInput = None
  5.  
  6. def singleProcessHash(text):
  7. count = 0
  8. ti = time.time()
  9. while True:
  10. msg = text + ':' + str(count)
  11. dig = hashlib.sha256(msg.encode(encoding='utf_8', errors='strict')).hexdigest()
  12. res = int(dig,16) & mask
  13. if res == 0:
  14. break
  15. count+=1
  16. print('Single-process run time: {:0.4f}'.format(time.time()-ti))
  17. print ('Message: ', msg)
  18. return msg
Lines 1-2:  Total bit length of the hash and the number of required high-order bits to be set to 0.  SHA-256 outputs a 256-bit hash.  For this example, the 23 high-order bits have to be 0 for a valid solution string.
Line 3:  Constucts a 256-bit mask with the 23 high-order bits set to 1.  All other bits are 0.
Lines 10-16: Brute-force loop that successively concats a number to the input text, hashes it thru SHA-256, and then does a bit-wise AND against the mask to test if the 23 high-order bits are all 0.  If the result isn't a partial collision, increment the counter and continue to loop.

Multiple Processes

  1. def init(text):
  2. global textInput
  3. textInput = text
  4. def findCollision(arg):
  5. global textInput
  6. msg = textInput + ':' + str(arg)
  7. hsh = hashlib.sha256(msg.encode(encoding='utf_8', errors='strict')).hexdigest()
  8. res = int(hsh,16) & mask
  9. if res == 0:
  10. return msg
  11.  
  12. def multiProcessHash(text):
  13. mult = 1
  14. numValues = 10**6
  15. chunkSize = math.ceil(numValues/mp.cpu_count())
  16. found = False
  17. msg = None
  18. ti = time.time()
  19. pool = mp.Pool(initializer=init, initargs=[text])
  20. while not found:
  21. sample = range(numValues*(mult-1), numValues*mult)
  22. mult += 1
  23. results = pool.imap_unordered(findCollision, sample, chunkSize)
  24. for msg in results:
  25. if msg:
  26. pool.terminate()
  27. pool.join()
  28. found = True
  29. break
  30.  
  31. print('Multi-process run time: {:0.4f}'.format(time.time()-ti))
  32. print('Message: ', msg)
  33. return msg
Lines 1-3:  Init function setting the text argument for each of the worker processes in the pool. Unfortunate aspect of Python, but use of globals is the most straight-forward way to accomplish this.
Lines 5-11:  Collision test function performed by each of the worker processes.
Line 15:  Variable to set up 1M values to be sent to the pool per iteration.
Line 16:  Chunk up those 1M test values evenly amongst the worker processes.  One worker per CPU.
Lines 21-31:  Set up the worker pool and loop through successive 1M value batches of samples until a partial collision is found.

Results

This was tested on a quad-core machine with hyper-threading (8 logical CPU's).
  1. if __name__ == '__main__':
  2. singleProcessHash(string.ascii_uppercase)
  3. print()
  4. multiProcessHash(string.ascii_uppercase)
  1. Single-process run time: 18.9084
  2. Message: ABCDEFGHIJKLMNOPQRSTUVWXYZ:7201566
  3.  
  4. Multi-process run time: 6.7886
  5. Message: ABCDEFGHIJKLMNOPQRSTUVWXYZ:7201566


Copyright ©1993-2024 Joey E Whelan, All rights reserved.