Sunday, November 19, 2017

Crypto Digest Collisions


Summary

Proof-of-work is a critical component of the various crypto coin variants in use today.  In a nutshell, the blockchain algorithms that form the basis of crypto coins require participants to complete a provable amount of computation before a transaction can be added to the chain.  Partial hash collision is one such proof of work.

The post covers a toy implementation of partial hash collision.

Background

The basis for this proof-of-work method is hash digests.  An input string is processed through a cryptographically-strong function that outputs a fixed-length and unique hash value for that given input string.  Given a strong hash algorithm, it should be computationally unfeasible to find two input strings that create the same hash digest - also known as collision resistance.

Collisions on the full bit-length of a hash may be unfeasible, but collisions on a subset of the bits can be accomplished.  There are no known short-cuts to finding a collision but trial and error - submitting a different input string to a hash algorithm, over and over.  The typical partial hash collision method takes a string as input, concatenates a number to that string, then tests the high-order bits of the resulting hash.  If the necessary number of high-order bits are all 0, a solution has been found.

Example:  Say a given hash algorithm outputs a 10-bit hash and we want to find string with given input that results in the first 4 bits all equal to 0.  There are 2**10 total unique hashes available in this algorithm.  There are 2**6 hashes where the first 4 bits are 0.  In general, we would have to try 2**4 (16) inputs to the hash algorithm to find a solution.  Real world implementations deal with 160/256 bit hashes and collision lengths of >20 bits.

Implementation

Single Process

totalBits = 256  
collisionBits = 23 
mask = (1<<totalBits) - (1<<totalBits-collisionBits)
textInput = None

def singleProcessHash(text):
    count = 0    
    ti = time.time()
    
    while True:
        msg = text + ':' + str(count)
        dig = hashlib.sha256(msg.encode(encoding='utf_8', errors='strict')).hexdigest()
        res = int(dig,16) & mask
        if res == 0:
            break   
        count+=1
            
    print('Single-process run time: {:0.4f}'.format(time.time()-ti))
    print ('Message: ', msg)
    return msg
Lines 1-2:  Total bit length of the hash and the number of required high-order bits to be set to 0.  SHA-256 outputs a 256-bit hash.  For this example, the 23 high-order bits have to be 0 for a valid solution string.
Line 3:  Constucts a 256-bit mask with the 23 high-order bits set to 1.  All other bits are 0.
Lines 10-16: Brute-force loop that successively concats a number to the input text, hashes it thru SHA-256, and then does a bit-wise AND against the mask to test if the 23 high-order bits are all 0.  If the result isn't a partial collision, increment the counter and continue to loop.

Multiple Processes

def init(text):
    global textInput
    textInput = text
    
def findCollision(arg):
    global textInput
    msg = textInput + ':' + str(arg)
    hsh = hashlib.sha256(msg.encode(encoding='utf_8', errors='strict')).hexdigest()
    res = int(hsh,16) & mask
    if res == 0:
        return msg

def multiProcessHash(text):
    mult = 1
    numValues = 10**6
    chunkSize = math.ceil(numValues/mp.cpu_count())
    found = False
    msg = None
    ti = time.time()
   
    pool = mp.Pool(initializer=init, initargs=[text])
    while not found: 
        sample = range(numValues*(mult-1), numValues*mult)
        mult += 1
        results = pool.imap_unordered(findCollision, sample, chunkSize)
        for msg in results:
            if msg:
                pool.terminate()
                pool.join()
                found = True
                break

    print('Multi-process run time: {:0.4f}'.format(time.time()-ti))
    print('Message: ', msg)
    return msg
Lines 1-3:  Init function setting the text argument for each of the worker processes in the pool. Unfortunate aspect of Python, but use of globals is the most straight-forward way to accomplish this.
Lines 5-11:  Collision test function performed by each of the worker processes.
Line 15:  Variable to set up 1M values to be sent to the pool per iteration.
Line 16:  Chunk up those 1M test values evenly amongst the worker processes.  One worker per CPU.
Lines 21-31:  Set up the worker pool and loop through successive 1M value batches of samples until a partial collision is found.

Results

This was tested on a quad-core machine with hyper-threading (8 logical CPU's).
if __name__ == '__main__':
    singleProcessHash(string.ascii_uppercase)
    print()
    multiProcessHash(string.ascii_uppercase)  
Single-process run time: 18.9084
Message:  ABCDEFGHIJKLMNOPQRSTUVWXYZ:7201566

Multi-process run time: 6.7886
Message:  ABCDEFGHIJKLMNOPQRSTUVWXYZ:7201566


Copyright ©1993-2024 Joey E Whelan, All rights reserved.