Summary
Proof-of-work is a critical component of the various crypto coin variants in use today. In a nutshell, the blockchain algorithms that form the basis of crypto coins require participants to complete a provable amount of computation before a transaction can be added to the chain. Partial hash collision is one such proof of work.
The post covers a toy implementation of partial hash collision.
Line 3: Constucts a 256-bit mask with the 23 high-order bits set to 1. All other bits are 0.
Lines 10-16: Brute-force loop that successively concats a number to the input text, hashes it thru SHA-256, and then does a bit-wise AND against the mask to test if the 23 high-order bits are all 0. If the result isn't a partial collision, increment the counter and continue to loop.
Lines 5-11: Collision test function performed by each of the worker processes.
Line 15: Variable to set up 1M values to be sent to the pool per iteration.
Line 16: Chunk up those 1M test values evenly amongst the worker processes. One worker per CPU.
Lines 21-31: Set up the worker pool and loop through successive 1M value batches of samples until a partial collision is found.
This was tested on a quad-core machine with hyper-threading (8 logical CPU's).The post covers a toy implementation of partial hash collision.
Background
The basis for this proof-of-work method is hash digests. An input string is processed through a cryptographically-strong function that outputs a fixed-length and unique hash value for that given input string. Given a strong hash algorithm, it should be computationally unfeasible to find two input strings that create the same hash digest - also known as collision resistance.
Collisions on the full bit-length of a hash may be unfeasible, but collisions on a subset of the bits can be accomplished. There are no known short-cuts to finding a collision but trial and error - submitting a different input string to a hash algorithm, over and over. The typical partial hash collision method takes a string as input, concatenates a number to that string, then tests the high-order bits of the resulting hash. If the necessary number of high-order bits are all 0, a solution has been found.
Example: Say a given hash algorithm outputs a 10-bit hash and we want to find string with given input that results in the first 4 bits all equal to 0. There are 2**10 total unique hashes available in this algorithm. There are 2**6 hashes where the first 4 bits are 0. In general, we would have to try 2**4 (16) inputs to the hash algorithm to find a solution. Real world implementations deal with 160/256 bit hashes and collision lengths of >20 bits.
Implementation
Single Process
totalBits = 256 collisionBits = 23 mask = (1<<totalBits) - (1<<totalBits-collisionBits) textInput = None def singleProcessHash(text): count = 0 ti = time.time() while True: msg = text + ':' + str(count) dig = hashlib.sha256(msg.encode(encoding='utf_8', errors='strict')).hexdigest() res = int(dig,16) & mask if res == 0: break count+=1 print('Single-process run time: {:0.4f}'.format(time.time()-ti)) print ('Message: ', msg) return msgLines 1-2: Total bit length of the hash and the number of required high-order bits to be set to 0. SHA-256 outputs a 256-bit hash. For this example, the 23 high-order bits have to be 0 for a valid solution string.
Line 3: Constucts a 256-bit mask with the 23 high-order bits set to 1. All other bits are 0.
Lines 10-16: Brute-force loop that successively concats a number to the input text, hashes it thru SHA-256, and then does a bit-wise AND against the mask to test if the 23 high-order bits are all 0. If the result isn't a partial collision, increment the counter and continue to loop.
Multiple Processes
def init(text): global textInput textInput = text def findCollision(arg): global textInput msg = textInput + ':' + str(arg) hsh = hashlib.sha256(msg.encode(encoding='utf_8', errors='strict')).hexdigest() res = int(hsh,16) & mask if res == 0: return msg def multiProcessHash(text): mult = 1 numValues = 10**6 chunkSize = math.ceil(numValues/mp.cpu_count()) found = False msg = None ti = time.time() pool = mp.Pool(initializer=init, initargs=[text]) while not found: sample = range(numValues*(mult-1), numValues*mult) mult += 1 results = pool.imap_unordered(findCollision, sample, chunkSize) for msg in results: if msg: pool.terminate() pool.join() found = True break print('Multi-process run time: {:0.4f}'.format(time.time()-ti)) print('Message: ', msg) return msgLines 1-3: Init function setting the text argument for each of the worker processes in the pool. Unfortunate aspect of Python, but use of globals is the most straight-forward way to accomplish this.
Lines 5-11: Collision test function performed by each of the worker processes.
Line 15: Variable to set up 1M values to be sent to the pool per iteration.
Line 16: Chunk up those 1M test values evenly amongst the worker processes. One worker per CPU.
Lines 21-31: Set up the worker pool and loop through successive 1M value batches of samples until a partial collision is found.
Results
if __name__ == '__main__': singleProcessHash(string.ascii_uppercase) print() multiProcessHash(string.ascii_uppercase)
Single-process run time: 18.9084 Message: ABCDEFGHIJKLMNOPQRSTUVWXYZ:7201566 Multi-process run time: 6.7886 Message: ABCDEFGHIJKLMNOPQRSTUVWXYZ:7201566
Copyright ©1993-2024 Joey E Whelan, All rights reserved.