Blockchain technology has undoubtedly been one of the most revolutionary technologies of the 21st century, and that doesn’t seem that’s going to change anytime soon. While the mathematical concepts that make blockchain feasible existed long before, the perfect use of blockchain has finally been found in the form of cryptocurrencies like Ethereum and Bitcoin. With blockchain technology continuing to forge a prominent role in our lives, one aspect of it that requires further explanation is hashing.
In layman’s terms, hashing takes various input data and uses it to perform a complex mathematical equation, the answer to which becomes a fixed-size output data. The output (also called a digest) is a fixed length whereas the size of the input data (also called a message or string) is unimportant.
As it pertains to Bitcoin and other cryptocurrencies, the input data serve as transactions that are put into an algorithm (in the case of Bitcoin, it’s the SHA-256 hashing algorithm) that works to create the fixed-length output.
Whether you use a single letter or millions of letters as your input data, hashing algorithms will always give you a fixed-size output. When using blockchain technology, the input will be the complete transaction and the output will be the state of that particular blockchain after going through the SHA-256 algorithm. Essentially, hashing allows any piece of data of any size to be stored in a simple and effective manner. This helps various parties like miners and networks to agree on the current state of the chain. Thus, hashing is one of the two primary cryptographic concepts that support blockchain technology. The other one is the digital signature.
At its most fundamental level, Bitcoin blockchain is composed of blocks that function as data storage devices that are filled with various transactions. It’s important to note that blockchain includes the so-called genesis block, which is the original block containing the first transaction ever made via Bitcoin. These transactions were used inside the genesis block as the inputs for the SHA-256 algorithm. Keep in mind that every time there is a new block, the output of the previous block’s hash will be used along with its transactions as one of the inputs for generating the output of the new block. This output will then be used for the next block, and so on. This is how we get the name blockchain, as each hash helps to form a serious of blocks, also known as block chains.
This system of chains ensures that tampering with past transactions is impossible. If one transaction is changed, even a small change, it alters the inputs to the blocks and the new hash output will be different from the original. As long as everyone agrees with the SHA-256 algorithm, the system is tamper-proof.
The use of hash functions in cryptography requires specific dimensions, as they require certain design qualities to make them secure. As is the case with digital signatures, e-commerce protocols, message authentication codes (MACs), cryptographic hash functions can be quite useful for providing security and privacy.
Cryptographic Hash Functions Properties
A cryptographic cash function should be able to quickly convert an input to output. Even if a more complicated hash function will ultimately be safer, efficiency is also important.
Every time you parse a specific input through the hash function, the result will be the same. If you somehow get different hashes each time, tracking the input will prove impossible.
3. Collision Resistance
Finding different inputs that have the same outputs should be virtually impossible, a fact that is vital to digital safety. The odds of doing this, while possible, should be astronomically long. Such an occurrence is called a cryptographic hall collision. Having strong collision resistance, which is essential, is based on two other types of resistance.
It’s vital that the hash function only moves in one direction. It should be next to impossible for a player to learn an input based on the output. Functions that don’t have resistance against this will be vulnerable to preimage attacks. It’s only through the “brute-force method” that one should be able to learn the original input.
Brute-force refers to choosing a random input, hashing it, and comparing it to the output until finding a match. There are three possible results to the brute-force method. The first is getting a match on the first try, which is almost impossible from a mathematical perspective. The second is finding the output after trying every possible input. The third scenario is finding the output somewhere between the first and last possible guess, which is also mathematically improbable.
Second Pre-Image Resistance
It should also be difficult to find two inputs that have the same has output. Functions without viable resistance to this are vulnerable to second pre-image attacks.
It should go without saying that collision resistance is important in cryptography. At times, people create what they think are collision resistant functions but are proven not to be at a later date. The birthday paradox is an intriguing problem related to the concept of collision resistance.
First, consider that the chances of two strangers having the same birthday is 0.27%. Of course, is there are 366 strangers in the same place, the probability of two of those people having the same birthday is 100%. However, if it’s just you and another stranger with the odds at 0.27%, a third person entering the room does not cause the odds to double as you might think. This is essentially the birthday paradox.
It would be too difficult to calculate the probability of a shared birthday among a group of people. Instead, we calculate the probability that no one shares a birthday. The formula for this is:
P (match) = 1 – P (no match), with P referring to probability.
The probability of a person in an “n” group of people not sharing a birthday is calculated with this formula:
This number is then subtracted from one, giving us the probability that “n” number of people share a birthday.
The birthday paradox essentially compares individual probabilities against one another. Obviously, the more people in a room, the more likely two people are to share a birthday. There is an exponential increase in probability rather than a linear growth.
The birthday paradox is utilized in a cryptographic attack called the birthday attack in which brute force is used to find two inputs with the same output. Essentially, it tries to create a collision because the birthday paradox helps to make the brute force method more effective at creating a collision than you would think. Of course, the odds are still astronomical. A Bitcoin attacker would have a one in a trillion chance of getting one satoshi from a successful collision. Even though the SHA-256 algorithm is yet to have its collision resistance broken, there are plans in place to switch to a SHA-3 series of algorithms.
Hash functions must be able to hide information about their input to make it difficult to determine the original message based on the output. Even if a malicious can tell if the input was odd or even, the hash will not be as secure as it should be.
The final output of every hash function needs to be randomly distributed. For every “Y” output, if “k” comes from a distribution with high min-entropy, it’s impossible to have an input “x” such that H(k | x) = Y
High min-entropy refers to the distribution from which the value is chosen is so widely distributed that picking a random number has little impact on probability. For instance, a number between one and 10 has a low min-entropy distribution whereas a number between one and 2^256 has a high min-entropy distribution. The “|” refers to concatenation, which refers to adding two strings together.
For example, if you have a “Y” output and guess a random value for “k” from a wide distribution, it’ll be virtually impossible to find a value for “X” in the sense that running the concatenation for k and X through a hash will generate Y. This is a concept that we see with Bitcoin mining.
The final output should be akin to a series of coin flips, although the hash algorithm ensures there will always be a fixed output for each individual input. The best algorithms ensure the output distribution makes it impossible for malicious players to find patterns that help them figure out the original input.
Hashing Data Structures
Data structures are a way to store data. Understanding blockchain requires an understanding of two elements of data structure: pointers and linked lists.
Pointers are programming variables that store the address of other variables. Typically, variables store data, regardless of the programming language. Rather than actually storing numeric values, pointers will help to store the address of other variables. As their name implies, pointers lead you to the location of other variables.
Linked Lists are akin to a chain of blocks in that each clock has data and is inherently linked to the next one. The pointer is the part of the data structure that knows the address of the next block in the chain. It’s also important to note that the last block has a null pointer, which holds no value until the next block in the chain is created.
As you may be able to tell, a blockchain consists of linked blocks that have a hash pointer, a distinct type of pointer, leading to a previous block to create a chain. Hash pointers provide the address of the previous block after going through a hash algorithm, making it a secure link between the two blocks.
Of course, the genesis block has no pointer going into it. There is only a pointer that links it to the second block in the chain. This particular pointer can only be created with hashing data from the genesis block.
Such a system can block an attack if an attacker attempts to change the data on a chain because any change in input data causes a change in output. Once data is changed to any particular block, the entire chain is altered, alerting everyone involved that the original chain is no longer intact. This is why blockchains can’t be altered. If each block in a chain couldn’t be identified, tracking them would be impossible. The block header is how the identification process is conducted.
Each block header contains the following:
- Chain version numbers (to track upgrades and protocol changes)
- Current timestamp of the block in UNIX
- Difficulty target of the block (number of zeroes needed to meet the level of proof of work needed to maintain a 10-minute block creation time)
- Hash pointer
- Nonce (the value that miners use to generate a correct hash)
- Merkle root hash of a particular block’s transactions
Each part is critical to block creation. The nonce, in particular, is important because miners alter it in order to try different permutations, and the more permutations, the tougher it is to mind.
Meanwhile, at the bottom of the Merkle tree are leaf nodes, which are built upon by non-leaf nodes. A non-leaf node is the hash of a child node, while the root node sits at the top of the tree. As it pertains to cryptocurrencies, the Merkle tree solves the problem of the amount of storage space and time that would be required to store every transaction on the blockchain. Tracing the hashes of a certain block down the Merkle tree can help lead us to where a certain piece of data should be located.
In Relation To Bitcoin
To put it simply, mining Bitcoins is the search for new blocks that you can add to the blockchain. Merkle architecture is the basis of every blockchain, as individual blocks serve as the leaf, non-leaf, and child nodes. It’s the job of miners to make sure the chain keeps growing. Miners offer their computational power to the Bitcoin network, and their power is red to hash data, helping add to the total hash rate of the Bitcoin blockchain. Hash rate refers to the speed of miners to conduct hashing operations. The higher the hash rate, the more miners are involved, increasing the amount of work miners have at their disposal.
Miners use computational work to solve the challenge string, which is essentially a series of numbers that always starts with a multitude of zeros. More zeros imply the mining project will be more difficult. In order to solve the challenge string, the miner must find the response or proof string. Doing this requires investing computational power to hash the transactions to create a new block before adding a random value via the nonce to create a possible solution. The response string is put into a hash function to see if the output is equal to or less than the challenge string. If not, the nonce will undergo millions of changes until the requirements are met. When that finally happens, the block can be added to the blockchain.
If we apply the previous formula: H(k | x) = Y to Bitcoin, K is the nonce, x is the hash of the block, and Y is the difficulty target. The nonce selection process is random and based on the brute force method. Thus, mining software continually generates random strings until the solution is found.
The more zeros in an output, the more work required to find the solution. For instance, an output with 40 zeros requires 2 to the 40 power possible solutions (more or less). Next, the blockchain can run a hash function to check the validity of the proof. If it checks out, the block can be added to the blockchain. Currently, the difficulty is at 10 minuter per block added, which ensures a low probability of collision and limits the number of orphan blocks, which are blocks that don’t get added to the blockchain because the miner fails to find the solution. Of course, the difficulty gets readjusted every 2016 blocks.
Hashing is what allows the proof-of-work algorithm to function, making it vital to the Bitcoin network. It’s an efficient system that provides both security and collision resistance. Without it, the blockchain wouldn’t exist, at least not in the same form we know it today.