Mastering Hash Functions in C: SHA-256 and MD5 Demystified

Learn how to implement cryptographic hash functions in C with this step-by-step tutorial. Explore the inner workings of MD5 and SHA-256 and see how to code them from scratch.

Mastering Hash Functions in C: SHA-256 and MD5 Demystified
Photo by Muhammad Zaqy Al Fattah / Unsplash
Step-by-Step Guide to Implementing SHA-256 and MD5 in C

Are you curious about the cryptography that keeps your online transactions and communications secure? Want to learn how to code your cryptographic hash functions in C? Look no further! This tutorial will dive into the inner workings of two of the most widely used algorithms: MD5 and SHA-256.

Not only will you learn how these algorithms work, but you’ll also have the opportunity to implement them yourself. Plus, a completed implementation of these concepts is available in the accompanying GitHub repository, so you can see how it’s done and start experimenting on your own. So join us and demystify the world of encryption!

GitHub - jterrazz/42-ssl-md5: 🔒 OpenSSL implementation in C. Supports md5, sha1, sha256, sha224, sha512 and sha384 algorithms. A medium article is available in description.
🔒 OpenSSL implementation in C. Supports md5, sha1, sha256, sha224, sha512 and sha384 algorithms. A medium article is available in description. - GitHub - jterrazz/42-ssl-md5: 🔒 OpenSSL implementat…

Cryptographic functions

Cryptographic functions serve a variety of important purposes, including:

  • Ensuring the integrity of messages or files
  • Verifying passwords without storing them in clear text
  • Creating “proof of work” for use in cryptocurrencies like Bitcoin and Ethereum
  • Generating and verifying digital signatures

A typical cryptographic function takes a message of arbitrary size for input and produces a fixed-length hash. For example, MD5 produces 128-bit hashes, while SHA-256 produces 256-bit hashes. When represented in hexadecimal, MD5 hashes are 16 characters long, and SHA-256 hashes are 32 characters long (since two letters represent one byte, and one byte equals 8 bits).

md5 hash translation

Cryptographic hash functions have several fundamental properties:

  • Determinism: The same input always produces the same output.
  • Speed: The process should be able to compute hashes for any input size quickly.
  • One-wayness: It should be infeasible to determine the input given the hash.
  • Collision resistance: A cryptographic hash function is considered collision resistant if it is computationally impossible to find two distinct inputs that produce the same hash output.
  • Avalanche effect: Even a tiny change in the input should result in a significantly different hash.

These properties ensure that hash functions can be used effectively for cryptographic purposes.

md5

While the MD5 algorithm has historical interest, it is no longer considered secure for cryptographic purposes due to known vulnerabilities that allow multiple inputs to produce the same hash output. However, it can still be used for non-cryptographic purposes, such as checking the integrity of a downloaded file against unintentional corruption. It is important to note that MD5 should not be used for security-critical applications.

sha256

The SHA-2 family of cryptographic hash functions, which includes SHA-256, was developed as an improvement over the SHA-1 and SHA-0 functions. Both of which have roots in the MD5 algorithm. As such, there are many similarities between the various SHA functions.

SHA-256 is considered a secure algorithm because it is computationally infeasible to determine the input given the hash output. However, it is not recommended for storing passwords in databases due to its relatively low cost of computation, which makes it vulnerable to brute-force and rainbow table attacks.

The algorithms

Step 1 — Formatting the input

The shuffling functions used in cryptographic hash algorithms have specific requirements for the input message. In particular, the message must be divided into chunks of 512 bits, and the resulting formatted message must meet particular criteria.

The formatted message size must be a multiple of 512 bits

To ensure that the formatted message has the correct size, the following steps are taken:

  • A “1” bit is appended to the end of the message, and enough “0” bits are added to make the length of the formatted message a multiple of (512–64) bits.
  • The size of the original message is stored in a 64-bit format in the remaining space at the end of the formatted message.

Then, the formatted message length can be calculated using the following formula: aligned = (nb + (X-1)) & ~(X-1) This formula aligns a number ‘nb’ to a multiple of ‘X’ bytes.

The build_msg function may also have an ‘is_little_endian’ parameter, which indicates the endianness of the input message. Endianness refers to the order in which bits are stored in memory and can be either little-endian or big-endian. You can learn more about endianness in this article:

Little & Big Endian
Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary…

To convert a 64-bit number from one endianness to the other, you can use the following function:

The shuffling algorithm works by iteratively inverting small groups of two-byte neighbors, starting with the smallest group and progressively expanding to larger groups.

bswap(uint64_t)

Step 2 — Shuffling the data

The shuffling steps for cryptographic hash algorithms differ slightly depending on whether you use MD5 or SHA-256. However, in both cases, the message is divided into 512-bit chunks.

The algorithm uses 32-bit buffers to hold data (4 for MD5 and 8 for SHA-256). For each chunk, the algorithm performs mathematical operations on these buffers, which modify their state. While the details of these operations can be complex, you can learn more about them in the following articles or by exploring the implementation in my project: SHA-256 and MD5.

MD5 - Wikipedia
SHA-2 - Wikipedia

Last step — Build the hash from buffers

After the shuffling step, the algorithm returns 4 (or 8 for SHA-256) buffers that can be used to create a readable hash. To obtain the hash, these buffers are concatenated using their hexadecimal representation.

One final issue to consider when using the MD5 algorithm: the buffers are in little-endian format, meaning that the bits must be inverted for 32-bit numbers to print them in the correct order.

Helpers

Do you need help implementing SHA-256? Check out this step-by-step guide for a helpful breakdown of the process.

SHA-256
Start Here; user interface put your original message to be hashed in cell A2up to 55 printable ASCII charactersplease avoid objectionable strings,SHA256(A2)123,A665A45920422F9D417E4867EFDC4FB8A04A1F3FFF1FA07E998E86F7F7A27AE3cell A2 has length 3I have enabled the comments feature. If you ri…

I hope this tutorial has given you a solid understanding of cryptographic functions. If you’d like to see a complete implementation of these concepts, be sure to check out my code on GitHub. And don’t hesitate to reach out if you have any questions or need further guidance. Happy coding!


Tired of supporting big corporations every time you shop online? Switch to open.mt, the decentralized marketplace that uses blockchain technology to enable peer-to-peer commerce with no intermediaries. Not only can you get better prices and faster transactions, but you can also support your local merchants and keep your community thriving.

Follow us for updates on our progress, and be the first to join the open.mt community when we launch. I appreciate your support, and we can’t wait to welcome you to our open market!

Open Market Technologies
Support your community and shop locally with open.mt! Our decentralized marketplace uses blockchain tech for P2P…