If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content
Current time:0:00Total duration:10:14

Video transcript

cryptographic hash functions are basically fundamental building blocks that are used within many cryptographic algorithms and protocols and they have a number of very important applications in the context of information security as a whole now some of the more common algorithms in this category that are known as cryptographic hash functions include things like md5 and also it has some predecessors like MV 4 and others as well as algorithms like like sha-256 and actually sha-256 is preceded by other algorithms like sha-1 and and so on and also there are some algorithms that maybe you've heard of or maybe that are a bit lesser-known but things like ripe MD and Blake and Skye and others now cryptographic hash functions are basically used as critical building blocks in many applications and in really the first motivating application the first historical application of these types of hash functions was in the context of what's known as a as a digital signature digital signatures are used in so many different cryptographic applications today they're they're the cornerstone of many e-commerce protocols they are used in doing things like Bitcoin generation and so on cryptographic hash functions are also used in things like message authentication protocols in pseudo random number generation and password security even encryption to some degree in fact aside from their use of digital signatures these hash functions are also used in other places in the Bitcoin protocol as well so first of all let me talk about what a cryptographic hash function actually is and of course as the name implies the the first thing it is it's a hash function and by hash function I basically mean that it will take input it's a mathematical function or transformation if you will that takes a particular input and we call this input a message okay and the message can be of arbitrary length and the hash function basically applies a mathematical transformation maybe a set of mathematical transformations to this input to produce a single output and we typically call this output a die although sometimes you will see the output referred to as a as a tag or as a hash or and the fingerprint but digest is sort of the most common nomenclature in fact md5 which was one of the one of the earlier hash functions stands for message digest 5 ok and MD 4th messages are just for and so on and so forth ok now the message as I mentioned briefly can be of arbitrary size it can be as long as you want or as short as you want but the output the size of the digest or the size of the tag is going to be fixed in length and for example in the context of a hash function like let's say sha-256 the digest will actually be exactly 256 bits in length ok so it's going to have a fixed length output but an arbitrary length input ok and the other thing I want to point out about these cryptographic hash functions that the the function here is a deterministic function and by that I mean that the output will always be the same for a given input so if you have a given input you're going to see the exact same output you won't have a situation in which maybe a given input will have two different possible outputs it's going to be consistent ok now traditional traditional hash functions that have been used in computer science for quite some time and they're used in many computing applications for example you may have heard of something like a hash one can use and something called the hash table but the kinds of hash functions that are used in hash tables and I want to stress this they aren't necessarily the same as cryptographic hash functions ok the qualifier cryptographic here is very very important and it usually means or implies that that hash function has to have a certain set of critical design goals and maybe some particular properties in mind that make it suitable for use in applications that use cryptography or in cryptographic applications areas where perhaps security or privacy or confidentiality or authentication is a serious concern ok so first and foremost it may be in describing some of these properties is that and maybe this almost goes without saying one of the first properties you want have a cryptographic hash function is that it should be computationally computationally efficient and by that I mean that it shouldn't take a lot of time to compute the output from a given input if you're you're given a message and you want to apply this set of transformations to that message to get a digest that set of transformations should not take a long time to implement on a computer to be very fast or relatively fast and it almost goes without saying but I think it is important to stress and point it out because I've seen people come up with grossly inefficient hash functions sometimes and those would not be considered suitable in the context of one typical cryptographic hash functions are used for cryptographic applications okay and the second property you typically need in the context and this is especially in the context of digital signing is that you want it to be the case that it's hard to find two inputs that actually map to the same output I mean two distinct inputs whose corresponding digest is identical and this property is typically referred to as collision collision resistance okay that means it's hard to find a colliding pair of input so in other words if you have two inputs and let's say you have some message and one you have a message m2 okay their output under the application of the hash function should not be the same you won't you won't ever have to be the same that the you won't ever have to be the case rather that the output of m1 and m2 under an application of the hash function is is the same it should they should never be the same should always be different okay now I should take a step back here and point out that of course in practice given that messages can be of arbitrary size and given that the input or the output rather is a fixed size it's not mathematically possible to guarantee that the output will always be different for for two distinct messages but what you typically want is not that the outputs are are necessarily different but that it's hard to find two distinct messages that produce the same output we know that they exist by virtue of the fact that there are many messages that can be hashed and only a finite and small number a relatively small number compared to the number of messages a smaller number of possible digest values but outside from from that consideration it should be hard it should take a long time and by long I mean an astronomically long time to find two distinct messages that would result in the same output under the application of the hash function okay now the third thing I want to point out is that in many cases you might want also in the context of a hash function for the hash function to be able to hide information about the inputs in other words given the given the output it should be hard to glean anything useful or interesting about the input okay anything any relevant detail in it and I don't just mean the whole input but maybe even something as simple as you know was the input an even number or an odd number I mean that should be the kind of thing that should be hard to infer when you see the output even something as simple as the the evenness or oddness of the input okay now a fourth property I want to point out is that you typically want the output to be well distributed in other words the the the output you know should look should look random in other words it should look like like a set of coin flips took place not that there was a predictable way in which the output was created okay and really what that means is that and you can think of it maybe more conceptually as it's almost as if you flipped a bunch of coins to get to the output it should look just that random okay and so what you can really think of cryptographic hash functions as is perhaps maybe the mathematical equivalent for mathematical analog of a meat grinder of sorta can take inputs and apply these these mathematical transformations to them such that the output looks for all intents and purposes completely random and unrelated to the original input okay I do want to make a few quick remarks about these particular properties and first of all they are interrelated for example if you have a situation where the outputs let's say really appear to bear no relationship to the input and the outputs also look random then then that will in fact give you two large agree a lot of the collision resistance properties because just the fact that you can't predict the output of the fact that it hides all this information implies that it's going to be hard to find two inputs that are distinct that map to the same output and so sometimes you get one property in exchange for the others now the second thing I want to point out is that typically these properties in practice or maybe even in interline mathematics are things that you you hope for but you can't always guarantee that they'll they'll always hold it and it may be entirely possible that you could design a hash function that you think is completely collision resistant but someone might come along a year from now and come up with a more clever way for finding a collision maybe they'll find a clever shortcut that does not involve doing a brute-force search of any sort okay and it turns out to cryptographers for better or worse currently do not have any mathematical technique they haven't developed techniques for being able to work around some of these limitations and so we often do take it on face value that these schemes are secure based on on how long they've been around okay I also want to point out the last thing I want to point out is that this treatment that I've given is not meant to be mathematically rigorous by by any stretch of the imagination there are far more precise ways to describe these underlying properties but my hope instead is that this video gives you maybe a bit of a flavor for what is required of a cryptographic hash function without maybe getting bogged down in some of the the mathematical minutiae and formalism