Wednesday 25 November 2015

TANDQ 13: Pass It On

In 2014-2015 I wrote an education column called "There Are No Dumb Questions" for the website "MuseHack". As that site has evolved, I have decided to republish those columns here (updating the index page as I go) every Wednesday. This thirteenth column originally appeared on Thursday, March 26, 2015.

Why must my password include a capital letter?


Because the needs of the many outweigh the needs of the few. This being the one year anniversary of my column, I’ve decided to take a look at some rather simple mathematics that is often taken for granted: that of passwords. Then again, as geeks, writers, et cetera, maybe you have a very good grasp of the subject, along with how long it takes your computer hacker character to crack a code. (Maybe you can even teach me a thing or two in the comments!) I’ll endeavour to be entertaining regardless.

Here’s the basics. There are 26 letters on an English keyboard (I don’t know enough to comment on, say, Japanese). Let’s say that your password has to be exactly 10 characters long (makes it easy!). With 26 choices for each entry, the result is 26^10 or 141,167,095,653,376 possible passwords. Now, what if we include capital letters as options? This doubles the total character set, so 52 choices for each entry, resulting in 52^10 or... well, over 144 quadrillion possible passwords. We’ve increased our previous answer by 2^10. But wait, what if we FORCED at least one capital letter instead (required, no option)? Well, this is going to reduce the total. It only makes sense. When you add a restriction to something, the total will decrease.

In this case, with one (or more) of the characters having ONLY the 26 uppercase options, we can effectively remove every password that is all lowercase - in other words, the 26^10 options we had to start. They’re no longer valid. Granted, when you remove 141 trillion from over 144 quadrillion, you still have 144 quadrillion… but the restriction DID make your password a bit easier to guess. What if your password can be any number of characters? That’s harder. What if it must be at least 8 characters? Somewhat easier again - don’t try guessing a shorter password. (What if it’s a maximum number of characters instead? Then it could be that you’re watching Sherlock, Season 2.) The natural question at this point is: Why force conditions that ultimately decrease total options? It’s a pretty good question.

Predictable Entropy


Before we get into that, a word about password entropy. (I am now contractually obligated to point out this XKCD comic. There’s an in-depth analysis of the mathematics behind it in my ‘further viewing’ links below.) The short version: Entropy is defined as the total number of possible resultant states. In terms of a string of characters, this gives: (total_characters)^(length), the way we had 26^10, above. Computers work in binary, so take log base 2, giving: (length)*log_2 (total_characters) as the binary size of the message, aka bits of entropy. You’ll notice that length is the big multiplier. Yes, log base 2 of 26 is less than log base 2 of 52, but adding two more (lowercase) characters is almost equivalent. (12*log_2(26) and 10*log_2(52) are both about 57.)

So, how many bits do we need for a good password? Well, this website link says 72 bits of entropy/security is strong for short term, but 80 is better for long term use (supported elsewhere, as it means 2^80 passwords would need to be tried). How do we get there? With about 94 characters on the keyboard, we’ll need 80 = (length)*log_2 (94), so a length of 13 characters. (PIN numbers, I’m looking at you.) Here’s the interesting thing. This entropy can be similarly achieved by selecting a sequence of random words, known to many as a “passphrase”. Instead of a keyboard, let’s assume a dictionary/vocabulary of 1,000 words. Solving 80 = length*log_2 (1000) means a length of 8 words (repetition allowed). If this doesn’t seem to buy us much, try plugging in the ACTUAL size of your vocabulary to the equation - the number of necessary words will only decrease. (Unless you know less than 1,000 words.)

The caveat to using a “passphrase” is that it does need to be RANDOM. The second word shouldn’t be in any way be determined by the first. Humans are not good with random - we will pick our birthdate, our mother’s maiden name, and something off The List of 2014s Worst Passwords... all in lowercase. Unless, that is, we are forced away from that inclination using (surprise!) some sort of restriction. So even though there are a few of us who can follow the logic of “length over character use”, for the good of the many who would use their password length to expand on 123456, we must succumb to including at least one upper case character, et cetera, et cetera. It’s not all bad - throwing in a symbol does increase the complexity of a passphrase too.

Of course, all of this assumes your hacker is running some brute force algorithm, rather than being a bit more ambitious, and attempting to steal an entire password file off your network. There’s not much an individual user can do there (aside from constantly change their password, and I pretend that’s why my work account forces me to do this) but logically the system itself has security measures in place. For instance, cryptographic hash functions (a nice little application of high school mathematics). Good enough - until we hit something like 2014’s Heartbleed bug, also an XKCD comic. Or until the character in your novel decides to use telekinesis to figure out people’s passwords. But at that point, you might as well call in Sherlock to get his opinion.

For further viewing:

1. Strength/Entropy: Characters vs. Words

2. The math behind passwords

3. TeachNomination: Password Math (Video)

Got an idea or a question for a future TANDQ column? Let me know in the comments, or through email!

No comments:

Post a Comment