Jan 18, 2008

Obfuscating IDs in URLs with Rails

There are a number of valid reasons to obfuscate IDs in URLs but it's not a replacement for authentication!

My approach to this problem is to allow the database to assign a normal serial ID, which means I don't have to maintain any extra state or use special database features. Then after it's created I save the obfuscated value of the id which I can use on later lookups.

I use Knuth's integer hash because it ensures there will be no collisions and the full range of values will be used.

In this example I'm using a signed 32 bit integer for the hashed_id because it's supported by all common Rails databases. You could however adjust MAXID to any number of bits you want but then you may have to deal with storage and conversion issues.

Model:

class Part < ActiveRecord::Base
PRIME = 2654435761
MAXID = 2**31-1
def after_create
self.hashed_id = (self.id * PRIME & MAXID)
self.save
end
end

Controller:
class PartsController < ApplicationController
def show
@part = Part.find_by_hashed_id(params[:id])
end
end

9 comments:

Bryan Tomlin said...

You really saved my life with this one. Well, that may be a little bit of an exaggeration.

I have to admit though, I don't understand what exactly is happening here. I guess it's probably the bitwise and (&) that I don't understand.

Please explain a little so I can know exactly what is happening and why there will never be any collisions.

Thanks!

Unknown said...

It's pulled directly from Knuth's book and is a standard method for creating a integer hash but....

The product of the prime and serial value have no common divisors ( with any of the other serial values ) so the results will walk through each possible values exactly once.

The best way to see it would be to set the prime to a low value like 3 or 7 and walk through all the 4 bit values in a loop.

You will notice it doesn't matter what you set the prime to, even if you use a really big one and a small 4 bit serial value, it will still cover each value exactly one time. Some prime values will provide a more random looking mixing for different size serial values though. Some will even walk through linearly or exactly backwards etc..

Anonymous said...

I am having the same issue and just read about your approach, which I find to be extremely elegant. However, I don't think this algorithm is designed as a secure hash and I am worried that it might be too easy to break. My goals for obscuring my identifiers are to keep people from being able to spider my site and reach all of the records on it, and to keep them from being able to measure how quickly records are being created. If someone figures out how this code works and undermine both of these goals.

In this algorithm with the prime you selected, if an object's hashed ID is less than 1640531534 (MAXID - PRIME & MAXID) then the next hashed ID will increase by 506952113 (which is PRIME & MAXID). The rest of the time the next hashed ID will decrease by 1640531535. You can see that by running the following code:

PRIME = 2654435761
MAXID = 2**31-1

for i in 1..100
puts (i * PRIME & MAXID) > 1640531534
puts ((i+1) * PRIME & MAXID) - (i * PRIME & MAXID)
end

If someone creates two items in rapid succession and compares the IDs, it won't be too hard for them to figure out the pattern, which would allow them to both crawl my site and measure how quickly records are being created. So I have decided that this approach is not secure enough for me. I am going to look into either adding a salt value to the hash, or just using an encryption algorithm.

If you are less paranoid than I am this may not bother you. But I though I should flag it just in case.

Unknown said...

aaron: correct this aproach is not secure. It's simply obscures the id from casual observation. If you want a secure ID I'd suggest encrypting the ID.

Anonymous said...

Thank you for the nice idea.
How do you handle the controller or the view when you try to update the content via form_for. I did not manage to accomplish this.

Best Ralf

Scott said...

With a little tweaking, you can make the hash trivial to invert so you don't have to store the hashed key at all.

Change PRIME to something less than MAXID (and for best-looking results, something that's not too close to a simple fraction of MAXID).

Then, calculate PRIME_INVERSE, a number such that (PRIME * PRIME_INVERSE) & MAXID == 1. This only has to be done once, offline. See http://en.wikipedia.org/wiki/Modular_multiplicative_inverse for how to do it. (Note that we're assuming MAXID is a power of 2 minus 1, and that the modulus we're effectively using is actually MAXID+1, i.e. the power of 2 itself.)

For example, you could have
MAXID = 2**31-1
PRIME = 1580030173
PRIME_INVERSE = 59260789 # (calculated from MAXID and PRIME offline)

You can verify for yourself that
(PRIME * PRIME_INVERSE) & MAXID == 1 .

Then:
def obfuscate_id(x)
(x * PRIME) & MAXID
end
def deobfuscate_id(x)
(x * PRIME_INVERSE) & MAXID
end

deobfuscate_id(obfuscate_id(x)) is now x for all x <= MAXID.

First 20 obfuscated id's, just to make sure they look OK:

1580030173
1012576698
445123223
2025153396
1457699921
890246446
322792971
1902823144
1335369669
767916194
200462719
1780492892
1213039417
645585942
78132467
1658162640
1090709165
523255690
2103285863

Again, the obfuscation isn't hard for anyone clueful to hack with a small amount of effort, but it's at least good enough that Joe User won't be able to invert it by eyeball.

Anonymous said...

Hi! Charles Doody . payday loans

Anonymous said...

Scott's point is good, but when we all use the same PRIME and PRIME_INVERSE, that's not secret anymore. I wrote about a way to make it really secret very easily, with detailed steps. It even (and that's the best part) allows to solve Aaron's problem of getting consecutive IDs disclosed, without becoming a slow algorithm.

nathan said...

Interesting way to get this done. I like it.

I ended up solving it in a different way and published it as a rails plugin called obfuscate_id:

https://github.com/namick/obfuscate_id