[ansel] Store images with extension?

Alan Garrison alang at cronosys.com
Tue Sep 7 14:14:41 PDT 2004


Chuck Hagenbuch wrote:
> Quoting Alan Garrison <alang at cronosys.com>:
> 
>> Just a thought, how about something like this:
>> (file md5sum)_(original filename)
>> e.g.,
>> 2c7f55d3aae7037c094b116d7b257b89_image2.jpg
>>
>> Or perhaps
>> (uploader user-id)_(file md5sum)_(original filename)
>> e.g.,
>> joebob_2c7f55d3aae7037c094b116d7b257b89_image2.jpg
>>
>> or even
>> (uploader user_id)_(upload timestamp)_(original filename)
>> e.g.,
>> joebob_10298309230_image2.jpg
> 
> 
> Prone to race conditions. In extreme cases, yes, but still worse than the
> current hash.

Sure, just doing a "md5(microtime())", while is basically nice and 
random-y, can be a little ugly if you are doing eyeball comparisons of 
different values.  I'm thinking along the lines of if you had to do some 
manual filesystem maintenance of one or more particular files that 
finding names by raw md5 hashes would be tedious.  Granted my thinking 
here is a bit filesystem-centric.

>> md5'ing the file may be useful if you are looking for the same exact
>> image with possibly a different name, though some may consider this too
>> much overhead.  The user+timestamp+name should probably negate any
>> chance of a dupe name, should be trivial to calculate on the fly, and it
>> still keeps the basic filename structure (just tacking on stuff to the
>> front).  Of course you'll have to properly escape the user id and 
>> filename.
> 
> 
> These are all very long, introduce the escaping issue you mentioned 
> above, and
> also I'm not sure why any of them are better than the current hashing 
> method,
> which has the advantage of storing by image id? I guess I see that they 
> let you
> include the original image name, but a lot of other stuff too. And then 
> if you
> edit the image, would you change the md5? 

I have no idea if Ansel wants to go down this path, I was just wondering 
if there was any sort of need to store the hash of the file contents for 
dupe searching.  Perhaps it is way overkill, particularly if doing a 
hash of a Hubble telescope image (some huge image).

 > And thus update it everywhere? Or
> would you leave it calculated from data that no longer exists?

Well presumably you'd update just that one instance name.  Again, 
probably overkill.

Just my $0.01.

> -chuck
> 
> -- 
> "Regard my poor demoralized mule!" - Juan Valdez


-- 
Alan Garrison
Cronosys, LLC <http://www.cronosys.com>



More information about the ansel mailing list