[gollem] Fwd: [PEAR-DEV] File_Repository Class Proposal
Jan Schneider
jan@horde.org
Thu, 19 Sep 2002 20:33:44 +0200
Perhaps something for the VFS_File driver?
----- Weitergeleitete Nachricht von mikemc-php@contactdesigns.com -----
Datum: Thu, 19 Sep 2002 11:15:49 -0700
Von: Mike McCallister <mikemc-php@contactdesigns.com>
Antwort an: Mike McCallister <mikemc-php@contactdesigns.com>
Betreff: [PEAR-DEV] File_Repository Class Proposal
An: pear-dev@lists.php.net
Greetings,
I constantly feel guilty for using PEAR code all the time without having
contributed more than bugfixes. So here is a File_Repository class you
guys can have if anyone is interested (if not, you can't blame me for
not trying ;). I'd have to Pearify it of course and add PEAR error
handling and clean it up a little (although it is reasonably clean).
Currently, it does not autogrow the repository - it has to initialize it
first. Anyways, let me know if there is any interest. Here are the
docs from the top of the class file (should give you an idea what it does):
* This class is designed to deal with a large store of files that need to be
* accessed quickly. The majority of modern filesystems do not deal well
with
* accessing files in a directory where there are many files (say
5000+). Why?
* Well basically when the OS wants to a get a file handle, it will ask the
* directory which inodes that file lives on. When a directory has inode
* information on MANY files, it can take longer (sometimes much longer)
to find
* out inode information for a single file. In reality there is a bit
more to it
* than that but that is the basic idea. Some filesystems (i.e. ReiserFS
* http://www.reiserfs.org/) don't have this problem as they use more
advanced
* algorithms for looking up inode data (usually some form of binary
tree). So
* if you are running one of these cool filesystems this class will do you
* NO GOOD WHATSOEVER. This class is used to make file access quick on a
* filesystem regardless of what kind of filesystem it is. How does it
do this?
* Quite simply it creates a directory hierarchy (usually with a depth of 2).
* Each level has 62 directories in it (A-Za-z0-9). Therefore a 2 level
* hierarchy has 622 or 3844 directories. Since file access is still
* reasonably fast on directories with less than say 500 files, a two level
* directory hierachy (aka repository) can store 500*3844 or 1,922,000
files and
* still have speedy file access. I don't recommend a three level
repository
* EVER (623 or 250,047 directories) - if you have this many files, you NEED
* to change your filesystem. Of course, you can still use this class on the
* nifty filesystems as a means to organize the files - this can be a
good thing
* since it can be very annoying to "ls" in a directory only to have 2
million
* files returned ;)
*
* So what does this class do for you? Well, first it can create the
repository
* for you which is good because creating 3844 directories by hand could
really
* suck. Next, it gives you three important methods for
accessing/updating files
* in the repository: open(), store(), and retrieve(). These methods
abstract
* away the fact that there is a directory hierarchy. In other words, by
using
* them, it is as is you were manipulating files in a single directory.
*
* This class was written specifically for two primary uses (although others
* exist whereever you have the need to store A LOT of files): a more robust
* session file storage and retrieval system for sites with medium to high
* traffic AND storing files associated with database records. WHY store
files
* associated with database records when you can just store them as a
BLOB type
* with that record? First, the filesystem is a much more convenient way to
* store files (i.e. don't need SQL to access file). Second, on servers
that are
* running many databases that get accessed frequently (i.e. our web
servers),
* returning BLOBS can wipe out your query/index memory cache therefore
* negatively affecting other databases on the same server.
*
* It is important to understand how we store/organize files in the
repository.
* It has one key strength and one key weakness (with a work around). By
* default, it will will store a file based on each beginning character
up to the
* max levels (2). So a file named "test.txt" would be stored in /root/t/e.
* This is a good thing because it is easy for developers to track down
which
* directories contain files since it makes intuitive sense. This way of
* organizing files also has a weakness - storage location is ONLY based
on the
* first N characters. So if each file always started with "prefix" all
files
* would end up in /root/p/e and therefore there would be no advantage to
using
* this class. There are two ways around this problem. The first is to set
* auto_md5 to TRUE. This will prefix each filename with a 32 character MD5
* digest for example: dd18bf3a8e0a2a3e53e2661c7fb53534_test.txt. The
digest
* is sufficiently random that you will get an even spread over the
repository
* and each digest is unique to the filename and will always be the same
for the
* same filename. While this will give you the best spread, it is not
convenient
* to look files up since you have to compute the digest to figure out
what the
* first two characters are. The other way is to set prefix_seed to
something
* other than 0. In the case of all files starting with "prefix" a file
called
* "prefix_test.txt" would be stored as if it were test.txt by setting
* prefix_seed to 7. Of course, this is really only useful when all
files have
* a common prefix. The MD5 method is the best way to store the files as
it is
* not affected by common names, prefixes or patterns in filenames.
*
* When you specify root during object initialization is MUST NOT HAVE A
TRAILING
* dir_delim and it MUST BE ABSOLUTE. If it does, this class will not
function
* correctly.
*
* Usage:
*
* $rep = new CDS_File_Repository(array('root' => '/path/to/dir'));
* $data = $rep->retrieve(array('filename' => 'test.txt'));
----- Ende der weitergeleiteten Nachricht -----
Jan.
--
http://www.horde.org - The Horde Project
http://www.ammma.de - discover your knowledge
http://www.tip4all.de - Deine private Tippgemeinschaft