Thoughts and ideas for improving the design of the GM cache
List of GM interactions with cache (and how these are implemented in the idea described below)
- At the end of uploader/downloader, claims are released from cached input files used by the job
- Remove hard links and delete job dir
- The main daemon process periodically runs cache cleanup.
- Using clean by age script. The clean up is independent of any up/downloader processes or jobs
- The main daemon process periodically runs cache registration (registering cache entries in indexing system)
- Register as part of download process
- Before data transfer, the mover looks up the cache to check if the file is already there. It checks permissions if the file is already cached, and if the file needs to be downloaded it locks the destination file
- Use hash to do direct look up. Lock using .lock file
- After data transfer, the mover checks the downloaded file, and deals with errors
- Do checksum and size check
- The mover links or copies the caches file to the session dir
- If the file is to be linked from the session dir, a hard link is created to the per-job dir in the cache, then this file is linked to from the session dir
- The mover also sets creation and valid till dates
- The .meta file contains the valid till date, the creation date is the creation date of the cached file
Implementation ideas
The current system uses a few 'central' flat files, and some per-file files to store metadata on the cached files. With large caches this causes problems like bug 1036.
An idea is to use a method like apache's mod_disk_cache which creates cached file names based on a hash of the url. This eliminates the need for a central metadata store to look up cached file names from urls.
Maswan's on-disk layout idea
Use a single mod_disk_cache-like file tree for the main data store with the root dir only readable by root, then hard-link out to per-job directories (directories readable by the mapped user only), the symlink into the session directory is from this per-job directory. The individual files need to be owned by root and not writeable by anyone else in the per-job directories to make sure files are immutable (very important since jobs will share files), but they must be world-readable.
lost+found/
cache/a/H/Iey@psrk9BDfOPA2H0sw.data (the actual file)
cache/a/H/Iey@psrk9BDfOPA2H0sw.meta (assorted metadata, url, timestamp, etc)
[...]
cache/Z/z/ZXhfEFtJlss50nAea7pg.data
cache/Z/z/ZXhfEFtJlss50nAea7pg.meta
job/63651205714667703007022/aHIey@psrk9BDfOPA2H0sw (hardlink to the cache file)
[...]
One thing about this layout is that files do not need to be "claimed", beyond having their hardlink created. If the main cache entry is deleted, it won't affect the per-job files. One just have to make sure that stale job directories get deleted, and then cache cleanup can be done in a strict LRU fashion. If the cache directory is mounted with atime, it is really simple to get right. Otherwise one probably have to guess with mtime or something. The clean up can be done with the cleanbyage script which can scan a hierarchy of directories and clean up the oldest files to free up a certain amount of space
Splitting the cache over file systems
To have the cache split over multiple file systems requires that hard links be created within each file system. In the GM configuration one should be able to specify the size limit of the cache on each file system, or specify a percentage share per file system. The cache manager can implement this as splitting the cache based on the initial letter of the URL hash (assuming that this is an even distribution). Then for example all hashes beginning [a-z] are stored in /fs1/dir1/ and all those beginning [A-Z] go to /fs2/dir2/. All parts of the GM involved with the cache must know and understand this configuration.
Auth caching
Currently each job does a call-out to the storage element to see if the current set of credentials is allowed to read the file. This is a fine default behaviour, but there should be some special cases that occurr often enough that the system would greatly benefit from some kind of quicker decision. Two suggestions:
- The same DN as orignially cached the file should be able to always read it (after all, the user might as well have downloaded it to a personal laptop at the time).
- Do a VOMS-based auth decision in the cache handler. "Any atlas user can read any atlas data."
- A plain auth decision cache with a configurable expiry (default, on the same order of magnitude as CRL lookups?).