This document provides a step-by-step guide to configuring an NDGF Tier 1 dCache pool. It is assumed that you have rudimentary knowledge of installing and operating dCache (i.e. you have tried to setup a minimal local dCache installation).
Versions
All hosts forming a dCache installation must run the same release series of dCache. Currently NDGF uses dCache 1.8. Please contact NDGF about the correct version to use.
Java
dCache 1.8 requires at least Java 5. NDGF recommends using the most recent Java 6, since we have observed a number of bugs in Java 5, that affect dCache performance. We recommend using the 64-bit version, if possible.
You can find Sun Java 6 here: http://java.sun.com/javase/downloads/
Note that dCache needs a full JDK, not just the JRE.
User account
An NDGF Tier-1 dCache pool is recommended to run as a non-root user. Since configuration files, program files and PID files are intermixed in a default dCache installation, it is best to make all dCache related files be owned by the dCache user. For obvious reason the pool directory needs to be writable by the dCache user.
You have the choice of using the RPM or the tarball distribution of dCache. In any case, you will need to change the ownership of all files to match the UID under which you want to execute the pool.
In the instructions below, $USER is to be replaced with the account name you have chosen to run dCache as, e.g. 'pool'.
Host certificate
The srmcopy method used by some users requires host certificates on the pool nodes. These need to be readable by the user account on the pool node. For a dedicated pool server chown:ing the host certificate and key to the user is sufficient.
Installation directory
dCache installs into /opt/d-cache/. If for some reason you choose to install into a different directory (e.g. because you have a separate local dCache installation on the same host), then you have to update the path in a number of installation scripts and configuration files.
One option is to install dcache-server with:
the files in which you need to change the path are
The following command should be enough to do this
In case you have several dCache installations on the same host, it is recommended to keep them separated from each other, i.e., do not try to share files (e.g. jar files) between the installations. The dCache batch scripts contain many assumptions about the file systems layout, and playing with file placement is problematic at best. Keeping files separate will also make it easier to upgrade dCache.
Below $PREFIX is to be replaced by the path to your d-cache installation.
Logging
dCache logs to /var/log/ by default. When installing a pool as a non-root user, you either have to allow that user to write to /var/log/, or change the directory to which dCache writes log files (see instructions for doing this below).
In either case, it is important to configure logrotate to rotate the dCache log files. Something like this will do:
Notice the copytruncate. dCache does not let go of its log file (it doesn't react to kill -HUP), so logrotate must copy and truncate the log file rather than renaming it.
Pool cells and dCache domains
A pool is essentially a directory made available to dCache for storing data files. In order for dCache to access this directory, a pool cell needs to run on a host having access to this directory. There will be exactly one pool cell per pool directory.
You have the option to run several pool cells within the same Java virtual machine instance; or to use dCache terminology, you can run several cells within the same domain. The benefit is that the cells can share the garbage collector and the Java heap, thus more efficiently using available resources. The downside of running all cells in the same domain is that a restart of a pool will force you to restart all pools within the same domain. For this reason we recommend against running several production pools within the same domain. It is however perfectly fine to run several domains on the same host.
Naming Conventions
Pool names are to be generated from your DNS domain name by replacing dots with underscores and appending a three digit numerical suffix. dCache domain names are generated in a similar fashion, but using the FQDN of the host running the dCache domain as a starting point and ending with the suffix Domain. Dots are again replaced by underscores, and in case of multiple domains on the same host, each domain gets a numerical suffix. Drop the numerical suffix if only a single dCache domain is hosted on this particular host. If you run more than 9 dCache domains on a single host, then use the same number of digits for all domains.
Notice: dCache domains are named after the FQDN of the host running the domain. Pools are named after DNS domain name of your site. You have to ensure uniqueness of the pool name by using a unique numerical suffix for your site.
Below $NAME refers to the FQDN with dots replaced by underscores. $HOSTNAME refers to the part of the FQDN before the first dot.
Example
For the hosts francis.grid.aau.dk and orval.grid.aau.dk, pools will be named grid_aau_dk_001, grid_aau_dk_002, grid_aau_dk_003, etc. Domains on francis.grid.aau.dk will be named francis_grid_aau_dk_1Domain, francis_grid_aau_dk_2Domain, etc.
Rationale
- Pools and domains must have unique names throughout the entire NDGF dCache installation. Using the DNS as the basis for generating names ensures uniqueness across multiple organizations.
- Dots are replaced by underscores, because dots in names breaks some of the shell scripts in dCache.
- Using the DNS domain name (rather than the FQDN of the host) as the basis for pool names allows you to transparently move pools between hosts.
- Using the FQDN as the basis for dCache domain names allows us to identify the current host of a particular pool.
Tape pool exception
Tape pools are named after their hsminstance, and instead of a 3-digit numerical suffix, the suffix begins with either r or w for a read or write pool, then 2 digits.
Example: atlas_nsc_liu_se_w01 is a write pool for the "atlas.nsc.liu.se" hsminstance. csc_fi_r03 is a read pool for the csc.fi hsminstance.
File system
dCache 1.8: Please note that the control/ directory has an alternative that should be significantly faster, see the section on Berkeley DB metadata repository. For those installations, please ignore the talk about the control/ directory in this section.
The dCache community recommends XFS as the best file system for dCache pools. Other possible candidates are ZFS and GPFS. dCache pools contain two directories: The control/ directory is used for storing meta data and the data/ directory contains the data files. The control/ directory will contain twice as many files as the data/ directory, however the control files are very small (often below 500 bytes). All files in the control directory are read into memory when a pool starts.
GPFS has been reported to be very slow for reading the large number of control files when a pool is started, and we therefore recommend to put the control directory on a different type of file system, say XFS. Even when using XFS for both directories, it may be wise to but the data/ directory and control/ directory on a different file systems and tune the block size for large and small files, respectively.
Java versions prior to release 6 did not have a mechanism to query the amount of free space on disk. Therefore dCache relies on your figures for the amount of disk space allocated for the pool. This excludes the amount allocated for meta data, i.e. everything in the control directory and overhead in the file system. It is hard to give definitive answers on the number of files to expect on a pool, but a 10TB pool should expect at least 500k files in the control/ directory. With a block size of 1 kb, this would amount to 500MB + inodes. Also remember to allocate enough inodes (for file systems that preallocate inodes).
Since all files in the control/ directory are read during pool start, we recommend to mount the control/ directory with the noatime option.
Berkeley DB metadata repository
Starting with dCache 1.8 there is an alternative metadata repository instead of the control/ directory with two tiny files per file stored in the pool. We recommend this for all pools, since it greatly reduces pool startup time. To activate this, please enable it in dCacheSetup before installation:
metaDataRepository=org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository
Monitoring
To ease debugging and day to day operation of dCache, we prefer to have rudimentary monitoring of all pools. Ganglia excels at monitoring large heterogeneous collections of hosts with minimal overhead. We have therefore chosen Ganglia as our preferred monitoring solution for dCache pools.
Ganglia consists of three components:
- The monitoring daemon, gmond
- The meta daemon, gmetad
- The web interface
It is only necessary to run the monitoring daemon on a dCache pool. Both version 2.5 and 3.0 are supported.
An instance of the monitoring daemon must run on each host which is to be monitored. Inside a site, instances of the monitoring daemon can optionally exchange data via UDP multicasting, thus ensuring that all hosts have the same data. This data is periodically queried by the meta daemon running in Ørestaden, Copenhagen. Since all instances of the monitoring daemon at a site have the same information, the meta daemon only needs to talk to one of them (although allowing the meta daemon to talk to them all has the benefit, that it can fall back on them if needed).
The configuration file for the monitoring daemon is quite different between version 2.5 and 3.0. In both cases, you should configure chaperon.ndgf.org as a trusted_host and give your site ("cluster" in Ganglia terminology) a human readable name.
Remember to open your firewall such that chaperon.ndgf.org can talk to the monitoring daemon on port 8649 and 8650 (the latter is only needed for Ganglia 3.0), unless you explicitly configured Ganglia to operate on other ports.
Note: even if you only run a single gmond on a single host, the UDP receive and send channels have to be set up. It seems that the daemon consists of two separate parts (collecting and publishing) and these only communicate through UDP unicasts or multicasts. You can check that the data collecting works properly by telneting to port 8649 (default) and checking that the required attributes are present.
Monitoring multiple hosts
If you want to monitor multiple local hosts, run the Ganglia monitoring daemon on each of them. By default, they will locate each other by UDP multicasting on channel 239.2.11.71 and port 8649 (this channel is administratively scoped and will not leave your site [if you have not configured multicast routing, it will not even leave the subnet]). If you have configured a firewall on the host, make sure to open it for this internal communication. If you have multiple NICs, then make sure that multicast traffic is routed on the correct NIC (i.e. add a route for 224.0.0.0/4 to that NIC). Only one node host needs to communicate with chaperon.ndgf.org though.
Version 2.5
As a bare minimum, set the name property and the trusted_hosts property.
Version 3.0
The equivalent settings for 3.0.
You may use
to generate a template configuration file mirroring the default 2.5 settings.
Firewall configuration
- Allow all outbound connections (TCP and UDP)
- Allow inbound TCP connections for the range 20000:25000. These will be used for data transfers.
- Allow inbound TCP connections from chaperon.ndgf.org on ports 8649 (used by Ganglia).
Operational procedures
We collect procedures for operating a pool at Operation-Procedures Pools. Please study and follow them. Notice that they are updated with new procedures whenever the need arises.
Step by step instructions for pool installation
Below, only $USER, $PREFIX, $NAME, and $HOSTNAME should be substituted. All other strings are to be taken literally.
- Create a user account for $USER and become $USER
- Install in $PREFIX (unpack rpm or deb and move the directories) and let $USER own $PREFIX and all files below that directory.
- If $PREFIX does not equal /opt/d-cache/, then please follow the instructions above for installing in a non-default directory.
- In $PREFIX/etc/ copy node_config.template to node_config and edit the following lines:
- In $PREFIX copy etc/dCacheSetup.template to config/dCacheSetup and edit the following lines:
- Create $PREFIX/log (you may put the log somewhere else, but make sure $USER can write to the log area; it should even be possible to reconfigure dCache such that syslog is used for logging)
- Go to $PREFIX/config and run
- Run $PREFIX/install/install.sh
- Add some pools by following the instructions below.
- Send the IP addresses of your pools to support@ndgf.org. We need to open our firewall to allow communication with the pool.
- Wait for reply.
- Run $PREFIX/bin/dcache start (as $USER)
- If successful (check log file), create a wrapper script which runs dcache-pool start as $USER during boot and dcache-pool stop during shutdown. Example script:
The "ulimit -n 32000" is needed because some clients use parallel transfers, and can easily eat 20 filedescriptors per file transfer. Do not copy the init script from $PREFIX to /etc/init.d/, as you may break dCache during upgrades, if you forget to copy the new version.
Step by step instructions for pool upgrade
- Make sure dCache is shut down
- Install (or unpack and copy) the RPM
- If $PREFIX does not equal /opt/d-cache/, then please follow the instructions above for installing in a non-default directory.
- Check/reset file ownership
- Run $PREFIX/install/install.sh
You should now be able to start dCache.
For the upgrade from 1.7 to 1.8, please see the dedicated page on: http://wiki.ndgf.org/display/ndgfwiki/dCache+1.8+pool+upgrade
Creating new pools
Note: The old technique of using a pool_path file for creating directories during dCache installation is deprecated. We recommend that you use the new technique described here.
Starting with version 1.8.0-13, dCache ships with a simple node administration script, dcache. Besides serving as an init script, it allows you to define new pools in a two step process: First the pool directory is created and afterwards this directory is added as a pool cell to a dCache domain.
Creating a new pool directory is as simple as running
This will create a new directory /q/pool1/, with the control/ and data/ sub-directories and a setup file. The pool is defined to have a size of 2000 Gibibytes.
To make the newly created pool directory available to dCache, we need to add a pool cell to a dCache domain. Please follow the naming conventions described above. Assuming we have settled on the pool name grid_aau_dk_001 and the domain name orval_grid_aau_dk_1Domain, we use the command
This defines a new pool cell grid_aau_dk_001 in orval_grid_aau_dk_1Domain for the pool in /q/pool1/. Please remember to use the --fqdn option for all NDGF pools.
We can see all configured pools by running