Operation-Procedures Clusters

About NDGF

About
People
Meetings
Forums
Steering Board

Documents

Presentations
Technical
Managemental
Links

Activities

Planning
Operation
Middleware
Tier-1

e-Science Projects

CERN
BioGrid
CO2
CC-VO


Added by Wikiuser -, last edited by Wikiuser - on Jun 30, 2007

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Cluster downtime

When a (tier1/large) cluster in NDGF has downtime, we should announce that to atlas-users (and possibly alice-users too, depending on the resource).

For atlas, large enough to require a broadcast is currently "regularly runs 20+ atlas jobs".

To help minimise the impact, one should set allownew="no" in /etc/arc.conf in the grid-manager block and then restart the grid-manager service to block new jobs 12-24h ahead of a scheduled downtime.