The Caltech/CERN/HP Joint Project
The Computing Technical Proposals of CERN's CMS and ATLAS Experiments [1,2] describe the unprecedented challenge of computing at the LHC. Its scale is well illustrated when considering the billions of highly complex interactions that must be recorded for physics analysis by the detectors each year. This amounts to an expected accumulation of several PetaBytes of event data, starting in the year 2005 at LHC turn-on, and continuing each year for many years thereafter. During this time, the recorded data will be analysed by thousands of physicists at their home institutes around the World.
The computing models proposed by the LHC experiments favour storage of and access to the event data by the use of large Object Database Management Systems (ODBMS) located at CERN, and with replicas at so-called Regional Centres. Regional Centres are special computing centres that serve groups of national collaborating institutes and which enjoy very high bandwidth connectivity to CERN, where the raw experimental data are accumulated.
The models represent an unprecedented challenge for data storage, access and networking computing technology. The challenge extends to the entire Object Oriented software development task, a task which must ensure that the software correctly implements and utilises the known relationships between the HEP objects stored in the ODBMS.
Effectively meeting this challenge must be a primary goal for the LHC experiments and therefore for CERN itself. In this document, we present detailed plans for a project which addresses the data storage, access and networking problems that are implicit to the LHC computing models. The project makes use of pre-funded hardware, financial support from Hewlett Packard, and participation by experts from CERN's IT Division.
The Project's purpose is to construct a large-scale prototype of an LHC Computing Centre. The prototype envisaged will employ a latest generation multi-node SMP server on which will be installed a large scale ODBMS, the High Performance Storage Management System (HPSS), and high bandwidth Wide Area Networks (WANs). In time, a complete LHC software environment will also be installed and then evaluated, and production simulation work carried out. To address the networking aspects of the challenge, tests of the ODBMS across the WANs will be made, so providing better predictions of how Central and remote computing centres might interoperate on a global scale.
Recent agreements between Caltech/JPL, NASA and Hewlett Packard (HP) have secured project funding for the installation of a major computing facility in the form of next-generation shared memory SMPs from HP. The initial machine (to be installed in June '97) will have 256 nodes, with a possible extension to 384 or 512 nodes by '98, and then a definite replacement of the CPUs with "Merced" chips in '99. The initial configuration will have roughly 40 times the CPU power of CERN's CSF, of which 10-20% will be available to HEP.
The Table below shows the timescales for the installation of hardware at Caltech/JPL and the San Diego Supercomputer Centre.
GOALS AND PHASES
The Project goals are subdivided into the following phases:
The installation and major users of the machine will be established in mid-'97, the machine being operational by June or July. It is important that the Project establishes itself promptly as one of the major users, to ensure that the required share of the machine is obtained to make the Project a success. A start in June is sufficiently long after the first results have been obtained from the DEC/RD45 Joint Project (expected end March '97) and fully digested. This will allow us to repeat the tests with confidence.The networking aspects of the Project invite its candidacy as an Internet II project: these will be chosen this year, and early visibility is obviously an advantage.
Clearly, the goals and phases of the Project, and their component work items, should be reviewed at each stage in the light of progress made.
COMPLEMENTARITY WITH CERN/RD45
The RD45 project is currently focussing on the use of distributed Object Databases. These databases would be distributed across multiple servers in both the local and wide area, but would appear to the user as a single logical database. Critical resources - both system and user - would be replicated using database services. Each replica of a given database is called an image. Important issues, which can be tested in this project, include the scalability of the current Objectivity/DB architecture to large numbers of autonomous partitions and database images. The SPP architecture can be partitioned into multiple logical partitions, allowing the above scenarios to be tested in a well-controlled environment. In addition, the use of wide-area replication, using high-bandwidth networks, which are highly unlikely to be available at CERN on the time-scales mentioned, could be tested. Finally, the project will provide a test-bed that is much more conveniently located for Objectivity engineers, should special developments be required for an efficient Objectivity/DB-HPSS interface.
In addition to the above areas, the Project has other features of direct importance to the LHC Experiments and CERN:
Firstly, it provides the LHC Experiments with access to a significant amount of computing capacity that would not otherwise be available. Since CERN is between experiment generations, this is an unusual opportunity. Mature experiments (such as those at LEP) have already installed the computing capacity they needed, and are using it. However, the LHC Experiments already have massive needs for capacity for simulation work. Their total demand for '97 approaches what is currently installed and allocated for the LEP experiments.
Secondly, the project is pre-funded, and the equipment is due for installation in the near term. There is a complementary installation at the nearby San Diego Supercomputer Centre, which will be built up to a similar total power. This will allow head-to-head comparisons to be made between the two systems. The SDSC already has experience with using a large database system (DB2 from IBM) with HPSS, and has plans to work in the area of distributed OO database systems in the future. Once the Caltech and SDSC machines are networked together, we can investigate the heterogeneous aspects of the LHC Computing models, and perhaps experimentally model the diverse interactions between the principal (CERN) Centre, and the remote centres.
The Project makes use of pre-funded computing hardware. The other costs include funding for Project coordination, travel money for experts from CERN visiting Caltech, travel money to attend review meetings at CERN, and to present the Project to the LHC Experiments, displacement costs, software licensing costs and logistics support at Caltech. These costs are detailed as follows:
The Project participants are CERN, Caltech and HP. The Project will be led jointly by Julian Bunn, representing CERN as a Visiting Associate in the Physics, Mathematics and Astronomy Division of Caltech, and Harvey Newman, both as the Chair of the CMS Software and Computing Board (SCB), and as the Caltech faculty member in charge of the Caltech HEP Group in CMS.
CERN's participation includes time from experts in the RD45 project and in PDP Group on short visits to Caltech to work on the Objectivity and HPSS parts of the Project in collaboration with the HEP and CACR teams. In addition, the strong interest shown by CMS in the Project leads us to expect programming contributions from individuals in the collaboration, in particular in the area of the future GEANT4 CMSIM. We expect assistance with the Project from two part-time CMS physicists, probably located nearby to Caltech geographically (e.g. at San Diego, or at UC Davis).
The Caltech HEP Group will cover the following aspects of the Project. Two members of staff will be provided: as well as assisting with the Objectivity and HPSS work, one will undertake the porting of CMSIM, and the other will manage the installation and production running of CMSIM. The Caltech Centre for Advanced Computing Research (CACR), under Paul Messina, will provide the hardware and software environment for the Project, and assist in development work such as the porting of HPSS to the SPP2000. Specifically, the Caltech work will comprise:
IMPACT ON CTP REQUIREMENTS FOR THE ODBMS
The project will test the ODBMS against the general requirements as laid out in the CMS CTP, namely:
The table below details the Project work items and their sequence, which is synchronised with the equipment installations in the table above.
These sections provide some background information for those unfamiliar with some of the HEP terminology used in the body of the Project proposal.
BACKGROUND A: THE LHC EVENT
The fundamental object of interest to the physicist is the "event". An event occurs when the two counter-rotating particle beams of the LHC machine collide in the centre of one of the LHC detectors. This collision causes sprays of energy (jets) to be produced in the form of other particles, which travel outwards through the detector, from the collision point. As these particles travel through the detector, their passage is signaled and digitised in more than ten million (multiplexed) channels. The digitised signals are filtered, discriminated, compressed and eventually stored in memory, on magnetic disk, or on magnetic tape.
At the LHC, an event is expected to occur at the rate of around 800 million times per second. Careful filtering is applied at a very early stage of digitising so that only events of interest are recorded (this process is called "triggering"). The expected "interesting" event rate is around 100 events per second. The goal of the triggering process is to ensure that it is these 100 events that are fully recorded by the detector. Each recorded event is unique and is identified by the 1 Mbytes of digitised electronic channel information from the detector. The event data rate during operation of the LHC machine is thus 100 Mbytes/sec for each of the LHC experiments.
Because of the very high rate of operation of the machine and detectors there are, at any given moment, several events propagating through the detector, like waves rippling across a pond. Thus the digitisings from the detector will contain information from around 25 different interactions. Only one of these interactions will have caused the triggering logic to read out the event, but the data from the remaining background (or "minimum bias") events will also be recorded. The picture shows a simulation of a single interaction, for clarity. To disentangle the interesting event from the background events is a difficult and challenging pattern recognition problem. To compound this difficulty, the intensity of neutron radiation inside the LHC detectors gives rise to a background "noise" that is due to a steady rate of atomic reactions. This noise is also digitised along with the interesting event data.
Not all events are of the same type. The triggering attempts to select events that fall into any one of about twenty different and broad categories. An event of one type will generally be followed by an event of a completely different type, and the order in which the events occurred, and are recorded, has no physics significance. In traditional HEP experiments the events are stored on sequential access devices like magnetic tapes, and there is consequently a difficult access problem in retrieval of a particular class of event.
BACKGROUND B: THE LHC DISTRIBUTED COMPUTING MODEL
The goal of the distributed LHC computing model is to realise a high degree of location-independence for physicists wishing to analyse LHC event data, within the practical bounds of network throughput at a feasible cost. The desire is that the computing resources, which are necessarily distributed amongst the collaborating institutes, are put to best use, and that new resources acquired by the institutes during the course of the construction and running phase of the experiment can be integrated and profited from by the collaborations.
The computing models described in the CMS and ATLAS CTPs favour a handful of Regional Centres that enjoy high bandwidth connectivity to CERN, and which act as the collaborations' computing centres for local institutes. The Regional Centres are expected to house substantial compute resources and support infrastructure. They will typically offer a replica of the OO event database, the master of which is populated at CERN directly from the DAQ systems in the experiments.
By using an OO database, we can store the events as objects and access them using powerful queries that operate on "collections" of objects. We can assign events with a particular tag to the collection, and thus avoid the costly overhead of sequentially testing each event to see if it is of the desired type. Moreover, we can assign relations between event objects and other event objects of the same type, and we can then navigate through all these events with great ease. In summary, the OO database approach is very promising because it gives us the ability to handle the expected complexity of LHC data.
The federated database is a collection of databases whose members may be located in different places on the network, and which helps with the scaling issues involved in the massive amounts of LHC data to be stored. The federated database features a distributed lock manager that locks database pages while they are updated. Each object in the federated database has a data member that indicates to which page of the database it belongs. When a page of the database is locked, then all objects in that page are locked against update.
One can envisage keeping the whole of a collection of event objects online, given enough memory. With TeraByte memories, this becomes a possibility for LHC event samples.
BACKGROUND C: THE HIGH PERFORMANCE STORAGE SUBSYSTEM (HPSS)
The role of HPSS (High Performance Storage Subsystem) in the project is as a manager for the physical storage used by the ODBMS. In an optimised and correctly sized system, one would hope to avoid access to tertiary storage as much as possible. However, the PetaBytes of event data need to be permanently stored on tertiary devices, and HPSS is the enabling technology for this task. It must sit beneath the ODBMS and must be integrated with it seamlessly. The database itself, which is active in memory and on disk, will, however, sometimes require access to a page that has been moved onto tertiary storage. How the HPSS interoperates with the ODBMS to retrieve the page (by retrieving the media offline) requires investigation.