16 February 200729
May 2002
(ITR
NSF 01-149)
CMS
Analysis:
an
Interactive Grid-Enabled Environment
(CAIGEE)
Harvey
Newman (PI),
California Institute of Technology
James Branson (Co-PI), University of California, San Diego
Submitted to the 2002 NSF Information and
Technology Research Program
Proposal #6116240
(Program Office: Mathematical and
Physical Sciences 47.049)
The major high energy physics
experiments now underway, and especially the Large Hadronic Collider (LHC)
experiments now in preparation, present new challenges in Petabyte-scale data
processing, storage and access, as well as multi-Gigabit/sec networking.
The Compact Muon Solenoid (CMS)
experiment is taking a leading role among the LHC experiments in helping to
define the Grid architecture, building Grid software components, integrating
the Grid software with the experiment's software framework, and beginning to
apply Grid tools and services to meet the experiment's key milestones in
software and computing, through the GriPhyN [[1]], PPDG [[2]] and European Data Grid
projects [[3],4]. Within the past year, prototype Tier 2
centers have been installed at Caltech, San Diego and Florida [5], and have
entered production status, in concert with the Tier 1 prototype center at
Fermilab, so providing a substantial portion of the simulated and reconstructed
events used by the worldwide CMS collaboration.
Until now, the Grid architecture
being developed [6,7,8,10,11] has focused on sets of files and on the
relatively well-ordered large-scale production environment. Considerable effort
is already being devoted to preparation of Grid middleware and services (this
work being done largely in the context of the PPDG, GriPhyN, EU DataGrid and
LHC Computing Grid projects). The problem of how processed object collections,
processing and data handling resources, and ultimately physics results may be obtained
efficiently by global physics collaborations has yet to be tackled head on.
Developing Grid-based tools to aid in solving this problem within the next two
to three years, and hence beginning now to understand the new concepts and
foundations of the (future) solution, is essential if the LHC experiments are
to be ready for the start of LHC running.
The current view of CMS’s
computing and software model is well developed, and is based on use of the Grid
to leverage and exploit a set of computing resources that are distributed
around the globe at the collaborating institutes. CMS has developed analysis
environment prototypes based on modern software tools, chosen from both inside
and outside High Energy Physics. These are aimed at providing an excellent
capability to perform all the standard data analysis tasks, but assume full
access to the data, very significant local computing resources, and a full
local installation of the CMS software. With these prototypes, a large number
of physicists are already engaged in detailed physics simulations of the
detector, and are attempting to analyze large quantities of simulated data.
The advent of Grid computing,
the size of the US-based collaboration, and the expected scarcity of resources
lead to a pressing need for software systems that manage resources, reduce
duplication of effort, and aid physicists who need data, computing resources,
and software installations, but who cannot have all they require locally
installed.
We thus propose to develop an
interactive Grid-enabled analysis environment for physicists
working on the CMS experiment. The environment will be lightweight yet highly
functional, and make use of existing and future CMS analysis tools as plug-in
components. It will consist of tools and utilities that expose the Grid system
functions, parameters and behavior at selectable levels of detail and
complexity. The Grid will be exposed in this way by making use of Web Services,
which will be accessed using standard Web protocols. A physicist will be able
to interact with the Grid to request a collection of analysis objects[*], to monitor the process of
preparation and production of the collection and to provide "hints"
or control parameters for the individual processes. The Grid enabled analysis
environment will provide various types of feedback to the physicist, such as
time to completion of a task, evaluation of the task complexity, diagnostics
generated at the different stages of processing, real-time maps of the global
system, and so on. We believe that only by exposing this complexity can an
intelligent user learn what is reasonable in the highly constrained global
system we expect to have. We expect the analysis environment we create to have
immediate and long term benefit to the CMS collaboration.
C.1.b. Problem Statement: The Development of Grid Technology for Analysis
C.2.a. The Grid-Enabled Physics Analysis Desktop
C.3. Existing
Activities Synergistic with this Proposal
C.4. Relationship
to Other Projects
C.7.a. Advancing Knowledge in Computer Science and Computational Science Disciplines
C.7.b. Educational Merit: Advancing Discovery and Access for Minority Students