Final Version v2.1

                                                                                                                        16 February 200729 May 2002

 

 

(ITR NSF 01-149)

 

 

CMS Analysis:

an Interactive Grid-Enabled Environment

(CAIGEE)

 

 

 

Harvey Newman (PI), California Institute of Technology

James Branson (Co-PI), University of California, San Diego

 

 

 

 

Submitted to the 2002 NSF Information and Technology Research Program

Proposal #6116240

(Program Office: Mathematical and Physical Sciences  47.049)

 

 

 

 


 

A.   Project Summary

 

The major high energy physics experiments now underway, and especially the Large Hadronic Collider (LHC) experiments now in preparation, present new challenges in Petabyte-scale data processing, storage and access, as well as multi-Gigabit/sec networking.

The Compact Muon Solenoid (CMS) experiment is taking a leading role among the LHC experiments in helping to define the Grid architecture, building Grid software components, integrating the Grid software with the experiment's software framework, and beginning to apply Grid tools and services to meet the experiment's key milestones in software and computing, through the GriPhyN [[1]], PPDG [[2]] and European Data Grid projects [[3],4].  Within the past year, prototype Tier 2 centers have been installed at Caltech, San Diego and Florida [5], and have entered production status, in concert with the Tier 1 prototype center at Fermilab, so providing a substantial portion of the simulated and reconstructed events used by the worldwide CMS collaboration.

Until now, the Grid architecture being developed [6,7,8,10,11] has focused on sets of files and on the relatively well-ordered large-scale production environment. Considerable effort is already being devoted to preparation of Grid middleware and services (this work being done largely in the context of the PPDG, GriPhyN, EU DataGrid and LHC Computing Grid projects). The problem of how processed object collections, processing and data handling resources, and ultimately physics results may be obtained efficiently by global physics collaborations has yet to be tackled head on. Developing Grid-based tools to aid in solving this problem within the next two to three years, and hence beginning now to understand the new concepts and foundations of the (future) solution, is essential if the LHC experiments are to be ready for the start of LHC running.

The current view of CMS’s computing and software model is well developed, and is based on use of the Grid to leverage and exploit a set of computing resources that are distributed around the globe at the collaborating institutes. CMS has developed analysis environment prototypes based on modern software tools, chosen from both inside and outside High Energy Physics. These are aimed at providing an excellent capability to perform all the standard data analysis tasks, but assume full access to the data, very significant local computing resources, and a full local installation of the CMS software. With these prototypes, a large number of physicists are already engaged in detailed physics simulations of the detector, and are attempting to analyze large quantities of simulated data.

The advent of Grid computing, the size of the US-based collaboration, and the expected scarcity of resources lead to a pressing need for software systems that manage resources, reduce duplication of effort, and aid physicists who need data, computing resources, and software installations, but who cannot have all they require locally installed.

We thus propose to develop an interactive Grid-enabled analysis environment for physicists working on the CMS experiment. The environment will be lightweight yet highly functional, and make use of existing and future CMS analysis tools as plug-in components. It will consist of tools and utilities that expose the Grid system functions, parameters and behavior at selectable levels of detail and complexity. The Grid will be exposed in this way by making use of Web Services, which will be accessed using standard Web protocols. A physicist will be able to interact with the Grid to request a collection of analysis objects[*], to monitor the process of preparation and production of the collection and to provide "hints" or control parameters for the individual processes. The Grid enabled analysis environment will provide various types of feedback to the physicist, such as time to completion of a task, evaluation of the task complexity, diagnostics generated at the different stages of processing, real-time maps of the global system, and so on. We believe that only by exposing this complexity can an intelligent user learn what is reasonable in the highly constrained global system we expect to have. We expect the analysis environment we create to have immediate and long term benefit to the CMS collaboration.

 

 

 

 

 

 

B.   Table of Contents

A.   Project Summary  1

B.    Table of Contents 2

C.    Project Description  3

C.1.      Introduction  3

C.1.a. Current Status in CMS  4

C.1.b. Problem Statement: The Development of Grid Technology for Analysis 5

C.2.      Architecture  6

C.2.a. The Grid-Enabled Physics Analysis Desktop  7

C.2.b. The Web Services 9

C.3.      Existing Activities Synergistic with this Proposal 12

C.4.      Relationship to Other Projects 13

C.5.      Complementary Proposals 13

C.6.      Schedules and Milestones 14

C.7.      Outreach and Education  14

C.7.a. Advancing Knowledge in Computer Science and Computational Science Disciplines 14

C.7.b. Educational Merit:  Advancing Discovery and Access for Minority Students 14

C.7.c. Geographical Access 15

C.8.      Results from Prior NSF Support 15