Partner Organizations:University of California-Davis: Collaborative Research
Activities and findings:
Research and Education Activities: CAIGEE Activities ================= Introduction ------------ Work on the Grid Analysis Environment (GAE) is now gathering momentum: its importance is hard to overestimate. While the utility and need for Grids has been proven in the LHC Experiments' production environments, their significance and critical role in the area of physics analysis has yet to be realized. The principal goals of the CAIGEE project are to address the use of the Grid for physics analysis at the LHC, and in particular for the CMS experiment. In the first year of the CAIGEE project, our collaboration has played a leading role in the prototyping and development of a Grid-based global system for LHC/CMS analysis. We have been the first of the LHC experiments' groups to demonstrate with real world examples, the benefit that the integration of traditional and newer analysis tools brings. The CAIGEE work is the acid test of the utility of Grid systems for science. The GAE will be used by a large, diverse community, and will need to support hundreds to thousands of analysis tasks with widely varying requirements. It will need to employ priority schemes, and robust authentication and security mechanisms. And, most challenging, it will need to operate well in what we expect to be a severely resource-limited global computing system. So we believe that the GAE is the key to success or failure of the Grid for physics, since it is where the critical physics analysis gets done, where the Grid end-to-end services are exposed to a very demanding clientele, and where the physicists themselves have to learn how to collaborate across large distances on challenging analysis topics. In CAIGEE, we are building an environment that consists of tools and utilities that integrate with existing analysis software, and which expose the Grid system functions, parameters and behavior at selectable levels of detail and complexity. This is achieved by the use of Web Services, which are accessed using standard Web protocols. A physicist is thus able to interact with the Grid to request a collection of analysis objects, to monitor the process of preparation and production of the collection and to provide 'hints' or control parameters for the individual processes. The GAE needs to provide various types of feedback to the physicist through a set of 'Grid Views' (under development), such as the time to completion of a task, the evaluation of the task complexity, diagnostics generated at the different stages of processing, real-time maps of the global system, and so on. These are areas we are working on presently. We believe that only by exposing this complexity at the outset, can an intelligent user learn to develop reasonable work-strategies in dealing with the highly constrained global system we expect to have for the LHC computing tasks. This work, which includes the development of Clarens and SOCATS as described below, is already being adopted within CMS, and generating considerable interest from the other LHC experiments. Web Services ------------ One of the key aspects of our work on developing a distributed physics analysis environment for CMS is the use of Web Services. Since the start of the CAIGEE project, we have made rapid and sustained progress in showing the feasibility of using Web Services for physics analysis data access. For example, different types of data ranging from detailed event objects stored in Objectivity ORCA databases, as well as Tag objects stored in Objectivity Tag databases have been converted into prototypical Web Services. We have developed a set of tools that allow lightweight access to detailed event objects through Web Services. These Web Services provided access to data ranging in granularity from the Federation metadata to the hits and tracks of individual events. The data accessed in this way can then be used by a variety of tools and software programs, with relative ease. In 2002 we successfully provided distributed access, via a Web Service, to FNAL's JetMet Ntuple files, produced from the analysis of the High Level Trigger (HLT) data simulated and reconstructed in preparation for a CMS High Level Trigger design milestone. This Web Service was implemented using both an SQLServer database backend running under Windows, and also using an Oracle9i database backend running under Linux. The user's view of the interface was identical, so demonstrating the ease with which we were able to hide the heterogeneity and details of the databases at different sites. Clarens ------- The Clarens development began in 2001 as a remote analysis server system It was refocussed at the advent of the CAIGEE proposal into a Grid-enabled web services layer, with client and server functionality to support remote analysis by end-user physicists. Clarens is currently deployed at CMS sites in the US, at CERN, as well as Pakistan. We consider Clarens to be one of the main deliverables of the CAIGEE project. In detail, the Clarens server architecture was changed from a compiled CGI executable to an interpreted Python framework running inside the Apache http server, improving transaction throughput by a factor of ten. A Public Key Infrastructure (PKI) security implementation was developed to authenticate clients using certificates created by Certificate Authorities (CAs). All client/server communication still takes place over commodity http/https protocols, with authentication done at the application level. Authorization of web service requests is achieved using a hierarchical system of access control lists for users and groups forming part of a so-called Virtual Organization (VO). As a side-effect, Clarens offers a distributed VO management system with the delegation of administrative authorization tasks away from a central all-powerful administrator, as is appropriate for a global physics collaboration. Server-side applications made available through Clarens include the obligatory file access methods, proxy certificate escrow, access to RDBMS data access through SOCATS, SDSC Storage Resource Broker (SRB) access, VO administration and shell command execution. Users on the Clarens-enabled servers are able to deploy their own web services without system administrator involvement. All method documentation and their APIs are discoverable through a standard interface. Access to web service methods is controlled individually through the ACL system mentioned above. The services described above are available from within Python scripts, C++, as well as standalone applications and web browser-based applets using Java. A Root-based client was used to demonstrate distributed analysis of CMS JetMET data at the Supercomputing 2002 conference in Baltimore, MD. Clarens was also selected to be part of the CMS first data challenge (DC1) in 2004. Monitoring ---------- A critical component of the CAIGEE architecture is the set of monitoring services that not only allow the end user to understand how the global Grid Analysis system is functioning and to influence it, but also which allow decision support and work flow management taking into account the extreme resource constraints prevailing. As part of CAIGEE we have been developing the MonALISA monitoring system, which consists of a network-robust army of software agents whose purpose is to monitor and control sets of resources. We have successfully deployed MonALISA as part of the reflector control software in the Virtual Room Videoconferencing System (VRVS). MonALISA is the first of a new type of component in the services based architecture of which CAIGEE is an initial prototype instantiation. In fact, the concept of a services based architecture is also gaining currency in the Globus area, where the OGSA architecture explicitly calls out for distributed, authenticated systems Web-based services (albeit in a rather more static sense). GroupMan -------- The GroupMan application was developed in response to a need for more user-friendly administration of current-generation LDAP-based virtual organizations. GroupMan can be used to populate the LDAP server with the required data structures and certificates downloaded from Certificate Authorities (CAs). Certificates may also be imported from certificate files in the case of CAs that do not offer certificate downloads. These certificates can then be used to create and manage groups of users using a platform-independent graphical user interface written in Python. The VO data is stored in such a way that it can be extracted using standard Grid-based tools to produce so-called 'gridmap' files used by the Globus toolkit. These files map host system usernames to individuals or systems identified by their certificates, thereby providing a coarse-grained authorization mechanism. SOCATS ------ In CAIGEE, we are nearing completion of the development of a general purpose tool to deliver large SQL results-sets in a binary optimized form. This development is called SOCATS. SOCATS is an acronym for STL Optimized Caching and Transport System. The main purpose of SOCATS is to deliver results from heterogeneous relational databases to C++ clients, in the form of binary optimized STL vectors and maps. The data returned from the SOCATS server to the client will be described through standard web service Web Services Definition Language (WSDL), but the data itself will be delivered in binary form. This will save the overhead of parsing large amounts of XML tags for large datasets. It will also reduce latency problems for WAN environments, in that large batches of rows which efficiently fill the network pipe will be transferred together. We intend to utilize Clarens as our rpc (Remote Procedure Call) layer for SOCATS. CAIGEE Tool Prototypes ---------------------- We have been working on developing various tools as candidates for inclusion in CAIGEE. A: Analysis Station Setup One example is the construction of a four-screen desktop analysis station setup that works off a single graphics card. The 4-way graphics card used allows an affordable setup to be built that offers enough screen space and pixels for most or all of: 1. Traditional analysis tools (e.g. ROOT session) 2. Software development windows: code, debug, execution, etc. 3. Event displays (IGUANA) 4. 'Grid Views': monitoring information displays or processed monitoring results 5. Persistent collaboration -- VRVS session(s); VNC sharing other's desktops, etc. 6. Online event or detector monitoring information from CMS (possibly more shared desktops) 7. Web browsers, Email, etc. Our prototype setup works on a desktop with four 20' displays for a cost of about $6k-$8k. One can imagine variations where this works in small and large meeting rooms, by using different displays. The cost of such a setup is certain to fall in the future. B. Handheld Analysis Client Another example tool is a handheld analysis client, targeted for the Pocket PC platform. We have been working closely with our colleagues at NUST in Pakistan on a prototype implementation, which is now in an alpha release. The analysis software is JAS (Java Analysis Studio), which was successfully ported to the Pocket PC by NUST. This software runs on the handheld device, and communicates as a client with a Clarens server. This enables the JAS client to fetch histogram and other data from the server, and manipulate and render it on the Pocket PC by use of the stylus. Connectivity with the Clarens server is over standard TCP/IP sockets, and Grid authentication is being built in. Either wireless or wired network connections to the Pocket PC are possible by use of an appropriate Compact Flash format network card. Activities specific to UCSD --------------------------- UCSD worked on developing prototype grid 'home directories' for distributed analysis users. The Storage Resource Broker (SRB) developed at the San Diego Super Computer Center (SDSC) was deployed to provide user space across grid resources. A lightweight infrastructure was developed to allow users to submit CMS analysis jobs to remote grid enabled computing resources and return the results by registering the output files into SRB. Analysis jobs were submitted through the Globus gatekeeper by authenticated users. The analysis users could discover the datasets available locally at a site, customize CMS analysis applications, submit a processing request, and view the job output. The infrastructure was tested between CERN and UCSD. SRB relies on a central meta data catalogue, which for these tests was located in San Diego. This introduced a latency for users at CERN viewing output files and listing directories, but it was found to be tolerable. Activities Specific to UC Riverside and UC Davis ------------------------------------------------ The partners have installed CAIGEE components, and are purchasing a high end graphics workstations with attached RAID array, with CAIGEE funds, in order to participate further in the developement of client Grid-based analysis tools. Planning -------- We will continue the project according to the planning described in the Proposal. In particular, we will continue on the development of CAIGEE demonstrations and developments together with UCSD, UCR and UCSD in this (FY03) fiscal year.
Outreach Activities: A distributed real-time physics analysis of simulated CMS Jet and Missing ET data was demonstrated at the SuperComputing 2002 conference, as part a wider Grid-based CMS data production run and demonstration being conducted at the time. This demonstration was shown to interested attendees who passed by the Center for Advanced Computing Research (CACR) Booth at SC2002. It involved practical use of several of the CAIGEE architectural components, as developed under the auspices of this project. In detail, the demonstration used distributed databases at Caltech, CERN and UCSD to distribute selection queries via the Clarens server. These queries created virtual data collections at these sites, which were subsequently moved across the WAN using a specially enhanced TCP/IP stack (FAST TCP), and rendered in real time on the analysis client workstation in Baltimore. A similar demonstration, also using software components developed for CAIGEE, was show at the Internet2 Fall meeting in September 2002 in Amsterdam. This demonstration's purpose was to preview two aspects Grid-enabled analysis of general interest to LHC: distributed data data access with local processing, and distributed processing with local access to results.
Journal Publications:
Other Specific Products:
http://pcbunn.cacr.caltech.edu/GAE/GAE.htm
This site provides a general overview and links to work on the Grid Analysis Environment for the LHC Experiments, and in particular draws heavily on material proposed in, and developed in the context of, the CAIGEE proposal (Award 0218937), which is the topic of this Report.
Contributions:
Contributions within Discipline:
The CAIGEE project is playing a leading role in the prototyping and development of a Grid-based global system for LHC analysis. We have been the first to demonstrate, in the LHC community, with real world examples, the benefit that the integration of traditional and newer analysis tools brings. In particular, the CAIGEE architecture is the most advanced and widely recognized in the field.
The CAIGEE architecture and tools rely fundamentally on ubiquitous, performant, wide area networks. Our involvement in several other DoE and NSF funded projects focussed on networking, such as 'Eurolink' has allowed us to influence and in some cases, drive, the requirements for WAN development. To this end, we have set several speed records using TCP (the Internet2 Land Speed Records) in pursuit of fast solutions to distributed LHC data analysis. The ability to transfer analysis data in the WAN at rates of several Gigabits/sec significantly improves the capability and possibilities of the global Grid analysis system.
Special Requirements for Annual Project Report:
Unobligated funds: less than 20 percent of current funds