ModNet: Need to specify number of nodes at each institute

ModNet: Need to specify number of nodes at each institute. Have to specify which parameter gets fixed: network bandwidth or number of CPUs. Can specify a node that simulates the Exemplar, with a high bandwidth between 256 CPUs.

NILE: Analysis tasks for the next generation. Tasks need to "check-in" with a snapshot of their current state. One task is to actually build the executable that will run. Sub-jobs run in a 20 minute time slice. Jobs can be scheduled on an idle machine that might not have any of the data local. Federation consists of a meta-data database, and 5000 databases containing data, with 100 million objects. A collection of databases corresponds to a "cluster". Databases distributed across 20 disks on 6 nodes. Each database has a container with the encapsulation of the legacy event record. Another contains metadata on each event record. Each metadata tag is about 250 Bytes per event. The event size is 5 kBytes. (cf. 100 bytes for 100 kBytes.) Event data is no. of tracks, thrust, total visible energy etc.: it is reconstructed data. There is a bit array, like a mask, that the Tau people use to set features. Each database is 20 Mbytes. Each also has run objects with a one to many association with the tag data for events in the run. There are many run objects in the database. Things like the beam energy, magnetic field, are also stored.

A "FileAtom" is an encapsulation of a database. It gives the location, physics info like the luminosity, first run number last number etc.. FileAtoms are meta-data in the database. Within the atom are data-items, which point to individual databases. There are 5000 databases, with potentially 5000 locks: there is a lockserver running on one of the nodes, but there has been no problem with locking.

The Cluster object contains a set of DataObjects, with member functions "copyto", "moveto" and so on. These are management tools, and are in fact rarely used. When used, they take care of moving the databases to the best places in the system.

Query has form that is resolved into a collection of FileAtoms.

It takes three days to populate the databases with the legacy data: 1 Mbyte/second. Indexes are created in a second pass: this is very slow. The indexes are on run number and event number. The next version will build indexes on physics quantities, and allow selections on e.g. energy level from the user.

Objy page size is the maximum: 65k which gives about a 12% overhead. There are a couple of scalability problems. One is with the data location manager, a replicated object that can sit on more than one node. These all share the same information, which is being reported to a master who ensures consistency. This is a bottleneck, partly due to the overhead of the communication system, ISIS. Other scalability problem with event selection. Random event access, or sequential event access. Despite this, have never seen the system bogged down. Maximum concurrent users about 6. The scheduling algorithm is primitive: we suggest looking at LSF.

The Resource Reporter collects statistics on the loads on the systems. This information is available to the Site Manager, whose job it is to allocate jobs to the nodes. The jobs are allocated via proxy processes for each of the nodes that advertise what resources (e.g. databases, files) are available on their node.

Can scan 20000 events per second maximum, looking through the tag. The limit at this speed is the Objectivity overhead.

Each event record is 492 attributes. The 5kBytes is a compressed record … the actual size is much larger. Each analysis job used only about 50 attributes.