C        Project Description

C.1     Introduction

Vision: We propose to develop and deploy the Physics Lambda-based Network System (PLaNetS) to drive a new round of discoveries at the frontiers of data-intensive science. PLaNetS will support Terabyte and multi-Terabyte “data transactions” between sites to complete in minutes to hours, rather than hours to days, and significantly improve overall working efficiency of the network resources. PLaNetS will build on this progress to develop a core suite of high performance end-to-end data transfer tools and applications, enhanced by real-time network and end-system monitoring and management services, components of which have been developed and proven in sustained field-trials over the last four years. These will be integrated to form a new paradigm of network operations and management including (1) Queues for tasks (transfers) of different lengths and levels of priority, coupled to dynamic (real or virtual) path-construction services for the most demanding, high-priority tasks, leveraging the work of the DOE-funded OSCARS [u19], TeraPaths [a14] and LambdaStation [a10] projects (2) A task "director" aided by end-system agents to partition the work among foreground, real-time-background and queued transfers, (3) End-to-end monitoring, network path and topology discovery, and path performance estimation and tracking services, based on the MonALISA [u10] and the Clarens [a3] frameworks, as well as IEPM monitoring services [a9], and (4) Policy-based network path-request and utilization services incorporating the OSG infrastructures for authentication, authorization and accounting. 

Relationship to OSG and Ultralight: PLaNetS will amplify and broaden the capabilities of OSG, a uniquely capable national computational facility supporting simulation and experimental research. We will augment the OSG software stack with UltraLight’s real-time services that estimate, monitor and track performance, and build managed data channels or complete network paths for high priority data transfer tasks as needed, to optimize the overall throughput among the grid sites. 

PLaNetS will exploit the recent breakthrough progress in several network-related areas of information technology to exploit the full capabilities of long range networks, in partnership with UltraLight. UltraLight is a state-of-the art facility based on a hybrid optical packet- and circuit-based dynamic network infrastructure with more than twenty 10 Gbps links, interconnecting the major high energy physics labs (Fermilab, BNL, SLAC) in the US, CERN in Europe and KEK in Japan, and key university sites in the US, Korea and Latin America (Caltech, Florida, Michigan, FIU, Manchester in the UK, UERJ in Rio, UNESP in Sao Paulo,  and KNU in Korea). Several novel methods for real-time network and end-system monitoring and management services have been developed and proven in sustained field-trials over the last four years including  (1) new fair-sharing, stable, high performance TCP-based network protocols (FAST, MaxNet), (2) tuned Linux kernels and network interface settings capable of sustained data transport in the range approaching 10 Gigabits/second (Gbps) for individual streams (3) global-scale agent-based systems, exemplified by Caltech’s MonALISA system, that autonomously monitor and help manage major research networks, hundreds of grid clusters and other distributed systems around the clock.

Target Applications: In partnership with the Open Science Grid (OSG) [u20], UltraLight [a16], [a15]and DISUN [u5] , we will deliver these capabilities to high energy physics (HEP), gravity-wave physics, astrophysics and, radio-astronomy communities by closely coupling PLaNetS to the Grid-based physics production and analysis systems under development in ATLAS [u1] and CMS [u29].  Physicists will use the testbed to exploit the powerful new network monitoring, profiling and real-time operations support systems, helping them to meet near-term Grid analysis milestones and greatly improve the performance observed in data challenges. Working in partnership with NSF’s DISUN project, the Large Hadron Collider (LHC) [u16] groups operating Tier1 and Tier2 centers will be able to greatly enhance their ability to transport data among their sites, in a strategically managed, fair-shared and policy-driven manner, thereby greatly increasing their ability to harness their computing and storage resources for data analysis, and scientific discovery. Astrophysicists will enhance their ability to locate, extract and if needed distribute and further process massive datasets. Radio astronomers will enhance the sensitivity and discovery reach of their explorations, acquiring, processing and correlating data at burst-rates several magnitudes higher than previously attainable.

As PLaNetS (2006-2011) covers the full ramp up period of the LHC to design luminosity, during which time network technologies and applications will continue to advance, it is essential that PLaNetS also incorporate the next step in networking. The UltraLight testbed, which currently includes more than 20 national, transoceanic and metropolitan-area links at 10 Gbps - and is continuing to evolve - is planned to be enhanced within the limits of available funding by a transition of Caltech’s external network connections to support up to 32 10 Gbps wavelengths using a software reconfigurable optical multiplexer (ROADM) on the dark fiber connecting the campus to the CENIC [u2] and UltraLight points of presence in Los Angeles. This will enable multi-10Gbps wavelength use by our target projects, and allow the first at-scale network tests guiding the design and modes of use of networks and associated systems supporting the next round of frontier science projects starting in 2011-2013, including the Super-LHC, Advanced LIGO, the Large Synoptic Survey Telescope[u15], and next-generation eVLBI experiments. These goals are coincident with those of the GENI framework now being prepared at NSF.

The combined PLaNetS and OSG system will initially be adapted to meet the specific needs of physicists in the US LHC, NVO, LIGO and eVLBI programs, but the PLaNetS tools and services also will be packaged for more general use. By integrating these tools in the Virtual Data Toolkit (VDT)[u26], we will also benefit astrophysicists in the Sloan Digital Sky Survey [u22] and Dark Energy Survey[u4], as well as bioinformatics and genetics application communities such as GADU/GNARE[a18].

Education, Outreach and Broad Impact:  We have designed a unique set of activities to broaden the impact of PLaNetS, targeting both the current and next generation of scientists. Tutorial workshops and summer research projects will be offered to students who will evolve into the next generation of scientists, providing them with access to the leading edge of the scientific frontier. The data intensive research paradigm shift resulting from PLaNetS requires additional mini-workshops for practicing researchers to allow them to take full advantage of the network infrastructure. Professional workshops, based on student workshops, will be offered at a variety of collaborative venues.

Our education and outreach program utilizes methods established and refined during the iVDGL, CHEPREO, and UltraLight projects. It provides direct and significant support for E&O activities including: interactive workshops, application development, experiment participation, infrastructure deployment, and internships at participating institutions. We will reach a variety of students at our collaborating institutes including a significant number of students from traditionally underrepresented groups and minorities as well as students from our collaborating international institutions. We will also invite a limited number of students from our LHC collaborating institutions that are not part of the PLaNetS collaboration.

Funding for five students is provided in the PLaNetS budget, additional students will be encouraged to attend with funding provided by their institution. Two student research lines are also provided, with additional funded students provided by REU programs at the participating institutions. Travel funds for the leaders of both student and professional workshops are provided by their institution.

PLaNetS, through its groundbreaking level of network capability, the scope of its international testbed and partnerships, the unique nature of the real-time systems aimed at data intensive science (building on the systems that are beginning to be deployed now in UltraLight), will provide vital input for NSF’s GENI initiative [u9] by the time it begins in 2009. PLaNetS’ end-to-end managed network paradigm, its ability to field real-time autonomous network systems on a global scale begun in the UltraLight project while isolating the hard technical and policy issues, and its long-term development program driven by the ongoing mission to serve a growing international community in support of their science, as well as research and education, will help shape the worldview of networks over the next few years. PLaNetS will thus influence GENI’s concept and design, especially as it relates to large-scale information usage by the scientific community, and in daily life.

C.2     Motivating Applications

C.2.1    LHC: Physics and Computing Challenges.

High Energy Physics experiments are breaking new ground in the understanding of the unification of forces, the origin and stability of matter, as well as structures and symmetries that govern the nature of matter in our universe. To improve our understanding of the fundamental constituents of matter, and the nature of space-time itself, researchers work to isolate and observe rare events, predicted by a variety of new physics theories which go beyond our current understanding.  Even by utilizing highly-processed analysis object data, the size of an LHC dataset suitable for discovering such rare events is expected to be at the terabyte level with similar amounts of Monte Carlo simulated data required for hypothesis testing.  Further, assuming a typical LHC data taking period (“Run”) corresponds to approximately 1-3 hours of stable online operations, the size of such a canonical RAW dataset is also expected to be about a terabyte.  Hence, physicists performing searches for possible new physics discoveries as well as physicists conducting detector calibration and systematic studies, vital for establishing the early presence of possible new physics signals, will frequently request terabyte sized dataset “chunks” for their work.

Over time, total HEP data volumes to be processed, analyzed and shared are expected to rise from the multi-Petabyte (1015 Byte) to the Exabyte (1018 Byte) range within the next 10-15 years, and the corresponding network speed requirements on each of the major links used in this field are expected to rise from the current 10 Gigabit/sec (Gbps) to the Terabit/sec (Tbps) range during this period, as summarized in the following roadmap of major HENP network links  [a19].

Table 1: Bandwidth Roadmap (in Gbps) for Major HENP Network Links

Year

Production

Experimental

Remarks

2001

0.155

0.622 – 2.5

SONET/SDH

2002

0.622

2.5

SONET/SDH; DWDM; GigE Integr.

2003

2.5

10

DWDM; 1 & 10 GigE Integration

2005

10

2-4´10

l Switch, l Provisioning

2007

2–4´10

~10´10 (and 40)

1st Gen. l Grids

2009

~10´10 (or 1–2´40)

~5´40 (or 20–50´10)

40 Gbps l Switching

2011

~5´40 (or ~20´10)

~5´40 (or 100´10)

2nd Gen. l Grids, Terabit networks

2013

~Terabit

~Multi-Terabit

~Fill one fiber

The HEP community leads science in its pioneering efforts to develop globally-connected, grid-enabled, data-intensive systems. Its efforts have led the LHC experiments to adopt the Data Grid Hierarchy of 5 “Tiers” of globally distributed computing and storage resources [a20]. Data at the experiment are stored at the rate of 200-1500 Mbytes/sec throughout the year, resulting in Petabytes per year of stored and processed binary data that are accessed and processed repeatedly by worldwide collaborators.

Processing and analyzing the data requires the coordinated use of the entire ensemble of Tier-N facilities. The relatively few large Tier-0 and Tier-1 facilities are best suited for the high priority large-scale tasks of systematic data processing, archiving and distribution. Moving down the hierarchy to the smaller and more numerous Tier-2 and Tier-3 facilities, individuals and small groups have greater control over how these resources are allocated to small and medium-sized tasks of special interest to them. Data flow among the Tiers will therefore be more dynamic and opportunistic, as thousands of physicists vie for shares of more local and more remote facilities of different sizes, for a wide variety of tasks of differing global and local priority, with different requirements in turnaround times (from seconds to hours), computational requirements (from processor-seconds to many processor-decades) and data volumes.

Data rates and network bandwidth estimates [a1] between the Tier-N sites were initially based on a conservative baseline formulated using an evolutionary view of network technologies. However, more recent estimates indicate that HEP network demands will reach multiple 10 Gbps links within the next two to three years during the time when the LHC begins operation, followed by a need for scheduled and dynamic use of 10 ´ 10 (or 1-2 ´ 40) Gbps wavelengths as the LHC transitions to full luminosity running.

C.2.2    Initial and Advanced LIGO: Physics and Computing Challenges. 

The Laser Interferometer Gravitational wave Observatory (LIGO) community is beginning to integrate its data analysis efforts into OSG in an effort to efficiently utilize “friendly computational cycles.” One of the greatest challenges associated with conducting LIGO data analysis on OSG is to efficiently move datasets that are typically on the order of one TB in size from LIGO’s data grid, which houses the full LIGO dataset, onto the storage resources associated with OSG compute resources. PLaNetS, in partnership with the OSG, will allow LIGO to address current network limitations and exploit the full potential of the computational resources available on the OSG, thereby promoting the most efficient analysis of gravitational wave data.

LIGO data movement among its Observatories, Tier-1 and -2 centers, international gravitational wave partners and onto OSG storage resources will benefit through the use of PLaNetS new transparent transport layer tools and services for monitoring and tracking network performance.

In the future, Advanced LIGO, with its increased sensitivity, will likely experience a tenfold increase in data volume. Binary inspiral waveforms will increase in length by nearly two orders of magnitude.  Typical data analysis of Advanced LIGO data will likely require data transfers of several terabytes to efficiently utilize local computational resources on the grid. LIGO’s partnership with PLaNetS will provide opportunity for testing and design guidance prior to the production-level science start-up of Advanced LIGO. 

C.2.3    eVLBI: Astronomy and Computing Challenges

The Very-Long Baseline Interferometer (VLBI) is one of the most powerful techniques for studying objects in the universe at ultra-high resolutions, combining simultaneously acquired data from a global array of up to ~20 radio telescopes to create a single coherent instrument. Traditionally, VLBI data are currently collected at data rates up to ~1 Gbps/telescope of incompressible data on magnetic disks that are shipped to a central site for correlation processing. Since the sensitivity of the observations increases as the square root of the data rate, there are large advantages to increase to multi-Gbps data rates.  Within the three years data rates are projected to increase to ~16 Gbps/telescope for some experiments, with ultimate goals of ~100 Gbps/telescope.  Recording and shipping of physical media at these higher rates quickly becomes uneconomical, making transfer to the correlator by high-speed networks (dubbed ‘e-VLBI) very attractive and economical.  To achieve this, data would be transferred in real-time to the correlator, requiring only relatively small electronic buffers at the telescopes and the correlator, though temporary buffering on physical media at either/both the telescope or the correlator is sometimes practical. In addition, support for future ‘distributed correlation’ where GRID computing resources are dynamically gathered and utilized to spread the correlator processing over hundreds or thousands of geographically distributed resources to enable new and better science at lower costs

C.2.4    NVO: Astrophysics and Computing Challenges.

The next decade will witness the completion of several new and massive surveys of the Universe.  These surveys span the whole electromagnetic spectrum from X-rays (ROSAT, Chandra, and XMM satellites) through optical and ultraviolet (SDSS, GALEX, LSST surveys) to measurements of the cosmic microwave background and radio (WMAP and PLANCK satellites).  It is only when these datasets are combined – collating data from several different surveys or matching simulations to observations – that the full scientific potential is realized; the scientific returns from the total will far exceed those from any one individual component.  The Palomar-Quest sky survey produces 50 Gbyte each clear night, and newer surveys will be coming online in the next few years (Pan-STARRS[u21], LSST[u15]) that are expected to raise this to tens of terabytes per night.

With the advent of event-based astronomy, the demands on the computing system grow, as the astronomers want to be notified of changes in the sky within minutes of a Gamma-Ray Burst or supernova, meaning that the pipeline must be able to meet these real-time requirements.  Other pipelines build derivative products that can be used for mining. The Hyperatlas project [a21] is an infrastructure to deliver image data in uniform projections and wide mosaics, so that images from different times or wavelengths can be jointly mined.

C.3     Need for Incorporating the Networking as an Active Element

Grid systems so far have treated the network as a passive and largely featureless substrate for data transport, in spite of the fact that wide area network bandwidths have grown approximately two orders of magnitude faster than processor speeds over the past two decades.  As the designers of HEP online data acquisition systems have learned, successful development and operation of a distributed system requires treating the network as an active element, and an important resource, similar to computing and storage facilities, whose use is to be monitored, tracked and optimized in real time. This is the central theme of the PLaNetS proposal.

The HEP community has become a principal driver, architect and co-developer of advanced networking infrastructure and new tools and techniques for end-to-end data transmission. For example, within the past year teams from Caltech, CERN, Michigan, Florida, FNAL, SLAC, and others demonstrated sustained transfers of LHC Monte Carlo physics data across the UltraLight testbed with throughputs of over 100 Gigabits/sec (Gbps), peaking to 150 Gbps, resulting in a total of 0.5 Petabytes transported during a 24 hour period [u23].  Such milestones clearly reveal that we are pushing the capabilities of networks that are based on statically routed and switched paths. It is now generally understood that in the longer term “intelligent photonics” (the ability to use wavelengths dynamically and to construct and tear down wavelength paths rapidly and on demand through cost-effective wavelength routing) are a natural match to the peer-to-peer interactions required to meet the needs of leading-edge, data-intensive science. The integration of intelligent photonic switching with advanced protocols is an effective basis for efficient use of network infrastructures, wavelength by wavelength, and holds the promise of bringing future Terabit networks within the reach, technically and financially, of scientists in all world regions.

Integrating these new capabilities into already complex computing models requires a sustained effort and close coordination with the LHC collaborations, many of whose members are either participating or partnering in this proposal.  Applications will need to be adapted and instrumented while Data Grid scheduling [a23] and management middleware must be written to take full advantage of the new infrastructure.  Feedback between applications and infrastructure will be critical to implementing an efficient, effective system. In the following we describe some of the features that are required to develop such a system

C.3.1    Terabyte Size Transactions

The driving forces behind much of the LHC, LIGO, NVO and eVLBI network bandwidth requirements are (1) that the “small” requests for data samples will often exceed a Terabyte (even in the early years of LHC operation), and could easily reach 10-100 Terabytes (in the years following), and (2) the number of requests from, for example, the global HEP, LIGO, and eVLBI communities are expected to reach hundreds per day. This leads to the need to support Terabyte-scale transactions, where the data is transferred in minutes rather than many hours, so that many transactions per day can be completed. The likelihood of the transaction failing to complete is much smaller than in the case of many long transactions sharing the available network capacity for many hours. Taking the typical time to complete a transaction as 10-15 minutes, then a one Terabyte transaction will use a 10 Gbps link fully, and a 100 Terabyte transaction (e.g. in 2010 or 2015) would fully occupy a link of ~1 Tbps.

PLaNetS will enable these modes of scientific analysis and discovery by providing the necessary system of services which facilitate frequent terabyte scale transactions requested by multiple users, completing the transaction in minutes rather than hours (or days) for efficient search optimization and systematic understanding.

 

C.3.2    Sustained Production Flows

One of the highest-priority bandwidth uses for the LHC are sustained production flows of data collected by the experiment’s online system which are stored at the Tier-0 and then distributed to the Tier-1’s at a rate of 200-1500 Mbytes/sec throughout the year.  In addition, most of the production of Monte Carlo simulated LHC data will be performed at the Tier-2 facilities and will be distributed back to the Tier-1’s, the Tier-0 as well as other Tier-2’s at aggregate rates of perhaps up to 50-100 Mbytes/sec. 

PLaNetS will enhance these modes of high-priority, sustained scientific data flows by providing the necessary system of services for differential, policy-based, network resource management which is able to discriminate between high-priority sustained versus lower priority peaky flows.

C.3.3    Burst Streaming

An interesting and important network requirement of both eVLBI and NVO involve burst data rates (used in real-time gathering and analysis of the data) which, in the case of eVLBI, are projected to increase to ~16 Gbps/telescope within a few years, with ultimate goals of ~100 Gbps/telescope.   To achieve this, attendant high-level services are required to provide managed ‘on-demand’ dedicated paths from telescopes to correlator to allow full real-time correlation. 

PLaNetS will enable these modes of burst scientific data flows for real-time data analysis by providing the necessary system of services which facilitate rapid policy based network resource re-prioritization.

C.4     The PLaNetS Managed, Integrated System

The PLaNetS collaboration will meet these goals by creating a cohesive system composed of a configurable, agile network, intelligent middleware and integrated applications, while also working with the developing Grid infrastructure from projects such as OSG, EGEE[u7], LCG [u17] and similar efforts.

The PLaNetS system, with its integration of network monitoring and dynamic provisioning through the use of intelligent end-to-end middleware (or “global services”) will transparently support efficient Terabyte-scale data transport, and both small and large real-time flows for the LHC, eVLBI, and Astrophysics collaborations. By maintaining a real-time global view of the state of the system, from the network infrastructure to the Grid middleware to the end-to-end managed services and the physicists' applications, and by applying self-learning algorithms that optimize the workflow while aiming to match the collaborations' policies for coordinated (network, data and compute) resource usage, it will enable stable and effective use of a petabyte-scale globally distributed system for the first time.

By tracking the system state and the data flows associated with various classes of work, with varying priorities and network requirements (throughput; latency and jitter for real-time streams), the monitoring system and global services will be able to learn (supervised, and later autonomously) how to resolve or mitigate network bottlenecks or other resource scheduling conflicts in the face of massive demands for large but limited resources.

Figure 1 shows a high level view of how PLaNetS services will be embedded within the e-science infrastructure. PLaNetS will build on the successful work of the UltraLight project which deployed an ultra scale hybrid network, basic network services and end-2-end network monitoring. PLaNetS will provide interfaces and functionalities for physics applications to effectively interact with the physics application-level services domain.  (Physics) application frameworks will be augmented to interact with a new class of high-level global services that in turn interact with the data storage and data access layers. Low-level UltraLight and OSG Services provide hints to the high-level PLaNetS services which allow optimization of data access and throughput, enabling the effective use of caching, pre-fetching, and offering opportunities for local and global system optimizations. PLaNetS adds a completely new dimension to these interactions by interfacing the applications to the novel managed networking services. This allows PLaNetS to extend the advanced planning and optimization behavior into the networking and data access layers, allowing a whole new class of advanced system behaviors and functionalities.

Text Box: Figure 2 PLaNetS Architecture:  Global Services provide policy based management for multiple individual data transfers.Figure 2 shows a multi usage view of how PLaNetS services will be utilized to support data transfers. At any given time in the distributed system there will be multiple transfers in progress all competing for the same (limited) storage and network resources. Within the PLaNetS computing model these transfers take place through interaction with policy based, secure PLaNetS services that provide a quality of service, and based on the priority of the transfer, a higher (guaranteed) throughput. The PLaNetS services collaborate, not to provide an optimized throughput for a single transfer, but to provide a fair sharing of the available (network) resources at any given time for all transfers. End-2-end monitoring provides feedback on the state of the network and the individual transfers to the global PLaNetS services which utilize this feedback to collaborative optimize the transfers. This feedback loop of continuous monitoring and adaptation of the global PLaNetS services is important to optimize (network) resource usage needed by multiple transfers. The PLaNetS computing model will not force applications to utilize its advanced services to use the network resources, but not doing so, will result in less guarantees on the quality of service for a particular transfer, and thus weaken the competitiveness of US based physics groups through the delay of potential scientific discoveries.

A crucial requirement for developing the PLaNetS intelligent infrastructure is the ability to allow high-level applications or service layers to effectively understand the current state of the underlying (network) infrastructure. Having a global view of the entire system will lead to optimized decision making and throughput.  Another feature of the PLaNetS system will be graceful degradation/enhancement of performance because of hardware failures or presence of congestion or availability of new resources. The goal of the proposed global services will be to provide the users and administrators with automated decision making to support a variety of connections with best effort or guaranteed service, including pricing mechanisms that will allow the administrator to charge differential pricing (effectively prioritizing different applications in cases of contention in the underlying routes and resources).  The system will also provide feedback on current measures of quality and performance that is being made available so that suitable fine tuning can be performed in the application if it is developed to be network aware. With the above goals in mind, our proposed PLaNetS global services have the following features:

·      Queues for tasks (transfers) of different lengths and levels of priority,

·      A best effort (lowest common denominator) service for transfers that do not interface to the network management services, with a specified (typically small) share of the bandwidth,

·      Higher levels of service via a “path” construction service, where the “paths” can have one of a few different priority levels as well as varying sizes (bandwidths)

·      Subsystem that prioritizes tasks based on their bandwidth requirements e.g., short tasks may execute in real-time and large tasks are run in background. Note that end-to-end paths could be constructed in a real sense (layer 1 “lightpaths”) or virtually (MPLS tunnels augmented by QoS attributes) or combinations thereof.

Text Box: Figure 3 Interaction between PLaNetS services and (storage) components.Figure 3 shows an overview of several of the proposed global PLaNetS services in a scenario. The individual services are described in more detail in the remainder of this section.  Shown is a generic PLaNetS aware File Transfer Service (FTS) which intends to move a 1.2TB file from Site A to Site B.  The FTS primarily interacts with an End-Host Agent (EHA) which is responsible for transparently interacting with the complex array of services necessary to optimize this transfer in the context of all ongoing and scheduled network usage, tracking the progress of the transfer and dealing with faults or preemption as required.  A sample of the types of queries and service interactions are also shown (green thin lines).

C.4.1    End-Host Agent

One of the shortcomings of many network monitoring systems is that they neglect monitoring what is arguably one of the most problem laden sections of the network: the end-systems. We propose to address this within PLaNetS by integrating and evolving the LISA agent (part of the current MonALISA release) into an End-Host Agent (EHA). LISA is already being used with VRVS to provide a short list of candidate-best servers, and then selects the best connection based on performance, server load (doing load balancing), etc. One of the End-Host Agent’s tasks will be to monitor the end system, profiling the client's state (CPU, memory, interrupts, disk-usage) correlated with achieved network performance.  This will be critical for quickly diagnosing the correct location of any performance problem within the PLaNetS fabric. As the End-Host Agent develops it will become the central point of contact for each host’s applications that need to utilize the network.  The EHA will transparently negotiate with the other PLaNetS services to setup and optimize effective use of the network.  In addition it will actively track network connections and respond to failures, errors and preemption to insure task completion.

C.4.2    Path Discovery Services

When initiating a network connection between two endpoints we need to understand what possibilities exist.  PLaNetS will rely on a set of Path Discovery Services (PDS) which will provide a comprehensive view of the possibilities that exist in the network. Specifically the PDS will provide whether or not options like dynamic virtual pipes or optical circuits exist partially or end-to-end along the path.    Examples of targeted capabilities for resource discovery are:  (1) Determine which options exist between two locations in the network (2) List components in the path that are “manageable”. (3) Given two replicas of a data source, “discover” (in conjunction with monitoring) the estimated bandwidth and reliability of each to a given destination. (4) Locate network resources and services which have agreements with a given VO.

C.4.3    Network Request Services

A managed network requires the means to define and enforce policy as well as allocate and schedule limited resources within the network.  A set of Network Request Services (NRS) need to be developed, implemented and deployed to reach the PLaNetS network vision. In many cases the PLaNetS collaboration will rely upon external projects (OSG, EGEE, etc.) to deliver appropriate core services which can be adopted or adapted for PLaNetS use. Examples of needed capabilities for NRS will be: (1) Negotiation, classification and queuing of requests (assign service/priority). (2) Policy Description Language:  provide networks (local, regional, national and international) the means to define their capabilities (prioritized flows, minimized latency, virtual dynamic point-to-point connections, light-path construction) and specify the conditions under which users or applications can be utilize those capabilities. (3) Policy Implementation and Enforcement:  Given an appropriate policy description language we need to implement a means of enforcement integrating AAA and monitoring information with the resultant end-to-end set of policies along the path(s). (4) Co-Scheduling requests:  Network request services, modulated by policy and scheduled use, must be able to coordinate with other scheduled resources like storage and access to compute cycles.  To achieve these capabilities, we will coordinate and work within any future technical group devoted to OSG resource management and towards the development of eventual OSG Resource Request Services.

C.4.4    Network Path Services

Part of our effort will be devoted to Network Path Services, broadly categorized into two areas: Construction and Management. Network Path (Virtual or Real) “Construction” Services (PCS) are required once a “path” resource is discovered and allocated.  Routers and switches may need to have configurations dynamically altered to enable the requested path. Optical Control Planes must be implemented to allow the highest degree of interoperability between various optical networks. For many of these services we hope to adopt the work of projects like such as Terapaths[a14], Lambda Station[a10], OSCARs[u19], Dragon[u6], Cheetah [u3] and HOPI [u13] to meet PLaNetS needs. The second area of focus will be Network Path Management Services (NPMS) such as: (1) Provide real-time status including time-to-completion estimation. (2) Provide fault handling, including soft or partial faults. (3) Redirection and preemption notification. Additional path information may also be available including information which may be summarized from historical monitoring information as well as policy and management information.  Examples are the round-trip times, recent bandwidth usage, packet loss statistics, reliability measures and current and future scheduling information.

PLaNetS will rely upon UltraLight for monitoring services that will be utilized by some of the global services previously described.

We believe that PLaNetS’ deployment and success, with the key features described above, will be crucial for the scientific success of the global scientific collaborations we work in. This is particularly true during peak periods of scientific opportunity: when a new accelerator such as the LHC first makes a new energy range available for exploration; when the accelerator luminosity begins to rise rapidly after the early days; when an upgrade to the accelerator or detector gives the physicists unprecedented capability to measure and identify new physics signals above the backgrounds. In response to the renewed opportunities for discoveries, and the competitive pressure among experiments, physicists' demands on the resources tend to rise rapidly, and the potential for substantial oversubscription of resources and bottlenecks throughout the system increases.

In the extreme, the system may undergo a “meltdown” as a result of an inability to cope with the resource-demand conflicts, and to exert reasonable policies for network usage and data placement across the entire system. By construction, the potential for such a meltdown will be maximal during times when rapid turnaround in analyzing and understanding the physics results is most crucial.

C.4.5