New Vision for an All Optical Network White paper for Workshop on New Visions for Large-Scale Networks submitted by V. Anantharam, C. Chang-Hasnain, J. Kahn, D. Messerschmitt, D. Tse, J. Walrand, and P. Varaiya EECS Department, University of California, Berkeley Optical packet switching needs to be fundamentally different in conception because of the absence of a viable analog of the electronic packet buffer. We propose a vision for an all optical network based on deflection routing, erasure codes at the packet level, and the building of applications based on novel transport level primitives. Research is necessary both at the physical layer, to design and develop the required optical switches, and at the systems level, to define the necessary transport primitives and to develop the coding and routing mechanisms that will realize them. Contact information : Venkat Anantharam or Connie Chang-Hasnain, 231 Cory Hall, EECS Department, University of California, Berkeley CA 94720. ananth@eecs.berkeley.edu , 510-643-8435 (VA) cch @eecs.berkeley.edu, 510-642-4315 (CCH) New Vision for an All Optical Network Optical packet switching needs to be fundamentally different in conception because of the absence of a viable analog of the electronic packet buffer. We envision research into a novel paradigm for design of an all optical network. The research addresses both systems issues and physical layer issues. The physical layer research is geared towards the design of faster optical switches with as much switching flexibility (defined below) as feasible. The systems research is geared toward incorporating ideas from coding theory into the networking arena and the building of applications based on novel transport layer primitives. We first describe the idea in general terms. Then we describe the physical layer challenge and subsequently the systems level challenge. We would like to create an optical packet switch without an optical buffer. Since switching necessarily involves contention, this would seem to be an impossibility. To understand why this is not the case, imagine the following simple problem : the design of a 2 by 2 packet switch. Imagine that time is slotted. At each time slot, at each input, one packet could arrive which might be destined to either one of the outputs. If the two packets arriving in a time slot are both destined for the same output, one or the other packet is going to lose. However, imagine that the very same two packets show up once again in the next time slot. Then the losing packet can now be the winner. Thus both packets have got through to their desired output. More precisely, one copy of each packet has got through to the desired output. The other copy has been misdirected and will eventually have to be taken out of the network at subsequent switches. This can be done, for instance, by having the notion of dead output directions at each switch which effectively bury packets. One feature that emerges from the above discussion is the notion of redundancy. Indeed packets need to be replicated and the guarantees that the network will have to provide at the applications layer level is that the applications will work as long as a reasonable fraction of the packets involved get through in the desired direction. This is part of the systems challenge and will be discussed in that section. This a challenge both at the application layer and at the transport layer : not only does one have to define what kinds of transport functionality it is appropriate to treat as fundamental, but one also has to define how such functionality is to be realized. Note that the way in which packet redundancy is introduced will need to be much more sophisticated than mere packet replication. This is because, first of all the traffic streams encounter multiple switches between the source and the destination, and secondly while bandwidth on the links is very large it might still pose a bottleneck for bandwidth hungry applications. The second feature that emerges from the above discussion is the need for packet by packet optical switching. At the current level of technology this is not yet feasible. The physical layer challenge is to get as close to this goal as possible. This has two aspects to it. One is to switch as small groups of packets as possible. This relates to the speed of the switching (for instance the rate at which mirror angles can be changed in a MEMS mirror based switch) relative to the packet rate. The second has to do with the topological flexibility of the switch. This relates to which switching patterns one can move to from a given switching pattern, and this is constrained by the technology, so it is a physical layer issue. This is discussed in more detail in the section on the physical layer challenge. The physical layer challenge In spite of all the recent, greatly publicized research and product development activities, there has not been successful demonstration of optical switches that have reasonable dimension, i.e. larger than 32x32, and meet all the necessary performance, physical size and power requirements. The necessary performance requirements include channel crosstalk, insertion loss and reliability at the minimum. Most optical switching fabrics suffer from scalability and/or crosstalk problems. The control electronics for the optical switches, requiring multiple digital feedback loops for each physical switch, may be larger and consume more power than the electronic switching fabric that the optics intend to replace. In addition to the above-mentioned issues, all of the all-optical switches are too slow for all-optical packet-switched networks. Even though a single switch made of LiNbO3 or semiconductor optical amplifier (SOA) can latch on within several nanoseconds, the speed drastically slows down when the switch matrix dimension is increased. Simple calculation shows that the speed is on microsecond scale for a 16x16 matrix. The MEMS, liquid crystal, and thermal ink-jet based technologies are nearly three orders of magnitude slower. In this program, we propose to examine this problem by first separating an optical switch into two major blocks of channel (wavelength) switch and fiber (space) switch, and determining the application requirements for each. With the separation of channel-switch and fiber-switch, the total hardware requirement for the switch fabric may be drastically reduced, and thus potentially resolving the scalability problem. Crosstalk and speed considerations will be examined separately for metro and WAN applications, where the distance requirements and physical architectures are very different. By focusing on the metro application, where the traffic is more bursty and LAN like and thus optical packet switching can create a tremendous impact, we believe the performance requirements can be greatly relaxed. Similar idea of focusing on a smaller size network for all-optical packet switching is also discussed recently in the Optical Internetworking Forum. Finally we address the speed of the optical switches. We propose to develop nanosecond tunable VCSEL array technology, with full tuning range covering full C- and L- band. These devices can perform cost-effective channel switching within a given fiber. Although this form of wavelength switching is more optoelectronic rather than all-optical in the strictest sense, it is generally believed that the performance and reliability can be far better. Presently, the tuning and latching speed of commercial tunable lasers with a wide tuning range is around 1-10 millisecond. In this program, we propose to explore nanosecond tuning and latching design, including a proposal for nanosecond wavelength locking. In addition, we plan to develop a space-based switch with nanosecond latching time using either SOA or semiconductor electro-absorptive device technologies. The systems level challenge In our view this challenge is comprised of two interrelated parts : (1) Defining appropriate transport layer primitives that facilitate the building of applications over this radically different network. (2) realizing these primitives in the architecture. The issues related to the former will become clearer after a discussion of the latter, so we discuss the latter challenge first. Realizing transport layer primitives As currently conceived the network supports two basic kinds of point to point transport primitives : unreliable transport (UDP) and reliable end to end transport (TCP). In addition there is a strong push to build in QoS related primitives, primarily using MPLS (multiprotocol label switching), which is a link layer technology to set up label switched paths and DiffServ, which defines per hop behaviour of packets through a DS field in the packet. Further, there are some multicast primitives available. To succinctly illustrate that challenges involved in realizing transport layer primitives in the new all optical network we conceive of, let us focus on the basic example : achieving reliable end to end transport. As described in the introduction, packets in our all optical network will not be guaranteed to leave a switch in the direction in which they would like to go. The way we propose to overcome the problem posed by this is to introduce redundancy at the level of packets. The controlled use of redundancy to overcome erasures of symbols is the province of coding theory. A deep and extensive body of research has been built up over the past few decades in this area. In brief coding for erasure permits the reliable transport of a fixed number of symbols (packets representing the session in our case) by transmitting a different, larger number of symbols, and providing schemes by which the desired information can be recovered only if a fraction of the actually transmitted symbols are received. Coding to combat erasures has begun to make its appearance as a practical technique in the networking arena. The primary driver behind this development, which is associated with the concept of tornado codes and is being commercialized by a company called Digital Fountain, is multicast streaming applications, where acknowledgement based transport protocols like TCP lead to ack overload. Our conception of the use of erasure correcting codes in the novel network is fundamentally different and poses several significant new challenges. To illustrate the issue, note that the applications we envision supporting are not just streaming but also interactive applications and applications with stringent real time constraints that cannot use buffers to smooth out jitter, as many streaming applications do. This means that small bursts of packets will need to find their way through the network from the source to destination in real time. Such a burst of packets will typically encounter several switches along its route and the packets that comprise it face the possibility of misdirection at each of the switches. Further, the switching speed of the switches is expected to be slow relative to the packet rate (part of the physical layer challenge is to mitigate this problem to the extent possible), so the misdirection will also occur in bursts. How does one code to ensure that the packet burst nevertheless gets through reliably ? This is a central problem we propose to tackle. It should be noted that deflection routing has been proposed earlier before in the context of interconnection networks. Here it is possible to reroute deflected packets by once again injecting them into the network, giving them a second chance to go the correct way, as it were. One does not expect this to be feasible in a large scale network. On the other hand, there is the possibility of exploiting spatial redundancy. Indeed packets could be sent along multiple paths from the source to the destination and the recovery of the intended packet burst could be attempted based on the aggregate of all the packets received from all the different paths. This possibility will play an important role in our research. Defining appropriate transport layer primitives A fundamental issue that arises in such any new network is : what are the appropriate primitives to use on which to build applications ? In the network we envision, the deflection of packets might be viewed not just as a disadvantage, but as a potential advantage for realizing multiagent primitives. One of the most important emergent uses of the network is in facilitating collaborative activities between small (or occasionally large) groups of end users. This is radically different from the unicast and multicast applications that have dominated so far. Such applications naturally need the ability of work with primitives which ensure that information is replicated at certain well defined subsets of hosts. Deflection in an all optical network along the lines we propose gives a natural mechanism for establishing such consistency between the participating hosts. We propose to systematically study the issue of defining the best transport level primitives in a deflection and erasure coding based all optical network. The research approach we will follow is based on a compositional view of the function of a network, as opposed to a decompositional view, i.e. we will work from the bottom up to define mutually intercompatible basic objects from which larger scale applications of the desired semantic characterisitcs can be easily created by a simple matter of picking and interconnecting the appropriate components.