Redar: A Remote Desktop Architecture for the Distributed Virtual Personal Computing Yuedong Zhang1, Zhenhua Song2, Dingju Zhu1, Zhuan Chen1, Yuzhong Sun1 1, National Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences 2, Department of Computer Science,University of Science and Technology of China {ydzhang, songzhenhua, zhudingju, chenzhuan}@ncic.ac.cn, yuzhongsun@ict.ac.cn Abstract Some popular computing technologies, such as ubiquitous computing, grid computing and thin-client computing, bring people to a much more distributed and pervasive computing environment. Based on these innovative technologies, a distributed virtual personal computing (DVPC) paradigm is coming into being. One of the fundamental challenges in the DVPC design is the virtually integrated but physically distributed desktop system. We proposed Redar, a remote desktop architecture for the DVPC. Redar integrates various user interfaces from diverse service nodes into one virtual desktop, and present the virtual desktop to the ultra-thin user client. The user interfaces currently supported by Redar include the application GUIs as well as the mobile storage interfaces. The key components in Redar are the GUI merger, the virtual desktop manager, the ultra-thin-client and the transport protocols. We have implemented Redar in a DVPC prototype system. According to our evaluation, Redar shows perfect display latency, storage I/O, overhead, scalability and robustness. 1. Introduction Personal computing came into a much more distributed and pervasive environment in the last decade, and this trend will keep on in the future decades. Grid computing [1], which is regarded as one of the technology trends in future decades, advocates coordinated resource sharing and problem solving in dynamic multi-institutional virtual organizations. Ubiquitous computing [2], also called pervasive computing, enables computers, intelligent terminals and digital devices to merge into human’s everyday life. Thin client computing [3] detaches processing logic and GUI display in the computer systems, and it provides an approach for terminals or devices with restricted capacity to borrow power from higher performance computers across the network. Based on the technologies described above, we propose a distributed virtual personal computing (DVPC) paradigm. Because of the Internet, a modern PC’s capacity is mainly out of the physical machine itself,and person really needs now is not a physical PC, but a networked personal computing environment. Moreover, the networked personal computing environment can be accessed through not only fixed machines on the desk, but also cell phones, PDAs, and other portable devices. Such personal computing paradigm composed of various user terminals and distributed computing resources is what we call distributed virtual personal computing (DVPC). There are many challenges in DVPC design, one of which is the user interface system. In this paper, we introduce Redar, a remote desktop architecture for DVPC. Redar provides a virtually integrated but physically distributed desktop environment for the thin-client users. It integrates various application interfaces from diverse service nodes into one virtual desktop environment, and presents the virtual desktop to the ultra-thin user client. Redar provides both application GUIs and some local device interfaces on the user terminal. We have prototyped Redar in a DVPC testbed, and the Redar prototype shows perfect display latency, storage I/O, overhead, scalability and robustness on the testbed. In the reminder of this paper, we describe our work in more detail. As a background, section 2 introduces the motivation, concepts, and prototype of the DVPC paradigm. Section 3 describes the design and implementation of the Redar architecture. Section 4 shows experiment results of Redar and evaluates its performance. Section 5 examines the related work. Finally, we conclude the full paper and introduce our future work in section 6. and all the experiments presented in this paper were done on this testbed. 2. Distributed Virtual Personal Computing 3. Redar Architecture Currently, the concepts of personal computing are no longer confined to the physical PCs. On one side, a PC is no more an isolated machine, and its capacity is out of the PC itself; on the other side, the human’s computing interfaces include not only the machines on desktops or laptops, but also portable devices like cell phones and PDAs. What users really need for personal computing are not physical machines or devices, but personal computing environments as well as the services that the environments can provide, and the physical machines and devices are mainly of interfaces to the personal computing environment. Based on the discussion above, we propose a distributed virtual personal computing (DVPC) paradigm. DVPC provides virtual personal spaces to users in a distributed environment. A virtual personal space consists of the virtual storages, the virtual devices, the virtual desktops and the applications. A user can access his virtual personal space from anywhere with a network connection, and the user terminal can be a PC, a thin client, or a portable device. By the technology of virtualization, DVPC enables large-scale resource sharing among many users and enhances resource utilization, which is also advocated by grid computing, utility computing, ubiquitous computing and thin-client computing. Redar is a user interface system designed for the DVPC, It provides integrated desktop like GUI environments to users by integrating the users’ application interfaces from divers service nodes into unified virtual desktops and delivering the virtual desktops to the user terminals. The application interfaces here denotes both the application GUIs and the device interfaces needed by the applications. Figure 1. DVPC Prototype We have developed a DVPC prototype in our lab. The prototype system aims at office applications for members in our research group. Figure 1 shows the architecture of the prototype system, which is just a simplified implementation of DVPC paradigm. We take the prototype as a testbed for the Redar research, Figure 2. Redar Architecture The Redar architecture is shown in Figure 2, and the main components in Redar are the the ultra-thinclient (UTC), the desktop integrator and the protocols. The UTC only consists of peripheral devices needed for user interface, and it is much thinner than the ordinary thin-client system. The desktop integrator integrates a user’s applications and other resources into a virtual desktop environment, and it is composed of a framebuffer based GUI merger and a desktop manager. There are three sets of protocols work in the Redar: the terminal protocol transport data between the desktop integrator and the ultra-thin-client; the GUI protocol transport data between the applications and the GUI integrator; the storage protocol transport data between the file server and the desktop manager. Out of the three components above, there are GUI grabbers running on the application nodes and the user’s virtual storage space on the file server. The system works as follows: The user applications run on the service nodes, and their interfaces are captured by the GUI grabbers and delivered to the GUI merger by the GUI protocol. The GUI merger merges the user’s GUIs into the virtual framebuffer, and the virtual framebuffer is mirrored to the ultra-thin-client by the terminal protocol. The inputs go a reverse way. The user has his own virtual storage space on the file server, and his local storage is mounted on the virtual storage space. The storage data are transported between the ultra-thin-client and the file server by the storage protocol. The starting and ending of the applications and the mounting and un-mounting of the local storage device are all controlled by the desktop manager. 3.1. Remote Display Protocols The GUI protocol and the terminal protocol in Redar are all remote display protocols. Generally speaking, there are two sorts of remote display protocols. The first sort is the high level remote display protocol, which delivers high level commands and throws most of the rendering work to the client side, and the typical one is X11 [4]. High level protocol has higher encoding efficiency and brings less bandwidth, but it usually is platform dependent, and leaves heavy burdens on the client side. The second sort is the low level protocol, which delivers low level data, such as bitmap data, frame buffer information and input events, and the typical system is VNC [5]. This approach is platform independent, and lefts little rendering work to the client side, but brings more bandwidth. In terminal protocol, we employ our gDevice[6] design, which a low level protocol like RFB[7]. This is because: 1) in GUI merger, the display information has been turned into framebuffer images; 2) low level remote display protocol enables ultra-thin-client, because it leaves little rendering work to the terminal; 3) according to our experiments and others’ research [8], low level remote display protocol performs better on resource-aware or portable devices. In GUI protocol, we employ a high level X11 protocol, which transports high level commands. This is because: 1) in the service area, it is not important that which side to render the display image, because whether side renders the image, the rendering loading still lies on the service nodes; 2) high level display commands decrease the network traffics; 3) it is easy to perform window merging basing on X11 protocol. 3.2. Framebuffer Based GUI Merger The GUI merger is the kernel part in Redar, and its central part is a virtual framebuffer. The framebuffer is just an abstract of the display device, and represents the video memory in a high level view. For example, for a display device with 1024×768 pixels and 24 bit color depth, the framebuffer is represented as a 1024× 768×3≈2.3M bytes memory space. To save system memory, there is no real framebuffer in the GUI merger, but only a virtual memory address space corresponding to the framebuffer. The GUI integration is executed by the inputer of the virtual framebuffer which writes display information received from different applications into the framebuffer. The framebuffer outputer acts as a GUI server for the user terminal and delivers the framebuffer updates to the terminal. In implantation of the GUI integrator, we take an X server as the inputer of the virtual framebuffer, and a modified RFB server as the outputer of the virtual framebuffer. The X server regards the virtual framebuffer as an X display, and writes GUIs from several X clients into the virtual framebuffer to perform the GUI integration. The framebuffer is a virtual one, and when the X server writes data into the virtual framebuffer, it redirects the data to the modified RFB server immediately. 3.3. Mobile Storage Support Local mobile storage is often neglected by former thin client system, but it is important in modern distributed computing environment. Redar supports two kinds of storage devices, one kind is ROM disk, such as CDROM and DVD; the other is USB storage, including some USB interface digital devices, such as mp3 player and digital camera. The local mobile storage is mounted on the virtual storage space as a network block device (NBD) [9], and the data is transported by the gDevice protocol. 3.4. Virtual Desktop Manager The virtual desktop manager (VDM) is the basic application to a user, and it contains icons, menus, task bars and window managers. VDM acts as both a DVPC access interface for the user and a privilege control tool for the DVPC system. The fundamental difference between our VDM and the ordinary desktop managers is that, 1) besides acting as an interface and tool for users, VDM also acts as a gatekeeper for the DVPC environment; 2) behind of the VDM, there exists a distributed computing environment rather than a single machine, so many working mechanisms in VDM are more complex than the ordinary desktop managers. The virtual desktop manager is implemented on the GNOME [10] system, which is a widely used desktop manager on the X11 platform. We make following modifications to GNOME version 2.0: 1) The work mechanism behind the icons and menu items is changed. Usually, an icon or menu item on the desktop maps to an executable file or script file. When the icon or menu item is clicked, the mapped file executes. We modify this mechanism: when an icon or menu item is clicked, a request is send to the scheduler; the scheduler invokes corresponding application or virtual devices on the remote node for the user; the application’s GUI is pushed to the user’s GUI merger through an X connection. 2) Icons and menus are configured according to the users’ privilege. To prevent users from invalid operations, a user’s GNOME session initializes the icons and menus according to the user’s privilege. We implement a UTC with KVM (keyboard, video and mouse) devices, CDROM driver, and USB storage interface, as is shown in figure 3. We take the Linux framebuffer device [13] as the video device, which represents the frame buffer of some video hardware, and allows application software to access the graphic hardware through a well-defined interface. Our UTC implementation is much simpler than the ordinary RFB clients, which are always high level applications running on the X window system. 4. Experiments and Evaluations 3.5. The Ultra-thin-client The Redar terminal is designed as an ultra-thinclient (UTC), that is, the Redar terminal is much thinner than the ordinary thin-client systems. We turn the terminal into “ultra-thin” by such measures: 1) The terminal is designed just as a set of networkattached human interface devices, which only consists of the peripherals needed for the user I/O and a network component to perform the communication. The design is just like a prediction to the future network computer in 1995 [11], and we have worked a lot on the network-attached peripherals [12]. 2) Little rendering work is left to the terminal. Because we implement a virtual framebuffer in the GUI merger, almost all of the rendering work is done by the inputer of GUI merger, so there is little rendering work left to the terminal. This rendering strategy both fits for the design shown in figure 5 and lowers the processing consumption on the terminals. The UTC design greatly lowers the complexity and costs of the terminal system, and this design extremely fits for the low-cost thin-client systems and the resource-aware portable devices. Server1 Server3 Gateway Client1 Client2 Client3 Benchmark Server Client4 Figure 4. Experimental Testbed The topology of the Redar testbed is shown in figure 4. The testbed consists of four servers and several clients. Three of the servers act as service nodes, and the other one acts as a benchmark server. The computers lie in two 100M bps Ethernets connected by a gateway. The configuration of the computers in the testbed is shown in table 1. Table 1. Testbed Configuration Server 1 Server 2 Server 3 Benchmark Server Clients Figure 3. The UTC Implementation Server2 CPU Intel P3 1GHz ×2 AMD Opteron 265 ×2 Intel P4 2.8GHz Intel P3 800MHz Intel P3 800MHz Memory OS 1 GB 8 GB Fedora Core 4, Linux kernel 2.6 512 MB 256 MB 256 MB MS Windows Redhat 9.0, kernel 2.4 Because the DVPC prototype system aims at office applications in our lab, we take some typical word processing programs, network tools and developing tools in evaluating our system. Further, although stream media is not a usual office application, it is generally adopted in benchmarking remote display system, so we take it into our experiment. For web browsing and stream media applications, we take the famous i-Bench benchmark suite [14] in the evaluation. The system’s overhead is measured by some traces we plugged into the system, and the CPU and memory utilizations are recorded periodically by the traces. 4.1. Remote Display Performance In remote display performance evaluation, we measure the application startup speed, web page open latency, stream media packets missing rate, and media file display lag in the experiment. We compare Redar’s performance with that of the local and a former DVPC remote display system. The former display system is purely based on the X11 protocol, with the application invoke and close implemented on the terminal and the access control implement on the gateway. The local performance is directly measured on Server 1. Figure 5. Application Startup Speed Figure 5 shows the startup speed of the typical applications in our system. As is shown, Redar is much more instant in startup applications than that of the pure X11, and it is almost as fast as the local display. Table2. Web: Average Page Latency Average Page Latency (S) Local Redar Pure X11 0.39 0.51 1.10 The average page latency benchmarked by the iBench 3.0 is shown in table 2. Redar is a little slower than the local display, but is more than 2 times faster than the X11. Table 3. Stream Media: Packets Missing Rate Packets Missing (%) Local Redar X11 7 12 38 The packets missing rate in stream media display benchmarked by i-Bench 3.0 is shown in table 3. The local display misses 7% of the media packets, and Redar misses 12%. The X11 misses38%, and the video jumps severely on the client, thus brings very bad experience to the users. We measure the display lag of the stream media by screening a sect of media file with the Realplayer, and calculating the running time. The media is 29 seconds long, and its size is 360×240 pixels. We measure both the original size display and the full screen size display. As is shown in figure 6 the display on Redar client lags a little bit, but the lag is hardly perceivable. However, on the pure X11 client, the display lags about 20% in original size, and runs more than double time in full screen size. Figure 6. Stream Media: Display Lag X11 performs worse than Redar on our testbed because the synchronization operations in the X protocol lead to great latency and lots of additional traffics. The X11 protocol has a synchronization mechanism in itself, this mechanism ensures hardly any packets lost in an X sessions. However, this mechanism barrages the data transport in an X session, especially when the data size is large and the network hops increase. 4.2. I/O Performance of UTC Storage Table4. I/O Performance of UTC Storage Maximum Average I/O I/O (KB/s) (KB/s) (%) USB 184 137.9 62 Reading USB 228 147.5 67.3 Writing CDROM 85.3 51.2 100 Reading Redar provides CDROM driver and USB storage interface on the UTC, and table 4 shows the I/O performance of the UTC storage devices. As is show in the table, the USB reading speed achieves 184KB/s in maximum, and 137.9KB/s in average, which is 62% of the total device I/O bandwidth; the USB writing speed achieves 229KB/s in maximum, and 147.5 KB/s in average, which is 67.3% of the total device I/O bandwidth; the CDROM reading speed achieves 85KB/s in maximum, and 51.2 KB/s in average, which is the full bandwidth the device can provide. 4.3. Overhead The Redar’s overhead is mainly brought by the display system, and the storage brings little overhead, which is often less than 0.5% of the system’s CPU and memory, so we only evaluate the display overhead of the Redar system. The display overhead is measured on both the server side and the client side. our UTC is based on the Linux framebuffer driver, which cannot provide hardware acceleration, so all the rendering work is done by the CPU. On the server side, we measure the CPU and memory overhead brought by the GUI merger in three typical cases, which are standby, i-Bench execution, and stream media execution. The experimental result in figure 8 is executed on server 1, which has the weakest processing capacity in the three server nodes. As is shown in figure 10, the memory overhead of the four cases is very smooth and average. The memory overhead is about 1.3% which affects the system little (because it is hard to show memory in “%” in the figure, we use “‰” in the figure). The CPU overhead varies greatly in the three use cases. When the client is standby, the CPU overhead is about 1%. When the iBench runs, the average CPU overhead is about 5%, and the peak is 12%. The heaviest CPU overhead comes from the stream media display, and the average CPU overhead is about 15%, and the peak is 20%. Thus, a Redar GUI merger brings 1% to 15% CPU overhead to server 1, and the average is lower than 5%; the peak is about 20%. The CPU overhead of GUI merger brings a little burden to server 1, but to a stronger server, such as server 2, the overhead is negligible. Memory ‰ CPU % 20 Memory CPU Standby i-Bench Media 15 10 Figure 7. Overhead on the Client On the client side, we compare the CPU and memory consumption of the Redar UTC and the X based VNC viewer. We take three typical cases in our experiment, which are standby, web browsing and stream media playing. The experimental results are show in figure 7. Compared with VNC viewer, our UTC saves about 75% memory, but the CPU consumption the UTC is a bit more than that of the VNC viewer. UTC saves memory consumption by loading less software and reducing the data copy operations. The software suit needed for UTC is prominently smaller than that of the VNC viewer, which runs on the Xlib. Further, the data transport and processing in UTC is simpler than that of the VNC viewer, so there is less memory need for buffering or caching data. The reason why our UTC consumes more CPU cycles is: the VNC viewer runs on the Xlib, and the Xlib can use the video card acceleration provided by the device driver below the Xlib; however, 5 Time Figure 8. Overhead on the Server Node 4.4. Robustness and Scalability The robustness and scalability of Redar is measured both in our everyday using and by the benchmark. About fifteen members of our group log on the DVPC prototype system in daily working, and the Redar and the DVPC prototype system shows perfect robustness in our working. Further, to test the scalability quantitively, we execute a stress test on the weakest node in the testbed, i.e. server 1. The test is perform by startup several GUI mergers on the same server node and executing an iBench on each client. The curves in figure 9 shows the CPU and memory overhead tracks on sever 1 as the GUI integrators running on it increase, The memory overhead increase linearly as the stress increase, and because each GUI merger brings only less than 1.3% memory overhead to the system, the memory overhead is not a bottleneck to the system’s scalability. The CPU overhead hardly increases as there are more than five GUI mergers, thus the CPU percentage for each GUI merger decreases. Concurrently, the average page latency measured by the i-Bench increase when there are more than five GUI integrators. Hence, the system’s scalability is mainly confined by the CPU overhead. However, a user’s daily working load is much lighter than the i-Bench running load, so it is easy for server 1 to serve 5 users in average. To the other two servers in our system, more users are able to be served There are some recently developed systems similar to DVPC. HP SoftUDC [17] is a software-based costeffective and flexible solution to the quest for utility computing. The main difference between SoftUDC and DVPC is that, SoftUDC mainly concern the problem of isolation and migration basing on virtualization, where as DVPC mainly concern the problem of integration basing on virtualization. Stanford Collective [18] is a system that delivers managed desktops to personal computer (PC) users. The main difference between Stanford Collective and DVPC is that Collective’s main motivation is to achieve better security and lower cost of management, and its server side is a centralized architecture. HP Interactive Grid [19] is a grid computing architecture designed to support graphical interactive sessions. The fundamental difference between our DVPC and HP Interactive Grid is that we mainly focus on the distribution features of the future personal computing environment and we address the challenges brought by such features, such as GUI merging, virtual user space construction, etc. Moreover, the system above only provides remote KVM devices, and no other devices, such as USB storage or CDROM are provided. 6. Conclusion and Future Work Figure 9. Scalability Test 5. Related Works A number of point to point remote display systems have been developed. X11 was originally developed to display GUI on the UNIX like operating system, and now a well-rounded protocol. RDP[15] and ICA[16] are remote display systems designed for the Microsoft Windows platform. Because of high-level platform dependent commands, the clients of the RDP or ICA system should be MS Windows family operating system or special terminal software, so the clients are comparatively “fat”. VNC takes a low-level approach and uses a single encoding mechanism providing a simple and portable solution, so the system is operating system independent. Because most of the VNC viewers are designed as high-level applications, the VNC client cannot be very thin without modifying. The main limitation of these point to point remote display technologies is that they do not handle the problems brought by distributed environment, so they do not fit for our DVPC in default. We introduce Redar: a remote desktop architecture for the distributed virtual personal computing (DVPC). DVPC is a computing paradigm proposed to accommodate the more and more distributed and pervasive personal computing environment. The main innovative works in Redar are: a remote desktop architecture for the DVPC, which provides an integrated desktop window-style user environment in a distributed computing environment; a framebuffer based GUI merging mechanism, which is perfect for GUI merging under distributed computing environment; a virtual desktop manager, which acting as both an resource manager and a security control tool for the virtual desktop; an UTC system, which brings lower CPU and memory utilization than the formers. According to our experiments, Redar architecture brings better resource utilization and load balance in distributed personal computing environment, and it performs perfect in display latency, overhead, robustness, and system scalability. Compared with other similar systems, Redar shows it superiorities in scalability, efficiency, flexibility, and resource utilization. The research on Redar is still in an initial stage, and there are many works to be done in the future. Firstly, more kinds of devices should be supported on the client side. Personal computing environment consists of not only GUIs and local storage interfaces, but also audio devices, printing services, etc. In the next stage work, more kinds of devices should be supported on the user terminals. Secondly, a more flexible rendering strategy is needed. In current version of Redar implementation, we adopt an absolutely server-side GUI rendering strategy. To some stronger clients, this strategy wastes client processing capacity and consumes more network bandwidth. So a more flexible rendering strategy, which can automatically balance the loadings between the server and the client, should be developed. Thirdly, more technologies should be developed to guarantee the system’s performance in a more complex network environment. The experiments in this paper are made in a stable network environment. When the system is ported to a more complex environment, its performance, such as latency, overhead, robustness and scalability, will be greatly affected. So many technological measurements should be developed to guarantee the system’s performance in a more complex environment. References [1] I. Foster, C. Kesselman and S. Tuecke, “The anatomy of the grid: Enabling scalable virtual organizations,” International Journal of High Performance Computing Applications, Vol 15, P.200-222, 2001. [2] Roy Want, Trevor Pering, Gaetano Borriello and Keith I. Farkas, “Disappearing Hardware,” IEEE Pervasive Computing, January–March, 2002. [3] Sharon A. Wheeler, “Thin-client/Server Architectures,” http://www.espipd.com/ November, 2000. [4] R. W. Scheifler and J. Gettys, “The X Window System,” ACM Trans. Gr., 5(2):79–106, Apr. 1986. [5] T. Richardson, Q. Stafford-Fraser, K. R. Wood, and A. Hop-per, “Virtual network computing,” IEEE Internet Computing,2(1):33–38, 1998. [6] ZHANG Yue-Dong, YANG Yi, FAN Jian-Ping, MA Jie , “gDevice: A Protocol for the Grid-Enabling of the Computer Peripherals,” Journal of Computer Research and Development, 2005 42, (6). [7] Tristan Richardson, “The RFB Protocol,” AT&T Labs Cambridge Whitepaper, March, 2005. [8] Yang SJ, Nieh J, Krishnappa S, Mohla A, Sajjadpour M, “Web browsing performance of wireless thin-client computing,” Proceedings of the twelfth international conference on World Wide Web, 2003. [9] P.T. Ares, “The Network Block Device”, http://www2.linuxjournal.com/article/3778, May,2000. [10] http://www.gnome.org/. [11] Brodersen, R.W, “The network computer and its future, “IEEE Solid-State Circuits Conference, San Francisco, Feb. 1997. [12] Zhang Yuedong, Yang Yi, Sun Yuzhong, Fan Jianping, “Network-attached Smart Peripheral for Loosely Coupled Grid Computer,” Proceedings of the 8th international conference on high performance computing in Asia Pacific region, 2005. [13] Alex Buell, “Framebuffer HOWTO,” http://www.faqs.org/docs/, Feb. 2000. [14] i-Bench version 1.5. http://etestinglabs.com/ benchmarks/i-bench/i-bench.asp. [15] B. C. Cumberland, G. Carius, and A. Muir, “Microsoft Windows NT Server 4.0, Terminal Server Edition: Technical Reference,” Microsoft Press, Redmond, WA, Aug. 1999. [16] Citrix White Paper, “Citrix MetaFrame 1.8 Backgrounder,” Citrix Systems, June 1998. [17] Mahesh Kallahalla, Mustafa Uysal, Ram Swaminathan, David Lowell, Mike Wray, Tom Christian, Nigel Edwards, Chris Dalton, Frederic Gittler, “SoftUDC: A Software-Based Data Center for Utility Computing,” IEEE Computer, November 2004. [18] Ramesh Chandra, Nickolai Zeldovich, Constantine Sapuntzakis, Monica S. Lam, “The Collective: A CacheBased System Management Architecture,” Proceedings of the 2nd Symposium on Networked Systems Design and Implementation, May 2005. [19] Vanish Talwar, Sujoy Basu, Raj Kumar, “An Environment for Enabling Interactive Grids,” Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, 2003.