You have been commissioned to develop a machine that transports the senses between Seattle, Australia, and India

Bruce Campbell

Industrial Engineering PhD Qualifying Exam

January 2003

Building a Collaboration Machine

Industrial engineers are often well schooled in the consideration of human factors in an industrial setting. The broad field of human factors includes considerations of human physiology, psychology, ergonomics, and man-machine interfaces. The advent of rapidly growing communication infrastructures provides industrial organizations the opportunity to expand geographically while still maintaining the sense of a connected workspace. Although typical synchronous, computer-mediated communications consist of e-mail, videoconferencing, teleconferencing, and perhaps streaming broadcasts and real-time text chat, technology being worked on in the world’s research laboratories today suggests remote computer-mediated communications should transform industrial organizations in the near future. There are no show-stopping technological details confronting the world that haven’t already been solved in some lab somewhere. It is time for engineers to put together prototypical systems that demonstrate the leapfrog possibilities of remote presence over available communication channels. But, how does an industrial engineer coordinate a project aiming to accomplish just that?

Building a Collaboration Machine

As an example, consider an industrial engineer who has been commissioned to develop a machine that transports the senses between Seattle, Australia, and India. The machine will be used to provide a remarkable new way to communicate and collaborate across vast distances. The task for the project engineer is one of taking her understanding of how humans process information and systematically developing a machine and evaluation protocols that can be used to characterize and assess the fidelity of a machine as it relates to human collaboration between any two to ten people simultaneously. A typical budget of $2,000,000 in US dollars, and a two year timeframe, forces the engineer to be creative yet keep a sense of urgency in order to mitigate the risk of project cancellation.

Today’s collaborative technology projects often dictate other budgets besides financial ones. For example, bandwidth is often a precious resource that requires budgeting wisely. Visual content requires polygon and texture budgets in order to allow graphics subsystems to provide a satisfactory visual experience. Typical technological budgets for an industrial solution might include a 100-mbit/sec bandwidth for communications and a million actively presented polygons and 16MB of texture memory for visual display.

The project engineer considering such an endeavor would be wise to consider the perspective of a software engineer planning a large software project. Software engineers have learned how to modularize big problems in order to focus on pieces while considering the whole solution. Modularization affords working in an iterative fashion where continuous progress can provide the project lead and others working on the project with optimal motivation throughout the project timeline. The tested tenets of RADical programming (RAD stands for Rapid Application Development), taught with great success by professors such as James Collofello at Arizona State University, provide a blueprint for building a successful iterative process [1].

The Iterative Development Process

For an iterative process to succeed, the necessary components of the design must be documented in architectural discussions. The system architecture must be first priority as it provides the overall blueprint each subgroup and iteration works toward fulfilling. The architecture identifies the methods for combining system modules appropriately. Each module can be worked on independently to build better and better components once the interfacing architecture is in place. The first iteration of a collaboration machine might yield a device that looked like an XBox with XBox Live capability yet provided a better true 3-D experience. It would add a basic haptic device (and any other desired components not currently in the XBox platform) as a placeholder to put emphasis on sharing touch in the solution.

Modular design allows different groups to make progress at different rates and yet still allows their most recent work to be immediately useful. The whole planet marvels at the rapid development of the World Wide Web that has succeeded because Web browser developers can work independent of Web server developers who can work independent of data definition language developers. The World Wide Web Consortium (W3C) architects the Application Programming Interfaces (APIs) between components and each component group builds the best components that comply with the APIs [2]. Everyone has noticed that computing hardware componentization is improving faster than ever as interfaces like USB ports and firewire ports have been successfully defined.

Because the collaboration would require the sharing of meaningful data spanning large geographic distances, the project lead would spend a significant amount of time at the networking layer building the simplest communication stack of protocols that could deliver useful streams between collaborators. Given a satisfactory stream between users, the rest would entail building a superior client system for maximizing bandwidth to the brain that best represents the fidelity of the each collaborator’s intent.

As the project lead takes inventory of her own skill set, she would need to involve others who had expertise in the different modules’ function. She would be wise to delegate one person to be in charge of each module in the machine’s design. She would spend a great deal of time ascertaining that each of the project module leads knew the overall system APIs by heart just as a football team knows its playbook. Key delegation of responsibility for a collaboration machine would include one person each for the client machine architecture, messaging architecture, vision system architecture, sound system architecture, haptics system architecture, application software, and system usability. These domain experts would have the following project responsibilities:

The client machine architect would be responsible for the architecture of the computing machine. Perhaps such machine would be a standard workstation already on the market or perhaps one might need to be modified. Expertise such an architect would provide would include understanding of the latency of internal pathways such as any system bus, multithreading of multiple processors, speed of calculations and specialized co-processors, memory hierarchy, and data drive access speeds/throughput.

The messaging architect would be responsible for passing data to the network stream and receiving data from it. She would know all events that can occur within the local system and consider the importance and immediacy of their communication to others. She would understand the concept of aura and flow and would consider ways to derive intent from the sensual actions of users. She would know all about packet packing and unpacking.

The visual system architect would be responsible for all peripheral devices that mediated the visual experience of each user. She would have great knowledge of the human vision system and be well versed in computer graphics algorithms, video processing, and leading edge research in potentially new and different solutions.

The aural system architect would be responsible for all peripheral devices that mediated the aural experience of each user. She would have great knowledge of the human aural system and be well versed with spatialized and localized sound algorithms and sound processing. She would be interested in leading-edge research regarding potentially new and different solutions.

The haptics system architect would be responsible for all peripheral devices that mediated the haptic experience of each user. She would have great knowledge of the human system of touch and feeling and be well versed with haptics algorithms and emerging haptic hardware solutions.

The application software developer would be responsible for developing applications that would facilitate collaboration among users. She would understand group dynamics and emergent behavior as well as be a top-notch programmer and system code integrator.

The usability engineer would be responsible for verifying the usefulness of the collaboration system as well as provide input as to the correctness of the design. She would be responsible for testing the system with potential users and developing a feedback cycle that aligned well with the iterative approach.

Human Considerations

A successful collaboration machine design would demand world-class human-machine interface research. The perspective of the machine from the human’s point of view would be quite different from the representation of that point of view in the machine itself. Such a situation is necessitated today by the fact the two entities (human and computing machine) are so different architecturally. The computing machine perspective is understood as a result of it being engineered and built by human designers from the bottom up. The more difficult problem to research is how the human would thrive using the computing machine as a facilitating agent.

From the perspective of a human participating in a collaborative session, the system would strive to maximize bandwidth to the brain by matching system capabilities and information flow to the physiology and neural processing of the human user. Of course the most important aspects of human information processing would vary dependent on the task at hand. If collaborators were reviewing a written document, for example, the system would facilitate symbol processing and text clarity. If collaborators were instead sharing an architecture walkthrough of a proposed structure such as a light rail boxcar, the system would facilitate a psychologically accurate visual fidelity of the space and match the 3-D environment to the visual system of each user.

Much has been written about the discomforts of text chat during collaborative sessions where participants are focusing discussion on a 3-D visual scene. For example, an in-depth review of this text chat problem lead to the justification of the On-Live Traveler approach of eliminating text chat and adding a voice channel between participants [3]. And yet, a voice communication channel requires significantly more bandwidth than a text-based channel. A text chat channel may very well be appropriate when collaborators are focusing on shared sound content such as a piece of music or a sound recording of the Costa Rican rainforest. In that case, it makes more sense to allocate available bandwidth to the sound content channel.

Any effective collaboration machine must consider both the physiology and the psychology of each human participant. With all our senses, research continues making headway understanding what each physical sensual system is capable of representing in signals to the brain. Enough has been documented to date to facilitate optimal peripheral designs for a $2,000,000 budget machine. But cyberspace engineers are becoming more aware of the mental focus of those signals that are made available to the brain from our sensual systems. This emergent research, facilitated by advances in PET scan and related technologies, provides strong suggestion that human mental processes use only a subset of sensual signals to build a real time mental model of the environment around them. Our brain has evolved to filter, suppress, amplify, and manipulate signals in order to focus our actions and conscious thought on a limited number of items at a time. And yet, these items become fully formed as organized mental objects. This mental process was first described elegantly as Gestalt theory in 1924 [4].

So, it would follow, an ideal collaboration machine would be able to extract those sensual signals that most mattered in the current mental model of any collaborator that wished to communicate thought with his or her peers. Emphasis on intent is already a serious research agenda in considering the visual, aural, and haptic device user experience. For example, researchers at Cornell University have developed visual perceptual metrics from studying mental model formulation and have most recently applied their metrics to reduce overcomputation of global illumination beyond what a human observer processes mentally [5]. Researchers at the University of Cambridge have developed a perception model for the prediction of aural thresholds, loudness, and partial loudness that helps them improve cochlear implants [6]. Popular computer algorithmic routine libraries include the optimizers, such as dynamic level-of-detail polygon fragmentation for 3-D models, that provide useful system features a machine could use to vary presentation based on intent. The difficult problem is extracting the participant’s intent and then mapping it to a multi-modal sensual presentation system with limited bandwidth.

Cyberspace researchers enter into much debate as to the benefit of 3-D presentation versus the 2-D presentation of many existing computer-mediated collaboration systems today. Microsoft Reasearch’s Social Computing Group discovered that 3-D solutions built with their development platform did not outperform 2-D presentations in over half of the solutions they prototyped for usability studies [7]. Perhaps the problem lies in the fact that their platform lacks the presentation and interaction devices to really align the 3-D presentation with the human physical and psychological systems. A true 3-D visual presentation would better match the accommodative and vergence requirements of our human visual system for viewing objects in depth. Current presentation display systems, like a typical home or work monitor, don’t even attempt to accommodate such seemingly important psychological processes of the human eye [8].

The human brain allocates seventy percent of all receptors, over forty percent of the cortex, and some four billion neurons to the processing of our visual system which suggests making a symbiotic device for it is a complex matter [9]. Visual scientists have come to recognize that the human vision receptor field functions like two separate visual systems. The focal visual system gathers detailed information requiring rapid eye fixations to position the fovea while the ambient visual system only suggests changes in the periphery that may merit exploration [10]. The complexity of matching a physical display to the human visual processing system thus requires a serious amount of focus, especially when so many collaborative tasks have a visual component.

The human aural system is similarly sophisticated for considering a 3-D experience. Our human system relies heavily on localizing the sounds our ears access. As Durand Begault of NASA’s Ames Research Center suggests, the best way to envision better aural systems of the future is to better marry issues of psychoacoustics and engineering, where psychoacoustics means the consideration of the subjective interpretation of listeners to the object characteristics of sound [11]. A significant part of the engineering effort includes consideration of the unique ear print of each collaboration participant as studies have found a properly calibrated sound device makes a huge difference in perception fidelity.

Three-dimensional haptic devices in collaborative systems are a more recent addition. An added difficulty comes from the fact that rendering visual images and sound prints is a one-way street, while haptic devices are two-way [12]. And, “there are about 2,000 receptors in each of our fingertips whose only role is to gauge qualities like texture, shape and the ability to cause friction," says Mandayam Srinivasan, Ph.D., director of MIT's Laboratory for Human and Machine Haptics. "There may be even more sensors for gauging warmth or coolness, and for detecting mechanical, chemical or heat stimuli." [13]. Any kind of haptic device that we use to interface multiple human participants in a collaborative session must take input from users as well as deliver output through the same mechanism. Yet, having a shared sense of touch would be critical for collaborative sessions where participants are concerned with material texture and consistency in design discussions. As with human visual and aural systems, a significant part of the engineering effort would include consideration of each participant’s unique touch print.

Given a two-year timeline, human smell and touch information processing would most likely have to be ignored. Smell and touch are uniquely personal experiences and even though the University of Rome has developed an electronic nose and Alpha MOS America of Hillsborough, NJ has made claim to the first electronic tongue, both report results that suggest any general enough device is far off in the future.

Performance Requirements

The machine would attempt to do the best mapping of physiological and psychological sensual experience and share it via its messaging architecture over a reliable network. The visual system would have a minimal requirement of fifteen visual frames per second and yet attempt, via an engineered solution, to stay stable close to thirty frames per second. The aural system would only transmit (and thus receive) sounds between the frequencies of 100 and 10,000 hertz and cut off any sound below 10 decibels. Due to the sensitive requirements of touch, the haptic system would need to sample a thousand impulses a second and consider responding to force, vibration and temperature (with temperature perhaps the least necessary in a general purpose collaboration machine).

The intent and mental model processing components of the machine would be developed to help optimize use of the network given its 100-mbit/sec bandwidth limitation. Messaging would need to be engineered in order to share visual, voice, sound, haptic, and data channels based on a sophisticated priority algorithm that could also engineer the best packing of available networking packets for efficient use of their fixed size. The same models could help in making the trade-offs necessary in keeping each sensual interface running smoothly at its respective refresh rate (e.g. thirty frames per second for visuals).

A collaboration machine would take advantage of thousands of available and emerging algorithms that get the most out of the computing device, co-processors, and peripherals. The list of useful algorithms machine engineers could consider would include dead reckoning, level-of-detail, edge emphasis shading, and mip-mapping. And that list only includes popular algorithms associated with the visual display system. Sound and haptic algorithms emerge daily. In reality, the list of available algorithms to consider would be growing faster than they could be considered these days. Performance considerations would have to be coordinated with considerations of usability as results of usability might suggest where to make hard trade-offs in algorithm use.

Usability Considerations

When planning for usability issues in the design of a collaboration machine, an industrial engineer would be able to rely on the comprehensive literature of previously published results regarding systems attempting to provide a strong sense of immersion. Much literature is available for consideration of the usability of the visual display, audio, and haptics subsystems. The literature review would allow the project to stay on track with initial designs that allowed for verification of all the interfaces between system modules. Well before the time came for performing any experiments with physical collaboration machines, a simulator could simulate system throughput in software based on data recorded in collaborative sessions between co-located participants who did not use any technology to accomplish a series of useful collaborative tasks (building a three-dimensional puzzle for example). The simulation would let an engineer know what the theoretical best experience a user of the machine might experience given the expected number of participants, the peripheral devices, computing machine specification, messaging design, and bandwidth of the current designed system.

Perhaps with six months left in the project, or sooner if at least two physical machines were far enough along to provide a representative experience, a usability engineer could organize some overall usability studies. The usability team would have only worked with individual devices up to that point, confirming the literature as they progressed.

Whole system experiments would generate two groups of users. One group would perform collaborative tasks together in one location without the use of technology. The other would perform the same tasks distributed such that they could have no influence with each other outside of the use of the machine. Usability tests would run multiple samples of the tasks with different network lags representing distances from a few miles for users in the same city up to ten thousand miles for messages between participants.

The main hypothesis being tested would be that collaborative machine users could do the tasks as well as the control group. That hypothesis would be most cleanly tested via the experiments outlined above. One side hypothesis would be that the collaborative machine users could focus on task better than the control group. A test engineer would test that by recording eye and hand movements and comparing them to eye and hand movements of the co-located group. A related goal would be to verify immersion results found and published by others in the field. The original researchers who did the work would suggest those experiment designs.

The development team would use the results of usability studies to fine-tune the software and hardware involved in the collaborative machine system architecture. The iterative project management approach engrained in all development stage participants would allow changes to be made in parallel with planned tasks that had yet to be accomplished.

Other Possible Technologies

An industrial engineer would be very interested in a display device that allowed for a wide field of view and true 3-D pixel rendering. The visual display might take advantage of a modified visualization engine that made OpenGL rendering calls (in order to keep up with published OpenGL solutions and the iterative specification) to an understanding display. The system would employ the best 3-D sound system (hardware and software) the sound budget would allow, an extendible 3-D haptics engine, multicast networking protocols that managed channels wisely, and an intelligent packet streaming technology that would determine and organize the bits that actually streamed over the network.

Any collaboration machine solution would be prudent to incorporate the e2e (end-to-end) strategy of building the intelligence at the client and not within the network. Such a strategy makes sense for providing flexibility in the iterative approach, an approach that believes in unforeseen future opportunities in the use of any designed system.

Emergent Behaviors

A collaboration machine that actually aligned its behavior with the participant’s physiology and psychology identically as to when interacting with the physical world would create a mental flow similar to when not using the machine. As a result, people might find their mental process accepting a focus on the new geography of the digital cyberspace instead of the physical geography that is actually located just outside their sensual machine. They might completely lose track of the time of day and their current location on Earth.

The biofeedback cycle of participants communicating with the intent engine might lead to observed patterns where participants behaved in ways that were most consistently interpreted by the intent engine according to their liking. Participants’ actions might become more dramatic as they learned to draw attention to their conscious intent more explicitly. This could become analogous to the unique emergent behavior of using the mouse to drive a Web browser (a behavior where users are concerned about where the mouse pointer is at all times for efficient processing of their next desired action). As a result, participants might become aware of their own mind and body to the point of noticing aspects of behavior and memetic activity that they had never noticed before.

Network latency tends to create emergent behaviors as participants anticipate the fact their peers do not receive their transmissions immediately. Voice communication with latency would likely yield the same emergent behaviors as when old phone switching systems created delays in inter-country phone calls. Token passing might emerge among conscientious participants as they became hesitant to interrupt others in voice and action. Perhaps latency would be overcome by project’s end yet participants would still be anticipating lags.

When the collaboration machine’s behavior did not align with the participant’s physiology and psychology exactly, physiological changes might take place that attempted to retrain the brain to be a more effective user of the machine. New behaviors that emerged would depend on the misalignment and might likely depend on the participant’s willingness to have the system change their physiology (their attitude towards the machine when starting its use).

A Representative Budget

In order to emphasize the relative importance of the ideas proposed in this paper, a representative budget is provided below. A typical technology project budget consists of staff, benefits, travel, hardware, supplies, and miscellaneous. This budget was prepared based on University of Washington guidelines for a project to be funded from external funds (for example, via a grant from the National Science Foundation). If the $2,000,000 were being provided for a commercial venture, the overhead would represent the costs of providing a place to work and staff functions such as payroll, accounting, human resources, etc. The budget prepared below uses salary amounts for actual University of Washington employees and students involved in similar research roles as the line items suggest.

		2 YEAR CUMULATIVE
SALARIES
Senior Personnel
Director/Project Manager (3 mo/yr)		$135,920
Staff
Senior Programmer (3 mos/yr)		$101,286
Network/Messaging Specialist (3 mos/yr)		$105,336
Visual Display Specialist (2.5 mos/yr)		$84,406
Computer Engineer (3 mos/yr)		$78,657
3-D Sound Engineer (2 mos/yr)		$64,278
Haptics Specialist (2 mos/yr)		$64,278
Evaluation Manager (2 mos/yr)		$64,928
5 Research Associates (3 mos/yr)		$85,950
Program Coordinator - Admin (2 mos/yr)		$34,016
Students
Graduate Student (2 mos/yr)		$18,144
2 Undergraduate Students (part-time)		$20,000
Total Salaries		$857,199

BENEFITS
Professional Staff		$200,668
Graduate Student (s)		$2,123
Undergraduate Students		$1,940
Total Benefits		$204,731

SALARIES AND BENEFITS		$1,061,930

TRAVEL		$15,000

OTHER
Hardware		$225,000
Materials and Supplies		$9,000
Services (copying, printing,long distance phone)		$2,500
Grad Op Fee		$8,840
TOTAL OTHER COSTS		$245,340

TOTAL DIRECT COSTS		$1,322,270

INDIRECT COSTS (51.6% MTDC)		$677,730

TOTAL		$2,000,000

References:

[1] Collofello, J. et al, RADical Programming using software engineering techniques in computer science, http://www.hayden.edu/NECC/NECChandout.pdf (accessed January 20, 2003).

[2] The W3C, About the World Wide Web Consortium (W3C), http://www.w3.org/Consortium/ (accessed January 20, 2003).

[3] DiPaola S. and Collins, D., A 3D Virtual Environment for Social Telepresence, In Western Computer Graphics Symposium Proceedings, 2002.

[4] Ellis W., Source Book of Gestalt Psychology, New York: Harcourt, Brace and Co, 1938 (an English translation of Max Wertheimer’s Über Gestalttheorie, 1924).

[5] Pattanaik, S., Ferwerda, J., Fairchild M., and Greenberg D., A Multiscale model of Adaptation and Spatial Vision for Realistic Image Display, In Proceedings of SIGGRAPH ’98, 1998.

[6] Moore B., Glasberg B., and Baer T., A Model for the Prediction of Thresholds, Loudness, and Partial Loudness, Journal of Audio Engineering Society, vol. 45, No. 4, pp. 224-240, April 1997.

[7] Regan, T., Virtual Worlds Platform, http://research.microsoft.com/scg/vworlds/vworlds.htm (accessed January 18, 2003).

[8] Seibel, E., True 3D Displays, http://www.hitl.washington.edu/research/true3d/ (accessed January 17, 2003).

[9] Ware, C., What can Perception Tell us about Visual Languages and the Display of Information Structures?, Lecture at the University of Memphis, May 2001.

[10] Furness, T., Harnessing Virtual Space. In Proceedings of SID International Symposium Digest of Technical Papers, pp. 4-7, 1988.

[11] Begault, Durand R., 3-D Sound for Virtual Reality and Multimedia, Academic Press, Cambridge, MA, 1994.

[12] Berkelman, P., Hollis, R., and Baraff, D., Interaction with a Realtime Dynamic Environment Simulation using a Magnetic Levitation Haptic Interface Device, In Proceedings of IEEE International Conference on Robotics and Automation, pp. 3261 – 3266, May 1999.

[13] Delta Phi Omega, Delta Phi Omega Information, http://www.deltaphiomega.org/fyi.asp (Accessed January 18, 2003).