Thursday, March 25, 2010

Core infrastructure for secure natural communications

As opposed to the other key constructs, designing the core infrastructure is not a fundamental research issue but rather requires effective systems engineering. Given the design parameters that can be foreseen for secure natural communications – what is the best network design, how much capacity is needed, where to place concentrators, etc?

Today’s technology suggests the following general design about the core infrastructure. We imagine a world which is rich in sensors for both input (e.g. camera, speech) and output (e.g. displays). These are small and cost-effective hence they can be ubiquitous. They are in every room in every building and pervasive in outdoor malls and even in roadways. They are equipped with GPS so we know location and the sensors can communicate.

To achieve the natural communications and to connect these pervasive sensors required untethered devices with extensive coverage by wireless networks. But wireless cannot solve it all. Since maximum bandwidth is achieved through fiber optics we assume an extensive high bandwidth fiber backbone.

Key systems engineering issues

Input sensor design. We imagine that people speak naturally and have their speech, and ultimately their gestures sensed by the infrastructure and operated upon. How will the physical sensors be designed? In order to get reasonable noise-free reception – do they need to be worn on clothing very close to the speaker? Can those be made natural enough, sufficiently low-cost, and powerful enough in a small footprint? Or will rooms be designed with acoustic systems and cameras to pick up on sounds and motion?
Output sensor design. Players not only emit bits, but they also receive them. What is the overall design of the output system? Are there screens everywhere that people gravitate to? How much is visual and how much is spoken? What is the trade-off between personal devices and devices that are built into the infrastructure?
Local wireless network. Today there are wireless networks with different design points for the home, office, mall, and airport. If we had a uniform assumption about the desirable network infrastructure: a high bandwidth fiber backbone and ample wireless networks to reach into that backbone, what would the wireless piece look like? What would it cost to configure? Do we have the right protocols designed for that set of assumptions? Spectrum?
Spectrum. How should spectrum be allocated to balance all of the wireless needs between local area, metropolitan, and wide area?
Wide area backbone network design. Similar to the wireless network observation. We have fiber networks today. Let’s create a specification to understand the needs of tomorrow. What would represent sufficient bandwidth for foreseeable needs? How would we build such a network? Cost?

While core network design is principally about systems engineering - improvement in core communications research technology will also improve the core network. Here are some of these core research areas motivated by the desire to create a core infrastructure for secure natural communications.

Sample core infrastructure research issues

Fiber componentry. Getting more bits per second through a fiber link.
Fiber network architecture. Designing the layout of fiber multiplexors and processors to improve network bandwidth
Nanotechnology. Continued miniaturization is critical for sensor technology
Wireless contention algorithms. Better signal processing to get more bandwidth out of limited spectrum
Multi-media. How to utilize network bandwidth to address quality of service needs for disparate traffic patterns such as voice, data, and video.

Wednesday, March 17, 2010

The decode engine

Continuing in the inventory of research problems in secure natural communications.

The player experience has two components. Players physically send and receive bits of information and the semantics are understood naturally. The decode engine is the proxy for the player that allows the information infrastructure to interpret the user desire and allow for natural behavior.

The level of sophistication in the decode engine depends on the depth and breadth of capability. In the most basic level, we use today’s keyboard and display and coded commands and the decode engine barely exists. But this is not “natural”. So the decode engine must become sophisticated to enable the natural behavior at both the semantic and physical layers.

Research problems related to the decode engine

Core speech recognition algorithms. The most natural way that we communicate is by speaking. Speech recognition has made enormous progress but it is a very hard problem to get 100% so there will always be room for improvement.

Networked speech recognition. There are efficiencies to performing some speech recognition locally, at the device that is picking up the sound itself. This especially is the case if the microphone is in a room attached to a processing chip. However, if the microphone is tiny – a wearable – then there must be a way to find processing and stream the bits to the processor.

Handwriting recognition. Less generally used that speech recognition but a similar application and problem.

Gesture recognition. For some, gestures enhance the content of the message. For others, such as speech impaired, the gesture language is the language.

Natural language recognition. Another hard problem! A natural way to communicate is to express in a sentence – rather than a structured form – what function one wants to achieve.

Language translation. The Internet and World Wide Web are global. We need translation capability that provides the full power of the web to speakers of any language.

Searching unstructured data. We now know that search is a fundamental piece of the infrastructure.

Query by image content. There are images on the web. These should be searchable.

Query by video content. There is video on the web. This should be searchable.

Separating commands from data. As we communicate naturally we provide bits – either in text form, or speech, gesture, etc. Some of these bits are the content and some are the command structure. For text, we know how to separate the two. For more natural forms such as speech, we need to find approaches to make this separation as natural as possible.

Wednesday, March 10, 2010

The player's physical experience

I apologize that this week’s posting was a bit delayed. I became somewhat busy with a new position. Check out http://www.w3.org/2010/03/ceo-pr.html for details.

We have been wandering through the 5-6 key constructs for secure natural communications. One of my key objectives is to enumerate research areas that require more effort to achieve our goal. Comments are invited on current status and research progress.

If the player’s semantic experience is one of higher-level semantics of unlimited breadth, the physical experience deals with a more specific set of modes of physical interaction that a user might have with the infrastructure. For each mode, the challenge is the same. The player has some means of physically expressing themselves. What is most “natural” depends on the player and depends on the application. But in any case, there are only a limited number of methods that are used.

Research problems related to the player’s physical experience

Sensors. To obtain a greater degree of naturalness there will be many core sensors in the infrastructure. These will range from traffic sensors on highways to cameras to speech sensitive devices. Research is required to improve the cost, shrink the size, and blend into the environment.
Enabling everything for secure natural communications. Further reductions in costs for RFID’s and GPS will include more items in the intelligent network and have their location trackable.
Speech sensors. An important set of sensors. Microphones are available as part of the infrastructure so that people’s speech can be the primary input device of their intentions. Improved noise and echo cancellation for the overall system is required.
Wearables. A key design point for small sensors; particularly speech capture is a wearable sensor. This requires research in nanotechnology to better embed this capability into garments.
Video, gesture capture. This will enable a richer interpretation of the user’s intent, and a richer understanding of what is going on at target locations.
Other methods. These would include different forms of keyboard, signing (for speech impaired), etc.

Tuesday, March 2, 2010

Inventory of areas related to the player's semantic experience

Review of purpose of this blog

I’ve described the concept of Secure Natural Communications, why it is broadly achievable, and the benefits. I’ve described 5-6 key constructs: the player’s semantic experience, the player’s physical experience, the decode engine, core infrastructure, information intelligence, and advanced applications.

To move forward requires an inventory of the technical areas that will require more development to perfect secure natural communications. We rely on numerous technologies – all of them are good enough to get started; most of them would benefit from further enhancement. So after we inventory these areas, we will need to assess our current level of capability in these areas and then create a roadmap to improve capability.

Today’s blog begins the inventory.

Technologies related to a player’s semantic experience

Today’s technology already has many of the components needed to provide a natural semantic interface for the player. Each will be enhanced further and customized to provide secure natural communications. Here are some of the areas that require attention.
Networked consumer devices. There already has been tremendous progress in making consumer devices for music, video, telephony, and computing to be more universal (capability) and networked. Increasingly, their user interfaces will adapt to the interfaces that people find most natural. This progress will continue. More focus will be necessary on standardization and integration.
Multiple communication threads. Inherent in the notion of group communications is that a person will be involved in multiple communications threads simultaneously. For the player’s semantic experience the infrastructure must be able to surface these multiple threads to people irrespective of how they interact with the infrastructure. We have some technologies to do this for a person sitting at their display. More challenging will be if they are using shared video screens in the infrastructure, or if they are trying to communicate naturally without devices.
Higher-level semantics. This is too large an area to deal with comprehensively in this paper, but ultimately it is the broadest part of secure natural communications. This paper limits its focus to infrastructure common to all applications – not the semantics of each application. However, each application area requires its own standardization on semantics. Example areas are:
o Non-communications infrastructure intensive. Examples include numerical applications, devtest, ERP, and database.
o Base-level communications infrastructure. Examples include Web, file, and transaction processing
o Applications that require their own infrastructure. This is the most interesting area for higher-level semantics. There are some applications that are so communications intensive that they require their own sophisticated semantic infrastructure to be provided by an overall infrastructure. These would include social networking, collaboration, and virtual desktop.