Wednesday, March 17, 2010

The decode engine

Continuing in the inventory of research problems in secure natural communications.

The player experience has two components. Players physically send and receive bits of information and the semantics are understood naturally. The decode engine is the proxy for the player that allows the information infrastructure to interpret the user desire and allow for natural behavior.

The level of sophistication in the decode engine depends on the depth and breadth of capability. In the most basic level, we use today’s keyboard and display and coded commands and the decode engine barely exists. But this is not “natural”. So the decode engine must become sophisticated to enable the natural behavior at both the semantic and physical layers.

Research problems related to the decode engine

Core speech recognition algorithms. The most natural way that we communicate is by speaking. Speech recognition has made enormous progress but it is a very hard problem to get 100% so there will always be room for improvement.

Networked speech recognition. There are efficiencies to performing some speech recognition locally, at the device that is picking up the sound itself. This especially is the case if the microphone is in a room attached to a processing chip. However, if the microphone is tiny – a wearable – then there must be a way to find processing and stream the bits to the processor.

Handwriting recognition. Less generally used that speech recognition but a similar application and problem.

Gesture recognition. For some, gestures enhance the content of the message. For others, such as speech impaired, the gesture language is the language.

Natural language recognition. Another hard problem! A natural way to communicate is to express in a sentence – rather than a structured form – what function one wants to achieve.

Language translation. The Internet and World Wide Web are global. We need translation capability that provides the full power of the web to speakers of any language.

Searching unstructured data. We now know that search is a fundamental piece of the infrastructure.

Query by image content. There are images on the web. These should be searchable.

Query by video content. There is video on the web. This should be searchable.

Separating commands from data. As we communicate naturally we provide bits – either in text form, or speech, gesture, etc. Some of these bits are the content and some are the command structure. For text, we know how to separate the two. For more natural forms such as speech, we need to find approaches to make this separation as natural as possible.

1 comment:

  1. Multilingual Search Engine:
    http://www.drdobbs.com/architecture-and-design/223900083

    Separating commands from data:
    "For more natural forms such as speech, we need to find approaches to make this separation as natural as possible. "
    To solve this problem consider the following:
    1) When I use a notepad, I hit a Ctrl + s to save the file. How does the person using keyboard know this?
    2) "\n" new line character. Its not natural.
    3) Traffic light (Red and green)
    4) Gestures by policeman
    5) Language itself: We have given special meanings to symbols and attained some sort of standardization for this.
    6) If you are thinking of truths like 2 fingers +2 fingers=4 fingers. Then it is very possible for someone to look at the system a bit differently. One can be considering the space between two fingers as one. So the spaces in his hand would allow him to count 4. Now 2 fingers (1 space) + 2 fingers (1 space) = 3 fingers (2 spaces).

    So in my opinion its just a matter of standardization. When people get used to it, then it will automatically become natural.

    Now one can argue that we cannot have "\n" for Chinese. But, its actually getting handled.

    ReplyDelete