Voice Recognition Unit

Special Thanks: to Zoinkity, for sharing his research.

Note: I strongly recommend getting a more "professional" microphone. There are several options for under $20. The Samson R10S registered my words (adult male) about half the time. At this point I'm not sure if the problem is the VRU or the Microphone.

The Voice Recognition Unit may very well be the most complicated accessory available. It adds at least 5 new Joybus Commands some with sub-commands. Decoding all of these bits will take some time but I have some good data and I'm interested in feedback or others to challenge and disprove my assumptions.

At this time I'll point out, some documentation lists the Tx Length excluding the command byte. I specify the Tx value including the command byte because these are the values that are written to the PIF RAM.

At this time I have not identified the difference between the VRS and the VRU in it's responses, although the console can seem to tell.

Supported Commands

0xFF

The 0xFF command is Joybus reset command. See 0x00 for more the rest of the details of the 0xFF response.

0x00

The VRU responds to the 0x00 command with 3 bytes either:

  • 0x00 0x01 0x00
  • 0x00 0x01 0x01

The last byte equal to 0x01 means that the VRU is fully initialized and can enable the microphone and attempt a word recognition. During initialization the status of this byte changes between 0x00 and 0x01, see the 0x0B for additional comments.

0x01

In my captured communication the 0x01 command was sent multiple times to the VRU. It was never responded to by the VRU. I point this out as a situation where some developers expect the VRU to respond with an indication of an error, but it literally does not respond.

0x09

0x0A

At this time I have some theories about the 0x0A command being used to Load the word list to the VRU, but I cannot confirm this is true or the format of this data.

0x0B

This command is still theory but does appear several times in my captures, including as the last step of initialization and again immediately after some other commands. Up to this time there has been no parameter data only 0x00. My theory is the Dictionary mask command 0x00 means no word masking.

0x0C

I've found at least 2 'usages' of the 0x0C command there are 6 byte parameter positions and only 2 bytes with data identified so there are probably other unidentified usages.

This 'zero' command occurs during initialization so may serve a different purpose than the following command which I'm much more clear on the usage. At this point I'm assuming that there is a 0x00 command that takes a 0x01 as a data parameter.

This command is the primary command of the Clear Dictionary process. The value 0x34 tells the VRU to expect 52 words for the Voice Recognition Dictionary. The 0x02 is probably the sub-command to clear the dictionary.

0x0D

This is the command that I feel I've got the most real facts about, still some assumptions and unknowns.

First a couple of examples:

This gets a little more interesting when viewed as a bit pattern

After running this with multiple values it seemed to break down into specific pattern

For example:

The sub commands identified so far are:

1100 0000 = Analog Gain Values 000 = 0 and 100 = 1

0001 0000 = Digital Gain, Values 000 = 0, 001 = 4, 111 = 7

Unidentified 0x0D sub commands

1111 0000 = ??, Value 000 = 0, occurs during init

0111 0000 = ??, Value 011 = 3, occurs during init

0100 0000 = ??, Value 000 = 0, occurs during init

1011 0000 = ??, Value 010 = 2, occurs during init

0001 1000 = ??, Value 000 = 0, occurs during init

0000 0000 = ??, Value 000 = 0, occurs after Push To Talk has been released. Maybe query how many matches

Theory

One of these 0x0D sub commands that occur during init may be able to modify the pitch used to detect voice, thereby letting an adult play the games.

To confirm that the last 5 bits are a CRC, this would be identical to the CRC used for Controller PAK's even the same position in the Tx bytes, but even longer commands for the VRU don't seem to use a CRC check.

The following is the initialization sequence for the VRU, although there are other 0x00 and 0x01 commands that go across the wire, I don't believe they are part of this process. The Console sends the commands and the VRU responses are shown as nested steps.

  1. One ms (1/1000th of a second) LOW (Unsure if this is Console or VRU driven)
    1. US delay's 50 ms, JP delay 0.5 ms.
  2. 0x0D 0x1E 0x0C
    1. 0x00
  3. 0x0D 0x6E 0x07
    1. 0x00
  4. 0x0D 0x08 0x0E
    1. 0x00
  5. 0x0D 0x56 0x18
    1. 0x00
  6. 0x0D 0x03 0x0F
    1. 0x00
  7. 0x0C 00 00 00 00 01 00
    1. 0x97
  8. 0x0B 0x00 0x00
    1. 0x00 0x00 0x00



VRU Reference:

https://pastebin.com/ajLzRLze by Zoinkity - USA VRU Sound Indicies (BEST source to Emulate)

https://pastebin.com/TcsfwpSM by Zoinkity - VRU Commands

https://pastebin.com/ajLzRLze by Zoinkity Pronunciation of US VRU

https://pastebin.com/6UiErk5h by Zoinkity - VRU Commands and Communication

https://pastebin.com/5Fr9G36N by Zoinkity VRU US Raw Capture

https://pastebin.com/rNe7CQNf by Zoinkity VRU - JP Raw Capture

https://pastebin.com/JWwSVUS7 by Zoinkity - Hey You Pikachu Recognized Words

https://pastebin.com/kzMQBeTq by Zoinkity - Recognized words in Pikachu Genki de Chu

https://pastebin.com/z7qbnviN by Zoinkkity - Densha de Go 64 VRU recognized words

Zoinkity - pastebin site : https://pastebin.com/u/Zoinkity

VRS Reference: https://tcrf.net/Densha_de_GO!_64/en#VRS_Test

VRU Emulation Forum post: http://www.emutalk.net/threads/55279-Hey-You!-Pikachu-Possible-HLE-Implementation