Dialogue Robot Comp.

Q & A

About regulation

Q: Can I modify the CSV file of the tourist spot database provided?
A: It is possible to modify it at a level that can be automated (so that it can respond even if unknown tourist destination information is given in the preliminary and final rounds).

Q: Is it possible to use a sensor other than the one provided?
A: As a regulation, participants may prepare their own recognition program, so it is okay to use sensors other than cameras and microphones. However, it is necessary to bring your own items to the preliminary and final round venues.

Q: The regulation says that the conversation will end automatically after 5 minutes. Do we need to make the robot say something like "5 minutes have passed, please go home"?
A: After 5 minutes and 30 seconds have passed since the start of the dialogue (30 seconds is a margin for the case where the start process takes time), the participants' programs will behave as if the robot ends the dialogue (e.g., "I'm sorry on the way, but it's time to finish the guidance".). (A sample program that detects that the time counted by the robot system has passed 5 minutes and 30 seconds is provided).

Q: Can I access the Internet during the actual production to search for tourist spot information and transfer information?
A: It is possible to access the Internet and search for tourist destination information and transfer information if it is developed so that it can respond even if unknown tourist destination information is given in the preliminary and final rounds. The organizer will prepare the Internet connection environment (assuming mobile Wifi) required to use the cloud voice recognition system.

Q: Is it okay to search for facilities around the destination in the preliminary and final rounds?
A: If it is developed so that it can handle unknown facilities, it is ok.

Q: Does the recommendation need to consider regular holidays? For example, the robot can provide a negative information about a non-recommended candidate by saying "it is not open today". Is that okay?
A: We will ask the customers to talk with the robot with a mind of "deciding where to go if you visit on vacation", not "deciding where to go today." Therefore, it is not necessary to consider whether the tourist spot is closed on the day of the preliminary or final round.

Q: Is it necessary to change the response depending on the weather? (Indoor facilities are recommended when it rains, etc.)
A: As with the above, there is no need to consider the weather on the day of the preliminary or final round.

Q: Is it okay for the system to decide the recommended location even during dialogue?
A: The recommended location will be decided before the dialogue starts.

Q: How is the recommended location for the system determined?
A: Of the two sightseeing spots selected by the cusormer, one is randomly determined as the recommended sightseeing spot.

Q: How do I receive information on the two tourist destinations selected by the customer?
A: The two locations selected by the customer and the recommended locations randomly selected from them are recorded on the server where the robot system is located, and the participant's program accesses that server immediately before the dialogue starts and obtains this information. A sample program for that purpose will be provided to the participants.

Q: During the dialogue, the monitor shows images of two tourist spots selected by the customer, but can the robot know whether the images are displayed on the left or right?
A: Before the start of the dialogue, the two tourist destination IDs (JSON format) selected by the customer are to be acquired from the server on the program developed by the participants. The tourist spot with the first ID is displayed on the left side of the screen, and the tourist spot with the second ID is displayed on the right side of the screen.

Q: Is it ok to improve the system on the day of preliminary round? For example, looking at the customer, I may think it is better to correct the robot's utterance or to change the way the robot moves. In that case, is it ok to correct the movement program during the preliminary round?
A: It is ok to fix a fatal bug in the system (the program will drop), but other improvements are not possible.

About the customers

Q: I think that children are often interested and bring their parents. Wouldn't it be a problem if you decided whether or not to let children with parents join as customers in the preliminary round? If so, it seems that there is someone inside who makes the child quiet. Also, if the customer does not wear a headset, saying "quietly" in the waiting seat will affect voice recognition.
A: If it is possible to keep the child waiting quietly (the staff will check), we will ask them to join.

Q: What do you do with your family? Will younger children be with the customer?
A: The dialogue with the robot will be done one by one. If it is possible to keep the child waiting quietly (the staff will check), we will ask them to join

Q: Regarding the participation of younger persons, should we assume the participation of kindergarten children and elementary school children?
A: It is assumed that it will be difficult for elementary school students to talks with the robot in this task, so we will make it possible for junior high school students and above to join. Please assume that the customers are a junior high school student or older.

Q: How many people will be customers at one time? (If you come with a family of three, do you deal with three people?)
A: Even if multiple people come at once, one person will talk with the robot at a time.

Q: (Concerning the above question) Does it mean that the same family can talk with the robot in a row (for example, mother -> child 1 -> child 2)? In that case, does the previous dialogue influence on the subsequent dialogue? For example, can you think of conversational concerns such as "Did you recommend this to your mother a while ago?"
A: It is assumed that the same family will talk with the robot in a row. However, the dialogue of each customer is independent of the previous content. The staff tells the customer that each person's dialogue is independent (just because the previous customer is a friend, the content of the previous dialogue with the robot is not passed on to the next dialogue)

Q: Is the dialogue between the robot and the customer visible to other people? Maybe some people find it difficult to talk in situations that are seen by others? Also, if you have trouble answering during the dialogue, you may have to assume that you will turn your eyes to other acquaintances. In terms of experimental design, it is desirable that the subject and an outsider are isolated.
A: Customers will interact with each other knowing that they are being seen by others. Some people may find it difficult to interact, but the situation is similar in actual stores. In this competition, we are assuming that we will try the dialogue system empirically, so we will do it in the situation where the customer is seen by the waiting person.

About evaluation method by customers

Q: Is it possible for one customer to talk to multiple teams?
A: Limit one customer from interacting with robots of the same team's robot multiple times. Since talking with multiple teams is not restricted, some customers may talk with multiple teams' robot.

Q: When the customer interacts with robots of multiple teams, are the tourist destination candidates selected first in the dialogues of the second and subsequent teams?
A: At that time, we will ask the customer to list the tourist destination candidates that they wants to go to, so it is thought that some people may change it in the second time. We don't ask the customers so that they select the same candidate site as the candidate site selected for the first team, even for the second and subsequent teams.

Q: If the participating teams bring customers, I feel that it is possible to control the customers' behavior. Is that okay?
A: It is restricted that the participating teams bring the customers.

Q: Not only the results of the questionnaire such as satisfaction, but also whether or not the tourist spots recommended by the randomly selected system were selected. But the dialogue time is as short as 5 minutes, so does the participants' original preference bias affect the selection?
A: As you pointed out, there is a possibility that the customer's original preference affects the selection. As a countermeasure, by increasing the number of customers' evaluations to some extent, the influence of the original bias of preference will be reduced.

Q: How much information does the customer see when choosing two tourist destinations?
A: The customer experiencer sees only the names of the six tourist spots and one photo of each, and selects two of them.

Q: How much do you explain to those who want to talk with the robot? Do you even explain that if the tourist spots recommended by the robot match the tourist spots that the customer decided to go to, the evaluation will be high?
A: We will not explain that point. For those who wish to talk with the robot, the staff will tell to the customer to select two desired sightseeing spots, decide the sightseeing spot to go to while consulting with the robot, and answer the impression evaluation questionnaire after the dialogue.

About operation

Q: Is there an opportunity for team building between individual participants? Also, is it possible to gather team members from others and participate after applying individually?
A: We will set up a place where teams can exchange information with slack. You can also apply individually and then gather members to form a team.

Q: Is it possible to change the team name when applying individually and later changing to participation by multiple people?
A: It is possible.

Q: How many people are expected to participate in one team?
A: A team of one person is ok. Depending on the complexity of the system to be developed, it is possible that there will be more than 2 to 3 people.

Q: How many teams do you expect to participate in the final round?
A: We consider 30 to 40% of the total to be in the final round. It may increase depending on the number of participating teams and the evaluation of the preliminary round.

Q: Can I have someone who has not participated in the competition try the system under development?
A: It is ok.

Q: Participants will announce the development details at the final round, but can the teams that lost in the preliminary round continue development until the final round?
A: It is ok.

About preparation for participation

Q: Do I need two computers, Windows and Linux?
A: Linux is necessary for image processing such as facial expression recognition of the dialogue partner. Other programs can be run on a single Windows PC. Since it is at least possible to configure a dialogue system with only voice recognition and robot control, one Windows PC is sufficient as long as there is no image recognition.

Q: When will the program be distributed after applying for participation?
A: After applying, please submit a pledge regarding the use of software and tourist resort data. We will give you the program as soon as you receive the pledge.

About participation in the preliminary and final rounds

Q: Regarding the hardware setup, the microphones are not prepared and do the participating teams need to bring their own microphones? How can I set up the microphone at the counter and connect it to the participant's PC?
A: The organizer has prepared a microphone for voice recognition system, and the connection between the microphone and the PC on which the voice recognition system is running has already been set. However, with this regulation, it is possible for participants to prepare their own voice recognition system, and if they prepare their own, bring a microphone and a PC to run the voice recognition system and set up the microphone at the venue.

About robot system

Q: Is it possible to develop everything in python only?
A: It is possible. Since the recognition processing results are acquired from the provided program and the control commands are sent to the robot by socket communication, it does not matter which OS and language the program is developed.

Q: When I watch the demo video, the android sometimes blinks (it looks a little scary), but is it possible to adjust it?
A: It cannot be adjusted by the user, but it will be adjusted by the organizer. However, due to hardware restrictions, there may be an upper limit to the blinking speed, or you may feel glaring with the current settings that prevent the eyelids from closing completely.

Q: You said that you can move your hands on the android "Erica", but are hand movements out of scope in the competition?
A: In the competition, "Android I" will be used. This android cannot control the movement of the hand, so the movement of the hand is out of scope.

Q: What can you control with the development of this dialogue android?
A: You can control conversation content, voice parameters, facial expressions, line of sight, and neck movements.

Q: Is it possible to mistakenly recognize the robot's own utterances?
A: At the preliminary and final round venues, we will use a close-talking microphone or a microphone array so that the robot's own utterances are not mistakenly recognized.

Q: Is there a simulator?
A: We have prepared a software that allows you to simply check the behavior of android with CG.

Q: Can you recognize the Kansai dialect?
A: The program we provide uses Google's voice recognition. It seems that Kansai dialect recognition is possible to some extent with Google's voice recognition.

Q: With voice recognition, do I have to detect the start and end of utterances with my own program?
A: Yes. When the speech recognition program starts recognition, it sends the result (interimresult: ***) sequentially. After judging that the utterance is over, the final result (result: ***) will be sent. Receive all of these and decide the beginning and the end.
<Example: Results are obtained in order from the top>
interimresult: 今日
interimresult: 今日は
interimresult: 今日は
result: こんにちは
confidence: 0.8813719153404236

Q: What is the recognition time delay (when it works)?
A: It depends on the length of the utterance you want to recognize. If the utterance is short (less than 1 second), there will be a delay of 300 miliseconds, and if it is longer than that, a delay of about 500 miliseconds will occur.

Q: What value is returned when speech recognition fails?
A: It depends on the definition of failure of speech recognition result.
--Confidence is obtained in the final speech recognition result (the closer the value is to 1, the higher the reliability is, and the closer it is to 0, the lower the reliability).
--If no speech recognition is performed: No output is obtained.
--When the final result is not output: If the voice is too quiet or too short, the final result may not be sent after the interim result is sent. If the result is not sent within a few seconds after receiving the interim result, the user should judge that it is voice recognition.

Q: Isn't the unnatural Japanese automatic correction done?
A: Google's speech recognition seems to correct some unnatural Japanese.

Q: Is it possible to get N-best for the voice recognition result?
A: It is not possible with the distributed software. However, since participants are allowed to prepare their own recognition programs, it is possible for participants to prepare their own speech recognition programs that can acquire N-best.

Q: Is it possible to measure the position of the face on the world coordinates when using the position of the face of the customer?
A: The distributed software does not include the function to obtain information on the position of the face on the world coordinates. In the sample software distributed, it is assumed that the customer's head position is 1.5 m in front of the robot and 1.2 m in height. It is possible for participants to install their own sensors such as Kinect that can measure three-dimensional positions and prepare their own recognition system.

Q: Is the time required for blinking built in without user control? It felt a little long in the demo video.
A: The user cannot adjust the time required for blinking, but the organizer will adjust it. However, due to hardware restrictions, there is an upper limit to the blink speed.

Q: Is it possible to control how the eyes are turned down?
A: The control method changes depending on the situation in which the eyes are turned down, but it can be adjusted. If the upper eyelid is open, you can adjust the facial expression, and if it is the direction of the eyeball, you can adjust the position of the line of sight.

Q: How much can you adjust the speed of speech? For example, is it possible to speak almost twice as fast?
A: It is possible. However, if you speak too quickly, the movement of the android's lip may not catch up.

Q: Doesn't it move at all unless I operate it with a program, except for the movement of the lip when blinking or speaking?
A: In addition to blinking and lip movements, the head automatically moves according to the utterance when the android speaks, and unconscious human-like movements such as postures and fluctuations according to emotional states are performed. Therefore, basically, just by setting the emotional state and letting the android speak, the system will automatically move the android like a human being.

Q: Is it possible to perform head movements such as nodding and tilting the neck? What kind of command will dialogue control issue if possible?
A: Nodding and squeezing movements are implemented in a format that reproduces predefined movements. Therefore, the speed and size of movement are constant. For the command, specify the gesture name such as "playmotion nod \n".

Q: You said that the task starts when the customer is sitting in a chair. Does it mean that it is necessary to estimate from the sensor information whether or not the customer is sitting?
A: It is not necessary to estimate from the sensor information because it is assumed that the participant confirms that the customer is sitting in the chair and starts the participant's program. You may develop a program to estimate from the sensor information and start automatically.

Q: How many facial expressions are defined for robots?
A: Currently, there are fullsmile, bad, angry and MoodBasedFACS.
-- If you specify fullsmile, bad, or angry as the facial expression, you will get a fixed facial expression regardless of your emotional state.
-- If MoodBasedFACS is specified as the facial expression, the facial expression will gradually change to match the emotional state (Arousal, Valence, Dominance). After specifying MoodBasedFACS, it is possible to express a facial expression that is more suitable for the situation by changing the emotional state. (reference)

Q: Is the software included in the program that displays objects such as desks on the simulator?
A: Yes. It contains. Start MiracleHuman.exe, select the robot, and then start the CreateDeskMonitorHuman.bat batch file. Alternatively, use Miracleforrobotcom.bat. To run Miracleforrobotcom.bat, you need to wait a few seconds after starting MiracleHuman.exe and execute CreateDeskMonitorHuman.bat. In this case, select the robot within a few seconds.