SoooS: Multilingual Multimedia Message Board

Hannes Leipold *                                       Karan Singla*

As we continue to work, travel and share information between each other irrespective of geographical and cultrul demographics our conversations are becoming more multilingual.  However according to X,  95% people in the world don't speak more than 2 languages.  There are many social platforms like facebook, tiktok and rediit however they all offer limited multilingual conversation.


We propose SoooS: a multilingual multimedia message board. This message-board can be seen as new form of reditt which is free and open-source. Here a user A only speaking a language A, can send video/audio/text messages to a user B who only speaks a language B. 


It all started with a course project: Me and Hannes did together at USC, where we prototype this core functionality, below is the video.



Demo

Bridge API enables video-2-video translation. Four services have to run in a sequential manner (A -$>$ B -$>$ C -$>$ D) to create a translated video. However Backend can also run an individual service depending upon the user request.


Git repository:


https://github.com/ksingla025/V2V_translate



Figure above gives a high level overview of the this API. As discussed above Backend has access to a machine learning pipeline API which can translate a video in one language to another. However Backhand only calls the service which it needs for e.g: a user enters text, and now wants to see his text message translated to another language. While another wants his spoken video message in an another language. The BRIDGE API  enables four major services:


A: ASR(): Sends an audio file to Automatic Speech Recognizer server and  returns a string with recognized text.

B. Translate(): Translates a string from a source language to a target language using Machine Translation sever.

C. TTS(): Converts an input text string to a speech utterance into a user specified target language using Text-to-Speech server.

D. audio_video_overlap(): Uses original audio, translated audio and original video to create a new translated audio message.


Great News!! As an extension, Hannes submitted a proposal (Karan as Mentor) to GSOC, where we continue to develop this project as a part of RedHenLab.

Checkout Hannes's GSOC blog on this:  

https://hannes-leipold1.medium.com/gsoc-utilizing-speech-to-speech-translation-to-facilitate-multilingual-file-storage-and-networking-ef2fe9435934