Text-To-Speech & Timeline & NET119 & Machine Translation

Objective

Create synthetic voice for a target language, using festival.

In speech some 10 ms of signal is considered periodical. So, window size is usually 40 ms, with overlap 8 ms, and seek range 15 ms.
Typically a vocoder parameter is extracted at every 5 ms.
So, periodical in 40 ms, parameter extraction in 5 ms.

Timeline

http://transfer.arcadia.co.jp/openupload/www/index.php?action=d&step=2

NET119

NET119 introduction: https://www.youtube.com/watch?v=PYb4p2wqP00

Playlist: https://www.youtube.com/playlist?list=PLD78C4Y4xMTfqC411fB4w_GKJDQXW58d8

Machine Translation

https://qiita.com/R-Yoshi/items/9a809c0a03e02874fabb
http://www.statmt.org/moses/
http://www.statmt.org/moses/?n=Moses.Baseline
- ssh vang.ext; cat /proc/meminfo: 32GB

Developing with Docker (Ubuntu 18.04 based, on top of Ubuntu 18.04 (VM))

This part is the Getting Started - Moses Installation which is done with docker in Prepare-docker-environment-for-developing-statistical-machine-translation
Prepare the git for Dockerfile, then put the corpus (training data) and the training result (desired translation model, fr2en, etc.) in sharedwks directory:
- git remote show origin
- mkdir ~/arcadia/smtdevenv; cd ~/arcadia/smtdevenv
git clone https://github.com/ivansetiawantky/smtdevenv.git .
mkdir ~/arcadia/smtdevenv/sharedwks
docker search ivansetiawantky
sudo apt-get install jq (Command line JSON processor)

docker pull ivansetiawantky/smtdevenv:latest ; docker images -a ; docker inspect ivansetiawantky/smtdevenv:latest | jq '.[0].Config.Labels'
docker container ls -a
Shift + F2 to horizontally create a new terminal session in the VM.
docker run --rm -ti --name mysmtdev -v $HOME/arcadia/smtdevenv/sharedwks:/home/smtdev/sharedwks ivansetiawantky/smtdevenv:latest byobu new
Control P + Q: detach from container
docker container attach mysmtdev
Confirm that /home/smtdev/sharedwks is shared.
Inside the container:
- cd /home/smtdev/localwks/sample-models/
  - Must be executed in this directory, because moses.ini specifies path of phrase-based translation table, language model, with respect to this directory.
- /home/smtdev/localwks/mosesdecoder/bin/moses -f phrase-model/moses.ini < phrase-model/in > out2
- diff out* : same!!!!

Now doing custom SMT building with dictionary available freely

This part is the Getting Started - Baseline System, where the example of building parallel corpus to do SMT is shown.
This part is also the 学習に用いる日英パラレルコーパスの加工 in https://qiita.com/R-Yoshi/items/9a809c0a03e02874fabb

First, try the Getting Started - Baseline System (fr2en)

This part is the Getting Started - Baseline System, where the example of building machine translation (mosesdecoder) based on fr→en (fr2en) parallel corpus to do SMT is shown.

Minimum s/w requirements are:

GIZA++ for word aligning parallel corpus
IRSTLM, SRILM, KenLM for (target) language model estimation. KenLM is the default. Good for 3-gram.

Base directory is $HOME in the Baseline system tutorial, but here we use below base directory.

The corpus and moses.ini (machine translation model from training result) must be located outside the container, i.e., to be put in the volume shared by the host and the container (need to confirm). This is to hide the proprietary corpus from the public domain Docker image.