2/10/19
- Slowly reading through paper (https://hal.archives-ouvertes.fr/hal-01578292/document), and found equation that I think will be helpful when creating a cost function for our model
- Similarity between 2 chords A and B: S(A, B)
- Make chord into a 12-dimentional vector where index 0 is C and index 11 is B (and all notes in between)
- If the note is being played, put a 1 in the vector, otherwise put a 0
- According to paper, optimal value for C is 10 (can do experimenting to determine best value)
- This is mainly used for aligning the scores (Needleman-Wunsch Algorithm for optimal alignment)
2/14/19
- I found the midi parsing and score alignment algorithm algorithm written by the researchers. The main algorithm is written in C and is highly optimized, meaning that the alignment of scores runs quickly.
- I modified their code in order to run for python 3 as well as changed some of the intern utils in order to better fit our project, and the further modifications we will need to make to their code in the future
- Their algorithms are most likely going to remain untouched as they are way faster than anything we could make, however, a lot of the package structure their methods of running the code does not fit our needs too well, so that will have to be changed more in the future.
2/19/19
- I have been understanding the midi-parser that was written. It fits every aspect that we need. How the parser works:
- Each quanta of time (as determined by the user) contains a 128 dimensional vector representing all the notes being played at that moment. I'm not too sure the range that the piano falls in, but every single change in index represents a half-step change (up or down respectively)
- The quantization of the parser determines how many subdivisions a quarter note should be split into
- (From what I understand) the numerical value of the 16-bit integer value in the vector represents the volume of the note (which I believe is combined with velocity from the midi)
- Nice visual representation files can be generated for these vectors
- In order to view the HTML, make sure to use Firefox because Chromes cross-site scripting auditor (XSS) won't allow d3.js to load otherwise.
- An image of a slice of a piano score with quantization 8 is shown below
- I edited the parser by increasing the reusability of their code for us, as well as allowing the code to run on any machine (all that has to be done is for the cython file that contains the alignment code to be compiled)
- Note on the alignment: The algorithm was able to align all the files in the bouline directory, however, a segfault is being thrown when I try and do the same for the spotify folder. I am not sure whether this is just a simple permission issue, or whether something is wrong
3/17
- Worked on actually getting RNN to work
- Want to do an Encoder-Decoder model, but it very difficult considering the complexity of the data
- Made some changes so that the RNN would work
- Modified the data so that the orchestration was a single 9000 dimensional vector
- The RNN is able to output an orchestration, but many melodic features are missing, and it is very clustered
- The rhythm transfers from the
4/15
- New idea for how to make the RNN
- Have there be multiple networks, with a network for each instrument in the orchestration
- After training, the new model has a much cleaner output than the old model
- I haven't made the classifier to determine which instruments are needed in the final orchestration, but that shouldn't be too difficult to implement
- I still want to create an encoder-decoder RNN as I think that would be the best way to "translate" the piano to the other instruments