This project is largely based on the research done at the University of Taiwan, seen here.
Timeline:
- August 31: Brainstormed and decided on the DJ Project, whose goal is to create a 4 to 8 bar region of generated music between two songs to foster a smooth transition from one song to the next, such as a professional DJ / audio engineer would do.
- September 7: Outlined and mapped out the project, built in Python 3.6. There are three main parts:
- Sound Select: The DL algorithm selects samples to use in the generation, it does this by mathematically predicting what the sound will sound like using the fourier transform, Parseval's theorem, frequency, energy, and so forth. It also can generate simple synths by stacking and filtering waves.
- Rhythm Select: The part of the program largely focuses on graph theory. Using a simulated annealing (SA) algorithm, Shimi searches for a few paths that has an optimal scores according to her scoring protocols -- which are facilitated by rule-based music theory programs, human trained ML algorithms and neural networks.
- Harmony Select: The last section of the DL algorithm focuses largely on dynamics and balance. Using MIR techniques from the Python library Librosa, Shimi will be able to beat-stretch rhythms and modulate keys in order to create a few musical phrases that smoothly transition one song to the next. It will also decide which compression ranges it wants to bring out within the selections and which it wants to reduce (i.e. bass for hip-hop).
- September 14: Focus on the deep learning algorthim, and reading many papers in hope of replicating their code to generate a working DL algorithm. Created a Python Pitch object would should be one of the contributions to the Sound Select focus area.
- September 21: Looked into rule-based systems and began programming MIR and rule-based systems into the existing code base, looking to integrate Shimi's motor functions (unsuccessfully) into the software that I am creating. Since the DL is still very complex and difficult to figure from an existing code base, the rule-based system is a much easier start. In particular, looking into Shimi's rating protocol, where it finds chord progressions and chord tones which are "stronger" than others, as well as began coding a music-centric SA tool.
- Goal for Oct 5: Integrate and fully use and understand Librosa and all of its tools, since the music theory is at its core the fundamentals of this project. Also, use this data on two songs to make output using PyAudio.
- October 5: Continue work from September 28 on the Python code. In the meantime, I set up a libary of percussion samples for future use in Sound Select. Python can read the samples, but it cannot output them.
- October 11: Reached out and met with Dr. John Barry of the school of ECE's Digital Signal Processing department. His recommendation on behalf of the signal mathematics of the project is to look into the "stereo" part of the .wav sound samples. By iteratively going through the panning, it should be possible to isolate particular sound samples, which could be vary valuable in music generation. He also recommended I look into speech processing and that area of DSP to find more specific answers.
- October 12: Move away from generation and generation techniques and move towards fading in and fading in out. Use deep learning to look at spectrogram images to find the optimal point to fade in / out.
- October 19: Matlab script looks through a folder for every .wav file in the folder, then compare it to every other .wav file. For n songs, this gives a total number of transistions as (n)(n-1). Once matlab gets the data, it should be moved into python to run through the neural network.
- October 26: Spectrogram data is fuzzy. Working on debugging the spectrogram pictures. Researching more on convolutional neural networks to find one that is suitable. The original Taiwanese CNN does not seem fit for this purpose and the code is much more difficult than some of the other open source CNNs. Begin on the 3/4 presentation.
- November 2: Matlab script is mostly operational, only thing left is to go through and individually label the data points. For the 3/4 presentation, I will use 8 samples, for a total of 56 data entries as the ground truth (n)(n-1) . For the end of semester project, I hope to use around 500 samples for a more accurate transistion. Begin work on a better demo that uses and integrates deep learning, for now, it is not necessary to beatmatch, as they may disturb the spectrogram analysis.
- November 6: Matlab script is in the clear. Labeling the dataset after this Wednesday. Decided on using the open source CNN Keras.
- November 9: 3/4 presentation. Work on a bigger dataset, accuracy is an issue!
- November 16-December 6: The last part of the semester is largely the same. I spent time researching convolutional neural networks, understanding what every operation in the one I designed does. In particular, I looked at a few lectures from Stanford and Georgia Tech about deep learning. Other than that, I spent time labeling data for the CNN. The goal for the end of the semester is around 500 labeled spectrograms.