I developed a new hyperbolic knowledge graph embedding model using orthogonal projections. Hyperbolic embedding has strength in representing hierarchical data, while orthogonal projections' strength lies in the representation of non-1-to-1 relations; I combined these two techniques to build a new knowledge graph embedding model. This model outperformed previous hyperbolic models in link prediction tasks by 10%. In addition, the Lorentz hyperboloid model represented the data better than the Poincare ball model with its more straightforward closed orthogonal projection function, which brought fewer rounding errors. This work was partially done during my Samsung Advanced Institute of Technology internship.
Recently, I developed a hyperbolic embedding of temporal knowledge graphs (knowledge graphs that vary over time) using a hyperbolic version of recurrent neural networks (RNNs). This research investigates how hyperbolic space, compared to Euclidean space, is advantageous in representing the chronological and hierarchical properties of data. Our model outperformed previous TKG models by 8.5%. For more details, see https://arxiv.org/abs/2209.05635. This work is accepted to present at the 4th Conference on Automated Knowledge Base Construction (AKBC), 2022.
My doctoral thesis advances the volume formula of a three-dimensional truncated hyperbolic tetrahedron using a geometrical approach. The existing volume formula of hyperbolic tetrahedron takes its six dihedral angles as variables. Still, the procedure lacks geometric validity because the function was derived from the Volume Conjecture, a quantum topological and algebraic concept. Recent studies showed that the volume formula for hyperbolic tetrahedra could be extended to a broader geometric object, which is called a truncated hyperbolic tetrahedron. However, the extension is limited to specific cases only. To bridge this gap, I incorporated a new geometric idea of viewing truncated tetrahedron as an intersection of a tetrahedron in Lorentz space and its dual to develop an inclusive volume function. For more details, see https://digitallibrary.usc.edu/Share/4ra674f01end78i1h4m74ty8mh786o48.
Focusing on the high increase of electric scooter usage in cities over the world after two years from the release of the first electric scooter sharing service in September 2017, I investigated if there is a significant impact of electric scooter sharing service on ridership of transit system. Eleven-month-long data were collected from the government of Louisville, KY, the only city in the United States that publicly shares the usage data of electric scooters in the town. To test the relation between the usage of electric scooters and the ridership of transit systems, I compared different models of both parametric (linear models with various regularizations) and non-parametric (regression tree and random forest). The random forests model fitted better than linear models with a lower residual standard error of 14%, confirming the non-linear relation. For the details, see here.
Music generation is a well-known application of generative adversarial networks (GAN). Previous work by Olof Mogren proposed an algorithm applicable to continuous sequential data such as music. Performance of this music-generating algorithm, however, was found to vary by training sets: a generated song trained by music with multiple instruments at a time, such as a piano concerto, had less similarity to original classical music than when it was trained by music with a single melody, for example, piano sonata. As a part of the course project, students evaluated how similar these newly generated songs are to existing classical music. For more information, see here.
As a researcher of the Knowledge Learning team of the Machine Learning Lab, I developed a project to embed knowledge graphs (KGs) in hyperbolic space. Working on various hyperbolic KG embedding models, I found that most distance-based models are limited in embedding data with n-to-n relation. I proposed a new model using an orthogonal projection to overcome the weakness of previous hyperbolic embedding models and better represent n-to-n relations.
Working as a data scientist in the Digital Innovation team, my primary role was to find better algorithms that predict the debt of each branch using statistical and machine learning techniques. I designed a classification algorithm to have high recall and detect as many potential high-risk groups as possible. Using RNN-based networks with multiple layers, the algorithm's efficiency was improved from 75% to 91%.