Today, increasing workloads in data centers and high-performance computing are dominated by data and graph analytic applications aided by artificial intelligence (AI) and machine learning (ML). Typically, these data-intensive applications are handled by GPU-based artificial neural networks (ANNs) using a traditional von Neumann computing architecture. A majority of the energy consumption in the ANNs comes from multiply-and-accumulate (MAC) and data movement operations. The von-Neumann bottleneck also fundamentally limits the speed and energy-efficiency of the data transferred between memory and the processor. Currently, there are 3 main challenges preventing ONNs from achieving competitive performance with their digital counterparts: scalability, lack of device platform that can integrated all the necessary photonic devices, and on-chip optical memory. My research addresses new ways to create scalable, ultra-energy efficient photonic devices with co-integrated non-volatile optical memory [1]–[7]. Tensor-trained decomposition enables 79× fewer number of components while maintaining a > 95% accuracy for MINST classification tasks [8]. Our heterogeneous III-V/Si platform provides a way to instantiate all the needed functionalities such as optical neurons, electrical neurons, light emitters, memory, and synaptic interconnections on silicon. Optical weighting is performed with our III-V/Si MOSCAP phase shifters which yield > 1,000,000× lower power consumption than standard approaches [1], [9], [10]. Recently, we have also discovered a way to co-integrate memristive or charge-trap memory with the photonic-based MAC process, thus mitigating the von-Neumann bottleneck and reducing power consumption and latency when running computational tasks [1]–[7]. I helped lead the proposal for this work and it is actively funded by the DARPA Microsystems Technology Office. I plan to maintain an active collaboration with DARPA and HPE for future follow-up funding, design and fabrication runs.