The SARS-CoV-2 pandemic has created a global race for a cure. One approach focuses on designing a novel variant of the human angiotensin-converting enzyme 2 (ACE2) that binds more tightly to the SARS-Cov-2 spike protein and diverts it from human cells. In this paper, we present a novel protein design framework formulated as a reinforcement learning task. It combines a fast, biologically-grounded reward function and a sequential action-space formulation to generate candidate proteins efficiently. By performing full scale molecular dynamics simulations, we confirm the improved stability of the protein complexes obtained compared to the native human ACE2/SARS-CoV-2. Our results suggest that combining novel protein design methods with modern reinforcement learning principles is a viable path for a COVID-19 cure and could also accelerate the design of analogous therapeutics targeting other diseases.