Ultrasound computed tomography (USCT) is an emerging imaging modality that holds great promise for breast imaging. Full-waveform inversion (FWI)-based image reconstruction methods incorporate accurate wave physics to produce high spatial resolution quantitative images of speed of sound or other acoustic properties of the breast tissues. However, FWI reconstruction is computationally expensive, which limits its application in a clinical setting. This contribution investigates using the use of a convolutional neural network (CNN) to learn a mapping from USCT data to speed of sound estimates. The CNN was trained using a supervised approach that employed a large set of anatomically and physiologically realistic numerical breast phantoms (NBPs) and simulated USCT measurements. Once trained, this CNN can then be evaluated for real-time FWI image reconstruction from USCT data. The performance of the proposed method was assessed and compared against FWI using a hold-out sample of 41 NBPs and corresponding USCT images. Accuracy was measured using relative mean square error (RMSE) and structural self-similarity index measure (SSIM). This numerical experiment demonstrates that a supervised learning model can achieve accuracy comparable to FWI, while significantly reducing computational time and memory requirements.