The last few years showed a growing interest in the design and development of Knowledge-Aware Recommender Systems (KARSs). This is mainly due to their capability in encoding and exploiting several data sources, both structured (such as knowledge graphs) and unstructured (such as plain text). Nowadays, a lot of models at the state-of-the-art in KARSs use deep learning, enabling them to exploit large amounts of information, including knowledge graphs (KGs), user reviews, plain text, and multimedia content (pictures, audio, videos). In my Ph.D. I will follow this research trend and I will explore and study techniques for designing KARSs leveraging representations learnt from multi-modal information sources, in order to provide users with fair, accurate, and explainable recommendations.