In this paper, we present a strategy to provide users with knowledge-aware recommendations based on the combination of graph neural networks and sentence encoders. In particular, our approach relies on the intuition that different data sources (i.e., structured data available in a knowledge graph and unstructured data, such as textual content) provide complementary information and can equally contribute to learn an accurate item representation. Accordingly, we first exploited graph neural networks to encode both collaborative features, such as the interactions between users and items, and structured properties of the items. Next, we used a sentence encoder that relies on transformers to learn a representation based on textual content describing the items. Finally, these embeddings are combined by exploiting a deep neural network where both self-attention and cross-attention mechanisms are used to learn the relationships between the initial embeddings and to further refine the representation. Such a neural network provides as output a prediction of users’ interest in the items, which is used to return a top-k recommendation list. In the experimental evaluation, we carried out an experiment against two datasets, and the results showed that our approach overcame several competitive baselines.