A PageRank-based collaborative filtering recommendation approach in digital libraries
In the current era of big data, the explosive growth of digital resources in Digital Libraries (DLs) has led to the serious information overload problem. This trend demands personalized recommendation approaches to provide DL users with digital resources specific to their individual needs. Most recommender systems employ Collaborative Filtering (CF) technology as a simulation of word-of-mouth effect to recommend users highly personalized items by aggregating and evaluating the usage data from other similar users or items. However, there still exist some issues that impede the deployment of CF technique in large-scaled DL systems:
• Data sparsity. A DL user is typically reluctant to rate a digital resource after it’s downloaded or borrowed, because reading the digital resource, e.g., a professional book, often takes the user too long time for him/her to remember the rating work. Though the DL user’s interest can also be acquired by analyzing the user’s behaviors, e.g., downloading, commenting or clicking a digital resource, the exponentially increasing digital resources vs the linearly increasing DL users still makes the historical usage data sparser, which are however necessary to identify similar users or digital resources during CF.
• Cold start. A lot of new joining DL users or new published digital resources have no historical usage data that can be used to identify similar users or digital resources during CF due to the dynamicity of social network of DL.
The above limitations require an approach that relies not only on the analysis of explicit user ratings or user behaviors, but also on the involvement of the additional source of knowledge (e.g., social network) as the complement to sparse historical usage data. Therefore, in this paper we present a personalized digital resource recommendation approach, which combines PageRank and CF techniques in a unified framework for recommending right digital resources to an active user by generating and analyzing a social network of both user relationships and resource relationships from historical usage data. Our novel idea is to adapt the personalized PageRank algorithm to propagate an influential DL user’s importance or digital resource’s importance along the associative links connecting both active user and his/her initially preferred resources, aiming to alleviate the issues of unstable and sparse historical usage data that hinder the usage of traditional CF techniques in DLs.
We further evaluate the performance of the proposed methodology through a case study relative to the traditional CF technique operating on the same historical usage data from a DL.
- 1) Computer science; AI; robotics
- 3) Experimental epistemology; constructivism; philosophy of science