"he change in the min_k parameter from 1 to 3 in the predictions has a significant impact on how estimated ratings are computed and subsequently affects the performance of the recommendation system. Let's delve into this transition and its implications.\n",
"The change in the min_k parameter from 1 to 3 in the predictions has a significant impact on how estimated ratings are computed and subsequently affects the performance of the recommendation system. Let's delve into this transition and its implications.\n",
"\n",
"\n",
"Initially, with min_k = 1, predictions are generated even if only a single similar user (neighbor) has rated a particular item. This approach can lead to predictions that might not accurately represent the item's true rating, especially if the rating from the sole available neighbor is an outlier or not representative of the broader user preferences.\n",
"Initially, with min_k = 1, predictions are generated even if only a single similar user (neighbor) has rated a particular item. This approach can lead to predictions that might not accurately represent the item's true rating, especially if the rating from the sole available neighbor is an outlier or not representative of the broader user preferences.\n",
"\n",
"\n",
...
@@ -262,7 +270,7 @@
...
@@ -262,7 +270,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 5,
"execution_count": 95,
"id": "cc806424",
"id": "cc806424",
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
...
@@ -443,7 +451,7 @@
...
@@ -443,7 +451,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 6,
"execution_count": 96,
"id": "d03ed9eb",
"id": "d03ed9eb",
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
...
@@ -476,6 +484,7 @@
...
@@ -476,6 +484,7 @@
" self.k = k\n",
" self.k = k\n",
" self.min_k = min_k\n",
" self.min_k = min_k\n",
" self.sim_options = sim_options\n",
" self.sim_options = sim_options\n",
" \n",
"\n",
"\n",
" def fit(self, trainset):\n",
" def fit(self, trainset):\n",
" \"\"\"\n",
" \"\"\"\n",
...
@@ -508,10 +517,10 @@
...
@@ -508,10 +517,10 @@
"\n",
"\n",
" # Step 1: Create the peer group of user u for item i\n",
" # Step 1: Create the peer group of user u for item i\n",
" peer_group = []\n",
" peer_group = []\n",
" for j, rating in enumerate(self.trainset.ir[i]):\n",
" for neighbor_inner_id, rating in enumerate(self.trainset.ir[i]):\n",
" if rating is not None:\n",
" if rating is not None:\n",
" similarity = self.sim[u, j] # Similarity between user u and user j for item i\n",
" similarity = self.sim[u, neighbor_inner_id] # Similarity between user u and user j for item i\n",
" # Step 2: Pick up the top neighbors efficiently\n",
" # Step 2: Pick up the top neighbors efficiently\n",
" k_neighbors = heapq.nlargest(self.min_k, peer_group, key=lambda x: x[1]) # Top k neighbors based on similarity\n",
" k_neighbors = heapq.nlargest(self.min_k, peer_group, key=lambda x: x[1]) # Top k neighbors based on similarity\n",
...
@@ -608,7 +617,7 @@
...
@@ -608,7 +617,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 7,
"execution_count": 97,
"id": "7a9147ea",
"id": "7a9147ea",
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
...
@@ -719,7 +728,7 @@
...
@@ -719,7 +728,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 8,
"execution_count": 98,
"id": "be53ae27",
"id": "be53ae27",
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
...
@@ -775,7 +784,7 @@
...
@@ -775,7 +784,7 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": 9,
"execution_count": 99,
"id": "c20d8e19",
"id": "c20d8e19",
"metadata": {},
"metadata": {},
"outputs": [
"outputs": [
...
...
%% Cell type:markdown id:f4a8f664 tags:
%% Cell type:markdown id:f4a8f664 tags:
# Custom User-based Model
# Custom User-based Model
The present notebooks aims at creating a UserBased class that inherits from the Algobase class (surprise package) and that can be customized with various similarity metrics, peer groups and score aggregation functions.
The present notebooks aims at creating a UserBased class that inherits from the Algobase class (surprise package) and that can be customized with various similarity metrics, peer groups and score aggregation functions.
%% Cell type:code id:00d1b249 tags:
%% Cell type:code id:00d1b249 tags:
``` python
``` python
# reloads modules automatically before entering the execution of code
# reloads modules automatically before entering the execution of code
he change in the min_k parameter from 1 to 3 in the predictions has a significant impact on how estimated ratings are computed and subsequently affects the performance of the recommendation system. Let's delve into this transition and its implications.
The change in the min_k parameter from 1 to 3 in the predictions has a significant impact on how estimated ratings are computed and subsequently affects the performance of the recommendation system. Let's delve into this transition and its implications.
Initially, with min_k = 1, predictions are generated even if only a single similar user (neighbor) has rated a particular item. This approach can lead to predictions that might not accurately represent the item's true rating, especially if the rating from the sole available neighbor is an outlier or not representative of the broader user preferences.
Initially, with min_k = 1, predictions are generated even if only a single similar user (neighbor) has rated a particular item. This approach can lead to predictions that might not accurately represent the item's true rating, especially if the rating from the sole available neighbor is an outlier or not representative of the broader user preferences.
For example, consider User 11's ratings for items like 1214 and 364. Under min_k = 1, the predictions were 3.604 and 2.492, respectively. However, when min_k is increased to 3, these ratings adjust downwards to 3.166 for both items. This adjustment indicates that the initial ratings might have been influenced by only a few ratings from similar users, which can lead to more volatile or less reliable predictions.
For example, consider User 11's ratings for items like 1214 and 364. Under min_k = 1, the predictions were 3.604 and 2.492, respectively. However, when min_k is increased to 3, these ratings adjust downwards to 3.166 for both items. This adjustment indicates that the initial ratings might have been influenced by only a few ratings from similar users, which can lead to more volatile or less reliable predictions.
Similarly, for User 23's ratings on items 1997 and 2700, transitioning from min_k = 1 to min_k = 3 results in downward adjustments from 2.782 and 2.349 to 2.5625 for both items. This change suggests that the initial ratings might have been based on limited or potentially biased data, prompting a more conservative reassessment under min_k = 3.
Similarly, for User 23's ratings on items 1997 and 2700, transitioning from min_k = 1 to min_k = 3 results in downward adjustments from 2.782 and 2.349 to 2.5625 for both items. This change suggests that the initial ratings might have been based on limited or potentially biased data, prompting a more conservative reassessment under min_k = 3.
The rationale behind this change lies in the nature of the min_k parameter. Increasing min_k to 3 requires a more robust set of similar users (at least 3) to have rated an item before a prediction is made. This adjustment aims to provide more stable and reliable predictions by relying on a broader consensus among users with similar preferences.
The rationale behind this change lies in the nature of the min_k parameter. Increasing min_k to 3 requires a more robust set of similar users (at least 3) to have rated an item before a prediction is made. This adjustment aims to provide more stable and reliable predictions by relying on a broader consensus among users with similar preferences.
By enforcing a higher min_k, the system adopts a more cautious approach to estimating ratings, particularly for items with sparse or potentially biased rating data. This approach helps mitigate the impact of outliers or sparse data in the recommendation system, leading to more consistent and credible predictions overall.
By enforcing a higher min_k, the system adopts a more cautious approach to estimating ratings, particularly for items with sparse or potentially biased rating data. This approach helps mitigate the impact of outliers or sparse data in the recommendation system, leading to more consistent and credible predictions overall.
In summary, adjusting the min_k parameter from 1 to 3 signifies a shift towards more conservative and reliable estimates of item ratings within the recommendation system. This adjustment aims to enhance the accuracy and robustness of the system's predictions by requiring a broader consensus among similar users before making predictions, thereby improving the overall quality and reliability of recommendations provided to users.
In summary, adjusting the min_k parameter from 1 to 3 signifies a shift towards more conservative and reliable estimates of item ratings within the recommendation system. This adjustment aims to enhance the accuracy and robustness of the system's predictions by requiring a broader consensus among similar users before making predictions, thereby improving the overall quality and reliability of recommendations provided to users.
%% Cell type:code id:cc806424 tags:
%% Cell type:code id:cc806424 tags:
``` python
``` python
defanalyse_min_support(knn_model,testset):
defanalyse_min_support(knn_model,testset):
# Reset min_k to 2
# Reset min_k to 2
knn_model.min_k=2
knn_model.min_k=2
# Modify min_support from 1 to 3 and observe actual_k
# Modify min_support from 1 to 3 and observe actual_k
formin_supportinrange(1,4):
formin_supportinrange(1,4):
knn_model.sim_options['min_support']=min_support
knn_model.sim_options['min_support']=min_support
predictions_min_support=knn_model.test(testset[:30])# Take the first 30 predictions for display
predictions_min_support=knn_model.test(testset[:30])# Take the first 30 predictions for display
print(f"\nPredictions with min_support = {min_support}:")
print(f"\nPredictions with min_support = {min_support}:")
similarity_matrix=knn_model.sim# Algorithm of knn_model
similarity_matrix=knn_model.sim# Algorithm of knn_model
print("\nSimilarity Matrix:")
print("\nSimilarity Matrix:")
returnsimilarity_matrix
returnsimilarity_matrix
# Call the function and print the analysis
# Call the function and print the analysis
result=analyse_min_support(knn_model,testset)
result=analyse_min_support(knn_model,testset)
print(result)
print(result)
```
```
%% Output
%% Output
Predictions with min_support = 1:
Predictions with min_support = 1:
User: 11, Item: 1214, Actual_k: 1
User: 11, Item: 1214, Actual_k: 1
User: 11, Item: 364, Actual_k: 2
User: 11, Item: 364, Actual_k: 2
User: 11, Item: 4308, Actual_k: 1
User: 11, Item: 4308, Actual_k: 1
User: 11, Item: 527, Actual_k: 2
User: 11, Item: 527, Actual_k: 2
User: 13, Item: 1997, Actual_k: 0
User: 13, Item: 1997, Actual_k: 0
User: 13, Item: 4993, Actual_k: 1
User: 13, Item: 4993, Actual_k: 1
User: 13, Item: 2700, Actual_k: 0
User: 13, Item: 2700, Actual_k: 0
User: 13, Item: 1721, Actual_k: 1
User: 13, Item: 1721, Actual_k: 1
User: 13, Item: 527, Actual_k: 1
User: 13, Item: 527, Actual_k: 1
User: 17, Item: 2028, Actual_k: 2
User: 17, Item: 2028, Actual_k: 2
User: 17, Item: 4993, Actual_k: 2
User: 17, Item: 4993, Actual_k: 2
User: 17, Item: 1214, Actual_k: 1
User: 17, Item: 1214, Actual_k: 1
User: 17, Item: 4308, Actual_k: 1
User: 17, Item: 4308, Actual_k: 1
User: 19, Item: 1997, Actual_k: 0
User: 19, Item: 1997, Actual_k: 0
User: 19, Item: 2028, Actual_k: 0
User: 19, Item: 2028, Actual_k: 0
User: 19, Item: 4993, Actual_k: 0
User: 19, Item: 4993, Actual_k: 0
User: 19, Item: 5952, Actual_k: 0
User: 19, Item: 5952, Actual_k: 0
User: 19, Item: 2700, Actual_k: 0
User: 19, Item: 2700, Actual_k: 0
User: 19, Item: 1721, Actual_k: 0
User: 19, Item: 1721, Actual_k: 0
User: 19, Item: 1214, Actual_k: 0
User: 19, Item: 1214, Actual_k: 0
User: 19, Item: 364, Actual_k: 0
User: 19, Item: 364, Actual_k: 0
User: 23, Item: 1997, Actual_k: 2
User: 23, Item: 1997, Actual_k: 2
User: 23, Item: 2700, Actual_k: 2
User: 23, Item: 2700, Actual_k: 2
User: 27, Item: 1997, Actual_k: 0
User: 27, Item: 1997, Actual_k: 0
User: 27, Item: 2028, Actual_k: 1
User: 27, Item: 2028, Actual_k: 1
User: 27, Item: 5952, Actual_k: 1
User: 27, Item: 5952, Actual_k: 1
User: 27, Item: 2700, Actual_k: 0
User: 27, Item: 2700, Actual_k: 0
User: 27, Item: 1721, Actual_k: 1
User: 27, Item: 1721, Actual_k: 1
User: 27, Item: 364, Actual_k: 1
User: 27, Item: 364, Actual_k: 1
User: 27, Item: 4308, Actual_k: 1
User: 27, Item: 4308, Actual_k: 1
Predictions with min_support = 2:
Predictions with min_support = 2:
User: 11, Item: 1214, Actual_k: 1
User: 11, Item: 1214, Actual_k: 1
User: 11, Item: 364, Actual_k: 2
User: 11, Item: 364, Actual_k: 2
User: 11, Item: 4308, Actual_k: 1
User: 11, Item: 4308, Actual_k: 1
User: 11, Item: 527, Actual_k: 2
User: 11, Item: 527, Actual_k: 2
User: 13, Item: 1997, Actual_k: 0
User: 13, Item: 1997, Actual_k: 0
User: 13, Item: 4993, Actual_k: 1
User: 13, Item: 4993, Actual_k: 1
User: 13, Item: 2700, Actual_k: 0
User: 13, Item: 2700, Actual_k: 0
User: 13, Item: 1721, Actual_k: 1
User: 13, Item: 1721, Actual_k: 1
User: 13, Item: 527, Actual_k: 1
User: 13, Item: 527, Actual_k: 1
User: 17, Item: 2028, Actual_k: 2
User: 17, Item: 2028, Actual_k: 2
User: 17, Item: 4993, Actual_k: 2
User: 17, Item: 4993, Actual_k: 2
User: 17, Item: 1214, Actual_k: 1
User: 17, Item: 1214, Actual_k: 1
User: 17, Item: 4308, Actual_k: 1
User: 17, Item: 4308, Actual_k: 1
User: 19, Item: 1997, Actual_k: 0
User: 19, Item: 1997, Actual_k: 0
User: 19, Item: 2028, Actual_k: 0
User: 19, Item: 2028, Actual_k: 0
User: 19, Item: 4993, Actual_k: 0
User: 19, Item: 4993, Actual_k: 0
User: 19, Item: 5952, Actual_k: 0
User: 19, Item: 5952, Actual_k: 0
User: 19, Item: 2700, Actual_k: 0
User: 19, Item: 2700, Actual_k: 0
User: 19, Item: 1721, Actual_k: 0
User: 19, Item: 1721, Actual_k: 0
User: 19, Item: 1214, Actual_k: 0
User: 19, Item: 1214, Actual_k: 0
User: 19, Item: 364, Actual_k: 0
User: 19, Item: 364, Actual_k: 0
User: 23, Item: 1997, Actual_k: 2
User: 23, Item: 1997, Actual_k: 2
User: 23, Item: 2700, Actual_k: 2
User: 23, Item: 2700, Actual_k: 2
User: 27, Item: 1997, Actual_k: 0
User: 27, Item: 1997, Actual_k: 0
User: 27, Item: 2028, Actual_k: 1
User: 27, Item: 2028, Actual_k: 1
User: 27, Item: 5952, Actual_k: 1
User: 27, Item: 5952, Actual_k: 1
User: 27, Item: 2700, Actual_k: 0
User: 27, Item: 2700, Actual_k: 0
User: 27, Item: 1721, Actual_k: 1
User: 27, Item: 1721, Actual_k: 1
User: 27, Item: 364, Actual_k: 1
User: 27, Item: 364, Actual_k: 1
User: 27, Item: 4308, Actual_k: 1
User: 27, Item: 4308, Actual_k: 1
Predictions with min_support = 3:
Predictions with min_support = 3:
User: 11, Item: 1214, Actual_k: 1
User: 11, Item: 1214, Actual_k: 1
User: 11, Item: 364, Actual_k: 2
User: 11, Item: 364, Actual_k: 2
User: 11, Item: 4308, Actual_k: 1
User: 11, Item: 4308, Actual_k: 1
User: 11, Item: 527, Actual_k: 2
User: 11, Item: 527, Actual_k: 2
User: 13, Item: 1997, Actual_k: 0
User: 13, Item: 1997, Actual_k: 0
User: 13, Item: 4993, Actual_k: 1
User: 13, Item: 4993, Actual_k: 1
User: 13, Item: 2700, Actual_k: 0
User: 13, Item: 2700, Actual_k: 0
User: 13, Item: 1721, Actual_k: 1
User: 13, Item: 1721, Actual_k: 1
User: 13, Item: 527, Actual_k: 1
User: 13, Item: 527, Actual_k: 1
User: 17, Item: 2028, Actual_k: 2
User: 17, Item: 2028, Actual_k: 2
User: 17, Item: 4993, Actual_k: 2
User: 17, Item: 4993, Actual_k: 2
User: 17, Item: 1214, Actual_k: 1
User: 17, Item: 1214, Actual_k: 1
User: 17, Item: 4308, Actual_k: 1
User: 17, Item: 4308, Actual_k: 1
User: 19, Item: 1997, Actual_k: 0
User: 19, Item: 1997, Actual_k: 0
User: 19, Item: 2028, Actual_k: 0
User: 19, Item: 2028, Actual_k: 0
User: 19, Item: 4993, Actual_k: 0
User: 19, Item: 4993, Actual_k: 0
User: 19, Item: 5952, Actual_k: 0
User: 19, Item: 5952, Actual_k: 0
User: 19, Item: 2700, Actual_k: 0
User: 19, Item: 2700, Actual_k: 0
User: 19, Item: 1721, Actual_k: 0
User: 19, Item: 1721, Actual_k: 0
User: 19, Item: 1214, Actual_k: 0
User: 19, Item: 1214, Actual_k: 0
User: 19, Item: 364, Actual_k: 0
User: 19, Item: 364, Actual_k: 0
User: 23, Item: 1997, Actual_k: 2
User: 23, Item: 1997, Actual_k: 2
User: 23, Item: 2700, Actual_k: 2
User: 23, Item: 2700, Actual_k: 2
User: 27, Item: 1997, Actual_k: 0
User: 27, Item: 1997, Actual_k: 0
User: 27, Item: 2028, Actual_k: 1
User: 27, Item: 2028, Actual_k: 1
User: 27, Item: 5952, Actual_k: 1
User: 27, Item: 5952, Actual_k: 1
User: 27, Item: 2700, Actual_k: 0
User: 27, Item: 2700, Actual_k: 0
User: 27, Item: 1721, Actual_k: 1
User: 27, Item: 1721, Actual_k: 1
User: 27, Item: 364, Actual_k: 1
User: 27, Item: 364, Actual_k: 1
User: 27, Item: 4308, Actual_k: 1
User: 27, Item: 4308, Actual_k: 1
Similarity Matrix:
Similarity Matrix:
[[1. 0. 0.24615385 0. 0.43243243 0. ]
[[1. 0. 0.24615385 0. 0.43243243 0. ]
[0. 1. 0. 0. 0.17094017 0. ]
[0. 1. 0. 0. 0.17094017 0. ]
[0.24615385 0. 1. 0. 0.53333333 0. ]
[0.24615385 0. 1. 0. 0.53333333 0. ]
[0. 0. 0. 1. 0. 0. ]
[0. 0. 0. 1. 0. 0. ]
[0.43243243 0.17094017 0.53333333 0. 1. 0.25 ]
[0.43243243 0.17094017 0.53333333 0. 1. 0.25 ]
[0. 0. 0. 0. 0.25 1. ]]
[0. 0. 0. 0. 0.25 1. ]]
%% Cell type:markdown id:9fcc525d tags:
%% Cell type:markdown id:9fcc525d tags:
Predictions with min_support = 1:
Predictions with min_support = 1:
The actual_k values vary across different predictions. For instance, for User 11 and Item 1214, actual_k is 1, indicating that only one neighbor was used to estimate this prediction. Conversely, for predictions like User 11 with Item 364, actual_k is 2, indicating that two neighbors were considered in the estimation.
The actual_k values vary across different predictions. For instance, for User 11 and Item 1214, actual_k is 1, indicating that only one neighbor was used to estimate this prediction. Conversely, for predictions like User 11 with Item 364, actual_k is 2, indicating that two neighbors were considered in the estimation.
Predictions with min_support = 2 and min_support = 3:
Predictions with min_support = 2 and min_support = 3:
Increasing the min_support threshold to 2 or 3 doesn't significantly alter the actual_k values compared to predictions with min_support = 1. This suggests that for most predictions, the actual number of neighbors (actual_k) involved in the estimation remains relatively consistent.
Increasing the min_support threshold to 2 or 3 doesn't significantly alter the actual_k values compared to predictions with min_support = 1. This suggests that for most predictions, the actual number of neighbors (actual_k) involved in the estimation remains relatively consistent.
Understanding actual_k:
Understanding actual_k:
actual_k represents the real number of neighbors (similar users) that were taken into account to estimate the rating of a specific item for a given user. A higher actual_k indicates that more neighbors were involved in the prediction, potentially leading to more robust and reliable estimations of ratings.
actual_k represents the real number of neighbors (similar users) that were taken into account to estimate the rating of a specific item for a given user. A higher actual_k indicates that more neighbors were involved in the prediction, potentially leading to more robust and reliable estimations of ratings.
The similarity matrix provides an overview of the similarities between users. Each element in the matrix represents the similarity score between two users, where higher values indicate greater similarity. For example, a similarity coefficient of 1 on the main diagonal indicates maximum similarity of a user with themselves.
The similarity matrix provides an overview of the similarities between users. Each element in the matrix represents the similarity score between two users, where higher values indicate greater similarity. For example, a similarity coefficient of 1 on the main diagonal indicates maximum similarity of a user with themselves.
This similarity matrix is crucial in the recommendation process to identify users who are most similar to a given user, enabling the system to weight ratings effectively and produce personalized and relevant predictions.
This similarity matrix is crucial in the recommendation process to identify users who are most similar to a given user, enabling the system to weight ratings effectively and produce personalized and relevant predictions.
In summary, by adjusting parameters like min_support, we control how predictions are computed using data from similar neighbors, while the similarity matrix offers insights into user similarities that are fundamental for the effective functioning of collaborative filtering-based recommendation systems.
In summary, by adjusting parameters like min_support, we control how predictions are computed using data from similar neighbors, while the similarity matrix offers insights into user similarities that are fundamental for the effective functioning of collaborative filtering-based recommendation systems.
%% Cell type:markdown id:2dd01f5b tags:
%% Cell type:markdown id:2dd01f5b tags:
# 3. Implement and explore a customizable user-based algorithm
# 3. Implement and explore a customizable user-based algorithm
Create a self-made user-based algorithm allowing to customize the similarity metric, peer group calculation and aggregation function
Create a self-made user-based algorithm allowing to customize the similarity metric, peer group calculation and aggregation function