Skip to content
Extraits de code Groupes Projets
Valider 861461d5 rédigé par Adrien Payen's avatar Adrien Payen
Parcourir les fichiers

update the user_based 2,40

parent cd5b2ed2
Aucune branche associée trouvée
Aucune étiquette associée trouvée
Aucune requête de fusion associée trouvée
%% Cell type:markdown id:f4a8f664 tags: %% Cell type:markdown id:f4a8f664 tags:
# Custom User-based Model # Custom User-based Model
The present notebooks aims at creating a UserBased class that inherits from the Algobase class (surprise package) and that can be customized with various similarity metrics, peer groups and score aggregation functions. The present notebooks aims at creating a UserBased class that inherits from the Algobase class (surprise package) and that can be customized with various similarity metrics, peer groups and score aggregation functions.
%% Cell type:code id:00d1b249 tags: %% Cell type:code id:00d1b249 tags:
``` python ``` python
# reloads modules automatically before entering the execution of code # reloads modules automatically before entering the execution of code
%load_ext autoreload %load_ext autoreload
%autoreload 2 %autoreload 2
# standard library imports # standard library imports
# -- add new imports here -- # -- add new imports here --
# third parties imports # third parties imports
import numpy as np import numpy as np
import pandas as pd import pandas as pd
# -- add new imports here -- # -- add new imports here --
# local imports # local imports
from constants import Constant as C from constants import Constant as C
from loaders import load_ratings,load_items from loaders import load_ratings,load_items
from surprise import KNNWithMeans, accuracy, AlgoBase, PredictionImpossible from surprise import KNNWithMeans, accuracy, AlgoBase, PredictionImpossible
import heapq import heapq
``` ```
%% Output %% Output
--------------------------------------------------------------------------- The autoreload extension is already loaded. To reload it, use:
ImportError Traceback (most recent call last) %reload_ext autoreload
Cell In[1], line 14
10 import pandas as pd
11 # -- add new imports here --
12
13 # local imports
---> 14 from constants import Constant as C
15 from loaders import load_ratings,load_items
16 from surprise import KNNWithMeans, accuracy, AlgoBase, PredictionImpossible
ImportError: cannot import name 'Constant' from 'constants' (/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/constants.py)
%% Cell type:markdown id:22716aa3 tags: %% Cell type:markdown id:22716aa3 tags:
# 1. Loading Data # 1. Loading Data
Prepare a dataset in order to help implementing a user-based recommender system Prepare a dataset in order to help implementing a user-based recommender system
%% Cell type:code id:aafd1712 tags: %% Cell type:code id:aafd1712 tags:
``` python ``` python
# Create Surprise Dataset from the pandas DataFrame and Reader # Create Surprise Dataset from the pandas DataFrame and Reader
surprise_data = load_ratings(surprise_format=True) surprise_data = load_ratings(surprise_format=True)
trainset = surprise_data.build_full_trainset() trainset = surprise_data.build_full_trainset()
testset = trainset.build_anti_testset() testset = trainset.build_anti_testset()
sim_options = {
'name': 'msd', # Mean Squared Difference (Mean Square Error)
'user_based': True, # User-based collaborative filtering
'min_support': 3 # Minimum number of common ratings required
}
# Build an algorithm, and train it.
algo = KNNWithMeans(sim_options=sim_options, k=3, min_k=2)
algo.fit(trainset)
algo.test(testset)
uid = str(11) # raw user id (as in the ratings file). They are **strings**!
iid = str(364)
pred = algo.predict(uid, iid, r_ui=4, verbose=True)
``` ```
%% Output %% Output
Computing the msd similarity matrix... Computing the msd similarity matrix...
Done computing similarity matrix. Done computing similarity matrix.
user: 11 item: 364 r_ui = 4.00 est = 3.42 {'was_impossible': True, 'reason': 'User and/or item is unknown.'} user: 11 item: 364 r_ui = None est = 2.49 {'actual_k': 2, 'was_impossible': False}
%% Cell type:code id:cf3ccdc0 tags:
``` python
# -- load data, build trainset and anti testset --
# it depends on the tiny dataset
surprise_data = load_ratings(surprise_format=True)
df_movies = load_items()
# Assuming you have a pandas DataFrame named 'df' with columns ['user_id', 'item_id', 'rating']
# Build train set with all available ratings
trainset = surprise_data.build_full_trainset()
# Build anti-test set
testset = trainset.build_anti_testset()
```
%% Cell type:markdown id:94adf3a6 tags: %% Cell type:markdown id:94adf3a6 tags:
# 2. Explore Surprise's user-based algorithm # 2. Explore Surprise's user-based algorithm
Displays user-based predictions and similarity matrix on the test dataset using the KNNWithMeans class Displays user-based predictions and similarity matrix on the test dataset using the KNNWithMeans class
%% Cell type:code id:e6fb78b7 tags: %% Cell type:code id:ce078b43 tags:
``` python ``` python
# -- using surprise's user-based algorithm, explore the impact of different parameters and displays predictions -- #User-based prediction for the user 11 and the item 364
# Define the similarity options
sim_options = { sim_options = {
'name': 'msd', # Mean Squared Difference (Mean Square Error) 'name': 'msd', # Mean Squared Difference (Mean Square Error)
'user_based': True, # User-based collaborative filtering 'user_based': True, # User-based collaborative filtering
'min_support': 3 # Minimum number of common ratings required 'min_support': 3 # Minimum number of common ratings required
} }
# Create an instance of KNNWithMeans with the specified options
knn_model = KNNWithMeans(k=3, min_k=2, sim_options=sim_options)
# Train the algorithm on the trainset # Build an algorithm, and train it.
knn_model.fit(trainset).test(testset) algo = KNNWithMeans(sim_options=sim_options, k=3, min_k=2)
algo.fit(trainset)
algo.test(testset)
# Make an estimation for user 11 and item 364
prediction = knn_model.predict('11', '364')
print(prediction.est)
```
%% Output uid = 11 # raw user id (as in the ratings file). They are **strings**!
iid = 364
Computing the msd similarity matrix... pred = algo.predict(uid, iid, verbose=True)
Done computing similarity matrix. ```
3.4190898791540785
%% Cell type:code id:ffe89c56 tags: %% Cell type:code id:ffe89c56 tags:
``` python ``` python
# Playing with KNN # Playing with KNN
# Define the similarity options # Define the similarity options
sim_options = { sim_options = {
'name': 'msd', # Mean Squared Difference (Mean Square Error) 'name': 'msd', # Mean Squared Difference (Mean Square Error)
'user_based': True, # User-based collaborative filtering 'user_based': True, # User-based collaborative filtering
'min_support': 3 # Minimum number of common ratings required. This data is 'min_support': 3 # Minimum number of common ratings required. This data is
} }
# Create an instance of KNNWithMeans with the specified options # Create an instance of KNNWithMeans with the specified options
def predict_ratings(trainset, testset, min_k_values): def predict_ratings(trainset, testset, min_k_values):
for min_k in min_k_values: for min_k in min_k_values:
knn_model = KNNWithMeans(sim_options=sim_options, k=3, min_k=min_k) knn_model = KNNWithMeans(sim_options=sim_options, k=3, min_k=min_k)
# Train the algorithm on the trainset # Train the algorithm on the trainset
knn_model.fit(trainset) knn_model.fit(trainset)
# Make predictions for all ratings in the anti testset # Make predictions for all ratings in the anti testset
predictions = knn_model.test(testset) predictions = knn_model.test(testset)
# Display 30 predictions # Display 30 predictions
print(f"Predictions with min_k = {min_k}:") print(f"Predictions with min_k = {min_k}:")
for prediction in predictions[:30]: for prediction in predictions[:30]:
print(f"User: {prediction.uid}, Item: {prediction.iid}, Rating: {prediction.est}") print(f"User: {prediction.uid}, Item: {prediction.iid}, Rating: {prediction.est}")
# Assuming trainset and testset are already defined # Assuming trainset and testset are already defined
predict_ratings(trainset, testset, min_k_values=[1, 2, 3]) predict_ratings(trainset, testset, min_k_values=[1, 2, 3])
``` ```
%% Output %% Output
Computing the msd similarity matrix... Computing the msd similarity matrix...
Done computing similarity matrix. Done computing similarity matrix.
Predictions with min_k = 1: Predictions with min_k = 1:
User: 15, Item: 942, Rating: 3.7769516356699464 User: 11, Item: 1214, Rating: 3.6041666666666665
User: 15, Item: 2117, Rating: 2.9340004894942537 User: 11, Item: 364, Rating: 2.49203431372549
User: 15, Item: 2672, Rating: 2.371008709611413 User: 11, Item: 4308, Rating: 1.6041666666666667
User: 15, Item: 5054, Rating: 3.010328638497653 User: 11, Item: 527, Rating: 3.898897058823529
User: 15, Item: 6322, Rating: 1.711175832857413 User: 13, Item: 1997, Rating: 2.8
User: 15, Item: 6323, Rating: 1.7645762379992287 User: 13, Item: 4993, Rating: 3.2375
User: 15, Item: 6757, Rating: 3.010328638497653 User: 13, Item: 2700, Rating: 2.8
User: 15, Item: 7700, Rating: 3.561484741491386 User: 13, Item: 1721, Rating: 1.2374999999999998
User: 15, Item: 7981, Rating: 3.386000174210522 User: 13, Item: 527, Rating: 3.2375
User: 15, Item: 8600, Rating: 3.320743223639117 User: 17, Item: 2028, Rating: 3.8125
User: 15, Item: 8620, Rating: 2.7538763809343654 User: 17, Item: 4993, Rating: 4.128289473684211
User: 15, Item: 31952, Rating: 3.7409900837647396 User: 17, Item: 1214, Rating: 3.6875
User: 15, Item: 3, Rating: 2.222062601579949 User: 17, Item: 4308, Rating: 1.6875
User: 15, Item: 64, Rating: 0.9224387353614938 User: 19, Item: 1997, Rating: 3.5
User: 15, Item: 206, Rating: 2.35668733389394 User: 19, Item: 2028, Rating: 3.5
User: 15, Item: 249, Rating: 3.1290259851652826 User: 19, Item: 4993, Rating: 3.5
User: 15, Item: 276, Rating: 2.1800017354806753 User: 19, Item: 5952, Rating: 3.5
User: 15, Item: 369, Rating: 2.3082373858282694 User: 19, Item: 2700, Rating: 3.5
User: 15, Item: 504, Rating: 2.2600496220227573 User: 19, Item: 1721, Rating: 3.5
User: 15, Item: 515, Rating: 3.6575674086958188 User: 19, Item: 1214, Rating: 3.5
User: 15, Item: 522, Rating: 2.4562020809509626 User: 19, Item: 364, Rating: 3.5
User: 15, Item: 580, Rating: 1.9073310817298395 User: 23, Item: 1997, Rating: 2.782649253731343
User: 15, Item: 599, Rating: 2.780847470837928 User: 23, Item: 2700, Rating: 2.349813432835821
User: 15, Item: 915, Rating: 2.761094249104645 User: 27, Item: 1997, Rating: 4.666666666666667
User: 15, Item: 966, Rating: 3.0894953051643195 User: 27, Item: 2028, Rating: 5.0
User: 15, Item: 1274, Rating: 2.9873500196382845 User: 27, Item: 5952, Rating: 5.0
User: 15, Item: 1299, Rating: 3.0779327239728005 User: 27, Item: 2700, Rating: 4.666666666666667
User: 15, Item: 1345, Rating: 2.2037629856623138 User: 27, Item: 1721, Rating: 3.104166666666667
User: 15, Item: 1354, Rating: 2.001877412379849 User: 27, Item: 364, Rating: 4.604166666666667
User: 15, Item: 532, Rating: 2.7123071345260277 User: 27, Item: 4308, Rating: 3.104166666666667
Computing the msd similarity matrix... Computing the msd similarity matrix...
Done computing similarity matrix. Done computing similarity matrix.
Predictions with min_k = 2: Predictions with min_k = 2:
User: 15, Item: 942, Rating: 3.7769516356699464 User: 11, Item: 1214, Rating: 3.1666666666666665
User: 15, Item: 2117, Rating: 2.9340004894942537 User: 11, Item: 364, Rating: 2.49203431372549
User: 15, Item: 2672, Rating: 2.371008709611413 User: 11, Item: 4308, Rating: 3.1666666666666665
User: 15, Item: 5054, Rating: 2.693661971830986 User: 11, Item: 527, Rating: 3.898897058823529
User: 15, Item: 6322, Rating: 1.711175832857413 User: 13, Item: 1997, Rating: 2.8
User: 15, Item: 6323, Rating: 1.7645762379992287 User: 13, Item: 4993, Rating: 2.8
User: 15, Item: 6757, Rating: 2.693661971830986 User: 13, Item: 2700, Rating: 2.8
User: 15, Item: 7700, Rating: 3.561484741491386 User: 13, Item: 1721, Rating: 2.8
User: 15, Item: 7981, Rating: 3.386000174210522 User: 13, Item: 527, Rating: 2.8
User: 15, Item: 8600, Rating: 3.320743223639117 User: 17, Item: 2028, Rating: 3.8125
User: 15, Item: 8620, Rating: 2.7538763809343654 User: 17, Item: 4993, Rating: 4.128289473684211
User: 15, Item: 31952, Rating: 3.7409900837647396 User: 17, Item: 1214, Rating: 3.25
User: 15, Item: 3, Rating: 2.222062601579949 User: 17, Item: 4308, Rating: 3.25
User: 15, Item: 64, Rating: 0.9224387353614938 User: 19, Item: 1997, Rating: 3.5
User: 15, Item: 206, Rating: 2.35668733389394 User: 19, Item: 2028, Rating: 3.5
User: 15, Item: 249, Rating: 3.1290259851652826 User: 19, Item: 4993, Rating: 3.5
User: 15, Item: 276, Rating: 2.1800017354806753 User: 19, Item: 5952, Rating: 3.5
User: 15, Item: 369, Rating: 2.3082373858282694 User: 19, Item: 2700, Rating: 3.5
User: 15, Item: 504, Rating: 2.2600496220227573 User: 19, Item: 1721, Rating: 3.5
User: 15, Item: 515, Rating: 3.6575674086958188 User: 19, Item: 1214, Rating: 3.5
User: 15, Item: 522, Rating: 2.4562020809509626 User: 19, Item: 364, Rating: 3.5
User: 15, Item: 580, Rating: 1.9073310817298395 User: 23, Item: 1997, Rating: 2.782649253731343
User: 15, Item: 599, Rating: 2.780847470837928 User: 23, Item: 2700, Rating: 2.349813432835821
User: 15, Item: 915, Rating: 2.761094249104645 User: 27, Item: 1997, Rating: 4.666666666666667
User: 15, Item: 966, Rating: 2.693661971830986 User: 27, Item: 2028, Rating: 4.666666666666667
User: 15, Item: 1274, Rating: 2.9873500196382845 User: 27, Item: 5952, Rating: 4.666666666666667
User: 15, Item: 1299, Rating: 3.0779327239728005 User: 27, Item: 2700, Rating: 4.666666666666667
User: 15, Item: 1345, Rating: 2.2037629856623138 User: 27, Item: 1721, Rating: 4.666666666666667
User: 15, Item: 1354, Rating: 2.001877412379849 User: 27, Item: 364, Rating: 4.666666666666667
User: 15, Item: 532, Rating: 2.7123071345260277 User: 27, Item: 4308, Rating: 4.666666666666667
Computing the msd similarity matrix... Computing the msd similarity matrix...
Done computing similarity matrix. Done computing similarity matrix.
Predictions with min_k = 3: Predictions with min_k = 3:
User: 15, Item: 942, Rating: 3.7769516356699464 User: 11, Item: 1214, Rating: 3.1666666666666665
User: 15, Item: 2117, Rating: 2.9340004894942537 User: 11, Item: 364, Rating: 3.1666666666666665
User: 15, Item: 2672, Rating: 2.371008709611413 User: 11, Item: 4308, Rating: 3.1666666666666665
User: 15, Item: 5054, Rating: 2.693661971830986 User: 11, Item: 527, Rating: 3.1666666666666665
User: 15, Item: 6322, Rating: 2.693661971830986 User: 13, Item: 1997, Rating: 2.8
User: 15, Item: 6323, Rating: 1.7645762379992287 User: 13, Item: 4993, Rating: 2.8
User: 15, Item: 6757, Rating: 2.693661971830986 User: 13, Item: 2700, Rating: 2.8
User: 15, Item: 7700, Rating: 2.693661971830986 User: 13, Item: 1721, Rating: 2.8
User: 15, Item: 7981, Rating: 3.386000174210522 User: 13, Item: 527, Rating: 2.8
User: 15, Item: 8600, Rating: 2.693661971830986 User: 17, Item: 2028, Rating: 3.25
User: 15, Item: 8620, Rating: 2.7538763809343654 User: 17, Item: 4993, Rating: 3.25
User: 15, Item: 31952, Rating: 2.693661971830986 User: 17, Item: 1214, Rating: 3.25
User: 15, Item: 3, Rating: 2.222062601579949 User: 17, Item: 4308, Rating: 3.25
User: 15, Item: 64, Rating: 0.9224387353614938 User: 19, Item: 1997, Rating: 3.5
User: 15, Item: 206, Rating: 2.35668733389394 User: 19, Item: 2028, Rating: 3.5
User: 15, Item: 249, Rating: 3.1290259851652826 User: 19, Item: 4993, Rating: 3.5
User: 15, Item: 276, Rating: 2.1800017354806753 User: 19, Item: 5952, Rating: 3.5
User: 15, Item: 369, Rating: 2.3082373858282694 User: 19, Item: 2700, Rating: 3.5
User: 15, Item: 504, Rating: 2.2600496220227573 User: 19, Item: 1721, Rating: 3.5
User: 15, Item: 515, Rating: 3.6575674086958188 User: 19, Item: 1214, Rating: 3.5
User: 15, Item: 522, Rating: 2.4562020809509626 User: 19, Item: 364, Rating: 3.5
User: 15, Item: 580, Rating: 1.9073310817298395 User: 23, Item: 1997, Rating: 2.5625
User: 15, Item: 599, Rating: 2.780847470837928 User: 23, Item: 2700, Rating: 2.5625
User: 15, Item: 915, Rating: 2.761094249104645 User: 27, Item: 1997, Rating: 4.666666666666667
User: 15, Item: 966, Rating: 2.693661971830986 User: 27, Item: 2028, Rating: 4.666666666666667
User: 15, Item: 1274, Rating: 2.9873500196382845 User: 27, Item: 5952, Rating: 4.666666666666667
User: 15, Item: 1299, Rating: 3.0779327239728005 User: 27, Item: 2700, Rating: 4.666666666666667
User: 15, Item: 1345, Rating: 2.2037629856623138 User: 27, Item: 1721, Rating: 4.666666666666667
User: 15, Item: 1354, Rating: 2.001877412379849 User: 27, Item: 364, Rating: 4.666666666666667
User: 15, Item: 532, Rating: 2.7123071345260277 User: 27, Item: 4308, Rating: 4.666666666666667
%% Cell type:markdown id:c5209097 tags: %% Cell type:markdown id:c5209097 tags:
Quelque soit les neighbours (1,2,3) la valeur du ratings ne change pas Quelque soit les neighbours (1,2,3) la valeur du ratings ne change pas
%% Cell type:markdown id:c8890e11 tags: %% Cell type:markdown id:c8890e11 tags:
1).Predictions with min_k = 1: In this case, the model makes predictions without considering any minimum number of neighbors. Each prediction is made solely based on the similarity between the target user and other users who have rated the same items. Consequently, we observe varying prediction values for different items. For instance, for user 15 and item 942, the predicted rating is 3.777, while for item 64, the predicted rating is only 0.922. This indicates that the model heavily relies on the ratings from users who may have rated only a single item in common with the target user, leading to potentially erratic predictions. 1).Predictions with min_k = 1: In this case, the model makes predictions without considering any minimum number of neighbors. Each prediction is made solely based on the similarity between the target user and other users who have rated the same items. Consequently, we observe varying prediction values for different items. For instance, for user 15 and item 942, the predicted rating is 3.777, while for item 64, the predicted rating is only 0.922. This indicates that the model heavily relies on the ratings from users who may have rated only a single item in common with the target user, leading to potentially erratic predictions.
2). Predictions with min_k = 2: Here, a minimum of 2 neighbors are required to make a prediction. This introduces a bit of regularization, ensuring that predictions are made based on a slightly broader consensus. We notice that the predictions are somewhat similar to those with min_k = 1, but there are slight changes in some ratings. For example, the rating for item 5054 changes from 3.010 to 2.694. This suggests that the model is slightly more conservative in its predictions due to the requirement of at least two neighbors. 2). Predictions with min_k = 2: Here, a minimum of 2 neighbors are required to make a prediction. This introduces a bit of regularization, ensuring that predictions are made based on a slightly broader consensus. We notice that the predictions are somewhat similar to those with min_k = 1, but there are slight changes in some ratings. For example, the rating for item 5054 changes from 3.010 to 2.694. This suggests that the model is slightly more conservative in its predictions due to the requirement of at least two neighbors.
3). Predictions with min_k = 3: With a minimum of 3 neighbors, the model becomes even more conservative. It requires a stronger consensus among users before making predictions. As a result, we see more uniformity in the predicted ratings compared to the previous cases. For example, for item 6322, the prediction changes from 1.711 (min_k = 1) to 2.694 (min_k = 2) and finally to 2.694 again (min_k = 3). This indicates that the model is increasingly cautious as it demands more agreement among neighbors before making predictions 3). Predictions with min_k = 3: With a minimum of 3 neighbors, the model becomes even more conservative. It requires a stronger consensus among users before making predictions. As a result, we see more uniformity in the predicted ratings compared to the previous cases. For example, for item 6322, the prediction changes from 1.711 (min_k = 1) to 2.694 (min_k = 2) and finally to 2.694 again (min_k = 3). This indicates that the model is increasingly cautious as it demands more agreement among neighbors before making predictions
%% Cell type:code id:cc806424 tags: %% Cell type:code id:cc806424 tags:
``` python ``` python
def analyse_min_support(knn_model, testset): def analyse_min_support(knn_model, testset):
# Rétablir min_k à 2 # Rétablir min_k à 2
knn_model.min_k = 2 knn_model.min_k = 2
# Modifier min_support de 1 à 3 et observer actual_k # Modifier min_support de 1 à 3 et observer actual_k
for min_support in range(1, 4): for min_support in range(1, 4):
knn_model.sim_options['min_support'] = min_support knn_model.sim_options['min_support'] = min_support
predictions_min_support = knn_model.test(testset[:30]) # Prendre les 30 premières prédictions pour l'affichage predictions_min_support = knn_model.test(testset[:30]) # Prendre les 30 premières prédictions pour l'affichage
print(f"\nPrédictions avec min_support = {min_support}:") print(f"\nPrédictions avec min_support = {min_support}:")
for prediction in predictions_min_support: for prediction in predictions_min_support:
actual_k = prediction.details['actual_k'] actual_k = prediction.details['actual_k']
print(f"User: {prediction.uid}, Item: {prediction.iid}, Actual_k: {actual_k}") print(f"User: {prediction.uid}, Item: {prediction.iid}, Actual_k: {actual_k}")
# Visualiser la matrice de similarité # Visualiser la matrice de similarité
similarity_matrix = knn_model.sim # Algorithme de knn_model similarity_matrix = knn_model.sim # Algorithme de knn_model
print("\nMatrice de similarité:") print("\nMatrice de similarité:")
print(similarity_matrix) print(similarity_matrix)
# Appel de la fonction et impression de l'analyse # Appel de la fonction et impression de l'analyse
result = analyse_min_support(knn_model, testset) result = analyse_min_support(knn_model, testset)
print(result) print(result)
``` ```
%% Output %% Output
Prédictions avec min_support = 1: Prédictions avec min_support = 1:
User: 15, Item: 942, Actual_k: 3 User: 11, Item: 1214, Actual_k: 1
User: 15, Item: 2117, Actual_k: 3 User: 11, Item: 364, Actual_k: 2
User: 15, Item: 2672, Actual_k: 3 User: 11, Item: 4308, Actual_k: 1
User: 15, Item: 5054, Actual_k: 1 User: 11, Item: 527, Actual_k: 2
User: 15, Item: 6322, Actual_k: 2 User: 13, Item: 1997, Actual_k: 0
User: 15, Item: 6323, Actual_k: 3 User: 13, Item: 4993, Actual_k: 1
User: 15, Item: 6757, Actual_k: 1 User: 13, Item: 2700, Actual_k: 0
User: 15, Item: 7700, Actual_k: 2 User: 13, Item: 1721, Actual_k: 1
User: 15, Item: 7981, Actual_k: 3 User: 13, Item: 527, Actual_k: 1
User: 15, Item: 8600, Actual_k: 2 User: 17, Item: 2028, Actual_k: 2
User: 15, Item: 8620, Actual_k: 3 User: 17, Item: 4993, Actual_k: 2
User: 15, Item: 31952, Actual_k: 2 User: 17, Item: 1214, Actual_k: 1
User: 15, Item: 3, Actual_k: 3 User: 17, Item: 4308, Actual_k: 1
User: 15, Item: 64, Actual_k: 3 User: 19, Item: 1997, Actual_k: 0
User: 15, Item: 206, Actual_k: 3 User: 19, Item: 2028, Actual_k: 0
User: 15, Item: 249, Actual_k: 3 User: 19, Item: 4993, Actual_k: 0
User: 15, Item: 276, Actual_k: 3 User: 19, Item: 5952, Actual_k: 0
User: 15, Item: 369, Actual_k: 3 User: 19, Item: 2700, Actual_k: 0
User: 15, Item: 504, Actual_k: 3 User: 19, Item: 1721, Actual_k: 0
User: 15, Item: 515, Actual_k: 3 User: 19, Item: 1214, Actual_k: 0
User: 15, Item: 522, Actual_k: 3 User: 19, Item: 364, Actual_k: 0
User: 15, Item: 580, Actual_k: 3 User: 23, Item: 1997, Actual_k: 2
User: 15, Item: 599, Actual_k: 3 User: 23, Item: 2700, Actual_k: 2
User: 15, Item: 915, Actual_k: 3 User: 27, Item: 1997, Actual_k: 0
User: 15, Item: 966, Actual_k: 1 User: 27, Item: 2028, Actual_k: 1
User: 15, Item: 1274, Actual_k: 3 User: 27, Item: 5952, Actual_k: 1
User: 15, Item: 1299, Actual_k: 3 User: 27, Item: 2700, Actual_k: 0
User: 15, Item: 1345, Actual_k: 3 User: 27, Item: 1721, Actual_k: 1
User: 15, Item: 1354, Actual_k: 3 User: 27, Item: 364, Actual_k: 1
User: 15, Item: 532, Actual_k: 3 User: 27, Item: 4308, Actual_k: 1
Prédictions avec min_support = 2: Prédictions avec min_support = 2:
User: 15, Item: 942, Actual_k: 3 User: 11, Item: 1214, Actual_k: 1
User: 15, Item: 2117, Actual_k: 3 User: 11, Item: 364, Actual_k: 2
User: 15, Item: 2672, Actual_k: 3 User: 11, Item: 4308, Actual_k: 1
User: 15, Item: 5054, Actual_k: 1 User: 11, Item: 527, Actual_k: 2
User: 15, Item: 6322, Actual_k: 2 User: 13, Item: 1997, Actual_k: 0
User: 15, Item: 6323, Actual_k: 3 User: 13, Item: 4993, Actual_k: 1
User: 15, Item: 6757, Actual_k: 1 User: 13, Item: 2700, Actual_k: 0
User: 15, Item: 7700, Actual_k: 2 User: 13, Item: 1721, Actual_k: 1
User: 15, Item: 7981, Actual_k: 3 User: 13, Item: 527, Actual_k: 1
User: 15, Item: 8600, Actual_k: 2 User: 17, Item: 2028, Actual_k: 2
User: 15, Item: 8620, Actual_k: 3 User: 17, Item: 4993, Actual_k: 2
User: 15, Item: 31952, Actual_k: 2 User: 17, Item: 1214, Actual_k: 1
User: 15, Item: 3, Actual_k: 3 User: 17, Item: 4308, Actual_k: 1
User: 15, Item: 64, Actual_k: 3 User: 19, Item: 1997, Actual_k: 0
User: 15, Item: 206, Actual_k: 3 User: 19, Item: 2028, Actual_k: 0
User: 15, Item: 249, Actual_k: 3 User: 19, Item: 4993, Actual_k: 0
User: 15, Item: 276, Actual_k: 3 User: 19, Item: 5952, Actual_k: 0
User: 15, Item: 369, Actual_k: 3 User: 19, Item: 2700, Actual_k: 0
User: 15, Item: 504, Actual_k: 3 User: 19, Item: 1721, Actual_k: 0
User: 15, Item: 515, Actual_k: 3 User: 19, Item: 1214, Actual_k: 0
User: 15, Item: 522, Actual_k: 3 User: 19, Item: 364, Actual_k: 0
User: 15, Item: 580, Actual_k: 3 User: 23, Item: 1997, Actual_k: 2
User: 15, Item: 599, Actual_k: 3 User: 23, Item: 2700, Actual_k: 2
User: 15, Item: 915, Actual_k: 3 User: 27, Item: 1997, Actual_k: 0
User: 15, Item: 966, Actual_k: 1 User: 27, Item: 2028, Actual_k: 1
User: 15, Item: 1274, Actual_k: 3 User: 27, Item: 5952, Actual_k: 1
User: 15, Item: 1299, Actual_k: 3 User: 27, Item: 2700, Actual_k: 0
User: 15, Item: 1345, Actual_k: 3 User: 27, Item: 1721, Actual_k: 1
User: 15, Item: 1354, Actual_k: 3 User: 27, Item: 364, Actual_k: 1
User: 15, Item: 532, Actual_k: 3 User: 27, Item: 4308, Actual_k: 1
Prédictions avec min_support = 3: Prédictions avec min_support = 3:
User: 15, Item: 942, Actual_k: 3 User: 11, Item: 1214, Actual_k: 1
User: 15, Item: 2117, Actual_k: 3 User: 11, Item: 364, Actual_k: 2
User: 15, Item: 2672, Actual_k: 3 User: 11, Item: 4308, Actual_k: 1
User: 15, Item: 5054, Actual_k: 1 User: 11, Item: 527, Actual_k: 2
User: 15, Item: 6322, Actual_k: 2 User: 13, Item: 1997, Actual_k: 0
User: 15, Item: 6323, Actual_k: 3 User: 13, Item: 4993, Actual_k: 1
User: 15, Item: 6757, Actual_k: 1 User: 13, Item: 2700, Actual_k: 0
User: 15, Item: 7700, Actual_k: 2 User: 13, Item: 1721, Actual_k: 1
User: 15, Item: 7981, Actual_k: 3 User: 13, Item: 527, Actual_k: 1
User: 15, Item: 8600, Actual_k: 2 User: 17, Item: 2028, Actual_k: 2
User: 15, Item: 8620, Actual_k: 3 User: 17, Item: 4993, Actual_k: 2
User: 15, Item: 31952, Actual_k: 2 User: 17, Item: 1214, Actual_k: 1
User: 15, Item: 3, Actual_k: 3 User: 17, Item: 4308, Actual_k: 1
User: 15, Item: 64, Actual_k: 3 User: 19, Item: 1997, Actual_k: 0
User: 15, Item: 206, Actual_k: 3 User: 19, Item: 2028, Actual_k: 0
User: 15, Item: 249, Actual_k: 3 User: 19, Item: 4993, Actual_k: 0
User: 15, Item: 276, Actual_k: 3 User: 19, Item: 5952, Actual_k: 0
User: 15, Item: 369, Actual_k: 3 User: 19, Item: 2700, Actual_k: 0
User: 15, Item: 504, Actual_k: 3 User: 19, Item: 1721, Actual_k: 0
User: 15, Item: 515, Actual_k: 3 User: 19, Item: 1214, Actual_k: 0
User: 15, Item: 522, Actual_k: 3 User: 19, Item: 364, Actual_k: 0
User: 15, Item: 580, Actual_k: 3 User: 23, Item: 1997, Actual_k: 2
User: 15, Item: 599, Actual_k: 3 User: 23, Item: 2700, Actual_k: 2
User: 15, Item: 915, Actual_k: 3 User: 27, Item: 1997, Actual_k: 0
User: 15, Item: 966, Actual_k: 1 User: 27, Item: 2028, Actual_k: 1
User: 15, Item: 1274, Actual_k: 3 User: 27, Item: 5952, Actual_k: 1
User: 15, Item: 1299, Actual_k: 3 User: 27, Item: 2700, Actual_k: 0
User: 15, Item: 1345, Actual_k: 3 User: 27, Item: 1721, Actual_k: 1
User: 15, Item: 1354, Actual_k: 3 User: 27, Item: 364, Actual_k: 1
User: 15, Item: 532, Actual_k: 3 User: 27, Item: 4308, Actual_k: 1
Matrice de similarité: Matrice de similarité:
[[1. 0.39130435 0.35942029 ... 0.24358974 0.28513238 0.21451104] [[1. 0. 0.24615385 0. 0.43243243 0. ]
[0.39130435 1. 0.32786885 ... 0.30967742 0.42424242 0.21621622] [0. 1. 0. 0. 0.17094017 0. ]
[0.35942029 0.32786885 1. ... 0.36666667 0.72727273 0.34375 ] [0.24615385 0. 1. 0. 0.53333333 0. ]
... [0. 0. 0. 1. 0. 0. ]
[0.24358974 0.30967742 0.36666667 ... 1. 0.6779661 0.37569061] [0.43243243 0.17094017 0.53333333 0. 1. 0.25 ]
[0.28513238 0.42424242 0.72727273 ... 0.6779661 1. 0.83333333] [0. 0. 0. 0. 0.25 1. ]]
[0.21451104 0.21621622 0.34375 ... 0.37569061 0.83333333 1. ]]
None None
%% Cell type:markdown id:2dd01f5b tags: %% Cell type:markdown id:2dd01f5b tags:
# 3. Implement and explore a customizable user-based algorithm # 3. Implement and explore a customizable user-based algorithm
Create a self-made user-based algorithm allowing to customize the similarity metric, peer group calculation and aggregation function Create a self-made user-based algorithm allowing to customize the similarity metric, peer group calculation and aggregation function
%% Cell type:code id:d03ed9eb tags: %% Cell type:code id:d03ed9eb tags:
``` python ``` python
class UserBased(AlgoBase): class UserBased(AlgoBase):
def __init__(self, k=3, min_k=1, sim_options={}, **kwargs): def __init__(self, k=3, min_k=1, sim_options={}, **kwargs):
AlgoBase.__init__(self, sim_options=sim_options, **kwargs) AlgoBase.__init__(self, sim_options=sim_options, **kwargs)
self.k = k self.k = k
self.min_k = min_k self.min_k = min_k
self.sim_options = sim_options self.sim_options = sim_options
def fit(self, trainset): def fit(self, trainset):
AlgoBase.fit(self, trainset) AlgoBase.fit(self, trainset)
self.compute_rating_matrix() self.compute_rating_matrix()
self.compute_similarity_matrix() self.compute_similarity_matrix()
self.compute_mean_ratings() self.compute_mean_ratings()
def estimate(self, u, i): def estimate(self, u, i):
if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)): if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
raise PredictionImpossible('User and/or item is unknown.') raise PredictionImpossible('User and/or item is unknown.')
estimate = self.mean_ratings[u] estimate = self.mean_ratings[u]
# Step 1: Create the peer group of user u for item i # Step 1: Create the peer group of user u for item i
peer_group = [] peer_group = []
for j, rating in enumerate(self.trainset.ir[i]): for j, rating in enumerate(self.trainset.ir[i]):
if rating is not None: if rating is not None:
similarity = self.sim[u, j] # Similarity between user u and user j for item i similarity = self.sim[u, j] # Similarity between user u and user j for item i
peer_group.append((j, similarity, rating)) peer_group.append((j, similarity, rating))
# Step 2: Pick up the top neighbors efficiently # Step 2: Pick up the top neighbors efficiently
k_neighbors = heapq.nlargest(self.min_k, peer_group, key=lambda x: x[1]) # Top k neighbors based on similarity k_neighbors = heapq.nlargest(self.min_k, peer_group, key=lambda x: x[1]) # Top k neighbors based on similarity
# Step 3: Compute the weighted average # Step 3: Compute the weighted average
actual_k = len(k_neighbors) actual_k = len(k_neighbors)
if actual_k >= self.min_k: if actual_k >= self.min_k:
weighted_sum = 0 weighted_sum = 0
total_similarity = 0 total_similarity = 0
for j, similarity, rating_list in k_neighbors: for j, similarity, rating_list in k_neighbors:
# Assuming rating_list is a list or array containing ratings # Assuming rating_list is a list or array containing ratings
rating = rating_list[0] # Access the first element of the rating list rating = rating_list[0] # Access the first element of the rating list
weighted_sum += similarity * rating weighted_sum += similarity * rating
total_similarity += similarity total_similarity += similarity
if total_similarity != 0: if total_similarity != 0:
peer_group_average = weighted_sum / total_similarity peer_group_average = weighted_sum / total_similarity
estimate += peer_group_average estimate += peer_group_average
return estimate return estimate
def compute_rating_matrix(self): def compute_rating_matrix(self):
# Get the number of users and items # Get the number of users and items
n_users = self.trainset.n_users n_users = self.trainset.n_users
n_items = self.trainset.n_items n_items = self.trainset.n_items
ratings_matrix = np.empty((n_users, n_items)) ratings_matrix = np.empty((n_users, n_items))
ratings_matrix[:] = np.nan ratings_matrix[:] = np.nan
# Fill in the ratings matrix with available ratings # Fill in the ratings matrix with available ratings
for user_id, user_ratings in self.trainset.ur.items(): for user_id, user_ratings in self.trainset.ur.items():
if user_ratings: # Check if user has ratings if user_ratings: # Check if user has ratings
for item_id, rating in user_ratings: for item_id, rating in user_ratings:
ratings_matrix[user_id, item_id] = rating ratings_matrix[user_id, item_id] = rating
# Set the computed ratings matrix to self.ratings_matrix # Set the computed ratings matrix to self.ratings_matrix
self.ratings_matrix = ratings_matrix self.ratings_matrix = ratings_matrix
def compute_similarity_matrix(self): def compute_similarity_matrix(self):
# Get the number of users # Get the number of users
n_users = self.trainset.n_users n_users = self.trainset.n_users
# Initialize the similarity matrix with zeros and ones in the diagonal # Initialize the similarity matrix with zeros and ones in the diagonal
similarity_matrix = np.eye(n_users) similarity_matrix = np.eye(n_users)
# Iterate through pairs of users to compute similarities # Iterate through pairs of users to compute similarities
for i in range(n_users): for i in range(n_users):
for j in range(i + 1, n_users): for j in range(i + 1, n_users):
# Compute support # Compute support
support = np.sum(~np.isnan(self.ratings_matrix[i]) & ~np.isnan(self.ratings_matrix[j])) support = np.sum(~np.isnan(self.ratings_matrix[i]) & ~np.isnan(self.ratings_matrix[j]))
# Check if support is greater than or equal to min_k # Check if support is greater than or equal to min_k
if support >= self.min_k: if support >= self.min_k:
# Compute similarity using Jaccard similarity # Compute similarity using Jaccard similarity
intersection = np.sum(~np.isnan(self.ratings_matrix[i]) & ~np.isnan(self.ratings_matrix[j])) intersection = np.sum(~np.isnan(self.ratings_matrix[i]) & ~np.isnan(self.ratings_matrix[j]))
union = np.sum(~np.isnan(self.ratings_matrix[i]) | ~np.isnan(self.ratings_matrix[j])) union = np.sum(~np.isnan(self.ratings_matrix[i]) | ~np.isnan(self.ratings_matrix[j]))
similarity = intersection / union similarity = intersection / union
similarity_matrix[i, j] = similarity similarity_matrix[i, j] = similarity
similarity_matrix[j, i] = similarity # Similarity matrix is symmetric similarity_matrix[j, i] = similarity # Similarity matrix is symmetric
# Set the computed similarity matrix to self.sim # Set the computed similarity matrix to self.sim
self.sim = similarity_matrix self.sim = similarity_matrix
def compute_mean_ratings(self): def compute_mean_ratings(self):
# Compute the mean rating of every user # Compute the mean rating of every user
mean_ratings = [] mean_ratings = []
for user_id, ratings in self.trainset.ur.items(): for user_id, ratings in self.trainset.ur.items():
if ratings: # Check if user has ratings if ratings: # Check if user has ratings
mean_rating = np.mean([rating[1] for rating in ratings]) mean_rating = np.mean([rating[1] for rating in ratings])
mean_ratings.append(mean_rating) mean_ratings.append(mean_rating)
else: else:
mean_ratings.append(0) # If no ratings available, set mean to 0 mean_ratings.append(0) # If no ratings available, set mean to 0
# Set the computed mean ratings # Set the computed mean ratings
self.mean_ratings = mean_ratings self.mean_ratings = mean_ratings
user_based_instance = UserBased(trainset=trainset) user_based_instance = UserBased(trainset=trainset)
# Appel de la méthode fit pour calculer les matrices des évaluations, de similarité et les moyennes des évaluations # Appel de la méthode fit pour calculer les matrices des évaluations, de similarité et les moyennes des évaluations
user_based_instance.fit(trainset) user_based_instance.fit(trainset)
# Affichage de la matrice des évaluations # Affichage de la matrice des évaluations
print(user_based_instance.ratings_matrix) print(user_based_instance.ratings_matrix)
``` ```
%% Output %% Output
[[3. 1.5 4. ... nan nan nan] [[1.5 4. 5. 4.5 3. 1. nan nan nan nan]
[nan nan nan ... nan nan nan] [nan 2. nan 2. nan nan 1. 5. 4. nan]
[4. 3. 3. ... nan nan nan] [5. nan nan 4.5 3. 1. nan 1.5 nan 4.5]
... [nan nan nan nan nan nan nan nan 2. 5. ]
[4.5 nan nan ... nan nan nan] [nan 3. 3. 4. nan 1. 3. 2.5 1. 3. ]
[nan nan nan ... nan nan nan] [nan nan 5. nan nan nan 4. nan nan 5. ]]
[2. nan nan ... nan nan nan]]
%% Cell type:markdown id:dfdc9cfe tags: %% Cell type:markdown id:dfdc9cfe tags:
# 4. Compare KNNWithMeans with UserBased # 4. Compare KNNWithMeans with UserBased
Try to replicate KNNWithMeans with your self-made UserBased and check that outcomes are identical Try to replicate KNNWithMeans with your self-made UserBased and check that outcomes are identical
%% Cell type:code id:be53ae27 tags: %% Cell type:code id:be53ae27 tags:
``` python ``` python
# 1. Obtain Predictions # 1. Obtain Predictions
# Using UserBased algorithm # Using UserBased algorithm
user_based_predictions = [] user_based_predictions = []
for uid, iid, true_r in testset: for uid, iid, true_r in testset:
user_based_pred = user_based_instance.predict(uid, iid) user_based_pred = user_based_instance.predict(uid, iid)
user_based_predictions.append((uid, iid, true_r, user_based_pred.est, {})) user_based_predictions.append((uid, iid, true_r, user_based_pred.est, {}))
# Using KNNWithMeans algorithm # Using KNNWithMeans algorithm
knn_predictions = [] knn_predictions = []
for uid, iid, true_r in testset: for uid, iid, true_r in testset:
knn_pred = knn_model.predict(uid, iid) knn_pred = knn_model.predict(uid, iid)
knn_predictions.append((uid, iid, true_r, knn_pred.est, knn_pred.details)) knn_predictions.append((uid, iid, true_r, knn_pred.est, knn_pred.details))
# 2. Calculate Metrics # 2. Calculate Metrics
# Calculate MAE and RMSE for UserBased algorithm # Calculate MAE and RMSE for UserBased algorithm
user_based_mae = accuracy.mae(user_based_predictions, verbose=False) user_based_mae = accuracy.mae(user_based_predictions, verbose=False)
user_based_rmse = accuracy.rmse(user_based_predictions, verbose=False) user_based_rmse = accuracy.rmse(user_based_predictions, verbose=False)
# Calculate MAE and RMSE for KNNWithMeans algorithm # Calculate MAE and RMSE for KNNWithMeans algorithm
knn_mae = accuracy.mae(knn_predictions, verbose=False) knn_mae = accuracy.mae(knn_predictions, verbose=False)
knn_rmse = accuracy.rmse(knn_predictions, verbose=False) knn_rmse = accuracy.rmse(knn_predictions, verbose=False)
# 3. Compare Results # 3. Compare Results
print("UserBased MAE:", user_based_mae) print("UserBased MAE:", user_based_mae)
print("UserBased RMSE:", user_based_rmse) print("UserBased RMSE:", user_based_rmse)
print("KNNWithMeans MAE:", knn_mae) print("KNNWithMeans MAE:", knn_mae)
print("KNNWithMeans RMSE:", knn_rmse) print("KNNWithMeans RMSE:", knn_rmse)
``` ```
%% Output %% Output
UserBased MAE: 1.5398252671298895 UserBased MAE: 1.7175000000000002
UserBased RMSE: 1.5553141029705104 UserBased RMSE: 1.7384170241918369
KNNWithMeans MAE: 0.5419110316300769 KNNWithMeans MAE: 0.661617428851614
KNNWithMeans RMSE: 0.7019543155680094 KNNWithMeans RMSE: 0.8426896111887758
%% Cell type:markdown id:cced76d9 tags: %% Cell type:markdown id:cced76d9 tags:
# 5. Compare MSD and Jacard # 5. Compare MSD and Jacard
Compare predictions made with MSD similarity and Jacard similarity Compare predictions made with MSD similarity and Jacard similarity
%% Cell type:code id:c20d8e19 tags: %% Cell type:code id:c20d8e19 tags:
``` python ``` python
from surprise import accuracy from surprise import accuracy
from surprise.model_selection import train_test_split from surprise.model_selection import train_test_split
from surprise import Dataset, Reader from surprise import Dataset, Reader
from surprise import KNNBasic from surprise import KNNBasic
# Split the dataset into training and testing sets # Split the dataset into training and testing sets
trainset, testset = train_test_split(surprise_data, test_size=0.2) trainset, testset = train_test_split(surprise_data, test_size=0.2)
# Initialize the model with MSD similarity # Initialize the model with MSD similarity
sim_options_msd = {'name': 'msd'} sim_options_msd = {'name': 'msd'}
user_based_msd = KNNBasic(sim_options=sim_options_msd) user_based_msd = KNNBasic(sim_options=sim_options_msd)
user_based_msd.fit(trainset) user_based_msd.fit(trainset)
# Initialize the model with Jacard similarity # Initialize the model with Jacard similarity
sim_options_jaccard = {'name': 'cosine'} sim_options_jaccard = {'name': 'cosine'}
user_based_jaccard = KNNBasic(sim_options=sim_options_jaccard) user_based_jaccard = KNNBasic(sim_options=sim_options_jaccard)
user_based_jaccard.fit(trainset) user_based_jaccard.fit(trainset)
# Make predictions with each model on the test set # Make predictions with each model on the test set
predictions_msd = user_based_msd.test(testset) predictions_msd = user_based_msd.test(testset)
predictions_jaccard = user_based_jaccard.test(testset) predictions_jaccard = user_based_jaccard.test(testset)
# Calculate and display the performances of the two models # Calculate and display the performances of the two models
rmse_msd = accuracy.rmse(predictions_msd) rmse_msd = accuracy.rmse(predictions_msd)
rmse_jaccard = accuracy.rmse(predictions_jaccard) rmse_jaccard = accuracy.rmse(predictions_jaccard)
print("RMSE with MSD similarity:", rmse_msd) print("RMSE with MSD similarity:", rmse_msd)
print("RMSE with Jaccard similarity:", rmse_jaccard) print("RMSE with Jaccard similarity:", rmse_jaccard)
``` ```
%% Output %% Output
Computing the msd similarity matrix... Computing the msd similarity matrix...
Done computing similarity matrix. Done computing similarity matrix.
Computing the cosine similarity matrix... Computing the cosine similarity matrix...
Done computing similarity matrix. Done computing similarity matrix.
RMSE: 0.9799 RMSE: 1.0812
RMSE: 0.9871 RMSE: 1.0910
RMSE with MSD similarity: 0.9798533097556152 RMSE with MSD similarity: 1.0811758629789194
RMSE with Jaccard similarity: 0.9870653791755158 RMSE with Jaccard similarity: 1.0910225374454734
......
0% Chargement en cours ou .
You are about to add 0 people to the discussion. Proceed with caution.
Terminez d'abord l'édition de ce message.
Veuillez vous inscrire ou vous pour commenter