Collaborative filtering¶
A distinction is often made between two forms of data collection for recommendation systems. Explicit feedback relies on the user giving explicit signals about their preferences i.e. review ratings. Where as, implicit feedback refers to non-explicit signals of preference e.g. user watch-time. Traditionally, recommender systems can be split into three types:
Collaborative filtering (CF): CF produces recommendations based on the knowledge of users’ attitudes towards items, that is, it uses the “wisdom of the crowd” to recommend items.
Content-based (CB): CB recommender systems focus on the attributes of the items to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.
Hybrid recommendation systems: Hybrid methods are a combination of CB recommending and CF methods
In many applications, content-based features are not easy to extract, and thus, collaborative filtering approaches are preferred. Thus, we will only explore collaborative filtering methods from now on.
CF methods typically fall into three types, memory-based, model-based and more recently deep-learning based (Su & Khoshgoftaar, 2009, He et al., 2017). Neighbour-based CF and item-based/user-based top-N recommendations are typical examples of memory-based systems that utilises user rating data to compute the similarity between users or items. As mentioned previously, common model-based approaches include Bayesian networks, latent semantic models and markov decision processes. In this investigation, we will utilise a weighted matrix factorization approach. Later on, we will generalize the matrix factorization algorithm via a non-linear neural architecture (a softmax model).
However, there are a number of limitations to our approaches such as the inability to model the order of interactions. For instance, Markov chain algorithms (Rendle et al., 2010) can not only encode the same information as traditional CF methods but also the order in which user’s interacted with the items. Furthermore, the sparsity of the frequency matrix (described later on), makes computations prohibitly expensive in real-world settings, without some optimization.
Quick Links:¶
Setup¶
The next few code cells details the initial preparatory steps needed for the development of our collaborative filtering models, namely importing the required libraries; scaling the ids of users and artists;constructing a indicator variable for presence of user-artist interaction;finding the most assigned tag of an artist.
from __future__ import print_function
import numpy as np
import pandas as pd
import collections
from IPython import display
from matplotlib import pyplot as plt
import sklearn
import sklearn.manifold
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.logging.set_verbosity(tf.logging.ERROR)
# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format
# Install Altair and activate its colab renderer.
print("Installing Altair...")
!pip install git+git://github.com/altair-viz/altair.git
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
alt.renderers.enable('colab')
print("Done installing Altair.")
2021-11-30 10:57:05.120733: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-30 10:57:05.120783: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Installing Altair...
Collecting git+git://github.com/altair-viz/altair.git
Cloning git://github.com/altair-viz/altair.git to /tmp/pip-req-build-6xycq1vs
Running command git clone --filter=blob:none -q git://github.com/altair-viz/altair.git /tmp/pip-req-build-6xycq1vs
Resolved git://github.com/altair-viz/altair.git to commit a987d04e276106f62d4247ea48a1fcead2d06636
Installing build dependencies ... ?25l-
\
|
/
done
?25h Getting requirements to build wheel ... ?25l-
done
?25h Preparing metadata (pyproject.toml) ... ?25l-
done
?25hRequirement already satisfied: jsonschema<4.0,>=3.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.2.0)
Collecting toolz
Downloading toolz-0.11.2-py3-none-any.whl (55 kB)
?25l
|█████▉ | 10 kB 29.8 MB/s eta 0:00:01
|███████████▊ | 20 kB 32.2 MB/s eta 0:00:01
|█████████████████▋ | 30 kB 19.9 MB/s eta 0:00:01
|███████████████████████▌ | 40 kB 13.0 MB/s eta 0:00:01
|█████████████████████████████▍ | 51 kB 7.0 MB/s eta 0:00:01
|████████████████████████████████| 55 kB 4.9 MB/s
?25hRequirement already satisfied: jinja2 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.0.3)
Requirement already satisfied: entrypoints in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (0.3)
Requirement already satisfied: pandas>=0.18 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.3.4)
Requirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.21.4)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (0.18.0)
Requirement already satisfied: importlib-metadata in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (4.8.2)
Requirement already satisfied: setuptools in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (47.1.0)
Requirement already satisfied: attrs>=17.4.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (21.2.0)
Requirement already satisfied: six>=1.11.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (1.16.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2021.3)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jinja2->altair==4.2.0.dev0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from importlib-metadata->jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (3.6.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from importlib-metadata->jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (4.0.0)
Building wheels for collected packages: altair
Building wheel for altair (pyproject.toml) ... ?25l-
\
|
done
?25h Created wheel for altair: filename=altair-4.2.0.dev0-py3-none-any.whl size=812168 sha256=a871318be16a9414a4766ed7a398bb21fe3e2f9d9dffab320bae42401a925f69
Stored in directory: /tmp/pip-ephem-wheel-cache-2s42_sy3/wheels/06/13/e0/5bd72c969fe3954ee1561739e5c58e2ddfe5c10fcdffb12faa
Successfully built altair
Installing collected packages: toolz, altair
Successfully installed altair-4.2.0.dev0 toolz-0.11.2
Done installing Altair.
# NEEDED FOR GOOGLE COLAB
# from google.colab import auth
#from google.colab import drive
# import gspread
# from oauth2client.client import GoogleCredentials
# drive.mount('/content/drive/')
# os.chdir("/content/drive/My Drive/DCU/fouth_year/advanced_machine_learning/music-recommodation-system")
Helper functions
def calculate_sparsity(M):
"""
Computes sparsity of frequency matrix
"""
matrix_size = len((M['userID'].unique())) * len((M['artistID'].unique())) # Number of possible interactions in the matrix
num_plays = len(M['weight']) # Number of weights
sparsity = (float(num_plays/matrix_size))
return sparsity
def build_music_sparse_tensor(music_df):
"""
Args:
music_df: a pd.DataFrame with `userID`, `artistID` and `weight` columns.
Returns:
a tf.SparseTensor representing the feedback matrix.
"""
indices = music_df[['userID', 'artistID']].values
values = music_df['weight'].values
return tf.SparseTensor(
indices=indices,
values=values,
dense_shape=[num_users, num_artist])
def preproces_ids(music_df):
"""
Args:
ratings_df: a pd.DataFrame with `userID`, `artistID` and `weight` columns.
Returns:
a pd.DataFrame where userIDs and artistIDs now start at 1
and end at n and m (defined above), respectively
two dictionary preserving the orginal ids.
"""
unique_user_ids_list = sorted(music_df['userID'].unique())
print(unique_user_ids_list[0])
unique_user_ids = dict(zip(range(0, len(unique_user_ids_list) ),unique_user_ids_list))
unique_user_ids_switched = dict(zip(unique_user_ids_list, range(0, len(unique_user_ids) )))
unique_artist_ids_list = sorted(music_df['artistID'].unique())
unique_artist_ids = dict(zip(range(0, len(unique_artist_ids_list) ),unique_artist_ids_list))
unique_artist_ids_switched = dict(zip(unique_artist_ids_list, range(0, len(unique_artist_ids_list) )))
music_df['userID'] = music_df['userID'].map(unique_user_ids_switched)
music_df['artistID'] = music_df['artistID'].map(unique_artist_ids_switched)
return music_df, unique_user_ids, unique_artist_ids
def split_dataframe(df, holdout_fraction=0.1):
"""Splits a DataFrame into training and test sets.
Args:
df: a dataframe.
holdout_fraction: fraction of dataframe rows to use in the test set.
Returns:
train: dataframe for training
test: dataframe for testing
"""
test = df.sample(frac=holdout_fraction, replace=False)
train = df[~df.index.isin(test.index)]
return train, test
Traditional recommender system development relies on explicit feedback. Many models were designed to tackle this issue as a regression problem. For instance, the input of the model would be a matrix \(F_{nm}\) denoting user’s (m) preference of items (n) on a scale. In the classic movie ratings example, this preference would be users giving a 1-to-5 star rating to different movies.
This dataset contains implicit feedback: that is, observed logs of user interactions with items, in this instance user’s listening counts to artists. However, implicit feedback does not signal negativity, in the same way as a 1-star rating would. In our data, a user could listen to song of an artist a limited number of times. But that does not necessarily mean that the particular user has an aversion to that artist i.e. it could be part of a curated playlist by another user. Therefore, we decide to construct a binary matrix, which has a value of one if the observation is observed (i.e. a listening count has been logged between an artist and a user). Note, a 0 is not used to describe unobserved artist-user interactions. This is for optimization reasons, explained below.
user_artists = pd.read_csv('data/user_artists.dat', sep='\t')
user_artists['weight'] = 1
artists = pd.read_csv('data/artists.dat', sep='\t')
artists.rename({'id':'artistID'}, inplace=True, axis=1)
user_taggedartists = pd.read_csv(r'data/user_taggedartists-timestamps.dat', sep='\t')
user_taggedartists_years = pd.read_csv(r'data/user_taggedartists.dat', sep='\t')
tags = pd.read_csv(open('data/tags.dat', errors='replace'), sep='\t')
user_taggedartists = pd.merge(user_taggedartists, tags, on=['tagID'])
num_users = user_artists.userID.nunique()
num_artist = artists.artistID.nunique()
collab_filter_df = user_artists
Here, we calculate the top 10 tags by popularity. Then, we assign it to a artist, if the artist has a top 10 tag. If an artist’s tags are not in the top 10, we input ‘N/A’. Note, the next cell can take several mintues to compute.
top_10_tags = user_taggedartists['tagValue'].value_counts().index[0:10]
user_taggedartists['top10TagValue'] = None
for index, row in user_taggedartists.iterrows():
if row['tagValue'] in top_10_tags:
user_taggedartists.iloc[index, -1] = row['tagValue']
user_taggedartists.fillna('N/A',inplace=True)
artists = pd.merge(user_taggedartists, artists, on=['artistID'], how='right')[['artistID','name','top10TagValue','tagValue']].fillna('N/A')
artists.groupby(['artistID','name','top10TagValue']).agg(lambda x:x.value_counts().index[0]).reset_index()
artists = artists.drop_duplicates(subset=['artistID'])
assert artists.artistID.nunique() == num_artist
artists.rename({'tagValue':'mostCommonGenre'},axis=1, inplace=True)
We require two matrices or embeddings to compute a similarity measure (one for quires and one for items), but how do we get these two embeddings?
Matrix Factorisation¶
Figure 2: Data flow chart
First, we need to contsruct the feedback matrix \(F \in R^{m \times n}\), where \(m\) is the number of users and \(n\) is the number of artists. The goal is to two generate two lower-dimensional matrices \(U_{mp}\) and \(V_{np}\) ( with \(p << m\) and \(p << n\)), representing latent user and artist components, so that: $\( F \approx UV^\top \)$
First,we attempt to build the frequency matrix for both training and testing data. tf.SparseTensor is used
for efficient representation. Three separate arguments are used to represent a tensor, namely indices, values, dense_shape
, where a value \(A_{ij} = a\) is encoded by setting indices[k] = [i, j]
and values[k] = a
. The last tensor dense_shape
is used to specify the shape of the full underlying matrix. Note, as the indices arguments represent row and columns indices, some pre-processing needs to be performed on artist and user IDs. The IDs should start from 0 and end at \(m-1\) and \(n-1\) for users and artists respectively. Presently, userIDs start at 2. Two dictionaries, orginal_artist_ids
, orginal_user_ids
will preserve the original ids for analysis purposes later on. Assertions and print statements are used to ensure the validity of the transformations.
colab_filter_df, orginal_user_ids, orginal_artist_ids = preproces_ids(collab_filter_df)
2
colab_filter_df.describe()
userID | artistID | weight | |
---|---|---|---|
count | 92834.000 | 92834.000 | 92834.000 |
mean | 944.222 | 3235.737 | 1.000 |
std | 546.751 | 4197.217 | 0.000 |
min | 0.000 | 0.000 | 1.000 |
25% | 470.000 | 430.000 | 1.000 |
50% | 944.000 | 1237.000 | 1.000 |
75% | 1416.000 | 4266.000 | 1.000 |
max | 1891.000 | 17631.000 | 1.000 |
Next, we caulcate the number of unique artists, userids and sparisty of our proposed frequency matrix, before splitting into training and test subsets. Quite a sparse matrix indeed!
print(f'Number of unqiue users are: {collab_filter_df["userID"].nunique()}')
print(f'Number of unqiue artists are: {collab_filter_df["artistID"].nunique()}')
print(f'Sparsity of our frequency matrix: {calculate_sparsity(collab_filter_df)}')
Number of unqiue users are: 1892
Number of unqiue artists are: 17632
Sparsity of our frequency matrix: 0.002782815119924182
collab_filter_df.to_csv('data/test_user_artists.csv',index=False)
frequency_m_train, frequency_m_test = split_dataframe(colab_filter_df)
frequency_m_train_tensor = build_music_sparse_tensor(frequency_m_train)
frequency_m_test_tensor = build_music_sparse_tensor(frequency_m_test)
assert num_users == frequency_m_train_tensor.shape.as_list()[0]
assert num_artist == frequency_m_train_tensor.shape.as_list()[1]
assert num_users == frequency_m_test_tensor.shape.as_list()[0]
assert num_artist == frequency_m_test_tensor.shape.as_list()[1]
Training a Matrix factorization model¶
Per the definition above, \(UV^\top\) approximates \(F\). The Mean Squared Error is used to measure this approximation error. In the notation below, k is used to represent the set of observed listening counts, and K is the number of observed listening counts.
However, rather than computing the full prediction matrix, \(UV^\top\) and gathering the entries in the embeddings (corresponding to the observed listening counts) , we only gather the embeddings of the observers pairs and compute their dot products. Thereby, we reduce the complexity from \(O(NM)\) to \(O(Kp)\) where \(p\) is the embedding dimension. Stochastic gradient descent (SGD) is used to minimize the loss (objective) function. The SDG algorithim cycles through the observed listening binary and caulates the prediction according to the following equation.
Then it updates the user and artist as embeddings as shown in the following equations.
where \(\alpha\) denotes the learning rate. The algorithim continues untill convergence is found.
Other matrix factorization algorithms functions are also commonly used such as Alternating Least Squares (Takács and Tikk, 2012). A modified version of the aforementioned algorithm known as Weighted Alternating Least Squares (WALS) is slower than SDG but can be parallelised. For the purposes of this investigation, we are not particularly concerned with training times/latency requirements so we proceed with SDG.
We also decide to add regularization to our model, to avoid overfitting. Overfitting occurs when the model tries to fit the training dataset to well and does not generalize well to unseen or future data. In the context of artist recommendation, fitting the observed listening counts often emphasizes learning high similarity (between artists with many listeners), but a good embedding representation also requires learning low similarity (between artists with few listeners).
First, we define the two classes (train_matrix_norm
and build_matrix_norm
). The
build_matrix_norm
class computes the necessary pre-processing steps before we train the model such as
specifying the loss metric to optimise and the loss components( e.g. gravity loss for the regularized model) and the
initial artist and user embeddings. The train_matrix_norm
class simply trains the models and outputs figures
detailing the loss metrics and components. The methods build_vanilla()
and build_reg_model()
compute
the necessary pre-processing steps for the non-regularized and regularized model.
### Training a Matrix Factorization model
class train_matrix_norm(object):
"""Simple class that represents a matrix normalisation model"""
def __init__(self, embedding_vars, loss, metrics=None):
"""Initializes a Matrix normalisation model
Args:
embedding_vars: A dictionary of tf.Variables.
loss: A float Tensor. The loss to optimize.
metrics: optional list of dictionaries of Tensors. The metrics in each
dictionary will be plotted in a separate figure during training.
"""
self._embedding_vars = embedding_vars
self._loss = loss
self._metrics = metrics
self._embeddings = {k: None for k in embedding_vars}
self._session = None
@property
def embeddings(self):
"""The embeddings dictionary."""
return self._embeddings
def train(self, num_iterations=100, learning_rate=1.0, plot_results=True,
optimizer=tf.train.GradientDescentOptimizer):
"""Trains the model.
Args:
iterations: number of iterations to run.
learning_rate: optimizer learning rate.
plot_results: whether to plot the results at the end of training.
optimizer: the optimizer to use. Default to SDG
Returns:
The metrics dictionary evaluated at the last iteration.
"""
with self._loss.graph.as_default():
opt = optimizer(learning_rate)
train_op = opt.minimize(self._loss)
local_init_op = tf.group(
tf.variables_initializer(opt.variables()),
tf.local_variables_initializer())
if self._session is None:
self._session = tf.Session()
with self._session.as_default():
self._session.run(tf.global_variables_initializer())
self._session.run(tf.tables_initializer())
tf.train.start_queue_runners()
with self._session.as_default():
local_init_op.run()
iterations = []
metrics = self._metrics or ({},)
metrics_vals = [collections.defaultdict(list) for _ in self._metrics]
# Train and append results.
for i in range(num_iterations + 1):
_, results = self._session.run((train_op, metrics))
if (i % 10 == 0) or i == num_iterations:
print("\r iteration %d: " % i + ", ".join(
["%s=%f" % (k, v) for r in results for k, v in r.items()]),
end='')
iterations.append(i)
for metric_val, result in zip(metrics_vals, results):
for k, v in result.items():
metric_val[k].append(v)
for k, v in self._embedding_vars.items():
self._embeddings[k] = v.eval()
if plot_results:
# Plot the metrics.
num_subplots = len(metrics)+1
fig = plt.figure()
fig.set_size_inches(num_subplots*10, 8)
for i, metric_vals in enumerate(metrics_vals):
ax = fig.add_subplot(1, num_subplots, i+1)
for k, v in metric_vals.items():
ax.plot(iterations, v, label=k)
ax.set_xlim([1, num_iterations])
ax.legend()
return results
class build_matrix_norm():
"""Simple class that builds a matrix normalisation model"""
def __init__(self, listens, embedding_dim=3, regularization_coeff=.1, gravity_coeff=1.,
init_stddev=0.1):
"""Initializes a Matrix normalisation model
Args:
listens: the DataFrame of artist listening counts.
embedding_dim: The dimension of the embedding space.
regularization_coeff: The regularization coefficient lambda.
gravity_coeff: The gravity regularization coefficient lambda_g.
Returns:
A train_matrix_norm object that uses a regularized loss.
"""
self._embedding_vars = embedding_vars
self._loss = loss
self._metrics = metrics
self._embeddings = {k: None for k in embedding_vars}
self._session = None
def sparse_mean_square_error(sparse_listens, user_embeddings, artist_embeddings):
"""
Args:
sparse_listens: A SparseTensor rating matrix, of dense_shape [N, M]
user_embeddings: A dense Tensor U of shape [N, k] where k is the embedding
dimension, such that U_i is the embedding of user i.
artist_embeddings: A dense Tensor V of shape [M, k] where k is the embedding
dimension, such that V_j is the embedding of movie j.
Returns:
A scalar Tensor representing the MSE between the true ratings and the
model's predictions.
"""
predictions = tf.gather_nd(
tf.matmul(user_embeddings, artist_embeddings, transpose_b=True),
sparse_listens.indices)
loss = tf.losses.mean_squared_error(sparse_listens.values, predictions)
return loss
def gravity(U, V):
"""Creates a gravity loss given two embedding matrices."""
return 1. / (U.shape[0].value*V.shape[0].value) * tf.reduce_sum(
tf.matmul(U, U, transpose_a=True) * tf.matmul(V, V, transpose_a=True))
def build_vanilla(embedding_dim=3, init_stddev=1.):
"""performs the necessary preprocessing steps for the regularized model. """
# Initialize the embeddings using a normal distribution.
U = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[0], embedding_dim], stddev=init_stddev))
V = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[1], embedding_dim], stddev=init_stddev))
embeddings = {"userID": U, "artistID": V}
error_train = build_matrix_norm.sparse_mean_square_error(frequency_m_train_tensor, U, V)
error_test = build_matrix_norm.sparse_mean_square_error(frequency_m_test_tensor, U, V)
metrics = {
'train_error': error_train,
'test_error': error_test
}
return train_matrix_norm(embeddings, error_train, [metrics])
def build_reg_model(embedding_dim=3, regularization_coeff=.1, gravity_coeff=1.,
init_stddev=0.1
):
"""performs the necessary preprocessing steps for the regularized model. """
U = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[0], embedding_dim], stddev=init_stddev))
V = tf.Variable(tf.random.normal(
[frequency_m_train_tensor.dense_shape[1], embedding_dim], stddev=init_stddev))
embeddings = {"userID": U, "artistID": V}
error_train = build_matrix_norm.sparse_mean_square_error(frequency_m_train_tensor, U, V)
error_test = build_matrix_norm.sparse_mean_square_error(frequency_m_test_tensor, U, V)
gravity_loss = gravity_coeff * build_matrix_norm.gravity(U, V)
regularization_loss = regularization_coeff * (
tf.reduce_sum(U*U)/U.shape[0].value + tf.reduce_sum(V*V)/V.shape[0].value)
total_loss = error_train + regularization_loss + gravity_loss
losses = {
'train_error_observed': error_train,
'test_error_observed': error_test,
}
loss_components = {
'observed_loss': error_train,
'regularization_loss': regularization_loss,
'gravity_loss': gravity_loss,
}
#embeddings = {"userID": U, "artistID": V}
return train_matrix_norm(embeddings, total_loss, [losses, loss_components])
Vanilla Model (non-regularized)¶
vanilla_model = build_matrix_norm.build_vanilla(embedding_dim=35,init_stddev=.05)
vanilla_model.train(num_iterations=2000, learning_rate=20.)
2021-11-30 10:59:15.516773: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-11-30 10:59:15.516821: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-11-30 10:59:15.516850: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az135-680): /proc/driver/nvidia/version does not exist
2021-11-30 10:59:15.517194: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
iteration 0: train_error=1.000074, test_error=0.999733
iteration 10: train_error=0.998375, test_error=0.999718
iteration 20: train_error=0.996581, test_error=0.999620
iteration 30: train_error=0.994563, test_error=0.999332
iteration 40: train_error=0.992106, test_error=0.998661
iteration 50: train_error=0.988830, test_error=0.997254
iteration 60: train_error=0.984052, test_error=0.994469
iteration 70: train_error=0.976595, test_error=0.989169
iteration 80: train_error=0.964604, test_error=0.979546
iteration 90: train_error=0.945735, test_error=0.963248
iteration 100: train_error=0.918293, test_error=0.938453
iteration 110: train_error=0.883174, test_error=0.905822
iteration 120: train_error=0.843895, test_error=0.868776
iteration 130: train_error=0.803251, test_error=0.830353
iteration 140: train_error=0.761605, test_error=0.791162
iteration 150: train_error=0.718925, test_error=0.751109
iteration 160: train_error=0.676227, test_error=0.710972
iteration 170: train_error=0.635018, test_error=0.672069
iteration 180: train_error=0.596397, test_error=0.635441
iteration 190: train_error=0.560808, test_error=0.601587
iteration 200: train_error=0.528255, test_error=0.570609
iteration 210: train_error=0.498537, test_error=0.542411
iteration 220: train_error=0.471389, test_error=0.516801
iteration 230: train_error=0.446538, test_error=0.493547
iteration 240: train_error=0.423733, test_error=0.472404
iteration 250: train_error=0.402746, test_error=0.453140
iteration 260: train_error=0.383374, test_error=0.435541
iteration 270: train_error=0.365437, test_error=0.419417
iteration 280: train_error=0.348775, test_error=0.404602
iteration 290: train_error=0.333250, test_error=0.390952
iteration 300: train_error=0.318742, test_error=0.378340
iteration 310: train_error=0.305145, test_error=0.366659
iteration 320: train_error=0.292369, test_error=0.355813
iteration 330: train_error=0.280337, test_error=0.345719
iteration 340: train_error=0.268982, test_error=0.336304
iteration 350: train_error=0.258244, test_error=0.327505
iteration 360: train_error=0.248072, test_error=0.319267
iteration 370: train_error=0.238423, test_error=0.311541
iteration 380: train_error=0.229254, test_error=0.304284
iteration 390: train_error=0.220532, test_error=0.297457
iteration 400: train_error=0.212224, test_error=0.291028
iteration 410: train_error=0.204301, test_error=0.284966
iteration 420: train_error=0.196736, test_error=0.279243
iteration 430: train_error=0.189506, test_error=0.273834
iteration 440: train_error=0.182588, test_error=0.268718
iteration 450: train_error=0.175961, test_error=0.263873
iteration 460: train_error=0.169608, test_error=0.259281
iteration 470: train_error=0.163511, test_error=0.254924
iteration 480: train_error=0.157654, test_error=0.250786
iteration 490: train_error=0.152024, test_error=0.246853
iteration 500: train_error=0.146606, test_error=0.243112
iteration 510: train_error=0.141389, test_error=0.239549
iteration 520: train_error=0.136363, test_error=0.236155
iteration 530: train_error=0.131516, test_error=0.232917
iteration 540: train_error=0.126841, test_error=0.229827
iteration 550: train_error=0.122328, test_error=0.226876
iteration 560: train_error=0.117970, test_error=0.224055
iteration 570: train_error=0.113761, test_error=0.221358
iteration 580: train_error=0.109694, test_error=0.218776
iteration 590: train_error=0.105762, test_error=0.216304
iteration 600: train_error=0.101962, test_error=0.213937
iteration 610: train_error=0.098288, test_error=0.211667
iteration 620: train_error=0.094735, test_error=0.209491
iteration 630: train_error=0.091300, test_error=0.207403
iteration 640: train_error=0.087979, test_error=0.205399
iteration 650: train_error=0.084767, test_error=0.203475
iteration 660: train_error=0.081663, test_error=0.201628
iteration 670: train_error=0.078662, test_error=0.199853
iteration 680: train_error=0.075762, test_error=0.198147
iteration 690: train_error=0.072960, test_error=0.196507
iteration 700: train_error=0.070253, test_error=0.194931
iteration 710: train_error=0.067639, test_error=0.193414
iteration 720: train_error=0.065115, test_error=0.191956
iteration 730: train_error=0.062680, test_error=0.190552
iteration 740: train_error=0.060329, test_error=0.189202
iteration 750: train_error=0.058063, test_error=0.187902
iteration 760: train_error=0.055877, test_error=0.186650
iteration 770: train_error=0.053770, test_error=0.185445
iteration 780: train_error=0.051740, test_error=0.184285
iteration 790: train_error=0.049785, test_error=0.183167
iteration 800: train_error=0.047902, test_error=0.182090
iteration 810: train_error=0.046090, test_error=0.181052
iteration 820: train_error=0.044346, test_error=0.180052
iteration 830: train_error=0.042668, test_error=0.179089
iteration 840: train_error=0.041054, test_error=0.178160
iteration 850: train_error=0.039503, test_error=0.177264
iteration 860: train_error=0.038012, test_error=0.176401
iteration 870: train_error=0.036578, test_error=0.175568
iteration 880: train_error=0.035201, test_error=0.174765
iteration 890: train_error=0.033879, test_error=0.173990
iteration 900: train_error=0.032608, test_error=0.173243
iteration 910: train_error=0.031388, test_error=0.172521
iteration 920: train_error=0.030217, test_error=0.171825
iteration 930: train_error=0.029092, test_error=0.171154
iteration 940: train_error=0.028013, test_error=0.170505
iteration 950: train_error=0.026976, test_error=0.169879
iteration 960: train_error=0.025981, test_error=0.169275
iteration 970: train_error=0.025026, test_error=0.168691
iteration 980: train_error=0.024110, test_error=0.168127
iteration 990: train_error=0.023230, test_error=0.167583
iteration 1000: train_error=0.022385, test_error=0.167056
iteration 1010: train_error=0.021575, test_error=0.166548
iteration 1020: train_error=0.020797, test_error=0.166056
iteration 1030: train_error=0.020050, test_error=0.165581
iteration 1040: train_error=0.019332, test_error=0.165122
iteration 1050: train_error=0.018644, test_error=0.164678
iteration 1060: train_error=0.017982, test_error=0.164249
iteration 1070: train_error=0.017347, test_error=0.163833
iteration 1080: train_error=0.016737, test_error=0.163431
iteration 1090: train_error=0.016151, test_error=0.163043
iteration 1100: train_error=0.015588, test_error=0.162667
iteration 1110: train_error=0.015047, test_error=0.162303
iteration 1120: train_error=0.014527, test_error=0.161950
iteration 1130: train_error=0.014028, test_error=0.161609
iteration 1140: train_error=0.013548, test_error=0.161279
iteration 1150: train_error=0.013086, test_error=0.160959
iteration 1160: train_error=0.012642, test_error=0.160649
iteration 1170: train_error=0.012216, test_error=0.160349
iteration 1180: train_error=0.011805, test_error=0.160058
iteration 1190: train_error=0.011411, test_error=0.159776
iteration 1200: train_error=0.011031, test_error=0.159503
iteration 1210: train_error=0.010666, test_error=0.159238
iteration 1220: train_error=0.010314, test_error=0.158982
iteration 1230: train_error=0.009976, test_error=0.158733
iteration 1240: train_error=0.009650, test_error=0.158491
iteration 1250: train_error=0.009337, test_error=0.158257
iteration 1260: train_error=0.009035, test_error=0.158030
iteration 1270: train_error=0.008745, test_error=0.157810
iteration 1280: train_error=0.008465, test_error=0.157596
iteration 1290: train_error=0.008196, test_error=0.157388
iteration 1300: train_error=0.007936, test_error=0.157187
iteration 1310: train_error=0.007686, test_error=0.156991
iteration 1320: train_error=0.007445, test_error=0.156801
iteration 1330: train_error=0.007213, test_error=0.156617
iteration 1340: train_error=0.006989, test_error=0.156437
iteration 1350: train_error=0.006774, test_error=0.156263
iteration 1360: train_error=0.006566, test_error=0.156094
iteration 1370: train_error=0.006366, test_error=0.155930
iteration 1380: train_error=0.006173, test_error=0.155770
iteration 1390: train_error=0.005986, test_error=0.155614
iteration 1400: train_error=0.005807, test_error=0.155463
iteration 1410: train_error=0.005634, test_error=0.155316
iteration 1420: train_error=0.005467, test_error=0.155174
iteration 1430: train_error=0.005306, test_error=0.155035
iteration 1440: train_error=0.005150, test_error=0.154899
iteration 1450: train_error=0.005000, test_error=0.154768
iteration 1460: train_error=0.004855, test_error=0.154640
iteration 1470: train_error=0.004716, test_error=0.154515
iteration 1480: train_error=0.004581, test_error=0.154394
iteration 1490: train_error=0.004451, test_error=0.154276
iteration 1500: train_error=0.004325, test_error=0.154160
iteration 1510: train_error=0.004203, test_error=0.154048
iteration 1520: train_error=0.004086, test_error=0.153939
iteration 1530: train_error=0.003973, test_error=0.153833
iteration 1540: train_error=0.003864, test_error=0.153729
iteration 1550: train_error=0.003758, test_error=0.153628
iteration 1560: train_error=0.003656, test_error=0.153530
iteration 1570: train_error=0.003558, test_error=0.153434
iteration 1580: train_error=0.003462, test_error=0.153340
iteration 1590: train_error=0.003370, test_error=0.153249
iteration 1600: train_error=0.003281, test_error=0.153160
iteration 1610: train_error=0.003195, test_error=0.153073
iteration 1620: train_error=0.003112, test_error=0.152988
iteration 1630: train_error=0.003032, test_error=0.152906
iteration 1640: train_error=0.002954, test_error=0.152825
iteration 1650: train_error=0.002879, test_error=0.152746
iteration 1660: train_error=0.002806, test_error=0.152669
iteration 1670: train_error=0.002736, test_error=0.152594
iteration 1680: train_error=0.002668, test_error=0.152521
iteration 1690: train_error=0.002602, test_error=0.152449
iteration 1700: train_error=0.002538, test_error=0.152379
iteration 1710: train_error=0.002476, test_error=0.152311
iteration 1720: train_error=0.002417, test_error=0.152244
iteration 1730: train_error=0.002359, test_error=0.152179
iteration 1740: train_error=0.002303, test_error=0.152115
iteration 1750: train_error=0.002249, test_error=0.152053
iteration 1760: train_error=0.002196, test_error=0.151992
iteration 1770: train_error=0.002146, test_error=0.151933
iteration 1780: train_error=0.002096, test_error=0.151874
iteration 1790: train_error=0.002049, test_error=0.151817
iteration 1800: train_error=0.002002, test_error=0.151762
iteration 1810: train_error=0.001958, test_error=0.151707
iteration 1820: train_error=0.001914, test_error=0.151654
iteration 1830: train_error=0.001872, test_error=0.151601
iteration 1840: train_error=0.001831, test_error=0.151550
iteration 1850: train_error=0.001792, test_error=0.151500
iteration 1860: train_error=0.001753, test_error=0.151451
iteration 1870: train_error=0.001716, test_error=0.151403
iteration 1880: train_error=0.001680, test_error=0.151356
iteration 1890: train_error=0.001645, test_error=0.151310
iteration 1900: train_error=0.001611, test_error=0.151265
iteration 1910: train_error=0.001578, test_error=0.151221
iteration 1920: train_error=0.001546, test_error=0.151178
iteration 1930: train_error=0.001515, test_error=0.151135
iteration 1940: train_error=0.001485, test_error=0.151094
iteration 1950: train_error=0.001456, test_error=0.151053
iteration 1960: train_error=0.001427, test_error=0.151013
iteration 1970: train_error=0.001400, test_error=0.150974
iteration 1980: train_error=0.001373, test_error=0.150935
iteration 1990: train_error=0.001347, test_error=0.150898
iteration 2000: train_error=0.001321, test_error=0.150861
[{'train_error': 0.0013214338, 'test_error': 0.15086083}]

Regularized model¶
reg_model = build_matrix_norm.build_reg_model(regularization_coeff=0.1, gravity_coeff=1.0, embedding_dim=35,init_stddev=.05)
reg_model.train(num_iterations=2000, learning_rate=20.)
iteration 0: train_error_observed=1.000350, test_error_observed=1.000153, observed_loss=1.000350, regularization_loss=0.017503, gravity_loss=0.000219
iteration 10: train_error_observed=0.998725, test_error_observed=1.000160, observed_loss=0.998725, regularization_loss=0.017096, gravity_loss=0.000209
iteration 20: train_error_observed=0.997115, test_error_observed=1.000111, observed_loss=0.997115, regularization_loss=0.016742, gravity_loss=0.000200
iteration 30: train_error_observed=0.995423, test_error_observed=0.999926, observed_loss=0.995423, regularization_loss=0.016440, gravity_loss=0.000193
iteration 40: train_error_observed=0.993500, test_error_observed=0.999479, observed_loss=0.993500, regularization_loss=0.016190, gravity_loss=0.000186
iteration 50: train_error_observed=0.991094, test_error_observed=0.998540, observed_loss=0.991094, regularization_loss=0.015999, gravity_loss=0.000182
iteration 60: train_error_observed=0.987757, test_error_observed=0.996689, observed_loss=0.987757, regularization_loss=0.015878, gravity_loss=0.000179
iteration 70: train_error_observed=0.982698, test_error_observed=0.993174, observed_loss=0.982698, regularization_loss=0.015855, gravity_loss=0.000179
iteration 80: train_error_observed=0.974601, test_error_observed=0.986725, observed_loss=0.974601, regularization_loss=0.015974, gravity_loss=0.000183
iteration 90: train_error_observed=0.961508, test_error_observed=0.975427, observed_loss=0.961508, regularization_loss=0.016313, gravity_loss=0.000196
iteration 100: train_error_observed=0.941195, test_error_observed=0.957051, observed_loss=0.941195, regularization_loss=0.016978, gravity_loss=0.000226
iteration 110: train_error_observed=0.912536, test_error_observed=0.930352, observed_loss=0.912536, regularization_loss=0.018078, gravity_loss=0.000291
iteration 120: train_error_observed=0.877189, test_error_observed=0.896794, observed_loss=0.877189, regularization_loss=0.019654, gravity_loss=0.000417
iteration 130: train_error_observed=0.838836, test_error_observed=0.860020, observed_loss=0.838836, regularization_loss=0.021630, gravity_loss=0.000629
iteration 140: train_error_observed=0.799914, test_error_observed=0.822673, observed_loss=0.799914, regularization_loss=0.023872, gravity_loss=0.000941
iteration 150: train_error_observed=0.760728, test_error_observed=0.785208, observed_loss=0.760728, regularization_loss=0.026297, gravity_loss=0.001363
iteration 160: train_error_observed=0.721349, test_error_observed=0.747622, observed_loss=0.721349, regularization_loss=0.028880, gravity_loss=0.001911
iteration 170: train_error_observed=0.682586, test_error_observed=0.710569, observed_loss=0.682586, regularization_loss=0.031597, gravity_loss=0.002602
iteration 180: train_error_observed=0.645522, test_error_observed=0.675030, observed_loss=0.645522, regularization_loss=0.034397, gravity_loss=0.003441
iteration 190: train_error_observed=0.610938, test_error_observed=0.641767, observed_loss=0.610938, regularization_loss=0.037213, gravity_loss=0.004423
iteration 200: train_error_observed=0.579174, test_error_observed=0.611158, observed_loss=0.579174, regularization_loss=0.039984, gravity_loss=0.005530
iteration 210: train_error_observed=0.550252, test_error_observed=0.583282, observed_loss=0.550252, regularization_loss=0.042668, gravity_loss=0.006743
iteration 220: train_error_observed=0.524016, test_error_observed=0.558042, observed_loss=0.524016, regularization_loss=0.045239, gravity_loss=0.008041
iteration 230: train_error_observed=0.500236, test_error_observed=0.535254, observed_loss=0.500236, regularization_loss=0.047681, gravity_loss=0.009405
iteration 240: train_error_observed=0.478661, test_error_observed=0.514693, observed_loss=0.478661, regularization_loss=0.049991, gravity_loss=0.010820
iteration 250: train_error_observed=0.459050, test_error_observed=0.496126, observed_loss=0.459050, regularization_loss=0.052168, gravity_loss=0.012269
iteration 260: train_error_observed=0.441181, test_error_observed=0.479332, observed_loss=0.441181, regularization_loss=0.054215, gravity_loss=0.013742
iteration 270: train_error_observed=0.424857, test_error_observed=0.464107, observed_loss=0.424857, regularization_loss=0.056137, gravity_loss=0.015228
iteration 280: train_error_observed=0.409903, test_error_observed=0.450270, observed_loss=0.409903, regularization_loss=0.057941, gravity_loss=0.016716
iteration 290: train_error_observed=0.396168, test_error_observed=0.437664, observed_loss=0.396168, regularization_loss=0.059631, gravity_loss=0.018200
iteration 300: train_error_observed=0.383515, test_error_observed=0.426148, observed_loss=0.383515, regularization_loss=0.061215, gravity_loss=0.019673
iteration 310: train_error_observed=0.371830, test_error_observed=0.415603, observed_loss=0.371830, regularization_loss=0.062699, gravity_loss=0.021130
iteration 320: train_error_observed=0.361010, test_error_observed=0.405923, observed_loss=0.361010, regularization_loss=0.064090, gravity_loss=0.022565
iteration 330: train_error_observed=0.350967, test_error_observed=0.397016, observed_loss=0.350967, regularization_loss=0.065393, gravity_loss=0.023976
iteration 340: train_error_observed=0.341623, test_error_observed=0.388804, observed_loss=0.341623, regularization_loss=0.066615, gravity_loss=0.025358
iteration 350: train_error_observed=0.332910, test_error_observed=0.381215, observed_loss=0.332910, regularization_loss=0.067760, gravity_loss=0.026710
iteration 360: train_error_observed=0.324770, test_error_observed=0.374188, observed_loss=0.324770, regularization_loss=0.068836, gravity_loss=0.028029
iteration 370: train_error_observed=0.317151, test_error_observed=0.367669, observed_loss=0.317151, regularization_loss=0.069845, gravity_loss=0.029315
iteration 380: train_error_observed=0.310006, test_error_observed=0.361612, observed_loss=0.310006, regularization_loss=0.070793, gravity_loss=0.030565
iteration 390: train_error_observed=0.303295, test_error_observed=0.355973, observed_loss=0.303295, regularization_loss=0.071684, gravity_loss=0.031779
iteration 400: train_error_observed=0.296982, test_error_observed=0.350716, observed_loss=0.296982, regularization_loss=0.072522, gravity_loss=0.032955
iteration 410: train_error_observed=0.291032, test_error_observed=0.345807, observed_loss=0.291032, regularization_loss=0.073311, gravity_loss=0.034094
iteration 420: train_error_observed=0.285419, test_error_observed=0.341218, observed_loss=0.285419, regularization_loss=0.074054, gravity_loss=0.035196
iteration 430: train_error_observed=0.280114, test_error_observed=0.336922, observed_loss=0.280114, regularization_loss=0.074754, gravity_loss=0.036260
iteration 440: train_error_observed=0.275095, test_error_observed=0.332895, observed_loss=0.275095, regularization_loss=0.075415, gravity_loss=0.037286
iteration 450: train_error_observed=0.270340, test_error_observed=0.329115, observed_loss=0.270340, regularization_loss=0.076039, gravity_loss=0.038274
iteration 460: train_error_observed=0.265828, test_error_observed=0.325564, observed_loss=0.265828, regularization_loss=0.076629, gravity_loss=0.039225
iteration 470: train_error_observed=0.261543, test_error_observed=0.322225, observed_loss=0.261543, regularization_loss=0.077187, gravity_loss=0.040140
iteration 480: train_error_observed=0.257467, test_error_observed=0.319080, observed_loss=0.257467, regularization_loss=0.077716, gravity_loss=0.041018
iteration 490: train_error_observed=0.253585, test_error_observed=0.316117, observed_loss=0.253585, regularization_loss=0.078218, gravity_loss=0.041861
iteration 500: train_error_observed=0.249884, test_error_observed=0.313321, observed_loss=0.249884, regularization_loss=0.078695, gravity_loss=0.042669
iteration 510: train_error_observed=0.246350, test_error_observed=0.310680, observed_loss=0.246350, regularization_loss=0.079149, gravity_loss=0.043443
iteration 520: train_error_observed=0.242973, test_error_observed=0.308184, observed_loss=0.242973, regularization_loss=0.079581, gravity_loss=0.044183
iteration 530: train_error_observed=0.239741, test_error_observed=0.305823, observed_loss=0.239741, regularization_loss=0.079994, gravity_loss=0.044891
iteration 540: train_error_observed=0.236644, test_error_observed=0.303586, observed_loss=0.236644, regularization_loss=0.080390, gravity_loss=0.045567
iteration 550: train_error_observed=0.233673, test_error_observed=0.301465, observed_loss=0.233673, regularization_loss=0.080769, gravity_loss=0.046213
iteration 560: train_error_observed=0.230820, test_error_observed=0.299453, observed_loss=0.230820, regularization_loss=0.081133, gravity_loss=0.046828
iteration 570: train_error_observed=0.228075, test_error_observed=0.297541, observed_loss=0.228075, regularization_loss=0.081483, gravity_loss=0.047415
iteration 580: train_error_observed=0.225434, test_error_observed=0.295724, observed_loss=0.225434, regularization_loss=0.081821, gravity_loss=0.047973
iteration 590: train_error_observed=0.222887, test_error_observed=0.293995, observed_loss=0.222887, regularization_loss=0.082148, gravity_loss=0.048504
iteration 600: train_error_observed=0.220430, test_error_observed=0.292349, observed_loss=0.220430, regularization_loss=0.082464, gravity_loss=0.049009
iteration 610: train_error_observed=0.218057, test_error_observed=0.290779, observed_loss=0.218057, regularization_loss=0.082772, gravity_loss=0.049488
iteration 620: train_error_observed=0.215761, test_error_observed=0.289282, observed_loss=0.215761, regularization_loss=0.083072, gravity_loss=0.049943
iteration 630: train_error_observed=0.213538, test_error_observed=0.287852, observed_loss=0.213538, regularization_loss=0.083364, gravity_loss=0.050374
iteration 640: train_error_observed=0.211384, test_error_observed=0.286485, observed_loss=0.211384, regularization_loss=0.083650, gravity_loss=0.050782
iteration 650: train_error_observed=0.209294, test_error_observed=0.285177, observed_loss=0.209294, regularization_loss=0.083930, gravity_loss=0.051168
iteration 660: train_error_observed=0.207264, test_error_observed=0.283926, observed_loss=0.207264, regularization_loss=0.084205, gravity_loss=0.051533
iteration 670: train_error_observed=0.205290, test_error_observed=0.282726, observed_loss=0.205290, regularization_loss=0.084476, gravity_loss=0.051877
iteration 680: train_error_observed=0.203370, test_error_observed=0.281576, observed_loss=0.203370, regularization_loss=0.084744, gravity_loss=0.052201
iteration 690: train_error_observed=0.201499, test_error_observed=0.280473, observed_loss=0.201499, regularization_loss=0.085008, gravity_loss=0.052507
iteration 700: train_error_observed=0.199675, test_error_observed=0.279413, observed_loss=0.199675, regularization_loss=0.085269, gravity_loss=0.052794
iteration 710: train_error_observed=0.197895, test_error_observed=0.278394, observed_loss=0.197895, regularization_loss=0.085529, gravity_loss=0.053064
iteration 720: train_error_observed=0.196156, test_error_observed=0.277414, observed_loss=0.196156, regularization_loss=0.085786, gravity_loss=0.053317
iteration 730: train_error_observed=0.194457, test_error_observed=0.276470, observed_loss=0.194457, regularization_loss=0.086043, gravity_loss=0.053554
iteration 740: train_error_observed=0.192794, test_error_observed=0.275561, observed_loss=0.192794, regularization_loss=0.086298, gravity_loss=0.053774
iteration 750: train_error_observed=0.191166, test_error_observed=0.274685, observed_loss=0.191166, regularization_loss=0.086553, gravity_loss=0.053980
iteration 760: train_error_observed=0.189572, test_error_observed=0.273840, observed_loss=0.189572, regularization_loss=0.086808, gravity_loss=0.054172
iteration 770: train_error_observed=0.188008, test_error_observed=0.273024, observed_loss=0.188008, regularization_loss=0.087062, gravity_loss=0.054350
iteration 780: train_error_observed=0.186474, test_error_observed=0.272236, observed_loss=0.186474, regularization_loss=0.087317, gravity_loss=0.054514
iteration 790: train_error_observed=0.184968, test_error_observed=0.271474, observed_loss=0.184968, regularization_loss=0.087572, gravity_loss=0.054665
iteration 800: train_error_observed=0.183488, test_error_observed=0.270737, observed_loss=0.183488, regularization_loss=0.087827, gravity_loss=0.054805
iteration 810: train_error_observed=0.182034, test_error_observed=0.270024, observed_loss=0.182034, regularization_loss=0.088083, gravity_loss=0.054932
iteration 820: train_error_observed=0.180604, test_error_observed=0.269334, observed_loss=0.180604, regularization_loss=0.088341, gravity_loss=0.055048
iteration 830: train_error_observed=0.179197, test_error_observed=0.268665, observed_loss=0.179197, regularization_loss=0.088599, gravity_loss=0.055154
iteration 840: train_error_observed=0.177812, test_error_observed=0.268016, observed_loss=0.177812, regularization_loss=0.088858, gravity_loss=0.055249
iteration 850: train_error_observed=0.176448, test_error_observed=0.267388, observed_loss=0.176448, regularization_loss=0.089118, gravity_loss=0.055334
iteration 860: train_error_observed=0.175103, test_error_observed=0.266777, observed_loss=0.175103, regularization_loss=0.089380, gravity_loss=0.055409
iteration 870: train_error_observed=0.173778, test_error_observed=0.266185, observed_loss=0.173778, regularization_loss=0.089643, gravity_loss=0.055475
iteration 880: train_error_observed=0.172472, test_error_observed=0.265610, observed_loss=0.172472, regularization_loss=0.089908, gravity_loss=0.055533
iteration 890: train_error_observed=0.171183, test_error_observed=0.265050, observed_loss=0.171183, regularization_loss=0.090174, gravity_loss=0.055582
iteration 900: train_error_observed=0.169912, test_error_observed=0.264507, observed_loss=0.169912, regularization_loss=0.090441, gravity_loss=0.055623
iteration 910: train_error_observed=0.168657, test_error_observed=0.263978, observed_loss=0.168657, regularization_loss=0.090709, gravity_loss=0.055656
iteration 920: train_error_observed=0.167418, test_error_observed=0.263464, observed_loss=0.167418, regularization_loss=0.090980, gravity_loss=0.055682
iteration 930: train_error_observed=0.166195, test_error_observed=0.262963, observed_loss=0.166195, regularization_loss=0.091251, gravity_loss=0.055701
iteration 940: train_error_observed=0.164986, test_error_observed=0.262475, observed_loss=0.164986, regularization_loss=0.091524, gravity_loss=0.055714
iteration 950: train_error_observed=0.163793, test_error_observed=0.262001, observed_loss=0.163793, regularization_loss=0.091798, gravity_loss=0.055720
iteration 960: train_error_observed=0.162613, test_error_observed=0.261538, observed_loss=0.162613, regularization_loss=0.092073, gravity_loss=0.055719
iteration 970: train_error_observed=0.161448, test_error_observed=0.261087, observed_loss=0.161448, regularization_loss=0.092350, gravity_loss=0.055713
iteration 980: train_error_observed=0.160296, test_error_observed=0.260647, observed_loss=0.160296, regularization_loss=0.092628, gravity_loss=0.055702
iteration 990: train_error_observed=0.159157, test_error_observed=0.260218, observed_loss=0.159157, regularization_loss=0.092907, gravity_loss=0.055685
iteration 1000: train_error_observed=0.158032, test_error_observed=0.259800, observed_loss=0.158032, regularization_loss=0.093187, gravity_loss=0.055663
iteration 1010: train_error_observed=0.156919, test_error_observed=0.259391, observed_loss=0.156919, regularization_loss=0.093467, gravity_loss=0.055636
iteration 1020: train_error_observed=0.155819, test_error_observed=0.258993, observed_loss=0.155819, regularization_loss=0.093749, gravity_loss=0.055605
iteration 1030: train_error_observed=0.154731, test_error_observed=0.258604, observed_loss=0.154731, regularization_loss=0.094032, gravity_loss=0.055569
iteration 1040: train_error_observed=0.153655, test_error_observed=0.258224, observed_loss=0.153655, regularization_loss=0.094315, gravity_loss=0.055529
iteration 1050: train_error_observed=0.152591, test_error_observed=0.257853, observed_loss=0.152591, regularization_loss=0.094599, gravity_loss=0.055486
iteration 1060: train_error_observed=0.151539, test_error_observed=0.257490, observed_loss=0.151539, regularization_loss=0.094883, gravity_loss=0.055438
iteration 1070: train_error_observed=0.150498, test_error_observed=0.257136, observed_loss=0.150498, regularization_loss=0.095168, gravity_loss=0.055388
iteration 1080: train_error_observed=0.149470, test_error_observed=0.256790, observed_loss=0.149470, regularization_loss=0.095453, gravity_loss=0.055333
iteration 1090: train_error_observed=0.148452, test_error_observed=0.256451, observed_loss=0.148452, regularization_loss=0.095739, gravity_loss=0.055276
iteration 1100: train_error_observed=0.147446, test_error_observed=0.256120, observed_loss=0.147446, regularization_loss=0.096024, gravity_loss=0.055216
iteration 1110: train_error_observed=0.146451, test_error_observed=0.255797, observed_loss=0.146451, regularization_loss=0.096310, gravity_loss=0.055153
iteration 1120: train_error_observed=0.145467, test_error_observed=0.255480, observed_loss=0.145467, regularization_loss=0.096596, gravity_loss=0.055087
iteration 1130: train_error_observed=0.144494, test_error_observed=0.255171, observed_loss=0.144494, regularization_loss=0.096881, gravity_loss=0.055019
iteration 1140: train_error_observed=0.143532, test_error_observed=0.254868, observed_loss=0.143532, regularization_loss=0.097167, gravity_loss=0.054948
iteration 1150: train_error_observed=0.142580, test_error_observed=0.254572, observed_loss=0.142580, regularization_loss=0.097452, gravity_loss=0.054876
iteration 1160: train_error_observed=0.141640, test_error_observed=0.254282, observed_loss=0.141640, regularization_loss=0.097736, gravity_loss=0.054801
iteration 1170: train_error_observed=0.140710, test_error_observed=0.253998, observed_loss=0.140710, regularization_loss=0.098020, gravity_loss=0.054724
iteration 1180: train_error_observed=0.139790, test_error_observed=0.253720, observed_loss=0.139790, regularization_loss=0.098304, gravity_loss=0.054646
iteration 1190: train_error_observed=0.138881, test_error_observed=0.253449, observed_loss=0.138881, regularization_loss=0.098586, gravity_loss=0.054566
iteration 1200: train_error_observed=0.137982, test_error_observed=0.253183, observed_loss=0.137982, regularization_loss=0.098868, gravity_loss=0.054485
iteration 1210: train_error_observed=0.137094, test_error_observed=0.252922, observed_loss=0.137094, regularization_loss=0.099149, gravity_loss=0.054402
iteration 1220: train_error_observed=0.136216, test_error_observed=0.252667, observed_loss=0.136216, regularization_loss=0.099430, gravity_loss=0.054317
iteration 1230: train_error_observed=0.135347, test_error_observed=0.252418, observed_loss=0.135347, regularization_loss=0.099709, gravity_loss=0.054232
iteration 1240: train_error_observed=0.134489, test_error_observed=0.252173, observed_loss=0.134489, regularization_loss=0.099987, gravity_loss=0.054146
iteration 1250: train_error_observed=0.133641, test_error_observed=0.251934, observed_loss=0.133641, regularization_loss=0.100264, gravity_loss=0.054058
iteration 1260: train_error_observed=0.132803, test_error_observed=0.251699, observed_loss=0.132803, regularization_loss=0.100540, gravity_loss=0.053969
iteration 1270: train_error_observed=0.131975, test_error_observed=0.251470, observed_loss=0.131975, regularization_loss=0.100814, gravity_loss=0.053880
iteration 1280: train_error_observed=0.131156, test_error_observed=0.251245, observed_loss=0.131156, regularization_loss=0.101087, gravity_loss=0.053790
iteration 1290: train_error_observed=0.130347, test_error_observed=0.251025, observed_loss=0.130347, regularization_loss=0.101359, gravity_loss=0.053699
iteration 1300: train_error_observed=0.129547, test_error_observed=0.250809, observed_loss=0.129547, regularization_loss=0.101629, gravity_loss=0.053608
iteration 1310: train_error_observed=0.128757, test_error_observed=0.250598, observed_loss=0.128757, regularization_loss=0.101898, gravity_loss=0.053516
iteration 1320: train_error_observed=0.127977, test_error_observed=0.250390, observed_loss=0.127977, regularization_loss=0.102165, gravity_loss=0.053424
iteration 1330: train_error_observed=0.127205, test_error_observed=0.250187, observed_loss=0.127205, regularization_loss=0.102430, gravity_loss=0.053331
iteration 1340: train_error_observed=0.126443, test_error_observed=0.249989, observed_loss=0.126443, regularization_loss=0.102694, gravity_loss=0.053238
iteration 1350: train_error_observed=0.125690, test_error_observed=0.249794, observed_loss=0.125690, regularization_loss=0.102956, gravity_loss=0.053145
iteration 1360: train_error_observed=0.124946, test_error_observed=0.249603, observed_loss=0.124946, regularization_loss=0.103217, gravity_loss=0.053051
iteration 1370: train_error_observed=0.124211, test_error_observed=0.249416, observed_loss=0.124211, regularization_loss=0.103475, gravity_loss=0.052957
iteration 1380: train_error_observed=0.123485, test_error_observed=0.249233, observed_loss=0.123485, regularization_loss=0.103732, gravity_loss=0.052864
iteration 1390: train_error_observed=0.122767, test_error_observed=0.249053, observed_loss=0.122767, regularization_loss=0.103987, gravity_loss=0.052770
iteration 1400: train_error_observed=0.122058, test_error_observed=0.248877, observed_loss=0.122058, regularization_loss=0.104240, gravity_loss=0.052676
iteration 1410: train_error_observed=0.121358, test_error_observed=0.248704, observed_loss=0.121358, regularization_loss=0.104491, gravity_loss=0.052582
iteration 1420: train_error_observed=0.120666, test_error_observed=0.248535, observed_loss=0.120666, regularization_loss=0.104740, gravity_loss=0.052488
iteration 1430: train_error_observed=0.119982, test_error_observed=0.248370, observed_loss=0.119982, regularization_loss=0.104987, gravity_loss=0.052394
iteration 1440: train_error_observed=0.119307, test_error_observed=0.248207, observed_loss=0.119307, regularization_loss=0.105233, gravity_loss=0.052300
iteration 1450: train_error_observed=0.118640, test_error_observed=0.248048, observed_loss=0.118640, regularization_loss=0.105476, gravity_loss=0.052207
iteration 1460: train_error_observed=0.117981, test_error_observed=0.247892, observed_loss=0.117981, regularization_loss=0.105717, gravity_loss=0.052113
iteration 1470: train_error_observed=0.117330, test_error_observed=0.247739, observed_loss=0.117330, regularization_loss=0.105956, gravity_loss=0.052020
iteration 1480: train_error_observed=0.116687, test_error_observed=0.247589, observed_loss=0.116687, regularization_loss=0.106194, gravity_loss=0.051927
iteration 1490: train_error_observed=0.116051, test_error_observed=0.247442, observed_loss=0.116051, regularization_loss=0.106429, gravity_loss=0.051835
iteration 1500: train_error_observed=0.115423, test_error_observed=0.247298, observed_loss=0.115423, regularization_loss=0.106662, gravity_loss=0.051742
iteration 1510: train_error_observed=0.114803, test_error_observed=0.247156, observed_loss=0.114803, regularization_loss=0.106893, gravity_loss=0.051650
iteration 1520: train_error_observed=0.114190, test_error_observed=0.247018, observed_loss=0.114190, regularization_loss=0.107122, gravity_loss=0.051559
iteration 1530: train_error_observed=0.113585, test_error_observed=0.246882, observed_loss=0.113585, regularization_loss=0.107350, gravity_loss=0.051467
iteration 1540: train_error_observed=0.112987, test_error_observed=0.246749, observed_loss=0.112987, regularization_loss=0.107575, gravity_loss=0.051376
iteration 1550: train_error_observed=0.112396, test_error_observed=0.246618, observed_loss=0.112396, regularization_loss=0.107798, gravity_loss=0.051286
iteration 1560: train_error_observed=0.111812, test_error_observed=0.246490, observed_loss=0.111812, regularization_loss=0.108019, gravity_loss=0.051196
iteration 1570: train_error_observed=0.111235, test_error_observed=0.246365, observed_loss=0.111235, regularization_loss=0.108238, gravity_loss=0.051106
iteration 1580: train_error_observed=0.110665, test_error_observed=0.246242, observed_loss=0.110665, regularization_loss=0.108455, gravity_loss=0.051016
iteration 1590: train_error_observed=0.110102, test_error_observed=0.246121, observed_loss=0.110102, regularization_loss=0.108670, gravity_loss=0.050928
iteration 1600: train_error_observed=0.109546, test_error_observed=0.246003, observed_loss=0.109546, regularization_loss=0.108883, gravity_loss=0.050839
iteration 1610: train_error_observed=0.108996, test_error_observed=0.245887, observed_loss=0.108996, regularization_loss=0.109094, gravity_loss=0.050751
iteration 1620: train_error_observed=0.108453, test_error_observed=0.245773, observed_loss=0.108453, regularization_loss=0.109303, gravity_loss=0.050664
iteration 1630: train_error_observed=0.107916, test_error_observed=0.245662, observed_loss=0.107916, regularization_loss=0.109511, gravity_loss=0.050577
iteration 1640: train_error_observed=0.107385, test_error_observed=0.245552, observed_loss=0.107385, regularization_loss=0.109716, gravity_loss=0.050490
iteration 1650: train_error_observed=0.106861, test_error_observed=0.245445, observed_loss=0.106861, regularization_loss=0.109919, gravity_loss=0.050404
iteration 1660: train_error_observed=0.106343, test_error_observed=0.245340, observed_loss=0.106343, regularization_loss=0.110121, gravity_loss=0.050318
iteration 1670: train_error_observed=0.105831, test_error_observed=0.245237, observed_loss=0.105831, regularization_loss=0.110320, gravity_loss=0.050233
iteration 1680: train_error_observed=0.105325, test_error_observed=0.245136, observed_loss=0.105325, regularization_loss=0.110518, gravity_loss=0.050149
iteration 1690: train_error_observed=0.104825, test_error_observed=0.245037, observed_loss=0.104825, regularization_loss=0.110714, gravity_loss=0.050064
iteration 1700: train_error_observed=0.104331, test_error_observed=0.244939, observed_loss=0.104331, regularization_loss=0.110907, gravity_loss=0.049981
iteration 1710: train_error_observed=0.103842, test_error_observed=0.244844, observed_loss=0.103842, regularization_loss=0.111100, gravity_loss=0.049898
iteration 1720: train_error_observed=0.103359, test_error_observed=0.244750, observed_loss=0.103359, regularization_loss=0.111290, gravity_loss=0.049815
iteration 1730: train_error_observed=0.102882, test_error_observed=0.244659, observed_loss=0.102882, regularization_loss=0.111478, gravity_loss=0.049733
iteration 1740: train_error_observed=0.102410, test_error_observed=0.244569, observed_loss=0.102410, regularization_loss=0.111665, gravity_loss=0.049652
iteration 1750: train_error_observed=0.101944, test_error_observed=0.244481, observed_loss=0.101944, regularization_loss=0.111850, gravity_loss=0.049571
iteration 1760: train_error_observed=0.101483, test_error_observed=0.244394, observed_loss=0.101483, regularization_loss=0.112033, gravity_loss=0.049490
iteration 1770: train_error_observed=0.101027, test_error_observed=0.244309, observed_loss=0.101027, regularization_loss=0.112215, gravity_loss=0.049410
iteration 1780: train_error_observed=0.100577, test_error_observed=0.244226, observed_loss=0.100577, regularization_loss=0.112395, gravity_loss=0.049331
iteration 1790: train_error_observed=0.100131, test_error_observed=0.244145, observed_loss=0.100131, regularization_loss=0.112573, gravity_loss=0.049252
iteration 1800: train_error_observed=0.099691, test_error_observed=0.244065, observed_loss=0.099691, regularization_loss=0.112749, gravity_loss=0.049173
iteration 1810: train_error_observed=0.099256, test_error_observed=0.243986, observed_loss=0.099256, regularization_loss=0.112924, gravity_loss=0.049096
iteration 1820: train_error_observed=0.098825, test_error_observed=0.243909, observed_loss=0.098825, regularization_loss=0.113098, gravity_loss=0.049018
iteration 1830: train_error_observed=0.098400, test_error_observed=0.243834, observed_loss=0.098400, regularization_loss=0.113269, gravity_loss=0.048941
iteration 1840: train_error_observed=0.097979, test_error_observed=0.243760, observed_loss=0.097979, regularization_loss=0.113439, gravity_loss=0.048865
iteration 1850: train_error_observed=0.097562, test_error_observed=0.243687, observed_loss=0.097562, regularization_loss=0.113608, gravity_loss=0.048789
iteration 1860: train_error_observed=0.097151, test_error_observed=0.243616, observed_loss=0.097151, regularization_loss=0.113775, gravity_loss=0.048714
iteration 1870: train_error_observed=0.096744, test_error_observed=0.243546, observed_loss=0.096744, regularization_loss=0.113940, gravity_loss=0.048639
iteration 1880: train_error_observed=0.096341, test_error_observed=0.243478, observed_loss=0.096341, regularization_loss=0.114104, gravity_loss=0.048565
iteration 1890: train_error_observed=0.095943, test_error_observed=0.243411, observed_loss=0.095943, regularization_loss=0.114266, gravity_loss=0.048491
iteration 1900: train_error_observed=0.095550, test_error_observed=0.243345, observed_loss=0.095550, regularization_loss=0.114427, gravity_loss=0.048417
iteration 1910: train_error_observed=0.095160, test_error_observed=0.243280, observed_loss=0.095160, regularization_loss=0.114587, gravity_loss=0.048345
iteration 1920: train_error_observed=0.094775, test_error_observed=0.243217, observed_loss=0.094775, regularization_loss=0.114745, gravity_loss=0.048272
iteration 1930: train_error_observed=0.094394, test_error_observed=0.243155, observed_loss=0.094394, regularization_loss=0.114901, gravity_loss=0.048200
iteration 1940: train_error_observed=0.094017, test_error_observed=0.243094, observed_loss=0.094017, regularization_loss=0.115057, gravity_loss=0.048129
iteration 1950: train_error_observed=0.093645, test_error_observed=0.243035, observed_loss=0.093645, regularization_loss=0.115210, gravity_loss=0.048058
iteration 1960: train_error_observed=0.093276, test_error_observed=0.242976, observed_loss=0.093276, regularization_loss=0.115363, gravity_loss=0.047988
iteration 1970: train_error_observed=0.092911, test_error_observed=0.242919, observed_loss=0.092911, regularization_loss=0.115514, gravity_loss=0.047918
iteration 1980: train_error_observed=0.092550, test_error_observed=0.242863, observed_loss=0.092550, regularization_loss=0.115664, gravity_loss=0.047848
iteration 1990: train_error_observed=0.092193, test_error_observed=0.242808, observed_loss=0.092193, regularization_loss=0.115812, gravity_loss=0.047779
iteration 2000: train_error_observed=0.091840, test_error_observed=0.242754, observed_loss=0.091840, regularization_loss=0.115960, gravity_loss=0.047711
[{'train_error_observed': 0.0918399, 'test_error_observed': 0.24275383},
{'observed_loss': 0.0918399,
'regularization_loss': 0.11595964,
'gravity_loss': 0.04771059}]

In both models, we observe a steep loss in train error and test as the model progress. Although, the regularized model has
a higher MSE, both on the training and test set. It must be noted that the quality of recommendation is improved when
regularization is added, which is proven when the artist_neighbors()
function is utilized (detailed down below).
In addition, we observe in the end evaluation section, that the performance of the model is improved when regularization
is added. The test error decreases similarity to the test error, although it plateaus around the 1000 epoch mark. As
expected, the additional loss generated by the regularization functions increases over epochs. For context, we added the
following regularisation terms to our model.
Regularization of the model parameters. This is a common \(\ell_2\) regularization term on the embedding matrices, given by \(r(U, V) = \frac{1}{N} \sum_i \|U_i\|^2 + \frac{1}{M}\sum_j \|V_j\|^2\).
A global prior that pushes the prediction of any pair towards zero, called the gravity term. This is given by \(g(U, V) = \frac{1}{MN} \sum_{i = 1}^N \sum_{j = 1}^M \langle U_i, V_j \rangle^2\)
These terms modifies the “global” loss (as in, the sum of the network loss and the regularization loss) in order to drive the optimization algorithm in desired directions i.e. prevent overfitting.
Evaluating the embeddings¶
We will use two similairty meausres to inspect the robustness of our system:
Dot product: score of artist j \(\langle u, V_j \rangle\).
Cosine angle: score of artist j \(\frac{\langle u, V_j \rangle}{\|u\|\|V_j\|}\).
DOT = 'dot'
COSINE = 'cosine'
def compute_scores(query_embedding, item_embeddings, measure=DOT):
"""Computes the scores of the candidates given a query.
Args:
query_embedding: a vector of shape [k], representing the query embedding.
item_embeddings: a matrix of shape [N, k], such that row i is the embedding
of item i.
measure: a string specifying the similarity measure to be used. Can be
either DOT or COSINE.
Returns:
scores: a vector of shape [N], such that scores[i] is the score of item i.
"""
u = query_embedding
V = item_embeddings
if measure == COSINE:
V = V / np.linalg.norm(V, axis=1, keepdims=True)
u = u / np.linalg.norm(u)
scores = u.dot(V.T)
return scores
def user_recommendations(model,user_id, k=15, measure=DOT, exclude_rated=False):
scores = compute_scores(
model.embeddings["userID"][user_id], model.embeddings["artistID"], measure)
score_key = measure + ' score'
df = pd.DataFrame({
'score': list(scores),
'name': artists.sort_values('artistID', ascending=True)['name'],
'most assigned tag':artists.sort_values('artistID', ascending=True)['mostCommonGenre']
})
return df.sort_values(['score'], ascending=False).head(k)
def artist_neighbors(model, title_substring, measure=DOT, k=6):
# Search for artist ids that match the given substring.
inv_artist_id_mapping = {v: k for k, v in orginal_artist_ids.items()}
ids = artists[artists['name'].str.contains(title_substring)].artistID.values
titles = artists[artists.artistID.isin(ids)]['name'].values
if len(titles) == 0:
raise ValueError("Found no artists with name %s" % title_substring)
print("Nearest neighbors of : %s." % titles[0])
if len(titles) > 1:
print("[Found more than one matching artist. Other candidates: {}]".format(
", ".join(titles[1:])))
artists_id_orginal = ids[0]
asrtists_id_mapped = inv_artist_id_mapping[ids[0]]
scores = compute_scores(
model.embeddings["artistID"][asrtists_id_mapped], model.embeddings["artistID"],
measure)
score_key = measure + ' score'
df = pd.DataFrame({
score_key: list(scores),
'name': artists.sort_values('artistID', ascending=True)['name'],
'most assigned tag':artists.sort_values('artistID', ascending=True)['mostCommonGenre']
})
return df.sort_values([score_key], ascending=False).head(k)
Here, we find the most similar artists to the band the cure. We also include the most assigned tag associated with an artist. The reccomdations are conistent with our domain knowedge of bands similar to the cure.
artist_neighbors(vanilla_model, "The Cure", DOT)
Nearest neighbors of : The Cure.
dot score | name | most assigned tag | |
---|---|---|---|
9437 | 0.541 | The Cure | chillout |
86790 | 0.527 | Yellowcard | rock |
115825 | 0.526 | Shiny Toy Guns | electronic |
49282 | 0.525 | Johnny Cash | new wave |
11826 | 0.525 | Jamiroquai | chillout |
97327 | 0.524 | Scooter | electronic |
artist_neighbors(vanilla_model, "The Cure", COSINE)
Nearest neighbors of : The Cure.
cosine score | name | most assigned tag | |
---|---|---|---|
9437 | 1.000 | The Cure | chillout |
10850 | 0.967 | Placebo | chillout |
14553 | 0.957 | Arctic Monkeys | chillout |
8273 | 0.956 | Radiohead | chillout |
4936 | 0.952 | Depeche Mode | chillout |
31876 | 0.946 | Sigur Rós | chillout |
artist_neighbors(reg_model, "The Cure", DOT)
Nearest neighbors of : The Cure.
dot score | name | most assigned tag | |
---|---|---|---|
16680 | 3.214 | The Beatles | chillout |
12363 | 3.212 | Muse | chillout |
3259 | 3.176 | Coldplay | chillout |
9437 | 3.164 | The Cure | chillout |
18364 | 3.157 | Nirvana | pop |
17472 | 3.116 | The Killers | chillout |
artist_neighbors(reg_model, "The Cure", COSINE)
Nearest neighbors of : The Cure.
cosine score | name | most assigned tag | |
---|---|---|---|
9437 | 1.000 | The Cure | chillout |
32942 | 0.965 | The Smiths | groove |
38968 | 0.962 | U2 | electronic |
10850 | 0.958 | Placebo | chillout |
4936 | 0.958 | Depeche Mode | chillout |
43413 | 0.955 | David Bowie | chillout |
We observe that the dot product tends to recommend more popular artists such as Nirvana and The Beatles, whereas cosine similarity recommends more obscure artists. This is likely due to the fact that the norm of the embedding in matrix factorization is often correlated with popularity. The regularised model seems to output better recommendations as the variation of the most assigned tag attribute is less when compared to the vanilla model. In addition, Marilyn Manson was recommended by the vanilla model in our intial run. We argue that these artists are most dis-similar! However, this observation is subject to change when you run the model, as we initialize the embeddings with a random Gaussian generator.
def artist_embedding_norm(models):
"""Visualizes the norm and number of ratings of the artist embeddings.
Args:
model: A train_matrix_norm object.
"""
if not isinstance(models, list):
models = [models]
df = pd.DataFrame({
'name': artists.sort_values('artistID', ascending=True)['name'].values,
'number of user-artist interactions': user_artists[['artistID','userID']].sort_values('artistID', ascending=True).groupby('artistID').count()['userID'].values,
})
charts = []
brush = alt.selection_interval()
for i, model in enumerate(models):
norm_key = 'norm'+str(i)
df[norm_key] = np.linalg.norm(model.embeddings["artistID"], axis=1)
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x='number of user-artist interactions',
y=norm_key,
color=alt.condition(brush, alt.value('#4c78a8'), alt.value('lightgray'))
).properties(
selection=nearest).add_selection(brush)
text = alt.Chart().mark_text(align='center', dx=5, dy=-5).encode(
x='number of user-artist interactions', y=norm_key,
text=alt.condition(nearest, 'name', alt.value('')))
charts.append(alt.layer(base, text))
return alt.hconcat(*charts, data=df)
artist_embedding_norm(reg_model)
def visualize_movie_embeddings(data, x, y):
genre_filter = alt.selection_multi(fields=['top10TagValue'])
genre_chart = alt.Chart().mark_bar().encode(
x="count()",
y=alt.Y('top10TagValue'),
color=alt.condition(
genre_filter,
alt.Color("top10TagValue:N"),
alt.value('lightgray'))
).properties(height=300, selection=genre_filter)
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x=x,
y=y,
color=alt.condition(genre_filter, "top10TagValue", alt.value("whitesmoke")),
).properties(
width=600,
height=600,
selection=nearest)
text = alt.Chart().mark_text(align='left', dx=5, dy=-5).encode(
x=x,
y=y,
text=alt.condition(nearest, 'name', alt.value('')))
return alt.hconcat(alt.layer(base, text), genre_chart, data=data)
def tsne_movie_embeddings(model):
"""Visualizes the movie embeddings, projected using t-SNE with Cosine measure.
Args:
model: A MFModel object.
"""
tsne = sklearn.manifold.TSNE(
n_components=2, perplexity=40, metric='cosine', early_exaggeration=10.0,
init='pca', verbose=True, n_iter=400)
print('Running t-SNE...')
V_proj = tsne.fit_transform(model.embeddings["artistID"])
artists.loc[:,'x'] = V_proj[:, 0]
artists.loc[:,'y'] = V_proj[:, 1]
return visualize_movie_embeddings(artists, 'x', 'y')
T-distributed stochastic neighbour embedding (t-SNE) is a dimensionality reduction algorithm useful for visualizing high dimensional data. We use this algorithm to visualise the embeddings of the regularized model. Due to the large number of user-submitted tags, we decided to colour-code the top 15 tags, with the rest being labelled as ‘N/A’. Although the sea of orange, indicating’N/A’, makes it difficult to interrupt these results, the regularised model seems to adequately cluster artists of a similar genre.
tsne_movie_embeddings(reg_model)
Running t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 17632 samples in 0.001s...
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
FutureWarning,
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:827: FutureWarning: 'square_distances' has been introduced in 0.24 to help phase out legacy squaring behavior. The 'legacy' setting will be removed in 1.1 (renaming of 0.26), and the default setting will be changed to True. In 1.3, 'square_distances' will be removed altogether, and distances will be squared by default. Set 'square_distances'=True to silence this warning.
FutureWarning,
[t-SNE] Computed neighbors for 17632 samples in 4.931s...
[t-SNE] Computed conditional probabilities for sample 1000 / 17632
[t-SNE] Computed conditional probabilities for sample 2000 / 17632
[t-SNE] Computed conditional probabilities for sample 3000 / 17632
[t-SNE] Computed conditional probabilities for sample 4000 / 17632
[t-SNE] Computed conditional probabilities for sample 5000 / 17632
[t-SNE] Computed conditional probabilities for sample 6000 / 17632
[t-SNE] Computed conditional probabilities for sample 7000 / 17632
[t-SNE] Computed conditional probabilities for sample 8000 / 17632
[t-SNE] Computed conditional probabilities for sample 9000 / 17632
[t-SNE] Computed conditional probabilities for sample 10000 / 17632
[t-SNE] Computed conditional probabilities for sample 11000 / 17632
[t-SNE] Computed conditional probabilities for sample 12000 / 17632
[t-SNE] Computed conditional probabilities for sample 13000 / 17632
[t-SNE] Computed conditional probabilities for sample 14000 / 17632
[t-SNE] Computed conditional probabilities for sample 15000 / 17632
[t-SNE] Computed conditional probabilities for sample 16000 / 17632
[t-SNE] Computed conditional probabilities for sample 17000 / 17632
[t-SNE] Computed conditional probabilities for sample 17632 / 17632
[t-SNE] Mean sigma: 0.179141
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:986: FutureWarning: The PCA initialization in TSNE will change to have the standard deviation of PC1 equal to 1e-4 in 1.2. This will ensure better convergence.
FutureWarning,
[t-SNE] KL divergence after 250 iterations with early exaggeration: 77.060295
[t-SNE] KL divergence after 400 iterations: 2.758283
Demo¶
You can find the most similar artist to a specified artist (that is contained in Last.FM) using the artist_neighbours()
function. Similarily, you can find the top 10 recommendations of a particular userID [0 to 1891] using the user_recommendations()
function. The first argument specifies the desired model, second argument the userID and third the top-k recommendations. Fourth argument represents the similarity measure, either DOT or COSINE (default = DOT, not a string).
user_recommendations(reg_model, 234, 10, COSINE)
score | name | most assigned tag | |
---|---|---|---|
126491 | 0.925 | Bandas Gaúchas - www.DownsMtv.com | N/A |
126539 | 0.922 | Menstruação Anarquika | N/A |
126554 | 0.921 | Moreira da Silva | N/A |
126582 | 0.901 | Validuaté | N/A |
126400 | 0.900 | The Vibrators | punk |
126583 | 0.840 | The Saints | punk |
126513 | 0.825 | Graforréia Xilarmônica | rock |
126540 | 0.814 | The Exploited | uk |
103973 | 0.807 | The Animals | rock |
126451 | 0.807 | Tim Maia | pop |
To further demonstrate the robustness of the system and measure the serendipity of our model, we incorporate the top artists that we listen to on Spotify (i.e. an unknown user). Note, these artists have to also be in the Last.FM dataset. The recommendation system should output similar artists based on it’s artist embeddings. The Spotipy library is used to interact with Spotify’s API. The similarity measure used is the Dot product. Due to the short lived nature of the spotify token and the fact you have to sign into a pop-up to retrieve the authentication token, we simply list our top 5 artists manually. If we did not, jupyter book will stall when attempting to build as it is waiting for our response. However, we provide the code used to retrieve the short-lived token for verification purposes.
"""
import spotipy
from spotipy.oauth2 import SpotifyOAuth
client_id = <insert_your_client_id>
client_secret = <insert your client secret>
redirect_url = '<insert your redirect uri>
scope = "user-top-read user-read-playback-state streaming ugc-image-upload playlist-modify-public"
authenticate_manager = spotipy.oauth2.SpotifyOAuth(client_id = client_id,client_secret = client_secret,redirect_uri =redirect_url,scope =scope,show_dialog = True)
sp = spotipy.Spotify(auth_manager=authenticate_manager)
artists_long = sp.current_user_top_artists(limit=5, time_range="long_term")
"""
top_5_artists =[
'Coldplay',
'Paramore',
'Arctic Monkeys',
'Lily Allen',
'Miley Cyrus'
]
spotify_reccomdations_df = pd.DataFrame()
for artist in top_5_artists:
similar_artist_df = artist_neighbors(reg_model, artist)[['name','dot score']]
spotify_reccomdations_df = pd.concat([spotify_reccomdations_df, similar_artist_df])
spotify_reccomdations_df.sort_values('dot score', ascending=False).head(10)
Nearest neighbors of : Coldplay.
[Found more than one matching artist. Other candidates: Jay-Z & Coldplay, Coldplay/U2]
Nearest neighbors of : Paramore.
[Found more than one matching artist. Other candidates: Paramore攀]
Nearest neighbors of : Arctic Monkeys.
[Found more than one matching artist. Other candidates: Arctic Monkeys vs The Killers]
Nearest neighbors of : Lily Allen.
Nearest neighbors of : Miley Cyrus.
[Found more than one matching artist. Other candidates: Miley Cyrus攀, Demi Lovato Ft. Miley Cyrus Ft. Selena Gomez Ft. Jonas Brothers, Miley Cyrus and Billy Ray Cyrus, Miley Cyrus and John Travolta, Hannah Montana and Miley Cyrus]
name | dot score | |
---|---|---|
3259 | Coldplay | 3.705 |
12363 | Muse | 3.574 |
37842 | Paramore | 3.550 |
24447 | Lily Allen | 3.520 |
17832 | Green Day | 3.485 |
6543 | Lady Gaga | 3.485 |
6543 | Lady Gaga | 3.474 |
17278 | Kings of Leon | 3.471 |
17472 | The Killers | 3.466 |
6543 | Lady Gaga | 3.464 |
We believe these recommendations are good as when our model was given an artist in the top five, it actually recommended other artists in the top five list.
Evaluation Code¶
This is the code needed to produce the in-depth model comparison. As we decided to use different notebooks for different models, the results of this code will be combined and explained later in the book.
## create holdout test set for each user (15 items)
user_artists = pd.read_csv('data/user_artists.dat', sep='\t')
user_ids = []
holdout_artits = []
for user_id in user_artists.userID.unique():
top_15_artists = user_artists[user_artists.userID == user_id].sort_values(by='weight').head(15).artistID.tolist()
if len(top_15_artists) == 15:
holdout_artits.append(top_15_artists)
user_ids.append(user_id)
holdout_df = pd.DataFrame(data={'userID':user_ids,'holdout_artists':holdout_artits})
holdout_df.to_csv('data/evaluation/test-set.csv',index=False)
## Finding the models vanilla, regualrised predection for each user.
def get_top_15_model_predictions(model, measure):
"""Computes the top 15 predictions for a given model
Args:
model: the name of the model
measure: a string specifying the similarity measure to be used. Can be
either DOT or COSINE.
Returns:
predicted_df a dataframe containing userIDs, their top 15 artists by the model, and the correspnding scores.
"""
artist_name_id_dict = dict(zip(artists['name'], artists['artistID']))
user_ids = []
predicted_artists = []
scores_list = []
for new_user_id, orginal_user_id in orginal_user_ids.items():
top_15_names = user_recommendations(model, new_user_id, k=15,measure=measure )['name'].values
top_15_scores = user_recommendations(model, new_user_id, k=15, measure=measure )['score'].values.tolist()
artist_ids = []
for name in top_15_names:
artist_ids.append(artist_name_id_dict[name])
predicted_artists.append(artist_ids)
user_ids.append(orginal_user_id)
scores_list.append(top_15_scores)
predicted_df = pd.DataFrame(data={'userID':user_ids,'predictions_artists':predicted_artists, 'score':scores_list })
return predicted_df
# save the recommended artits into dfs and save them to data/evaluation folder
vanilla_dot_pred= get_top_15_model_predictions(vanilla_model, measure=DOT)
vanilla_cos_pred = get_top_15_model_predictions(vanilla_model, measure=COSINE)
reg_dot_pred= get_top_15_model_predictions(reg_model, measure=DOT)
reg_cos_pred = get_top_15_model_predictions(reg_model, measure=COSINE)
vanilla_dot_pred.to_csv('data/evaluation/vannila_dot_pred.csv',index=False)
vanilla_cos_pred.to_csv('data/evaluation/vanila_cos_pred.csv',index=False)
reg_dot_pred.to_csv('data/evaluation/reg_dot_pred.csv',index=False)
reg_cos_pred.to_csv('data/evaluation/reg_cos_pred.csv',index=False)