Forecasting Likes: Exploring the Algorithms of a Basic Recommendation Engine

A system for recommending items, often called a recommendation engine, enables algorithm creators to predict user preferences from a set of options. Unlike search bars, these engines help users discover items they might not have found otherwise, making them valuable for platforms like Facebook, YouTube, and Amazon.

Two common approaches power recommendation engines. One analyzes the characteristics of items a user likes to suggest similar ones. The other leverages the preferences of other users, calculating similarity scores to recommend items accordingly. These methods can also be combined for a more robust engine. However, choosing an algorithm suitable for the specific problem remains crucial.

This tutorial guides you through building a collaborative, memory-based recommendation engine using basic set operations, math, and Node.js/CoffeeScript. This engine, similar to the second approach mentioned earlier, will suggest movies based on user likes and dislikes. The complete source code is available here.

Sets and Equations

Before diving into implementation, let’s understand the underlying concept. Our engine treats items and users as simple identifiers, disregarding additional item attributes. User similarity is represented by a similarity index, a decimal between -1.0 and 1.0. Similarly, the likelihood of a user liking a movie is expressed as a decimal within the same range.

Our algorithm employs various sets: sets of movies liked and disliked by each user, and sets of users who liked or disliked each movie. Unions and intersections of these sets, along with ordered lists of suggestions and similar users, are generated during recommendation.

To quantify similarity, we’ll use a modified Jaccard index, originally called “coefficient de communauté” by Paul Jaccard. This formula compares two sets, producing a statistic between 0 and 1.0:

It divides the count of shared elements by the total element count in both sets (counting each element only once). Identical sets have a Jaccard index of 1, while sets with no shared elements result in 0.

To compare users, we treat them as identifiers with liked and disliked movie sets. If we focused solely on liked movies, the Jaccard index would suffice:

Here, U1 and U2 represent the users, while L1 and L2 represent their respective sets of liked movies. However, shared dislikes also indicate similarity, leading to a modification:

We now include shared dislikes in the numerator and consider all liked or disliked items in the denominator.

Furthermore, we must account for opposing preferences. Users with opposite tastes shouldn’t have a similarity index of 0:

This extended formula subtracts conflicting likes and dislikes from shared ones, resulting in a range of -1.0 to 1.0. Identical preferences yield 1.0, while completely opposing tastes result in -1.0.

Before implementing our engine, let’s examine one more formula:

This formula calculates the probability (P(U,M)) of user U liking movie M. ZL and ZD represent the sum of similarity indices between U and users who liked or disliked M, respectively. |ML|+|MD| denotes the total number of users who rated M. The result falls between -1.0 and 1.0.

With these formulas, we can start building our recommendation engine.

Building the Recommendation Engine

We’ll create a simple Node.js application with a basic HTML/Bootstrap frontend. The backend, written in CoffeeScript, will handle GET and POST requests. While users exist, we won’t implement complex registration or login mechanisms. We’ll use the Bourne package for persistency, storing data in JSON files, and Express.js to manage routes and handlers.

If you’re new to Node.js, cloning the GitHub repository might be helpful. As with any Node.js project, we begin by creating a package.json file and installing dependencies listed in the “package.json” file (execute “$ npm install”).

Required Node.js packages:

We’ll structure the engine with four CoffeeScript classes under “lib/engine”: Engine, Rater, Similars, and Suggestions. Engine provides a unified API, binding the other three. Rater tracks likes and dislikes (two separate instances). Similars and Suggestions handle similar user calculations and recommended items, respectively.

Tracking Likes and Dislikes

Let’s start with the Rater class:

1
2
3
4
5
6
class Rater
	constructor: (@engine, @kind) ->
	add: (user, item, done) ->
	remove: (user, item, done) ->
	itemsByUser: (user, done) ->
	usersByItem: (item, done) ->

We’ll have separate instances for likes and dislikes. “Rater#add()” records user preferences, while “Rater#remove()” removes them.

Using Bourne, ratings are stored in “./db-#{@kind}.json” (“likes” or “dislikes”). The database is opened when instantiating Rater:

1
2
constructor: (@engine, @kind) ->
	@db = new Bourne "./db-#{@kind}.json"

This simplifies adding ratings:

1
@db.insert user: user, item: item, (err) =>

Removing ratings is similar (“db.delete” replaces “db.insert”). We’ll check for existing entries before adding or removing. After updates, we’ll recalculate similarity indices and generate new suggestions. The “Rater#add()” and “Rater#remove()” methods resemble:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
add: (user, item, done) ->
	@db.find user: user, item: item, (err, res) =>
		if res.length > 0
			return done()

		@db.insert user: user, item: item, (err) =>
			async.series [
				(done) =>
					@engine.similars.update user, done
				(done) =>
					@engine.suggestions.update user, done
			], done

remove: (user, item, done) ->
	@db.delete user: user, item: item, (err) =>
		async.series [
			(done) =>
				@engine.similars.update user, done
			(done) =>
				@engine.suggestions.update user, done
		], done

For simplicity, error handling is omitted here, but it’s crucial in real-world code.

The remaining methods, “Rater#itemsByUser()” and “Rater#usersByItem()”, retrieve items rated by a user and users who rated an item, respectively. For instance, when Rater is instantiated with kind = “likes”, “Rater#itemsByUser()” returns items the user liked.

Finding Similar Users

Next is the Similars class, responsible for computing and tracking similarity indices. It utilizes Rater instances to fetch rated items and calculates the index using our formula.

Similar to Rater, data is stored in “./db-similars.json”, opened during instantiation. The “Similars#byUser()” method retrieves users similar to a given user:

1
@db.findOne user: user, (err, {others}) =>

The core method, “Similars#update()”, takes a user as input, computes similar users, and stores them with their indices. It begins by retrieving the user’s likes and dislikes:

1
2
3
4
5
6
7
async.auto
	userLikes: (done) =>
		@engine.likes.itemsByUser user, done
	userDislikes: (done) =>
		@engine.dislikes.itemsByUser user, done
, (err, {userLikes, userDislikes}) =>
	items = _.flatten([userLikes, userDislikes])

Then, it identifies users who rated these items:

1
2
3
4
5
6
7
8
async.map items, (item, done) =>
	async.map [
		@engine.likes
		@engine.dislikes
	], (rater, done) =>
		rater.usersByItem item, done
	, done
, (err, others) =>

Finally, it calculates and stores the similarity index for each user pair:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
async.map others, (other, done) =>
	async.auto
		otherLikes: (done) =>
			@engine.likes.itemsByUser other, done
		otherDislikes: (done) =>
			@engine.dislikes.itemsByUser other, done
	, (err, {otherLikes, otherDislikes}) =>
		done null,
			user: other
			similarity: (_.intersection(userLikes, otherLikes).length+_.intersection(userDislikes, otherDislikes).length-_.intersection(userLikes, otherDislikes).length-_.intersection(userDislikes, otherLikes).length) / _.union(userLikes, otherLikes, userDislikes, otherDislikes).length

, (err, others) =>
	@db.insert
		user: user
		others: others
	, done

The code snippet above incorporates our Jaccard index variant for calculating similarity.

Generating Recommendations

The Suggestions class handles predictions. It utilizes a Bourne database (“db-suggestions.json”) opened during construction.

The “Suggestions#forUser()” method retrieves computed suggestions:

1
2
3
forUser: (user, done) ->
	@db.findOne user: user, (err, {suggestions}={suggestion: []}) ->
		done null, suggestions

The “Suggestions#update()” method, similar to “Similars#update()”, takes a user as input. It retrieves similar users and unrated items:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
@engine.similars.byUser user, (err, others) =>
	async.auto 
		likes: (done) =>
			@engine.likes.itemsByUser user, done
		dislikes: (done) =>
			@engine.dislikes.itemsByUser user, done
		items: (done) =>
			async.map others, (other, done) =>
				async.map [
					@engine.likes
					@engine.dislikes
				], (rater, done) =>
					rater.itemsByUser other.user, done
				, done
			, done
	, (err, {likes, dislikes, items}) =>
		items = _.difference _.unique(_.flatten items), likes, dislikes

Then, it iterates through each item, calculating the likelihood of the user liking it based on existing data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
@db.delete user: user, (err) =>
async.map items, (item, done) =>
		async.auto
			likers: (done) =>
				@engine.likes.usersByItem item, done
			dislikers: (done) =>
				@engine.dislikes.usersByItem item, done
		, (err, {likers, dislikers}) =>
			numerator = 0
			for other in _.without _.flatten([likers, dislikers]), user
				other = _.findWhere(others, user: other)
				if other?
				numerator += other.similarity

				done null,
					item: item
					weight: numerator / _.union(likers, dislikers).length

	, (err, suggestions) =>

The updated recommendations are saved back to the database:

1
2
3
4
@db.insert
	user: user
	suggestions: suggestions
, done

Exposing the Library API

The Engine class combines these components into a user-friendly API:

1
2
3
4
5
6
class Engine
	constructor: ->
		@likes = new Rater @, 'likes'
		@dislikes = new Rater @, 'dislikes'
		@similars = new Similars @
		@suggestions = new Suggestions @

After instantiating an Engine object:

1
e = new Engine

We can manage likes and dislikes:

1
2
e.likes.add user, item, (err) ->
e.dislikes.add user, item, (err) ->

And update similarity indices and suggestions:

1
2
e.similars.update user, (err) ->
e.suggestions.update user, (err) ->

Finally, we export the Engine class and other classes from their respective “.coffee” files:

1
module.exports = Engine

Then, we export Engine from the package by creating an “index.coffee” file containing:

1
module.exports = require './engine'

Creating the User Interface

To interact with the engine, we’ll build a simple web interface. Our “web.iced” file will spawn an Express app and handle routes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
movies = require './data/movies.json'

Engine = require './lib/engine'
e = new Eengine 

app = express()

app.set 'views', "#{__dirname}/views"
app.set 'view engine', 'jade'

app.route('/refresh')
.post(({query}, res, next) ->
	async.series [
		(done) =>
			e.similars.update query.user, done

		(done) =>
			e.suggestions.update query.user, done

	], (err) =>
		res.redirect "/?user=#{query.user}"
)

app.route('/like')
.post(({query}, res, next) ->
	if query.unset is 'yes'
		e.likes.remove query.user, query.movie, (err) =>
			res.redirect "/?user=#{query.user}"

	else
		e.dislikes.remove query.user, query.movie, (err) =>
			e.likes.add query.user, query.movie, (err) =>
				if err?
					return next err

				res.redirect "/?user=#{query.user}"
)

app.route('/dislike')
.post(({query}, res, next) ->
	if query.unset is 'yes'
		e.dislikes.remove query.user, query.movie, (err) =>
			res.redirect "/?user=#{query.user}"

	else
		e.likes.remove query.user, query.movie, (err) =>
			e.dislikes.add query.user, query.movie, (err) =>
				res.redirect "/?user=#{query.user}"
)

app.route('/')
.get(({query}, res, next) ->
	async.auto
		likes: (done) =>
			e.likes.itemsByUser query.user, done

		dislikes: (done) =>
			e.dislikes.itemsByUser query.user, done

		suggestions: (done) =>
			e.suggestions.forUser query.user, (err, suggestions) =>
				done null, _.map _.sortBy(suggestions, (suggestion) -> -suggestion.weight), (suggestion) =>
					_.findWhere movies, id: suggestion.item

	, (err, {likes, dislikes, suggestions}) =>
		res.render 'index',
			movies: movies
			user: query.user
			likes: likes
			dislikes: dislikes
			suggestions: suggestions[...4]
)

The app handles four routes. The index route ("/") serves the frontend HTML, rendering a Jade template using movie data, username, user preferences, and top suggestions. The template source code is available in the GitHub repository.

The “/like” and “/dislike” routes handle POST requests for recording user preferences, removing conflicting ratings if necessary (e.g., liking a previously disliked item). Users can also “unlike” or “un-dislike” items.

Lastly, the “/refresh” route triggers recommendation regeneration on demand, although this happens automatically after each rating.

Test-drive

To test the application, create a “data/movies.json” file containing movie data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[
	{
		"id": "1",
		"name": "Transformers: Age of Extinction",
		"thumb": {
			"url": "//upload.wikimedia.org/wikipedia/en/7/7f/Inception_ver3.jpg"
		}
	},
// …
]

You can copy the pre-populated example from GitHub repository.

Once everything is set up, start the server:

1
$ npm start

If successful, you should see:

1
Listening on 5000

The prototype relies on a chosen username (visit “http://localhost:5000”) due to the lack of user authentication. After submitting a username, you’ll be redirected to a page with “Recommended Movies” and “All Movies” sections. Initially, recommendations won’t appear due to the lack of data.

Open another browser window, access “http://localhost:5000”, and log in as a different user. Rate some movies, including a few rated by the first user. Return to the first user’s window and rate movies as well. Once both users have rated common items, recommendations should appear.

Improvements

Our prototype engine has room for improvement, especially for large-scale use. While robust solutions already exist, this section highlights areas for enhancement, focusing on conceptual aspects rather than specific implementation details.

Using a real database like Redis instead of our file-based solution is crucial for scalability. Redis’s speed and special capabilities make it suitable for managing set-like data.

Instead of recalculating recommendations in real-time after each rating, we can queue updates and process them in the background, potentially on a timed interval.

Strategically selecting a subset of users for recommendation generation can enhance performance. For instance, in a restaurant recommendation engine, limiting similar users to those in the same geographical area can be beneficial.

Adopting a hybrid approach, combining collaborative and content-based filtering, can improve accuracy. Platforms like Netflix use this strategy, considering both user behavior and movie attributes.

Conclusion

Collaborative memory-based recommendation engines are powerful tools. Our simple prototype demonstrates the fundamental concepts behind them. While not perfect, it lays the groundwork for more sophisticated implementations like Recommendable.

Like many data-driven problems, achieving accurate recommendations hinges on selecting the right algorithms and utilizing relevant content attributes. Hopefully, this article has shed light on the inner workings of collaborative memory-based recommendation engines.