#jsDisabledContent { display:none; } My Account | Register | Help

# Netflix Prize

Article Id: WHEBN0009399111
Reproduction Date:

 Title: Netflix Prize Author: World Heritage Encyclopedia Language: English Subject: Collection: Publisher: World Heritage Encyclopedia Publication Date:

### Netflix Prize

The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest.

The competition was held by Netflix, an online DVD-rental and online video streaming service, and was open to anyone not connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) or a resident of Cuba, Iran, Syria, North Korea, Burma or Sudan.[1] On 21 September 2009, the grand prize of US\$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.[2]

## Contents

• Problem and data sets 1
• Prizes 2
• Progress over the years 3
• 2007 Progress Prize 3.1
• 2008 Progress Prize 3.2
• 2009 3.3
• Cancelled sequel 4
• Privacy concerns 5
• References 7

## Problem and data sets

Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integral) stars.[3]

The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are only informed of the score for half of the data, the quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties.

In summary, the data used in the Netflix Prize looks as follows:

• Training set (99,072,112 ratings not including the probe set, 100,480,507 including the probe set)
• Probe set (1,408,395 ratings)
• Qualifying set (2,817,131 ratings) consisting of:
• Test set (1,408,789 ratings), used to determine winners
• Quiz set (1,408,342 ratings), used to calculate leaderboard scores

For each movie, title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of customers, "some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates".[2]

The training set is such that the average user rated over 200 movies, and the average movie was rated by over 5000 users. But there is wide variance in the data—some movies in the training set have as few as 3 ratings,[4] while one user rated over 17,000 movies.[5]

There was some controversy as to the choice of RMSE as the defining metric. Would a reduction of the RMSE by 10% really benefit the users? It has been claimed that even as small an improvement as 1% RMSE results in a significant difference in the ranking of the "top-10" most recommended movies for a user.[6]

## Prizes

Prizes were based on improvement over Netflix's own algorithm, called Cinematch, or the previous year's score if a team has made improvement beyond a certain threshold. A trivial algorithm that predicts for each movie in the quiz set its average grade from the training data produces an RMSE of 1.0540. Cinematch uses "straightforward statistical linear models with a lot of data conditioning".[7]

Using only the training data, Cinematch scores an RMSE of 0.9514 on the quiz data, roughly a 10% improvement over the trivial algorithm. Cinematch has a similar performance on the test set, 0.9525. In order to win the grand prize of \$1,000,000, a participating team had to improve this by another 10%, to achieve 0.8572 on the test set.[2] Such an improvement on the quiz set corresponds to an RMSE of 0.8563.

As long as no team won the grand prize, a progress prize of \$50,000 was awarded every year for the best result thus far. However, in order to win this prize, an algorithm had to improve the RMSE on the quiz set by at least 1% over the previous progress prize winner (or over Cinematch, the first year). If no submission succeeded, the progress prize was not to be awarded for that year.

To win a progress or grand prize a participant had to provide source code and a description of the algorithm to the jury within one week after being contacted by them. Following verification the winner also had to provide a non-exclusive license to Netflix. Netflix would publish only the description, not the source code, of the system. A team could choose to not claim a prize, in order to keep their algorithm and source code secret. The jury also kept their predictions secret from other participants. A team could send as many attempts to predict grades as they wish. Originally submissions were limited to once a week, but the interval was quickly modified to once a day. A team's best submission so far counted as their current submission.

Once one of the teams succeeded to improve the RMSE by 10% or more, the jury would issue a last call, giving all teams 30 days to send their submissions. Only then, the team with best submission was asked for the algorithm description, source code, and non-exclusive license, and, after successful verification; declared a grand prize winner.

The contest would last until the grand prize winner was declared. Had no one received the grand prize, it would have lasted for at least five years (until October 2, 2011). After that date, the contest could have been terminated at any time at Netflix's sole discretion.

## Progress over the years

The competition began on October 2, 2006. By October 8, a team called WXYZConsulting had already beaten Cinematch's results.[8]

By October 15, there were three teams who had beaten Cinematch, one of them by 1.06%, enough to qualify for the annual progress prize.[9] By June 2007 over 20,000 teams had registered for the competition from over 150 countries. 2,000 teams had submitted over 13,000 prediction sets.[3]

Over the first year of the competition, a handful of front-runners traded first place. The more prominent ones were:[10]

• WXYZConsulting, a team of Wei Xu and Yi Zhang. (A front runner during Nov-Dec 2006.)
• ML@UToronto A, a team from the University of Toronto led by Prof. Geoffrey Hinton. (A front runner during parts of Oct-Dec 2006.)
• Gravity, a team of four scientists from the Budapest University of Technology (A front runner during Jan-May 2007.)
• BellKor, a group of scientists from AT&T Labs. (A front runner since May 2007.)

On August 12, 2007, many contestants gathered at the KDD Cup and Workshop 2007, held at San Jose, California.[11] During the workshop all four of the top teams on the leaderboard at that time presented their techniques. The team from IBM Research — Yan Liu, Saharon Rosset, Claudia Perlich, and Zhenzhen Kou — won the third place in Task 1 and first place in Task 2.

Over the second year of the competition, only three teams reached the leading position:

• BellKor, a group of scientists from AT&T Labs. (front runner during May 2007 - Sept 2008.)
• BigChaos, a team of Austrian scientists from commendo research & consulting (single team front runner since Oct 2008)
• BellKor in BigChaos, a joint team of the two leading single teams (A front runner since Sept. 2008)

### 2007 Progress Prize

On September 2, 2007, the competition entered the "last call" period for the 2007 Progress Prize. Teams had thirty days to tender submissions for consideration. At the beginning of this period the leading team was BellKor, with an RMSE of 0.8728 (8.26% improvement). followed by Dinosaur Planet (RMSE = 0.8769; 7.83% improvement), and Gravity (RMSE = 0.8785; 7.66% improvement). In the last hour of the last call period, an entry by "KorBell" took first place. This turned out to be an alternate name for Team BellKor.require('Module:No globals')

local p = {}

-- articles in which traditional Chinese preceeds simplified Chinese local t1st = { ["228 Incident"] = true, ["Chinese calendar"] = true, ["Lippo Centre, Hong Kong"] = true, ["Republic of China"] = true, ["Republic of China at the 1924 Summer Olympics"] = true, ["Taiwan"] = true, ["Taiwan (island)"] = true, ["Taiwan Province"] = true, ["Wei Boyang"] = true, }

-- the labels for each part local labels = { ["c"] = "Chinese", ["s"] = "simplified Chinese", ["t"] = "traditional Chinese", ["p"] = "pinyin", ["tp"] = "Tongyong Pinyin", ["w"] = "Wade–Giles", ["j"] = "Jyutping", ["cy"] = "Cantonese Yale", ["poj"] = "Pe̍h-ōe-jī", ["zhu"] = "Zhuyin Fuhao", ["l"] = "literally", }

-- article titles for wikilinks for each part local wlinks = { ["c"] = "Chinese language", ["s"] = "simplified Chinese characters", ["t"] = "traditional Chinese characters", ["p"] = "pinyin", ["tp"] = "Tongyong Pinyin", ["w"] = "Wade–Giles", ["j"] = "Jyutping", ["cy"] = "Yale romanization of Cantonese", ["poj"] = "Pe̍h-ōe-jī", ["zhu"] = "Bopomofo", }

-- for those parts which are to be treated as languages their ISO code local ISOlang = { ["c"] = "zh", ["t"] = "zh-Hant", ["s"] = "zh-Hans", ["p"] = "zh-Latn-pinyin", ["tp"] = "zh-Latn", ["w"] = "zh-Latn-wadegile", ["j"] = "yue-jyutping", ["cy"] = "yue", ["poj"] = "hak", ["zhu"] = "zh-Bopo", }

local italic = { ["p"] = true, ["tp"] = true, ["w"] = true, ["j"] = true, ["cy"] = true, ["poj"] = true, } -- Categories for different kinds of Chinese text local cats = { ["c"] = "", ["s"] = "", ["t"] = "", }

function p.Zh(frame) -- load arguments module to simplify handling of args local getArgs = require('Module:Arguments').getArgs local args = getArgs(frame) return p._Zh(args) end function p._Zh(args) local uselinks = not (args["links"] == "no") -- whether to add links local uselabels = not (args["labels"] == "no") -- whether to have labels local capfirst = args["scase"] ~= nil

```        local t1 = false -- whether traditional Chinese characters go first
local j1 = false -- whether Cantonese Romanisations go first
local testChar
if (args["first"]) then
for testChar in mw.ustring.gmatch(args["first"], "%a+") do
if (testChar == "t") then
t1 = true
end
if (testChar == "j") then
j1 = true
end
end
end
if (t1 == false) then
local title = mw.title.getCurrentTitle()
t1 = t1st[title.text] == true
end
```

-- based on setting/preference specify order local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "poj", "zhu", "l"} if (t1) then orderlist[2] = "t" orderlist[3] = "s" end if (j1) then orderlist[4] = "j" orderlist[5] = "cy" orderlist[6] = "p" orderlist[7] = "tp" orderlist[8] = "w" end -- rename rules. Rules to change parameters and labels based on other parameters if args["hp"] then -- hp an alias for p ([hanyu] pinyin) args["p"] = args["hp"] end if args["tp"] then -- if also Tongyu pinyin use full name for Hanyu pinyin labels["p"] = "Hanyu Pinyin" end if (args["s"] and args["s"] == args["t"]) then -- Treat simplified + traditional as Chinese if they're the same args["c"] = args["s"] args["s"] = nil args["t"] = nil elseif (not (args["s"] and args["t"])) then -- use short label if only one of simplified and traditional labels["s"] = labels["c"] labels["t"] = labels["c"] end local body = "" -- the output string local params -- for creating HTML spans local label -- the label, i.e. the bit preceeding the supplied text local val -- the supplied text -- go through all possible fields in loop, adding them to the output for i, part in ipairs(orderlist) do if (args[part]) then -- build label label = "" if (uselabels) then label = labels[part] if (capfirst) then label = mw.language.getContentLanguage():ucfirst(

On November 13, 2007, team KorBell (aka BellKor) was declared the winner of the \$50,000 Progress Prize with an RMSE of 0.8712 (8.43% improvement).[12] The team consisted of three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky.[13] As required, they published a description of their algorithm.[14]

### 2008 Progress Prize

The 2008 Progress Prize was awarded to the team BellKor. Their submission combined with a different team, BigChaos achieved an RMSE of 0.8616 with 207 predictor sets.[15] The joint-team consisted of two researchers from commendo research & consulting GmbH, Andreas Töscher and Michael Jahrer (originally team BigChaos) and three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky (originally team BellKor).[16] As required, they published a description of their algorithm.[17][18]

This was the final Progress Prize because obtaining the required 1% improvement over the 2008 Progress Prize would be sufficient to qualify for the Grand Prize.

### 2009

On June 26, 2009 the team "BellKor's Pragmatic Chaos", a merger of teams "Bellkor in BigChaos" and "Pragmatic Theory", achieved a 10.05% improvement over Cinematch (a Quiz RMSE of 0.8558). The Netflix Prize competition then entered the "last call" period for the Grand Prize. In accord with the Rules, teams had thirty (30) days, until July 26, 2009 18:42:37 UTC, to make submissions that will be considered for this Prize.[19]

On July 25, 2009 the team "The Ensemble", a merger of the teams "Grand Prize Team" and "Opera Solutions and Vandelay United", achieved a 10.09% improvement over Cinematch (a Quiz RMSE of 0.8554).,[20][21]

On July 26, 2009, Netflix stopped gathering submissions for the Netflix Prize contest.[22]

The final standing of the Leaderboard at that time showed that two teams met the minimum requirements for the Grand Prize. "The Ensemble" with a 10.10% improvement over Cinematch on the Qualifying set (a Quiz RMSE of 0.8553), and "BellKor's Pragmatic Chaos" with a 10.09% improvement over Cinematch on the Qualifying set (a Quiz RMSE of 0.8554).[23] The Grand Prize winner was to be the one with the better performance on the Test set.

On September 18, 2009, Netflix announced team "BellKor's Pragmatic Chaos" as the prize winner (a Test RMSE of 0.8567), and the prize was awarded to the team in a ceremony on September 21, 2009.[24] "The Ensemble" team had in fact succeeded to match BellKor's result, but since BellKor submitted their results 20 minutes earlier, the rules award the prize to them.,[21][25]

The joint-team "BellKor's Pragmatic Chaos" consisted of two Austrian researchers from Commendo Research & Consulting GmbH, Andreas Töscher and Michael Jahrer (originally team BigChaos), two researchers from AT&T Labs, Robert Bell, and Chris Volinsky, Yehuda Koren from Yahoo! (originally team BellKor) and two researchers from Pragmatic Theory, Martin Piotte and Martin Chabbert.[26] As required, they published a description of their algorithm.[27]

The team reported to have achieved the "dubious honors" (sic Netflix) of the worst RMSEs on the Quiz and Test data sets from among the 44,014 submissions made by 5,169 teams was "Lanterne Rouge", led by J.M. Linacre, who was also a member of "The Ensemble" team.

## Cancelled sequel

On March 12, 2010, Netflix announced that it would not pursue a second Prize competition that it had announced the previous August. The decision was in response to a lawsuit and Federal Trade Commission privacy concerns.[28]

## Privacy concerns

Although the data sets were constructed to preserve customer privacy, the Prize has been criticized by privacy advocates. In 2007 two researchers from the University of Texas were able to identify individual users by matching the data sets with film ratings on the Internet Movie Database.[29]

On December 17, 2009, four Netflix users filed a class action lawsuit against Netflix, alleging that Netflix had violated U.S. fair trade laws and the Video Privacy Protection Act by releasing the datasets.[30] There was public debate about privacy for research participants. On March 19, 2010, Netflix reached a settlement with the plaintiffs, after which they voluntarily dismissed the lawsuit.[31]

## References

-- Module:Hatnote -- -- -- -- This module produces hatnote links and links to related articles. It -- -- implements the and meta-templates and includes -- -- helper functions for other Lua hatnote modules. --

local libraryUtil = require('libraryUtil') local checkType = libraryUtil.checkType local mArguments -- lazily initialise Module:Arguments local yesno -- lazily initialise Module:Yesno

local p = {}

-- Helper functions

local function getArgs(frame) -- Fetches the arguments from the parent frame. Whitespace is trimmed and -- blanks are removed. mArguments = require('Module:Arguments') return mArguments.getArgs(frame, {parentOnly = true}) end

local function removeInitialColon(s) -- Removes the initial colon from a string, if present. return s:match('^:?(.*)') end

function p.findNamespaceId(link, removeColon) -- Finds the namespace id (namespace number) of a link or a pagename. This -- function will not work if the link is enclosed in double brackets. Colons -- are trimmed from the start of the link by default. To skip colon -- trimming, set the removeColon parameter to true. checkType('findNamespaceId', 1, link, 'string') checkType('findNamespaceId', 2, removeColon, 'boolean', true) if removeColon ~= false then link = removeInitialColon(link) end local namespace = link:match('^(.-):') if namespace then local nsTable = mw.site.namespaces[namespace] if nsTable then return nsTable.id end end return 0 end

function p.formatPages(...) -- Formats a list of pages using formatLink and returns it as an array. Nil -- values are not allowed. local pages = {...} local ret = {} for i, page in ipairs(pages) do ret[i] = p._formatLink(page) end return ret end

function p.formatPageTables(...) -- Takes a list of page/display tables and returns it as a list of -- formatted links. Nil values are not allowed. local pages = {...} local links = {} for i, t in ipairs(pages) do checkType('formatPageTables', i, t, 'table') local link = t[1] local display = t[2] links[i] = p._formatLink(link, display) end return links end

function p.makeWikitextError(msg, helpLink, addTrackingCategory) -- Formats an error message to be returned to wikitext. If -- addTrackingCategory is not false after being returned from -- Module:Yesno, and if we are not on a talk page, a tracking category -- is added. checkType('makeWikitextError', 1, msg, 'string') checkType('makeWikitextError', 2, helpLink, 'string', true) yesno = require('Module:Yesno') local title = mw.title.getCurrentTitle() -- Make the help link text. local helpText if helpLink then helpText = ' (help)' else helpText = end -- Make the category text. local category if not title.isTalkPage and yesno(addTrackingCategory) ~= false then category = 'Hatnote templates with errors' category = string.format( '%s:%s', mw.site.namespaces[14].name, category ) else category = end return string.format( '%s', msg, helpText, category ) end

-- Format link -- -- Makes a wikilink from the given link and display values. Links are escaped -- with colons if necessary, and links to sections are detected and displayed -- with " § " as a separator rather than the standard MediaWiki "#". Used in -- the template.

-- Hatnote -- -- Produces standard hatnote text. Implements the template.

function p.hatnote(frame) local args = getArgs(frame) local s = args[1] local options = {} if not s then return p.makeWikitextError( 'no text specified', 'Template:Hatnote#Errors', args.category ) end options.extraclasses = args.extraclasses options.selfref = args.selfref return p._hatnote(s, options) end

function p._hatnote(s, options) checkType('_hatnote', 1, s, 'string') checkType('_hatnote', 2, options, 'table', true) local classes = {'hatnote'} local extraclasses = options.extraclasses local selfref = options.selfref if type(extraclasses) == 'string' then classes[#classes + 1] = extraclasses end if selfref then classes[#classes + 1] = 'selfref' end return string.format( '
%s
', table.concat(classes, ' '), s )

end

return p-------------------------------------------------------------------------------- -- Module:Hatnote -- -- -- -- This module produces hatnote links and links to related articles. It -- -- implements the and meta-templates and includes -- -- helper functions for other Lua hatnote modules. --

local libraryUtil = require('libraryUtil') local checkType = libraryUtil.checkType local mArguments -- lazily initialise Module:Arguments local yesno -- lazily initialise Module:Yesno

local p = {}

-- Helper functions

local function getArgs(frame) -- Fetches the arguments from the parent frame. Whitespace is trimmed and -- blanks are removed. mArguments = require('Module:Arguments') return mArguments.getArgs(frame, {parentOnly = true}) end

local function removeInitialColon(s) -- Removes the initial colon from a string, if present. return s:match('^:?(.*)') end

function p.findNamespaceId(link, removeColon) -- Finds the namespace id (namespace number) of a link or a pagename. This -- function will not work if the link is enclosed in double brackets. Colons -- are trimmed from the start of the link by default. To skip colon -- trimming, set the removeColon parameter to true. checkType('findNamespaceId', 1, link, 'string') checkType('findNamespaceId', 2, removeColon, 'boolean', true) if removeColon ~= false then link = removeInitialColon(link) end local namespace = link:match('^(.-):') if namespace then local nsTable = mw.site.namespaces[namespace] if nsTable then return nsTable.id end end return 0 end

function p.formatPages(...) -- Formats a list of pages using formatLink and returns it as an array. Nil -- values are not allowed. local pages = {...} local ret = {} for i, page in ipairs(pages) do ret[i] = p._formatLink(page) end return ret end

function p.formatPageTables(...) -- Takes a list of page/display tables and returns it as a list of -- formatted links. Nil values are not allowed. local pages = {...} local links = {} for i, t in ipairs(pages) do checkType('formatPageTables', i, t, 'table') local link = t[1] local display = t[2] links[i] = p._formatLink(link, display) end return links end

function p.makeWikitextError(msg, helpLink, addTrackingCategory) -- Formats an error message to be returned to wikitext. If -- addTrackingCategory is not false after being returned from -- Module:Yesno, and if we are not on a talk page, a tracking category -- is added. checkType('makeWikitextError', 1, msg, 'string') checkType('makeWikitextError', 2, helpLink, 'string', true) yesno = require('Module:Yesno') local title = mw.title.getCurrentTitle() -- Make the help link text. local helpText if helpLink then helpText = ' (help)' else helpText = end -- Make the category text. local category if not title.isTalkPage and yesno(addTrackingCategory) ~= false then category = 'Hatnote templates with errors' category = string.format( '%s:%s', mw.site.namespaces[14].name, category ) else category = end return string.format( '%s', msg, helpText, category ) end

-- Format link -- -- Makes a wikilink from the given link and display values. Links are escaped -- with colons if necessary, and links to sections are detected and displayed -- with " § " as a separator rather than the standard MediaWiki "#". Used in -- the template.

-- Hatnote -- -- Produces standard hatnote text. Implements the template.

function p.hatnote(frame) local args = getArgs(frame) local s = args[1] local options = {} if not s then return p.makeWikitextError( 'no text specified', 'Template:Hatnote#Errors', args.category ) end options.extraclasses = args.extraclasses options.selfref = args.selfref return p._hatnote(s, options) end

function p._hatnote(s, options) checkType('_hatnote', 1, s, 'string') checkType('_hatnote', 2, options, 'table', true) local classes = {'hatnote'} local extraclasses = options.extraclasses local selfref = options.selfref if type(extraclasses) == 'string' then classes[#classes + 1] = extraclasses end if selfref then classes[#classes + 1] = 'selfref' end return string.format( '
%s
', table.concat(classes, ' '), s )

end

return p
1. ^
2. ^ a b c
3. ^ a b
4. ^
5. ^
6. ^
7. ^
8. ^
9. ^
10. ^
11. ^
12. ^
13. ^
14. ^
15. ^
16. ^
17. ^
18. ^
19. ^
20. ^
21. ^ a b
22. ^
23. ^
24. ^
25. ^
26. ^
27. ^
28. ^
29. ^
30. ^ http://www.wired.com/threatlevel/2009/12/netflix-privacy-lawsuit/ Netflix Spilled Your Brokeback Mountain Secret, Lawsuit Claims
31. ^ Netflix Form 10-Q Quarterly Report. Filed Apr 28, 2010