Loading 2 Votes - +

How Much Does iTunes Like My Five-Star Songs?

After hearing one artist played over and over during a shuffled play of your entire music library in iTunes you may think your player has a preference of its own. Apple claims the iTunes’ shuffle algorithm is completely random.1 The shuffle algorithm chooses songs "without replacement." In other words, much like going through a shuffled deck of cards, you will hear each song only once until you have heard them all… or until you have stopped the player or selected a different playlist.

iTunes Party Shuffle2 is a different matter. Its algorithm selects songs "with replacement," meaning the entire deck of cards is reshuffled after each song is played. The play higher rated songs more often option does exactly what it says, but how much preference is given to higher rated songs?

To test the option’s preference for 5-stars, I created a short playlist of six songs: one from each different star rating and a song left un-rated. The songs were from the same genre and artist and were changed to be only one second in duration. After resetting the play count to zero, I hit play and left my desk for the weekend. To satisfy a little more curiosity, I ran the same songs once more on a different weekend without selecting the option to play higher rated songs more often. Monday morning the play counts were as shown in Table 1.


’’’Table 1.‘’’ The play higher rated songs more often option showed a distinct change in rating play counts. | border | align: center

The play counts in the random trial were very close to each other, as can be expected with a random selection. For the rating-biased trial the preference algorithm appears to be linear from 12% to 27% for the rated songs. Moving from the 5-star rating downward, the linear preference declines around 4% with each step down in rating, but doubles over the drop from 1-star to unrated with a fall of 8%. While one star may seem like the lowest rating, no-rating proved the black sheep of the lot.


’’’Figure 1.‘’’ The unrated play count dropped well below the liner bias iTunes showed for rated songs. | border | align: center

Changing the number of songs within each rating will change these probabilities. With multiple songs of each rating, the chance of a song with rating r coming up next in the ratings-biased party shuffle can be calculated using the equation in Figure 2.


’’’Figure 2.‘’’ The chance of a song of certain rating playing next; where x = number of songs with each rating, P = rating biased preference, and subscript = star rating. | border | align: center

With iTunes’ preference probabilities for each rating determined from the trial, the resulting equation is:


’’’Figure 3.‘’’ The chance of a song of certain rating playing next in iTunes Party Shuffle. | border | align: center

Although the higher rated songs are given preference, you will not definitively hear more 5-star rated songs than all other ratings. Most people follow a bell shaped curve for their ratings, with the 3-star rating being the most common. Table 2 displays a hypothetical iTunes library with this bell shaped curve for the rating song count. Figure 4 displays the resulting probabilites after running these hypothetical numbers through the equations above.


’’’Table 2.‘’’ Song counts within a typical rating distribution. | border | align: center


’’’Figure 4.‘’’ Probability of a rating playing next is greatly determined by song count. | border | align: center

As you can see in Figure 4, the chance of a rating coming up next in the playlist is greatly determined by the song count within the rating. The iTunes preference for higher rated songs and dislike for lower rated songs only slightly lowers or raises the probability determined first from the song count.

These chances of hearing a certain rating can be applied to find the chances of hearing a particular song. If we remove the song count from the numerator in Figure 3 we can calculate the chance of a certain song coming up next, not just the rating.


’’’Figure 5.‘’’ The chance of one particular song playing next. | border | align: center

About a month after running these tests, I noticed my iTunes party shuffle at work played the same song two times in a row. This was the first time I had noticed a consecutive repeat and I checked the playlist. Not only did I find Nirvana’s Territorial Pissings listed twice in a row, but AFI’s Death of Seasons was listed twice in a row three tracks later. I use the play higher rated songs more often option, but these were each middle-of-the-road 3-star songs in my song library of nearly 4000. The odds may seem outrageous at first, but not if you consider just how many songs you hear throughout a workday. If I average ten hours at work each day and a 3.5 minute song duration, odds say I should hear another consecutive repeat in less than a month.

Many claim to still see patterns as iTunes rambles through their music collection, but the majority of these patterns are simply multiple songs from the same artist. Think of it this way: If you have 2000 songs and 40 of them are from the same artist, there is always a 2% chance of hearing them next with random play. So right after one of their songs finishes, odds show a 50% chance they will play again within the next 35 songs and a 64% chance they will be played again within the next 50 songs. This can be calculated using the following equation:


’’’Figure 6.‘’’ The chance of one particular artist playing within song count n. | border | align: center

It’s simply the mind’s tendency to find a pattern that makes you think iTunes has a preference.

1 Levy, Steven. "Does Your iPod Play Favorites." 31 January 2005. http://msnbc.msn.com/id/6854309/site/newsweek/ Accessed 4 June 2005.

2 Hofferth, Jerrod. "Using Party Shuffle in iTunes." 22 August 2004. http://ipodlounge.com/index.php/articles/comments/using-party-shuffle-in-itunes/ Accessed 4 June 2005.

Thread parent sort order:
Thread verbosity:
0 Votes  - +
did you do this? by bradsmith

Brian, is this an article that you posted…or did you run this experiment. I heard this a while back about somebody determining that the algorithm apple used was indeed random in design. But man if you ran this experiment, kudos. You got some time on you hand up in dallas bro!

I\‘ll tell you how "anal" I have configured my iTunes using smart playlists. I have rated my entire collection. First let me establish the meanings of the star-ratings: 0 = OnTheGo Exiled, 1 = Yet to be rated, 2 = Low Rotation, 3 = Medium Rotation, 4 = Super Rotation, 5 = OnTheGo Rated. The idea behind it: All imported music gets 1 star. When on the road, if I like the tune, I update the rating to 5-stars. If I absolutely hate a tune, i\’ll exile it from playback by updating the rating to 0. When I get home and the stats are synchronised, I have a smart playlist called \‘OnTheGo Rated\’. I can then review the \‘approved\’ songs and assign a more appropriate rating. This results in 3 smart playlists for Low Rotation, Medium Rotation and Super Rotation, much like a typical radio station or musicvideo tv channel. In the partyshuffle mode, I assign another smart playlist, which 90% of the time is set to \‘Rotation Low .. Super\’. This will make sure that my party shuffle only plays approved tracks. If I only want "major hits", I select a more narrow selection using the smart playlist \‘Rotation Medium .. Super\’ or just \‘Rotation Super\’. Furthermore, I\‘ve set up \’Best Of Ggenre\’ smart playlists, which enables me to narrow the selection to a specific genre. The option \‘Play higher rated songs more often\’ is used throughout the partyshuffle mode, ensuring that more popular tracks are played more often. However, by utilizing the 0-star and 5-star for OnTheGo rating and using the 1-star as an initial rating, the partyshuffle mode only utilizes the 2..4 star rated tracks. The majority of the tracks reside in the 2-star category, which in regards to this statistical experiment is a big shift, as the article claims that \‘3-star\’ is the center of gravitation. I\‘m not a big math-freak, so perhaps you could tell me wether or not this has a positive influence on the \’Play higher rated songs more often\’ option. Oh, to make the playlist mania complete: using this strategy, I\‘ve also created these smart playlists: \’Rated but never played\‘, which lists all tracks having a 2..4 rating, but have never been played before, either in iTunes or the iPod itself, and a \’OTG to-be-rated\’ playlist, which features all 1-star tracks (initial import rating), which makes a handy tool to force you to review tracks and assign ratings, either in the comfort of home, or on the road using the OTG-exiled (0 stars) and OTG-rated (5-stars) ratings for easy rating ;). Oh and to complete it all, I\‘ve also set up an \’Airplay top 100\‘. I\’d like to hear all your comments on my implementation and the mathematical implications.

I question the validity of assuming a bell curve distribution for the song ratings.

Its a self-selected group… why would you import/purchase songs you don’t like? I suspect the curve is skewed significantly.

I have no ratings for any of my 4363 songs in my library. Using regular shuffle itunes will play the same songs twice within a couple hours. I do not have any duplicates. Also, the next day or the next time I restart itunes, itunes will play many of the same songs it played the previous day or time. I have watched and checked and watched over the last 6 months and this random thing is not so random.

0 Votes  - +
What a waste... by Anonymous

…of time and man-power. The math behind it is simple and plain, the article just proves that itunes functions work.

I personally don’t understand why people praise itunes so much as if it were some artefact. I think itunes is not geeky at all.

The formula in the paper is more than a bit unnecessarily complex. The evidence points to the following explanation provided by Bert 690 in the slashdot discussion for this story:

OK, after a bit more thinking, you were indeed very close. It appears the actual formula is:

points(0 stars)=1
points(1 stars)=3
points(2 stars)=4
points(3 stars)=5
points(4 stars)=6
points(5 stars)=7

probability(X stars) = points(X stars) / 26

This yields the following probabilities, listed along side the observed values from the article along with 95% confidence intervals.

p(5 star)=.2692 [.270 +- .0038]
p(4 star)=.2308 [.230 +- .0036]
p(3 star)=.1923 [.189 +- .0033]
p(2 star)=.1538 [.154 +- .0031]
p(1 star)=.1154 [.118 +- .0027]
p(0 star)=.0385 [.039 +- .0016]

As you can see each computed probability falls within the 95% confidence interval, so there’s a good chance this is the correct forumla.

Boy do I have too much time on my hands today.

0 Votes  - +
Back to Basics by mr.kurtz

All of which reminds me of a question I found interesting in Information Theory…

What do we mean by random?

What definition do people use? Please make it as matematically exact as you can…

Does the "play count" number affect the weight of the songs at all? Can another test be run with several tracks of the same rating, but with varying play counts?

one thing i noticed was that you only used six songs in your test. What i wonder is did you name the songs 1,2,3,4,5,6 corresponding to the number stars, 1 being a zero star and 6 being a five star. the reason i mention this, is because i had thought that perhaps preference might be given on the basis of song title/artist as it is listed alphabetically. I also wonder if there might be some colinearity related to the number of times a song was previously played. the affect of this being that, initially the first six songs might be played rather randomly due to a small sample size. from there on out, the songs being played may have the play count reflected on the number of times played in the future.

Another point i would like to make is that when you use a sample size of six, it is very hard to get an actually statistically significant outcome. Not that i’m hating on what you did, but i would like to see/do a study of say 300 samples randomly titled, randomly rated, and then arbitrarily played thousands of times to get a better description of the data.

From my understanding most users have at least a thousand songs, and from my statistics classes in college, it is quite evident that you need a much larger sample size to actually represent the real population.

0 Votes  - +
Shot Down by Anonymous

Well… today iTunes can help you out my friend. There is a new iTunes in town and it will help you lose that rating system of yours which i feel bad about. i mean, ive gone through my whole library naming every song to perfection and i knwo that was a pain. but the ratings too? damn man i feel for you. anyway i hope you enjoy the new iTunes!


It’s possible, on a Mac at least, to have ratings between 0 and 100; where 0 corresponds to no star, 20 to one star, and so on up to five stars.

What I’d like to see is this experiment repeated with 101 tracks, each with a different rating, just to see if this is taken into account by iTunes…

Any takers?

I just put my iPod on shuffle on a playlist I made. I might just believe in the same voodoo he was trying to put to death in the article, but I think the selection algorythm might have certain extra criteria. If it doesn’t then at the very least, it’s possible that a program to create playlists could.

With the 18 song playlist on shuffle, the songs seemed to follow a few trends:

- tracks got slower, and then faster (a degcrease, and then an increase in BPM)

- the only instrumental track on the list ("Bean-E-Man" by DJ Logic), and a track with sparse lyrics that are mixed in almost in the background ("Pulk/Pull Revolving Doors" by Radiohead) occurred next to each other, around number 12 out of 18 for the track.

-both instances where an artist appeared twice on the playlist, the two songs were played successively. In both instances, both songs came from the same album. These instances were "Beautiful" and "Batman and Robin" from <i>Paid tha Cost to be Da Bo$$</i> by Snoop Dogg, and "Award Tour" and "Electric Relaxation" from <i>Midnight Marauders</i> by A Tribe Called Quest.

This might not just be a coincidence- the BPM for each of these albums remains somewhat constant, and the tones/frequencies recorded also are consistant; Snoop Dogg’s voice is very distinctive, as are those of Q-Tip and Phife Dawg. The title track from Midnight Marauders even says, the entire album is "Bass Heavy".

Keep in mind as well that each and every studio-recorded mp3 file was mixed in stereo, with each instrumental and vocal track given a unique distribution between right and left, most likely using digital equipment. If a computer can put something like this together, then it most likely can take something like this apart.

Though computers are not capable of things like "mood" or "preference", they can and have been used to recognize things like audible frequency, beats per minute. I’m not sure whether or not iTunes, or the software on the iPod uses these things to compute the optimal order for songs to occur in- I do know, however, that it’s possible.

I’ll say this, too: I liked my iPod’s order for the songs more than mine.

0 Votes  - +
testing methods by Anonymous

I’m wondering what was the method? Did you set up a system of programmed multivariate testing? I’ve seen similar tests run for vending machines, attempting to determine preference by location. They wrote a vm ware model and set up a javascript to test scenarios. You could probably get a company to provide network virtualization to really simulate this – if you have a few hundred thousand $ to throw around.

This is a funny study, similar to a friend of mine, who was trying to optimize seat preferences to sell Masters golf tickets. I don’t think he clocked any dollaz though

0 Votes  - +
The Gumball Man... by Anonymous

i really don’t know if there is such a tracking on iTunes or Ipods… seems like the whole situation is like as random as bouncing gumballs. Speaking of gumballs, i’m hungry… maybe i’ll go to lunch with my ipod.

Your sample size is too small to properly identify anomalies

Share & Socialize

What is OmniNerd?

Omninerd_icon Welcome! OmniNerd's content is generated by nerds like you. Learn more.

Voting Booth

Can Trump make America great again?

14 votes, 1 comment