Spotify Wrapped project

JSON | SQL queries | Python dashboard
Q4 2022
Spotify Wrapped: interactive Python dashboard preview

Forewords

This article presents  the user problem I wanted to address as well as the dashboard solution I created. To dive deeper, I kindly invite you to visit my Github repository to read the underlying Python and SQL codes.
Tech stack: Python | SQL

Github | Spotify Wrapped

Project summary

Every December since 2016, Spotify users discover their "Spotify Wrapped". The latter provides a compilation of data about their activity on the platform over the past year: top artists, top songs, top genres, etc.

As a Spotify paying customer, my objective here was to reproduce most insights from Spotify Wrapped using Python (data import and visualization) and SQL (data analysis). To do so, I requested all my Spotify user data in November 2022.

Dataflow illustration

User goals

In my case study, I wanted to answer the following questions:

Data provided by Spotify

JSON format of Spotify data: an example

I received 8 different JSON files that provided the following information:
. ts: Timestamp when the track stopped playing (UTC format)
. platform: Device type
. ms_played: Number of milliseconds the stream was played
. Song information: track name, artist name, album name, track unique identifier (Spotify resource identifier)
. Podcast information : episode name, show name
. Reason start and end
. User actions: shuffle, skipped, offline mode, incognito mode

Methodology

Data preparation

Data analysis

For each question listed in the user goals, I :


Screenshot Python code (SQL query and interactive graph)

The code above contains a simple SQL query that returns:
. the total number of sessions (i.e. number of music tracks and podcast episodes)  
. the average number of sessions (i.e. number of music tracks and podcast episodes)   per day
. the average number of hours played on Spotify per month
. the average number of hours played on Spotify per day
The results are grouped by year, month and media type (music vs podcasts).

I then transformed the results of the SQL query into an interactive dataframe. Using that dataframe, I then created an interactive scatterplot that reacts to 2 widgets: a year slider and a radio button defining the type of variable to use for the Y axis.

Overview in Python:

Screenshot interactive scatterplot in Python

Python dashboard: Video preview

Main insights

Spotify knows a lot about me! For the past 2 and half years, Spotify has accumulated 100,000 data points about my listening habits. It can of course identify my top genres, artists, song tracks and podcasts. But it may also guess my daily habits like commuting / working and even hours of sleep... !

I started using Spotify in February 2020, just before the first French Covid19 lockdown was implemented. During the lockdown, I used the service about 4 hours a day, and then almost completely stopped using it during the summer 2020 (end of the 1st lockdown). However, at the end of 2020, a second lockdown took place in France and I resumed using Spotify (up to 5 / 6 hours a day). After that, I never stopped using the app. In 2022, I listened to Spotify on average 4.5 to 7 hours a day.

Besides, between 2020 and 2022, I more than doubled my total number of listening tracks (i.e. 19,931 to 44,371 tracks). This increase is also reflected by the total number of hours played (957 hours to 1979 hours, i.e. 3.4 hours/day to 6 hours/day). To understand why, it is useful to decompose the evolution in 2 parts.

Between 2020 and 2021, my total number of listened tracks increased by 69%. This jump is to be explained by 2 main factors. First, as far as music is concerned, I listened to the same number of songs (ca 2,600 tracks) but each song was played more frequently (12.7 times instead of 7.5 times). Regarding podcasts, my consumption surged. I almost trippled my number of podcasts, and listened to far more episodes (namely 45 to 622 episodes, i.e. +1280%).

Between 2021 and 2022, my total number of tracks increased further by 32% (although 2022 is not a complete year in the dataset). I continued listening to more podcasts. But first and foremost, I started listening to more music playlists that enabled me to discover new artists and new tracks. That way, in 2022, I listened to  1400+ different artists (against 700 on average in the previous years). I also listened to 4800 different music tracks (instead of 2600 before). I am thus far more versatile and curious. Still, on average, I tended to listen to the same song 9 times, which is higher than in 2020 (8 times).

I seem to be more versatile and curious over time thanks to Spotify playlists that make me discover new artists and tracks. Still, I am a heavy Kpop fan: my top songs and artists belong mostly to this genre.

I use Spotify mostly for music. For instance, my podcast consumption was almost non existent in 2020 (I used to listen to podcasts on other apps than Spotify). In 2021 and 2022, I listened to podcasts on Spotify about 1 hour a day, whatever the month considered. On the other hand, my music consumption is much more time variant. It may vary from 2 hours/day to 7 hours/day on average.

Spotify is really part of my daily life. I use its services throughout the day, between 6AM and 10PM. As far as music is concerned, my consumption is pretty constant between 7AM and 8PM (about 15 tracks played per hour). For podcasts, I tend to listen to them early in the morning before going to work (6 and 7AM), and late evening (9, 10 and 11 PM) before going to sleep.

There is no real difference between the work week and the weekend. My consumption of podcasts is flat whatever the day considered. As far as music is concerned, my peak is at the beginning of the week (Monday to Thursday with 100+ tracks per day). On Fridays and the weekend, my number of tracks is down by about 10%. I believe it is because I hang out more on those days.

I use mostly 2 types of device: the Spotify app on my smartphone (Android OS) and the Spotify desktop app (Windows 10).It is very rare that I use the web player (usually it happens when the desktop app does not run properly on my work laptop).

My device habits changed a lot in the past 3 years. The number of tracks on the Spotify desktop version stayed constant over time (about 15,000 tracks per year). But, the number of tracks on the smartphone app exploded (from 5,000 to 30,000 tracks, i.e. x6 increase in only 2 years and a half). That way, in 2020, I was mostly a desktop Spotify user (75% of my total tracks). But one year later, I used a bit more my smartphone than my personal computer to listen to Spotify music. In 2022, I use the smartphone app twice more than the desktop app.

I skip about 20% of the music tracks over the whole period.

This statistic varies a lot depending on the platform. As far as the desktop app is concerned, I almost never skip any music track, whatever the year considered. This may be because I use mostly the desktop app at work: I am focused on my work, and do not bother changing the music in the background.

As far as the Spotify mobile app is concerned, I tend to skip a lot of music tracks. The skipping action usually occurs before 10 seconds. After that threshold, I seem to listen to the whole song. It is interesting to notice that I tend to skip songs, mostly when i) using my smartphone, ii) at the end of the day. The peak seems to be between 7PM and 8PM (i.e. when I am finishing work and going back home). At those hours, the proportion of skipped songs is about the same as the number of non-skipped songs. Maybe it is for adjusting my mood from work to home.

My top 10 tracks are mostly made of Kpop songs every year. Still, these songs do not last in the top rankings: no song is found in the top 10 for several years. However, some artists appear quite consistently in my top 10 songs (e.g. Wonho appears 4 times, K/DA 3 times and Dreamcatcher twice).

More surprisingly, my top Kpop songs are not necessarily sung by the most famous Kpop artists (e.g. no BTS or Blackpink). It seems that I fell in love with specific songs produced by smaller artists.

Some titles in the top 10 may be misleading. Indeed, even if some songs do not belong to the Kpop genre, I discovered these songs through Kpop dance. For instance, I discovered Post Malone' 'Motley Crew' song (2nd most listened song in 2021) thanks to the Hyunjin's dance cover on Studio Choom. I fell in love with BewhY's 'Side by Side' song (3nd most listened song in 2021) because of Netflix 1M dance cover. Finally, regarding K/DA, I do not play League of Legends but I really like the MVs featuring famous girl K-rappers, etc.

Motley Crew (Hyunjin dance cover)Side by Side (Netflix Korea dance)

My top 10 artists differ quite a lot from the artists available in my top 10 songs ranking.  

First, very famous Kpop groups like BTS, TWICE, Stray Kids or EXO are in the top artists ranking (which was not the case in the top 10 songs ranking). Indeed, for these artists, I tended to listen to their whole albums, not just one or two songs on loop. For instance, in 2020, I listened to 94 BTS songs, with an average of 8 plays per song. The same year, I listened to only 5 K/DA songs, with an average of 44 plays per song. That is why K/DA is my top 10 songs in 2020, and not BTS - although BTS is my 2nd most popular artist in 2020.

One thing to notice in 2022 is that there are less Kpop artists in top 10. There are some artists in the 'epic' genre like 2WEI and Two Steps from Hell, as well as some popular western artists like Tiësto and David Guetta. I believe it is because I am listening to more playlists on Spotify that made me discover new songs outside my Kpop comfort zone.

It turns out that I listen mostly to French speaking podcasts. Many are business or career oriented, like "Vocation" (a guest is interviewed to present his / her job to a students and young graduates audience), "Dans la tête d'un PM" (interviews of product managers), "Guerres de Business" (story telling about business wars between giant companies), "Case Interview Prep & management Consulting" (management consulting), etc. I also listen to stories ("Les Pieds sur terre"), news ("La Story") and humor ("Le Moment Meurice").

One thing to notice is that my top 5 podcast shows are quite consistent over time. I tend to listen to the same shows year after year (with some exceptions). Moreover,  I listen to far more episodes in 2022 than in 2020. For instance, in 2020, "Maker" and "Generation XX" were part of my top 5 ranking in 2020 but I listened to each show only twice !

Conclusion

Possible next steps
  • Connect to the Spotify API to retrieve and analyze further music songs metadata (genres)
What I learnt in this project
  • Using SQL queries in a Python project thanks to the sqlite3 package
  • Creating an interactive dashboard in JupyterLab thanks to the panel and hvplot packages and using the Terminal (disclaimer: I still prefer Rshiny and Tableau for creating interactive dashboards ;-)

Explore other portfolio projects

Tableau app overview (3 panels)
Samsung Health x Google Maps History Location
Rshiny app | Listings characteristics
Airbnb France

Want to get in touch?

Send me a message

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.