This article presents in a non technical way the business issue I wanted to address. To dive deeper, I kindly invite you to visit my Github repositories to read the underlying code.
Tech stack : Python | dbt Cloud | GCP BigQuery | Power BI
The world is obsessed with chocolate. Recently, the movie "Wonka" topped the worldwide box office (more than $600 million revenues) with Thimothée Chalamet embodying the young chocolatier inventor Willy Wonka. Had the latter been a real person, he would probably have been astonished by the recent surge in prices of cocoa beans (x3 in a few months) due to bad weather and an ever-increasing global demand.
As an avid chocolate lover, I created this project to monitor the effect of inflation on chocolate products. I was interested in studying the differences in pricing between distributors and brands in France. Indeed, the market is highly segmented with highly processed food produced by industrial giants such as Ferrero (Kinder, Nutella, Ferrero Rocher), Mars, Mondelez on the one hand, and luxury chocolates produced by chefs like Alain Ducasse on the other hand. How do pricing strategies differ across such companies ?
To answer this question, I collected pricing information for 1,200+ chocolate products available for online purchase in France. I cleaned, harmonized and aggregated the data and finally produced a dashboard to summarize insights. I will detail below my methodology as well as the results.
Providing an exhaustive overview of the chocolate market in France is highly difficult for several reasons:
I was able to collect data for 1200+ chocolate products from 7 websites:
To do so, I created at least 1 web crawler for each website (depending on the difficulty to scrape all products at once). Spiders were coded in Python using either the scrapy or requests library. They produced dataframes in a .csv format.
Illustration of the results from scraping the Alain Ducasse website
The goal here was to import all dataframes resulting from the spiders, clean them a little bit in Python so that the data are consistent, before loading everything in GCP using the bigquery client.
Examples of cleaning actions performed:
Illustration of some cleaning actions performed in the Jupyter Notebook:
The Python code described in the section above sent automatically the cleansed data into BigQuery, as SQL tables in a dedicated folder ("dbt_chocolate_setup").
Still, those data were quite raw. They had to be aggregated as a chocolate dataset so that it can be used for further analysis, which is why I used dbt as a next step.
Illustration of GCP:
I used dbt Cloud to implement a rigorous data transformation process that is highlighted by the data lineage illustration above. In details:
There are several advantages of building such schema in dbt :
Illustration of the documentation automatically created by dbt for the Chocolates dataset
Illustration of datasets (SQL views and tables) stored in BigQuery following the dbt Cloud transformations:
I constructed a Power BI dashboard to provide a comprehensive view of the Chocolate market, using a BigQuery connector to plug the "Chocolates" and "Brand statistics" data. My objective was to tackle the following questions:
The Power BI dashboard helped uncover the following insights:
Chocolate products in France are dominated by 2 product categories: bars ("tablettes", 36% of the market) and baskets ("ballotins", "boites", "coffret", ...: 30% of the market). Aside those, the market is quite evenly distributed between nuts, ice creams, sweets, snacking, special products etc. Disclaimer: the category "Other" is quite substantial (8%) as I could not derive the product category from either the product name or its URL.
As chocolate products have very different formats, the price range tends to be wide (from €0.90 to 227€ (!), with a median price of €6.50). Chocolate bars, sweets, snacking and biscuits are the least expensive products (3/4 of chocolate bars and 2/3 of sweets and biscuits products are priced under €5). At the other end of the spectrum, boxes are the most expensive since they tend to contain more chocolate quantities (only 8% of boxes are priced under €5, and 15% of products are priced more than €40).
Because the format and the weight varies so much across products, I computed the price per kilogram as a comparison metric. The median price becomes €38/kg, with wide disparities across product categories.
The cheapest products at face value may not be the cheapest products when their weight is considered. For instance, the sweets, biscuits and nuts categories have the highest median prices per kg (between €45/kg and €55/kg, although product prices tend to low (<5€). On the other hand, some products like boxes and special collections have certainly a high product price but so does the median price per kg.
Not surprisingly, mass market players tend to have way cheaper prices than niche players. For instance, Franprix and Lindt supply products for a median price per kilogram of ca. €25/kg. On the contrary, the Michelin star chef Alain Ducasse and the chocolate specialist Maison du Chocolat sell products at a median price of respectively €136/kg and €190/kg. Chocolate specialists like Jeff de Bruges and Chocolat Castelain stand in the middle of the market, with a median price of €63/kg. Overall, the brand positioning and pricing seem consistent. Abtey is the only exception: it is a niche player and has low prices per kilogram (€22/kg) because it sells a lot of chocolates in bulk ("en vrac").
A good example to look at is the pricing disparities across brands for the chocolate bars category. Alain Ducasse sells 75g chocolate bars for €9, i.e. €120/kg (!) whereas Leader Price (one of the brands distributed by Franprix) sells 100g chocolate for a price between €1.99 and €3.09, i.e. maximum €31/kg. That is a 4x ratio for a product that is quite standardized.
Lindt provides an interested pricing case study. Indeed, its products are sold via 2 distribution channels: Lindt's own website, and the grocery store chain Franprix. Generally speaking, Lindt product sold on Franprix.fr are a bit more expensive (€32 vs €27), most probably because Franprix is an intermediate and Lindt wants to keep its margins.
Franprix applies higher prices for all product categories. For instance, the median price/kg for moulages is €44.50 on Franprix.fr vs €39.50 on Lindt's website. Quite surprisingly, some products (mainly "assortiments" and "bouchées") are sold only by Franprix.
This project has been quite a technical challenge for various reasons:
Some aspects of the project are not perfect and could be improved, for instance:
Thank you for reading so far ! I hope you enjoyed the project as much as I did
Send me a message!