Introduction
The purpose of this report is to analyze an open source dataset containing customer sales data, idenfity any notable findings and present insights which can be used to inform marketing efforts.
The dataset used contains transaction data from customer purchases from multiple countries during the period of Jan 2015 to July 2016 and includes data such as gender, age, product category, quantity. It also includes the cost price and retail price of each item purchase.
Note: Some details are not included in this dataset such as the merchant, the exact brand/item and a way to identify repeat customers or instances where a customer has bought more than one different item.1 This report works on the assumption the sales data pertains to one company. This conclusions of this report are therefore limited.
Dataset Source Url
Importing and making sense of data
I imported the data into a local MySQL server and accessed it with MySQL workbench. As the data was already clean there was no need to preprocess it. It could be analyzed straight away.
Here shows all columns, showing the first 5 results:
To begin the analysis, I separated the data into different demographics – distinguishing by country, age group and gender. I added columns to show the total number of customers in each demographic, along with the average revenue and profit, and the total revenue and profit.
SELECT Country, Gender,
CASE WHEN age between 17 and 24 then '17-24'
when age between 25 and 34 then '25-34'
when age >= 35 then '35+' END AS Age_group,
ROUND(SUM(Revenue)/1000,1) AS `Total_Revenue_(k)`,
ROUND(SUM(Revenue-Cost)/1000) AS `Total_Profit_(k)`,
ROUND(AVG(Revenue),1) AS `AvgRevenue`,
ROUND(AVG(100*(Revenue-Cost)/Revenue),1)
AS `Avg_Profit_margin_%`,
COUNT(*) AS Total_customers
FROM Sales_data
GROUP BY Country, Gender, Age_group
ORDER BY COUNTRY ASC
Insights
Analyzing the data we can find the following insights.
Germany is the most profitable region
Germany generates the most profit (959k) with an average profit margin of 26.6% (more than double the second most profitable, UK with 12.4%). This is true even though it generates less than half the revenue of the United States (Germany 4.2K, United States 10.4K).
It has around the same number of customers as France (5.2k) while generating 3.5x the profit.
Having the highest average spent per purchase supports the stereotype that German customers prefer to spend more to obtain higher quality. Emphasizing quality should therefore be a prominent theme in the marketing messaging.
Bikes are making money in Germany but not anywhere else
Analyzing the profit of different product categories we can see that bikes overall have an average profit of 1% compared to 14.8% or 17% for clothing or accessories.
Delving deeper it becomes clear there is a clear discrepancy between countries – Germany is maintaining a healthy profit margin(16.4%) while the UK, France and the United States are not (0% to -3.5%).
The United States is the only region to have lost money (-109K), with France and the UK making a measly profit ( 15K and 16K respectively – compared to 416K for Germany).
This suggests that products are being sold at heavily discounted prices in all regions except Germany.
Aside from the lack of profitability, as the other countries have comparable sales volume to Germany it suggests the market for purchasing bikes still exists in the less profitable regions. It would be recommended to experiment with lower priced brands/models as the data suggests France, the UK and the US are prepared to pay less than Germany to buy a bike.
Sales rise in the summer, and slump in the autumn
Visualizing in Tableau the total revenue by month for each product category shows a notable spike in bike sales during May and June. It’s followed by a lull and slow recovery, increasing momentum towards Christmas. Sales of accessories and clothing follow suit but to a lesser extent.
In the below chart, the Y-axis shows revenue while the thickness of the line shows the relative quantity.
Looking deeper into subcategories – many stay more or less the same while some (e.g fenders and gloves) are affected by the seasonal trend, as shown in this heat map:
It would be recommended therefore to reduce marketing budget for bikes, fenders and gloves during the low months August to October and instead allocate this budget to the other months, in order to maximise the momentum of the natural trend.
Footnotes
- We can see this as for each data entry where there is quantity greater than one, the total revenue of the purchase is an exact multiple of the price, confirming the entry is referring to the same product. We can confirm this by the following query, which should produce a zero results:
SELECT * FROM Sales_data
WHERE Revenue NOT IN (
SELECT unit_price * quantity
FROM Sales_data)
This presents numerous results. I noticed it was caused by each one being not quite an exact match – the unit price being a third or two thirds of the revenue but rounded to 3 decimal places. Amended, the query below accounts for this:SELECT * FROM Sales_data
WHERE Revenue NOT BETWEEN (unit_price * quantity) - 0.1 AND (unit_price * quantity) + 0.1
The result is 0 rows returned. Interesting, as the unit price is showing a very specific decimal it makes likely these items are sold as sets of three.
Leave a Reply