Analysis of sold new cars in Estonia¶
Data is taken from official source - https://www.mnt.ee/et/ametist/statistika/soidukite-statistika
Period - from Ferbruary 2018 till Janurary 2019(due to wrong format in January 2018).
Only new cars where used in this analysis.
Special script was created to extract needed data from official xls files.
Findings¶
- 26133 cars were sold for this period.
- ~67% petrol, ~25% diesel, ~6% hybrid cars...
- ~24% has engine capacity of ~2.0 liters, ~16% ~1.6 liters, ~13% ~1.5 liters...
- Most popular brands/marks in Estonia: TOYOTA, SKODA, RENAULT, VOLKSWAGEN, KIA...
- ~42% of all cars are SUVs
- Corporate clients buy cheaper cars with smaller engines(ex. Renault Clio), private clients buy more expensive cars with bigger engines(ex. Toyota Rav4)
- People from different cities buy different cars.
- People don't buy what they want, people buy what dealers have in stock (Most popular petrol/diesel cars: TOYOTA AVENSIS, VOLKSWAGEN TIGUAN)
In [125]:
%reload_ext autoreload
%autoreload 0
import os
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
In [126]:
df = pd.read_csv('mnt.csv')
df.describe(include='all')
Out[126]:
In [127]:
df.head()
Out[127]:
In [128]:
for col in "Kategooria KeretΓΌΓΌp VΓ€ljalaske aasta CO2 TΓ€ismass TΓΌhimass Telgi kokku Istekohti Maakond".split('\t'):
df = df.drop(columns=col.strip())
df.head()
Out[128]:
Adding additional data¶
In [129]:
df['model'] = df['Mark'] + ' ' + df['Mudel']
df.head()
Out[129]:
In [130]:
# taken from https://www.auto24.ee/new/list.php?d[class]=8,9,11&d[type]=0&d[fuel_id]=0&d[transmission_id]=0&d[drives_id]=0&d[start]=0
SUVS = 'stelvio q2 q3 q5 q7 q8 x1 x2 x3 x4 x5 x6 escalade xt5 aircross duster fullback ecosport edge kuga ranger cr-v hr-v cross kona santa tucson d-max cherokee compass renegade niro sorento sportage stonic nx rx cx g-klass gla glc gle gls x-klass countryman asx l200 outlander pajero juke navara qashqai x-trail grandland mokka 2008 3008 5008 cayenne macan alaskan captur kadjar koleos arona ateca karoq kodiaq rexton forester xv jimny sx4 vitara c-hr hilux cruiser rav4 amarok t-roc tiguan toureg xc40 xc60 xc90'.split()
df['SUV'] = df['Mudel'].apply(
lambda x: any([suv.lower() in x.lower() for suv in SUVS])
)
df.head()
Out[130]:
In [131]:
from decimal import *
def round(number, digit=0):
return float(Decimal(number).quantize(Decimal(str(1/10**digit)), ROUND_HALF_UP))
df['full model'] = df['Mark'] + ' ' + df['Mudel'] + '(' \
+ df['Mootori tΓΌΓΌp'].map(lambda x: x.replace('kat.', '').strip()).map(str) + ', ' \
+ df['Mootori maht'].map(lambda x: round(x / 1000, 1)).map(str) + ' cm3, ' \
+ df['Mootori vΓ΅imsus'].map(lambda x: int(x)).map(str) + ' kW)'
df.head()
Out[131]:
Total sold cars¶
In [132]:
df['Arv'].sum()
Out[132]:
Popular fuels¶
In [133]:
def show_pie(dataframe, groupby, index = None):
dataframe = dataframe.groupby(
[groupby],
as_index=False
)['Arv'].sum().sort_values('Arv', ascending=False).reset_index(drop=True)
if index:
dataframe.index = index
else:
dataframe = dataframe.set_index(groupby)
dataframe['Arv'].plot.pie(autopct='%1.2f%%')
print(dataframe)
show_pie(df, 'Mootori tΓΌΓΌp')
Engine capacities¶
In [134]:
bins = np.linspace(0, 3000, 31, endpoint=True)
capacities = pd.cut(df['Mootori maht'], bins, right=False).value_counts()
capacities.plot(kind='pie', autopct='%1.2f%%')
None
Top 10 popular manufacturers¶
Two charts might be displayed:
- 'model' - data is grouped by mark and model
- 'full model' - data is grouped by mark, model and engine
In [135]:
LIMIT = 10
def group_and_sort(data_frame, selection):
return data_frame.groupby(
selection,
as_index=False
)['Arv'].sum().sort_values('Arv', ascending=False).reset_index(drop=True)
def group_sort_and_plot(data_frame, selection, x='Arv', y=None, limit=LIMIT, plot=True):
data = group_and_sort(data_frame, selection)
data.index += 1
y = selection[0] if y is None else y
if plot:
sns.barplot(x, y, data=data[:limit], orient='h')
print(data.head(limit))
group_sort_and_plot(df, ['Mark'])
Top 10 popular car models¶
In [136]:
group_sort_and_plot(df, ['model'])
In [137]:
group_sort_and_plot(df, ['full model'])
Top 10 popular diesel models¶
In [138]:
df_diesel = df[df['Mootori tΓΌΓΌp'] == 'Diisel']
group_sort_and_plot(df_diesel, ['model'])
In [139]:
group_sort_and_plot(df_diesel, ['full model'])
Top 10 popular petrol models¶
In [140]:
df_petrol = df[df['Mootori tΓΌΓΌp'] == 'Bensiin kat.']
group_sort_and_plot(df_petrol, ['model'])
In [141]:
group_sort_and_plot(df_petrol, ['full model'])
Top 10 popular hybrid models(diesel and petrol)¶
In [142]:
df_hybrids = df[df['Mootori tΓΌΓΌp'].apply(lambda x: 'hΓΌbriid' in x)]
group_sort_and_plot(df_hybrids, ['model'])
In [143]:
group_sort_and_plot(df_hybrids, ['full model'])
Top 10 popular cars by private owners¶
In [144]:
df_private = df[df['TΓΌΓΌp'].apply(lambda x: 'FΓΌΓΌsiline' in x)]
group_sort_and_plot(df_private, ['model'])
In [145]:
group_sort_and_plot(df_private, ['full model'])
Top 10 popular cars by company owners¶
In [146]:
df_corporate = df[df['TΓΌΓΌp'].apply(lambda x: 'FΓΌΓΌsiline' not in x)]
group_sort_and_plot(df_corporate, ['model'])
In [147]:
group_sort_and_plot(df_corporate, ['full model'])
SUV vs non SUV¶
In [148]:
show_pie(df, 'SUV', index=['non-SUV', 'SUV'])
Popular fuels in SUV¶
In [149]:
df_suv = df[
df['Mudel'].apply(
lambda x: any([suv.lower() in x.lower() for suv in SUVS])
)
]
show_pie(df_suv, 'Mootori tΓΌΓΌp')
Top 10 popular SUV¶
In [150]:
group_sort_and_plot(df_suv, ['model'])
In [151]:
group_sort_and_plot(df_suv, ['full model'])
Top 10 popular Diesel SUV¶
In [152]:
df_diesel_suv = df_diesel[
df_diesel['Mudel'].apply(
lambda x: any([suv.lower() in x.lower() for suv in SUVS])
)
]
group_sort_and_plot(df_diesel_suv, ['model'])
In [153]:
group_sort_and_plot(df_diesel_suv, ['full model'])
Top 10 popular Petrol SUV¶
In [154]:
df_petrol_suv = df_petrol[
df_petrol['Mudel'].apply(
lambda x: any([suv.lower() in x.lower() for suv in SUVS])
)
]
group_sort_and_plot(df_petrol_suv, ['model'])
In [155]:
group_sort_and_plot(df_petrol_suv, ['full model'])
Top 10 popular hybrid SUV¶
In [156]:
df_hybrid_suv = df_hybrids[
df_hybrids['Mudel'].apply(
lambda x: any([suv.lower() in x.lower() for suv in SUVS])
)
]
group_sort_and_plot(df_hybrid_suv, ['model'])
In [157]:
group_sort_and_plot(df_hybrid_suv, ['full model'])
Most popular models per city¶
In [158]:
# Popular car per each city
data = df
data.replace('Narva- JΓ΅esuu','Narva-JΓ΅esuu', inplace=True)
cities = set(data['Linn'].tolist())
for city in cities:
print()
print(city)
group_sort_and_plot(data[data['Linn'] == city], ['full model'], limit=3, plot=False)
Top 3 for each mark¶
In [159]:
marks = ['ALFA ROMEO', 'ALPINE', 'ASTON MARTIN',
'AUDI', 'BENTLEY', 'BMW', 'BMW I',
'CADILLAC', 'CHEVROLET', 'CHRYSLER', 'CITROEN',
'DACIA', 'DODGE', 'DS', 'FERRARI', 'FIAT', 'FORD',
'HONDA', 'HYUNDAI', 'INFINITI', 'JAGUAR', 'JEEP', 'KIA',
'LAMBORGHINI', 'LAND ROVER', 'LEXUS', 'LOTUS', 'MASERATI',
'MAZDA', 'MERCEDES-AMG', 'MERCEDES-BENZ', 'MINI',
'MITSUBISHI', 'NISSAN', 'OPEL', 'PEUGEOT',
'PORSCHE', 'RENAULT', 'SEAT', 'SKODA', 'SSANGYONG',
'SUBARU', 'SUZUKI', 'TESLA MOTORS', 'TOYOTA', 'VOLKSWAGEN', 'VOLVO']
for mark in marks:
models = df[df['Mark'] == mark]
print("Top 3 {} models".format(mark))
group_sort_and_plot(models, ['full model'], limit=3, plot=False)
print()
Comments
Post a Comment