Data Analytics
Data Wrangling: A simple study using Netflix
The hallmark of Data Science is the ability to explore your data properly. You could have robust data with plenty of latent information but if not properly explored, you will leave such information largely unharnessed. In Data Analytics, data is deeply explored through wrangling to provide insight to history, to answer the questions of why and how and to also make accurate predictions about the future by finding the relationship in the dataset.
No data can be properly explored without it being properly cleaned. Truthfully, some issues of tidiness and quality may escape your focus during the initial cleaning but as you go on exploring, you will find these issues therefore, it is normal to iterate the cleaning exercise which is why I usually make a copy of my original dataset to work with.
dfCopy = df.copy()
Data exploration is objective and subjective. Objective to the individual and subjective to the focus of the research. Hence, the first step in data exploration after cleaning is to write down the questions you want to answer with your exploration. This helps you to remain focused because data exploration can be so distracting and overwhelming at times.
For my data exploration, I used a dataset that contains the TV shows and Movies available on Netflix as of 2019. It shows the data according to country. I will attempt to provide more insight to this and some other observations through this exploration. You can find a copy of my wrangling exercise using Jupyter notebook here
The dataset typically contains two types of shows: Movies and TV shows. The graph below shows their overall ratio in the dataset.

From the pie chart, movies lead overall through the years by a margin of 38.2% which is quite significant.
Shows per country
I selected the top 10 countries with the highest number of shows using the code below:
dfCopy.country.value_counts().head(10)

The United States (USA) has the highest number of shows followed by India, UK, Japan, South Korea etc. 80% of the countries that appear in our top 10 have the largest and oldest movie industries in the world which is why they have distinctively higher percentages of the shows available on Netflix
Shows per Genre


One of Netflix’s strongest selling points is its documentaries archive. Sometimes disguised as mere movies and other times presented as plain old documentaries but nonetheless engaging. Examples are: Inside Bill’s Brain: Decoding Bill Gates, Homecoming by Beyoncé, etc This is followed by the Netflix originals that have totally kept people coming back for more, movies like Money Heist, Lion Heart, Bridgerton etc which is in tandem with the statistics for Genres which shows that Documentaries, Dramas and Comedies have the highest slots and this is further buttressed by the ratings statistics with TV-MA, TV-14 and TV-PG covering genres such as Comedies, Documentaries, Dramas, etc.
Classification by Years
I decided to explore the data by the year each show was released/Created. I also classified the shows using the year they were added to Netflix in order to measure the growth and expansion of the platform over the years.


From the graph on the left, the frequency per year increases progressively year after year until 2019 which should have had a higher frequency than 2018 but is instead almost at par with 2017 and this decline continues further to 2020 which has the steepest decline recording numbers less than 2016. This decline can be attributed to the global pandemic — covid-19 which hampered workflow globally affecting the world’s financial system and every other industry which is also reflected in the graph on the right which shows with 2019 having the highest number of shows added followed by 2020, these two years have the highest patronage of Netflix due to the lockdown that forced people to stay at home.
In addition, I did a little analysis on the movies added to Netflix which were released within the last decade, 2011 till now and before 2011. The percentage of the movies created before 2011 is 6.2% while the percentage of movies created in the last decade is 93.8%
In Summary,
The following observations were made from the Netflix dataset:
- Most of the shows on Netflix are Movies and not TV shows as shown in the 38.2% difference between the two categories.
- The countries with the highest shows are mostly the oldest and largest movie industries in the world.
- Documentaries, Comedy and Dramas are the most frequent genres of shows in the dataset which is mostly due to Netflix original series which have gained critical acclaim globally.
- The global pandemic — COVID-19 affected production in the filming industry which caused a steep decline in the number of shows released in recent years and also affected the number of shows add.
- Most of the shows available on Netflix were released later than 2001
You can connect with me on Twitter