Simple Data Analysis

Bi
2 min readMar 20, 2021

--

The easiest way to overcome cancer is by early detection through timely identification of cancerous cells in the body. According to records, 10–20% of people with cancer are misdiagnosed and 28% of 583 cases were life threatening or life altering (google). Therefore, in order to win this war against cancer, it is important that we greatly cut down the rate of misdiagnosis.

In this article, I will analyze the common traits of cancerous and non-cancerous cells using a sample data from Kaggle on breast cancer — cancer and non cancer classification.

The data contains cell characteristics such as radius, concavity, texture, perimeter, smoothness, symmetry etc and it also identifies which cells are cancerous and which ones are not. I used pandas and numpy to analyse the data and Matplotlib for visualization. My Jupyter notebook containing this analysis can be accessed here:

https://github.com/lufunmbi/Data-Analysis

I used the info() function in pandas to get an overall view of the data and the describe() function to get the mathematical summary of the data. The outcome column contains two types of variables, 0 and 1 which indicates whether a cell is cancerous or not. I plotted a pie chart showing the percentage of cancerous to non-cancerous cells we have in the dataset

I made a copy of the dataset, renamed some of the column to prevent errors during usage and dropped some columns that I would not be using for this analysis. Below is the mathematical summary for new dataset as obtained from the describe() function

I compared some values of the dataset based on the outcome column as seen below:

From the figures above, the non-cancerous cells(0) have larger dimensions than the cancerous cells(1)

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Bi
Bi

Written by Bi

Data Science, Software Development and lots of Satire...

No responses yet

Write a response