
This project is a part of the Accenture North America Virtual Experience Programs that provides a deep dive into the evolving world of data from an analytics and visualization perspective with Navigating Numbers. The focus is to equip participants with the necessary data basic skills such as data cleaning, modeling, visualization and storytelling.
Problem Statement:
In this project, I worked as a part of the Accenture’s Data Team to analyze the content category of our client and highlight the top 5 categories with the largest aggregate popularity.
Approach and Solution:
In pursuit of our objectives, we adopt a technical mindset that involves critical thinking and breaking down complex challenges into smaller, manageable tasks. Our focus remains dedicated to achieving our goals by progressing one step at a time.
We will outline the recommended steps for our analyses.
A vital task as a data analyst is having a good understanding of the business. We start by facilitating interactions with stakeholders to find out about their business challenges and framing the necessary hypothesis that will enable us to make efficient analysis. A simple plan that provides us with a sketch of the endpoint result of our analysis will also be necessary as it will provide us an idea on how to make efficient visualizations to communicate our findings to the stakeholders.
A careful evaluation of the client brief shows that they want an analysis of their content categories showing the top 5 categories with the largest popularity. To figure out popularity, we will have to add up which content categories have the largest score.
I was provided with 7 datasets. I made a final selection of 3 datasets containing information on the Contents, Reactions and Reaction types. I also made a list of the relevant columns required to reach our analytics goal.

List of Columns
Data Acquisition
To begin the analysis, we will have to access the company datasets using Microsoft Excel. Each dataset is presented as a CSV file. A CSV (comma-separated values) file is a file format that structures data in lists, separated by commas. CSV is a popular and widely used file format and can be read by a wide range of other data manipulation tools.
Importing the Datasets:
For this project, we will launch the Microsoft Excel App and open the 3 selected datasets.

Opening the Datasets
Data Cleaning: