Data Science Portfolios
During the duration of my Data Science Course we had to complete 4 portfolios. These included analysing the datasets and then manipulating the data to gain valuable insights about it.
Portfolio 1 Link to heading
This was the first task we undertook. For this task we had to complete tasks such as Removing Missing data from the file (Null Values), investigating the data and creating descriptive statistics. We also had to plot and analyse the data, as well as giving an overall summary of my findings. At the end we had to learn how to detect and remove outliers that affected the skew of the data.
Portfolio 2 Link to heading
This was the second task we completed. To begin with, we had to explore the data, to see what we were dealing with. This involved using .head() and .info(). We then viewed to correlations between all the columns. We then did an analysis on these correlations. We then had to create a hypothesis before Training and Testing the data. After that, we trained a Linear Regression Model. We then visualised, compared and analysed the results.
Portfolio 3 Link to heading
This was our final task, dealing with the E-Commerce Dataset. We started by exploring the data and cleaning it up. We then had to covert the object features into digit features by utilising an encoder. We then created a heat map to study the correlation between all the features. After this, we split up the dataset to train a logistic regression model to help predict the ‘rating’ based on other features. We analysed these results. After this, we trained a KNN model to see how the accuracy would compare and completed discussions on these results.
Portfolio 4 Link to heading
After increasing our skill set with the previous portfolios, it was time to tackle a task of our own. This involved me tackling the following 7 questions that I created.
What is the distribution of purchase amounts in different categories? How does the review rating distribution differ between males and females? How does the purchase amount correlate with the age of the customers? Convert categorical variables (e.g., Gender, Category, Location, etc.) to numerical values. Split the dataset into training and testing sets, and standardize the features. Build a logistic regression model to predict the ‘Subscription Status’ based on other features. Report the accuracy and classification report. Build a k-nearest neighbors (KNN) classifier to predict the ‘Category’ based on other features. Determine the optimal number of neighbors using cross-validation.