From the course: Delivering Data-Driven Decisions with AWS: Applying Machine Learning, Data Engineering, and Generative AI
Navigating Amazon CodeWhisperer for EDA and PCA
From the course: Delivering Data-Driven Decisions with AWS: Applying Machine Learning, Data Engineering, and Generative AI
Navigating Amazon CodeWhisperer for EDA and PCA
- [Instructor] Now let's open VS Code. So what we'll do is we'll select a language, so we'll just type the word python, and then once you see that icon appear, just click on it. And then the first thing that we'll do today is we would like to perform principle component analysis in Amazon CodeWhisperer in VS Code. So we'll just start typing a few words to import the library. So import numpy, and then you can see that suggestion. So tab to the right and then suggest, and then accept that code suggestion when you see that balloon and click enter on your keyboard to accept that code suggestion. And then we'd also like to import a standard data visualization library called matplotlib, and so you can see matplotlib.pyplot as plot. So click on the tab key, tab to the right, and click enter or return on your keyboard to accept that code suggestion. And then we'd like to import the sklearn library for PCA. So import sklearn import. Just start typing a few word, decomposition. You can see that word is appearing, decomposition. So we'd like to accept that code suggestion, so tab to the right and click on return on your keyboard. And then also, from sklearn, we'd like to import datasets as well. So yes, so tab to the right, click on return to accept that suggestion. And then we would like to just import a random, we just want to import random seed. So just generate a random dataset. And then we want to also click on the wine dataset from sklearn. So click on wine equals, you can see that it's suggesting when to load a dataset, the wine dataset from sklearn, click return to accept that code suggestion. And then we would, it's also learning our intent right now, so it's suggesting would you like to accept the code suggestion for their features, X, which is wine data, and also Y, which is a target variable. So click tab to the end and then click return to accept that code suggestion. And then also we'd like to just understand the shape as well of our features. And then next, we'd like to perform PCA. So we just type the word equals, PCA equals. And then it's also suggesting, yep, decomposition. Would you like to perform principle component analysis of two components, so tab to the end, tab to the right, and click on return to accept that code suggestion from CodeWhisperer. And then also, we'd like to fit some X_r, so we'd like to perform PCA and transform our dataset using the fit function. And then next, we'd like to plot the function and have a look at the components as well. So it's already suggesting our intent, so it's plot. So yes, we'd like to accept that suggestion, tab to the right, click return, and then also, just say we'd like to do another plot as well. So we could also, you know, add further information regarding our plot, like the labels as well. So tab to the right to suggest to click return to accept that code suggestion. And it's learning our intent, and it's saying would you like to make a scatterplot this time, so you can tab to the right, click on return. And so that is how we could use, and then now it's suggesting would you like a legend as well, which is great. So it's learning our intent in natural language as we complete PCA to reduce our number of features. So next, what we can do together is, we open another file in VS Code, and then we'll select language Python again. So just tap Python, and then once you see that icon appear, click on it. And then this time, we'll perform exploratory data analysis or EDA. So what we'll do is, we'll just start, we'll just write import pandas, and then it's also learning intent. So it says pandas as pd, tab to the right, click on return to accept that code suggestion. And then we want to import numpy, which is a scientific package in Python. So we'll just tab to the right, accept that suggestion from CodeWhisperer, click enter, and then we'll also input. And then from sklearn, for example, just say we'd like to perform machine learning, and yes, we do want to split our dataset partitioner into training set and a test set, so we'll use that function train_test_split, and then we'll tab to the right, accept, click on return on your keyboard. And then what we'll do is, from sklearn, we'll also, we want to perform linear regression, for example. So .linear. Linear_model. And then we want to input linear, linear regression. So we can just tab to the right to accept that code suggestion and then click enter. And then also, just so you have some missing values, so we'll just start typing a few words. Sometimes when you have missing values, you need to impute the missing values. From sklearn.impute. So sometimes you might want to impute them by like a mean or like the median as well. So impute import SimpleImputer. So just tab to the right to accept that code suggestion. And then we also might want to import sklearn.metrics, our evaluation metrics. So .metrics, just start typing a few words, and ask where to for linear regression. That's one of our evaluation metrics and just say you want to import something else, so you could type mean_squared_error and then tab to the right. Click enter to return. And then now, we want to import a few more libraries. Import the data visualization library for matplotlib, as we saw before, tab to the right, click enter. And then also, we want to import seaborn as well, perhaps, another data visualization package in Python. So tab to the right to accept that suggestion and click on return. And then you want to keep the, and then also now, CodeWhisperer is suggesting, would you like to import your dataset? So yes, so tab to the right. And then we want to inspect our data, so just say we can start writing a few Python comments. So inspect data. And then CodeWhisperer also learns from our, and reads our Python comments as well. So we just click on the next row, and then we will type df.head, which means, helps us to examine the first five rows of our data frame, and then tab to the right to accept that code suggestion. And then the next step in exploratory analysis might be to check for null values. So writing the comments, check for null values. And then you can see that CodeWhisperer is already making a suggestion, so do we want to compute and sum the number of null values? So yes, so tab to the right to accept that suggestion. And then the next step of exploratory analysis might be to have a look at descriptive statistics. So df.describe. Yep, accept that suggestion. And the next step might be, let's check for outliers. Click on the blank row. So, and so we might want to check via a box plot as well. So you can see how smart Amazon CodeWhisperer is making a suggestion for us. So we can visualize our outliers via the box plot. And then just say, we would like to, you know, plot a histogram, you know, when we're checking the distribution of our data. So what would the code be? So let's type in our comments that we'd like to generate a histogram, and then we'll just wait for CodeWhisperer to make a suggestion. So yep, it's already produced a suggestion. So just tab to the right and click enter. And then one more thing. Recently, I forgot the code for correlation matrix. So you can type in the comments plot correlation matrix, and you can see how CodeWhisperer is learning our intent. So what we'll do is we just start typing a few words. So corr equals, and it's already making that suggestion for us. You can see how CodeWhisperer is our friend, and that's true, hit enter. So there you have it, that's how easy it is to write a few prompts in natural language to help with the exploratory data analysis project.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.