Psyliq Internship Completion - Internship Python or R
Psyliq Internship Completion - Internship Python or R
i. Data Collection:
Use the yfinance library in Python to fetch historical stock price data. Example: import
yfinance as yf and data = yf.download('AAPL', start='2022-01-01', end='2023-01-01').
iv. Normalization:
Normalize numerical features to a scale between 0 and 1. You can use the Min-Max scaling
method for this.
v. Data Splitting:
Split the dataset into training and testing sets using the train_test_split function from
sklearn.model_selection.
Code:-
import yfinance as yf
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# Data Collection
data = yf.download('AAPL', start='2022-01-01', end='2023-01-01')
# Data Preprocessing
data = data['Close'].values.reshape(-1, 1)
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
# Feature Engineering
# Example: Creating a 7-day moving average
data_scaled = pd.DataFrame(data_scaled, columns=['Close'])
data_scaled['MA_7'] = data_scaled['Close'].rolling(window=7).mean()
# Normalization
data_normalized = scaler.transform(data_scaled)
# Data Splitting
X_train, X_test, y_train, y_test = train_test_split(data_normalized[:-1], data_normalized[1:],
test_size=0.2, shuffle=False)
# Model Building
model = Sequential()
model.add(LSTM(50, input_shape=(X_train.shape[1], 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
# Model Training
model.fit(X_train.reshape(X_train.shape[0], X_train.shape[1], 1), y_train, epochs=50,
batch_size=32)
# Model Evaluation
loss = model.evaluate(X_test.reshape(X_test.shape[0], X_test.shape[1], 1), y_test)
# Prediction
predicted_prices = model.predict(X_test.reshape(X_test.shape[0], X_test.shape[1], 1))
2) Titanic Classification:
Data Collection:
Downloading the Titanic dataset from a source like Kaggle.
Data Exploration:
Use pandas for exploratory data analysis (EDA). Check for missing values, data types, and
summary statistics.
Data Preprocessing:
Handle missing values by imputing or removing them.
Encode categorical variables using one-hot encoding for algorithms that require numerical
input.
Data Splitting:
Split the dataset into training and testing sets using train_test_split.
Model Selection:
Choose a classification model such as logistic regression, decision trees, or random forests.
Model Training:
Train the selected model on the training dataset. Example: from sklearn.linear_model
import LogisticRegression and model = LogisticRegression().fit(X_train, y_train).
Model Evaluation:
Evaluate the model using metrics like accuracy, precision, recall, and F1 score on the testing
dataset.
Feature Importance:
If using a tree-based model, analyze feature importance using the feature_importances_
attribute.
Sample Implementation for above task is :-
Code:-
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
# Data Collection
titanic_data = pd.read_csv('titanic.csv')
# Data Preprocessing
titanic_data.dropna(subset=['Age', 'Embarked'], inplace=True)
titanic_data['Sex'] = titanic_data['Sex'].map({'male': 0, 'female': 1})
X = titanic_data[['Pclass', 'Sex', 'Age']]
y = titanic_data['Survived']
# Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Model Selection
model = RandomForestClassifier()
# Model Training
model.fit(X_train, y_train)
# Model Evaluation
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
report = classification_report(y_test, predictions)
3) Number Recognition using Neural Network and MNIST
dataset:
Data Collection:
Download the MNIST dataset, which is often available through TensorFlow or PyTorch
datasets.
Data Preprocessing:
Normalize pixel values to a scale between 0 and 1. Reshape the images to a format suitable
for neural networks.
Data Splitting:
Split the dataset into training and testing sets.
Model Building:
Build a neural network using TensorFlow or PyTorch. For TensorFlow, use the Sequential
model and add dense layers.
Model Training:
Train the neural network on the training dataset, specifying the number of epochs and batch
size.
Model Evaluation:
Evaluate the model on the testing dataset using accuracy or other relevant metrics.
Prediction:
Use the trained model to predict the digit in new handwritten images.
Code:-
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
# Data Collection
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Data Preprocessing
X_train, X_test = X_train / 255.0, X_test / 255.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)
# Model Building
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
# Model Training
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5)
# Model Evaluation
loss, accuracy = model.evaluate(X_test, y_test)
# Prediction
predictions = model.predict(X_test)