0% found this document useful (0 votes)
11 views19 pages

Revision Questions

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
11 views19 pages

Revision Questions

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 19

UNIT CODE:BIT3227 UNIT TITLE: DATABASE SCIENCE

REVISION QUESTIONS/AREAS

1. List and explain three techniques used in exploratory data analysis (EDA). (5 marks)

2. Given a dataset, explain how you would use Python's Pandas library to read, clean, and analyze the data.
Provide example code snippets for each step. (10 marks)

3. Discuss the differences between supervised and unsupervised learning models. Provide examples of each.
(5 marks)

4. Describe how SciPy can be used for optimization tasks. Provide an example of a simple optimization problem
and its solution using SciPy. (5 marks)

5. How do you use Matplotlib to create informative data visualizations? Provide an example of a line plot and a
subplot. (5 marks)

6. Highlight the features of of Jupyter Notebook and its advantages. (6 marks)

7. Describe the steps to perform data manipulation using Pandas. (10 marks)

8. Using Python, demonstrate how to perform a basic statistical analysis on a dataset using NumPy and SciPy.
Include code snippets. (10 marks)

9. How does the ANOVA technique help in understanding data variance? Provide an example of its application in
data analysis. (5marks)

10. Describe the basic operations and array attributes in NumPy. Provide examples of creating arrays and
performing operations on them. (5 marks)

11. Explain how to use Pandas for SQL operations. Provide example code to demonstrate reading data from a SQL
database and performing a simple query. (10 marks)

12. Define data science and explain its basic concepts.

13. List and describe three industries that utilize data science.

14. Outline the steps involved in the data analytics process.

15. What are the key techniques for exploring and visualizing data?

16. Explain descriptive statistics and their importance in data analysis.

17. Describe different data types used for plotting and visualization.

18. Differentiate between statistical and non-statistical analysis methods.

19. Compare and contrast descriptive and inferential statistics.

20. Define population, sample, and sampling techniques in statistics.

21. Provide an overview of regression analysis and ANOVA.

22. Explain the concept of S Square and its significance in correlation analysis.
23. Describe practical applications and interpretation of regression and ANOVA.

24. Discuss the uses of Python in data science and analytics.

25. Explain how to set up a Python environment and the role of Jupyter Notebook.

26. List and describe Python data types, operators, and functions.

27. Describe the uses of R in data analysis and visualization.

28. Explain the different data sets and data structures available in R.

29. Discuss data manipulation and visualization techniques in R.

30. What is NumPy and why is it important for numerical computing?

31. Describe how to create and manipulate Ndarrays in NumPy.

32. List basic operations, mathematical functions, and array attributes in NumPy.

33. Provide an introduction to SciPy and its sub-packages.

34. Discuss applications of SciPy in integration, optimization, and statistics.

35. Explain how SciPy is used for scientific computing tasks.

36. What is Pandas and how is it used for data manipulation?

37. Describe how to work with DataFrames, including data operations and indexing.

38. Explain how to read and write data files and perform SQL operations with Pandas.

39. Give an overview of machine learning approaches and models.

40. How do you identify problem types and select appropriate learning models?

41. Discuss the process of training, testing, and optimizing machine learning models with Scikit-Learn.

42. What is NLP and what are its key applications?

43. Introduce Scikit-Learn NLP libraries and techniques.

44. Describe how to train NLP models and perform grid search for optimization.

45. What are the key principles of data visualization?

46. Explain how to plot with Matplotlib, including line properties, (x,y) plots, and subplots.

47. Discuss techniques for creating visually appealing and informative data visualizations with Matplotlib.

1. List and explain three techniques used in exploratory data analysis (EDA). (5 marks)
Techniques in EDA:
1. Descriptive Statistics: Summarizing the main features of a dataset quantitatively. This includes measures like
mean, median, mode, standard deviation, and quartiles.
2. Data Visualization: Using graphical representations to see patterns, trends, and outliers. Common
visualizations include histograms, box plots, scatter plots, and heatmaps.
3. Data Cleaning: Identifying and correcting errors or inconsistencies in the data. This includes handling missing
values, correcting data types, and dealing with duplicates.

2. Given a dataset, explain how you would use Python's Pandas library to read, clean,
and analyze the data. Provide example code snippets for each step. (10 marks)
Reading Data:
import pandas as pd

# Reading a CSV file


df = pd.read_csv('data.csv')

Cleaning Data:

# Handling missing values


df = df.dropna() # Drop rows with missing values
# df = df.fillna(0) # Alternatively, fill missing values with 0

# Converting data types


df['date'] = pd.to_datetime(df['date'])

# Removing duplicates
df = df.drop_duplicates()

Analyzing Data:
python
Copy code
# Descriptive statistics
print(df.describe())

# Grouping and aggregating


grouped = df.groupby('category').mean()

# Visualizations (using Matplotlib)


import matplotlib.pyplot as plt

df['column'].hist()
plt.show()

3. Discuss the differences between supervised and unsupervised learning models. Provide
examples of each. (5 marks)
Supervised Learning:
 Definition: Models are trained using labeled data.
 Examples: Regression, Classification.
 Regression: Predicting house prices.
 Classification: Identifying spam emails.

Unsupervised Learning:

 Definition: Models are trained using unlabeled data.


 Examples: Clustering, Dimensionality Reduction.
 Clustering: Customer segmentation.
 Dimensionality Reduction: Principal Component Analysis (PCA) for feature reduction.

4. Describe how SciPy can be used for optimization tasks. Provide an example of a
simple optimization problem and its solution using SciPy. (5 marks)
Optimization with SciPy:

 Usage: SciPy's optimize module provides functions for optimization and root finding.
 Example:
python
Copy code
from scipy.optimize import minimize

# Objective function
def objective(x):
return x**2 + 5*x + 4

# Initial guess
x0 = 0

# Minimization
result = minimize(objective, x0)
print('Optimal value:', result.x)

5. How do you use Matplotlib to create informative data visualizations? Provide an


example of a line plot and a subplot. (5 marks)
Line Plot:
python
Copy code
import matplotlib.pyplot as plt

# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Line plot
plt.plot(x, y)
plt.title('Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Subplot:
python
Copy code
# Data
x = [1, 2, 3, 4, 5]
y1 = [2, 3, 5, 7, 11]
y2 = [1, 4, 6, 8, 10]

# Subplot
fig, axs = plt.subplots(2)
axs[0].plot(x, y1)
axs[0].set_title('First Subplot')

axs[1].plot(x, y2)
axs[1].set_title('Second Subplot')

plt.show()

6. Highlight the features of Jupyter Notebook and its advantages. (6 marks)


Features:

1. Interactive Coding: Supports code execution in cells, allowing step-by-step analysis.


2. Rich Media: Embed images, videos, and interactive widgets.
3. Markdown Support: Combine code with documentation.
4. Visualization: Integrate plots and graphs directly in the notebook.
5. Extensions: Enhance functionality with various plugins and extensions.
6. Export Options: Export notebooks in various formats like HTML, PDF, and slides.

Advantages:

1. Ease of Use: Intuitive interface for beginners and experts.


2. Reproducibility: Document code and results together.
3. Collaboration: Shareable notebooks for collaborative work.
4. Versatility: Suitable for data analysis, machine learning, and more.
5. Integration: Supports multiple languages and tools via kernels.

7. Describe the steps to perform data manipulation using Pandas. (10 marks)
Steps:

1. Loading Data:
python
Copy code
import pandas as pd
df = pd.read_csv('data.csv')

2. Inspecting Data:
python
Copy code
df.head()
df.info()
df.describe()

3. Handling Missing Values:


python
Copy code
df = df.dropna() # or df.fillna(0)

4. Filtering Data:
python
Copy code
filtered_df = df[df['column'] > value]

5. Grouping and Aggregating:


python
Copy code
grouped_df = df.groupby('category').sum()

6. Merging DataFrames:
python
Copy code
df2 = pd.read_csv('data2.csv')
merged_df = pd.merge(df, df2, on='key')

7. Creating New Columns:


python
Copy code
df['new_column'] = df['column1'] + df['column2']

8. Applying Functions:
python
Copy code
df['new_column'] = df['column'].apply(lambda x: x * 2)

9. Sorting Data:
python
Copy code
df = df.sort_values(by='column')
10.Saving Data:
python
Copy code
df.to_csv('cleaned_data.csv', index=False)

8. Using Python, demonstrate how to perform a basic statistical analysis on a dataset


using NumPy and SciPy. Include code snippets. (10 marks)
Example:
import numpy as np
from scipy import stats

# Sample data
data = np.array([1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 7, 8, 9, 10])

# Basic statistics with NumPy


mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
print(f'Mean: {mean}, Median: {median}, Standard Deviation: {std_dev}')

# Statistical tests with SciPy


t_stat, p_value = stats.ttest_1samp(data, 5)
print(f'T-statistic: {t_stat}, P-value: {p_value}')

9. How does the ANOVA technique help in understanding data variance? Provide an
example of its application in data analysis. (5 marks)
ANOVA (Analysis of Variance):

 Purpose: Compares means of three or more groups to see if at least one is significantly different.
 Application:
 Example: Testing if different teaching methods affect student performance.
python
Copy code
import pandas as pd
from scipy import stats

# Sample data
data = {
'Method A': [85, 90, 88, 92],
'Method B': [78, 80, 82, 84],
'Method C': [90, 92, 94, 96]
}
df = pd.DataFrame(data)

# ANOVA
f_stat, p_value = stats.f_oneway(df['Method A'], df['Method B'], df['Method C'])
print(f'F-statistic: {f_stat}, P-value: {p_value}')
10. Describe the basic operations and array attributes in NumPy. Provide examples of
creating arrays and performing operations on them. (5 marks)
Array Creation:
python
Copy code
import numpy as np

# Creating arrays
array1 = np.array([1, 2, 3])
array2 = np.zeros((2, 3))
array3 = np.ones((3, 3))
array4 = np.arange(0, 10, 2)
array5 = np.linspace(0, 1, 5)

print(array1, array2, array3, array4, array5)

Array Operations:
python
Copy code
# Element-wise operations
sum_array = array1 + array4
prod_array = array1 * array4

# Matrix operations
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

matrix_sum = matrix1 + matrix2


matrix_product = np.dot(matrix1, matrix2)

print(matrix_sum, matrix_product)

Attributes:
python
Copy code
# Array attributes
print(f'Shape: {array1.shape}')
print(f'Data type: {array1.dtype}')
print(f'Number of dimensions: {array1.ndim}')

11. Explain how to use Pandas for SQL operations. Provide example code to
demonstrate reading data from a SQL database and performing a simple query. (10
marks)
Example:
python
Copy code
import pandas as pd
import sqlite3

# Connect to the database


conn = sqlite3.connect('database.db')

# Reading data from SQL


df = pd.read_sql_query("SELECT * FROM table_name", conn)

# Performing a simple query


query_result = pd.read_sql_query("SELECT column1, column2 FROM table_name WHERE column3
> value", conn)

# Closing the connection


conn.close()

print(query_result)

12. Define data science and explain its basic concepts.


Data Science:

 Definition: An interdisciplinary field focused on extracting insights from data using techniques from statistics,
computer science, and domain knowledge.
 Basic Concepts:
 Data Collection: Gathering data from various sources.
 Data Cleaning: Preparing and cleaning data for analysis.
 Exploratory Data Analysis (EDA): Understanding data patterns and characteristics.
 Modeling: Applying statistical and machine learning models to make predictions.
 Validation: Assessing model performance.
 Communication: Presenting insights through reports and visualizations.

13. List and describe three industries that utilize data science.
Industries:

1. Healthcare: Predictive modeling for patient outcomes, personalized treatment plans, and medical image
analysis.
2. Finance: Fraud detection, risk management, algorithmic trading, and customer segmentation.
3. Retail: Inventory management, recommendation systems, and customer behavior analysis.

14. Outline the steps involved in the data analytics process.


Steps:

1. Define Objectives: Identify business goals and data requirements.


2. Data Collection: Gather relevant data from various sources.
3. Data Cleaning: Handle missing values, outliers, and inconsistencies.
4. Exploratory Data Analysis (EDA): Understand data patterns and relationships.
5. Modeling: Build and train predictive models.
6. Validation: Evaluate model performance and fine-tune as needed.
7. Deployment: Implement the model in a production environment.
8. Monitoring: Continuously monitor model performance and update as needed.
9. Communication: Present findings and insights to stakeholders.

15. What are the key techniques for exploring and visualizing data?
Techniques:

1. Descriptive Statistics: Summarize data using mean, median, mode, and standard deviation.
2. Data Visualization: Use charts and plots to represent data visually (e.g., histograms, scatter plots, box plots).
3. Correlation Analysis: Identify relationships between variables using correlation coefficients.
4. Dimensionality Reduction: Simplify data using techniques like PCA (Principal Component Analysis).
5. Clustering: Group similar data points together to identify patterns (e.g., K-means clustering).

16. Explain descriptive statistics and their importance in data analysis.


Descriptive Statistics:

 Definition: Statistics that summarize and describe the main features of a dataset.
 Importance:
 Data Summarization: Provide a quick overview of data.
 Central Tendency: Measures like mean, median, and mode indicate the central point of the data.
 Dispersion: Measures like range, variance, and standard deviation show data spread.
 Shape: Skewness and kurtosis describe data distribution shape.

17. Describe different data types used for plotting and visualization.
Data Types:

1. Numerical Data: Continuous or discrete values, used in line plots, histograms, and scatter plots.
2. Categorical Data: Distinct categories, used in bar charts and pie charts.
3. Time-Series Data: Data points indexed by time, used in line plots and area plots.
4. Geospatial Data: Data with geographic components, used in maps and heatmaps.

18. Differentiate between statistical and non-statistical analysis methods.


Statistical Analysis:

 Definition: Involves mathematical calculations to derive insights from data.


 Examples: Hypothesis testing, regression analysis, ANOVA.

Non-Statistical Analysis:

 Definition: Relies on qualitative methods to analyze data.


 Examples: Text analysis, thematic analysis, case studies.

19. Compare and contrast descriptive and inferential statistics.


Descriptive Statistics:

 Purpose: Summarize and describe the main features of a dataset.


 Scope: Limited to the given data.
 Examples: Mean, median, mode, standard deviation.

Inferential Statistics:

 Purpose: Draw conclusions about a population based on a sample.


 Scope: Generalize beyond the given data.
 Examples: Hypothesis testing, confidence intervals, regression analysis.

20. Define population, sample, and sampling techniques in statistics.


Population:

 Definition: The entire group of individuals or instances about whom we want to draw conclusions.

Sample:

 Definition: A subset of the population selected for analysis.

Sampling Techniques:

1. Random Sampling: Each member of the population has an equal chance of being selected.
2. Stratified Sampling: Population divided into subgroups (strata), and samples are taken from each.
3. Cluster Sampling: Population divided into clusters, and a whole cluster is sampled.
4. Systematic Sampling: Every nth member of the population is selected.

21. Provide an overview of regression analysis and ANOVA.


Regression Analysis:

 Purpose: Understand the relationship between dependent and independent variables.


 Types: Linear regression, multiple regression, logistic regression.
 Example: Predicting house prices based on features like size, location, and age.

ANOVA (Analysis of Variance):

 Purpose: Compare means of three or more groups to see if at least one is significantly different.
 Types: One-way ANOVA, two-way ANOVA.
 Example: Testing if different fertilizers affect crop yield.

22. Explain the concept of S Square and its significance in correlation analysis.
S Square (Sum of Squares):

 Definition: A measure of the total variability in the data.


 Types:
 Total Sum of Squares (TSS): Total variability in the data.
 Explained Sum of Squares (ESS): Variability explained by the model.
 Residual Sum of Squares (RSS): Variability not explained by the model.
 Significance: Used to calculate the coefficient of determination (R²) in regression analysis, indicating the
proportion of variance explained by the model.

23. Describe practical applications and interpretation of regression and ANOVA.


Regression Applications:

 Predictive Modeling: Forecasting future values based on past data.


 Risk Assessment: Estimating the impact of different factors on risk.

ANOVA Applications:

 Experimental Design: Comparing group means in controlled experiments.


 Quality Control: Assessing differences in manufacturing processes.

Interpretation:

 Regression: Coefficients indicate the direction and strength of relationships between variables.
 ANOVA: F-statistic and p-value indicate if group means are significantly different.

24. Discuss the uses of Python in data science and analytics.


Uses:

1. Data Manipulation: Libraries like Pandas for data cleaning and transformation.
2. Statistical Analysis: Libraries like SciPy and Statsmodels for statistical tests and models.
3. Machine Learning: Libraries like Scikit-Learn for building and evaluating models.
4. Data Visualization: Libraries like Matplotlib and Seaborn for creating plots and charts.
5. Big Data: Libraries like Dask and PySpark for handling large datasets.

25. Explain how to set up a Python environment and the role of Jupyter Notebook.
Setting Up Python Environment:

1. Install Python: Download and install from the official Python website.
2. Package Manager: Install pip or conda for managing packages.
3. Virtual Environment: Create a virtual environment to manage dependencies.
bash
Copy code
# Using venv
python -m venv myenv
source myenv/bin/activate

# Using conda
conda create --name myenv python=3.8
conda activate myenv
Installing Packages:
bash
Copy code
pip install pandas numpy matplotlib jupyter

Role of Jupyter Notebook:

 Interactive Coding: Execute code in cells, visualize outputs immediately.


 Documentation: Combine code with markdown for comprehensive analysis.
 Visualization: Integrate plots directly within the notebook.
 Sharing: Easily share notebooks with others for collaboration.

26. List and describe Python data types, operators, and functions.
Data Types:

1. int: Integer numbers.


2. float: Floating-point numbers.
3. str: Strings.
4. bool: Boolean values (True/False).
5. list: Ordered, mutable sequence of elements.
6. tuple: Ordered, immutable sequence of elements.
7. dict: Key-value pairs.
8. set: Unordered collection of unique elements.

Operators:

1. Arithmetic Operators: +, -, *, /, //, %, **.


2. Comparison Operators: ==, !=, >, <, >=, <=.
3. Logical Operators: and, or, not.
4. Assignment Operators: =, +=, -=, *=, /=.
5. Bitwise Operators: &, |, ^, ~, <<, >>.

Functions:

1. Built-in Functions: len(), sum(), max(), min(), print(), type().


2. User-defined Functions:
python
Copy code
def my_function(param1, param2):
return param1 + param2

27. Describe the uses of R in data analysis and visualization.


Uses:

1. Statistical Analysis: Extensive libraries for statistical tests and models.


2. Data Manipulation: Libraries like dplyr for data cleaning and transformation.
3. Data Visualization: Libraries like ggplot2 for creating high-quality plots.
4. Machine Learning: Packages like caret for building and evaluating models.
5. Report Generation: Tools like RMarkdown for creating dynamic reports.

28. Explain the different data sets and data structures available in R.
Data Sets:

 Built-in datasets: mtcars, iris, airquality, etc.

Data Structures:

1. Vector: Ordered collection of elements of the same type.


R
Copy code
v <- c(1, 2, 3)

2. Matrix: 2-dimensional array of elements of the same type.


R
Copy code
m <- matrix(1:9, nrow = 3)

3. Data Frame: Tabular data structure with columns of different types.


R
Copy code
df <- data.frame(Name = c("A", "B"), Age = c(25, 30))

4. List: Ordered collection of elements of different types.


R
Copy code
lst <- list(Name = "A", Age = 25)

29. Discuss data manipulation and visualization techniques in R.


Data Manipulation:

 dplyr Package:
R
Copy code
library(dplyr)
df <- df %>%
filter(Age > 25) %>%
mutate(Salary = Age * 1000) %>%
arrange(Name)

Data Visualization:
 ggplot2 Package:
R
Copy code
library(ggplot2)
ggplot(df, aes(x = Age, y = Salary)) +
geom_point() +
theme_minimal()

30. What is NumPy and why is it important for numerical computing?


NumPy:

 Definition: A fundamental package for numerical computing in Python.


 Importance:
 Efficient Arrays: Provides N-dimensional array objects for efficient computation.
 Mathematical Functions: Offers a wide range of mathematical functions for operations on arrays.
 Linear Algebra: Includes functions for linear algebra, Fourier transforms, and random number
generation.
 Interoperability: Works seamlessly with other libraries like SciPy, Pandas, and Matplotlib.

31. Describe how to create and manipulate Ndarrays in NumPy.


Creating Ndarrays:
python
Copy code
import numpy as np

# Creating arrays
array1 = np.array([1, 2, 3])
array2 = np.zeros((2, 3))
array3 = np.ones((3, 3))
array4 = np.arange(0, 10, 2)
array5 = np.linspace(0, 1, 5)

Manipulating Ndarrays:
python
Copy code
# Reshaping
reshaped = array1.reshape((3, 1))

# Indexing
element = array1[0]

# Slicing
subset = array1[1:3]

# Broadcasting
broadcasted = array1 * 2
# Aggregation
sum_array = array1.sum()
mean_array = array1.mean()

32. List basic operations, mathematical functions, and array attributes in NumPy.
Basic Operations:

1. Element-wise Operations: +, -, *, /, **.


2. Matrix Operations: np.dot(), np.matmul().
3. Comparison Operations: ==, !=, <, >, <=, >=.

Mathematical Functions:

1. Sum: np.sum()
2. Mean: np.mean()
3. Standard Deviation: np.std()
4. Min/Max: np.min(), np.max()
5. Sin/Cos/Tan: np.sin(), np.cos(), np.tan()

Array Attributes:

1. Shape: array.shape
2. Data Type: array.dtype
3. Number of Dimensions: array.ndim
4. Size: array.size

33. Provide an introduction to SciPy and its sub-packages.


SciPy:

 Definition: An open-source Python library used for scientific and technical computing.
 Sub-packages:
1. scipy.optimize: Functions for optimization and root finding.
2. scipy.stats: Statistical functions and tests.
3. scipy.integrate: Numerical integration routines.
4. scipy.linalg: Linear algebra routines.
5. scipy.signal: Signal processing tools.
6. scipy.sparse: Sparse matrix operations.
7. scipy.fftpack: Fast Fourier Transform routines.

34. Discuss applications of SciPy in integration, optimization, and statistics.


Applications:

1. Integration:

 Example: Numerical integration of a function.


python
Copy code
from scipy.integrate import quad

result, error = quad(lambda x: x**2, 0, 1)


print(result)

2. Optimization:

 Example: Finding the minimum of a function.


python
Copy code
from scipy.optimize import minimize

def objective(x):
return x**2 + 5*x + 4

result = minimize(objective, 0)
print(result.x)

3. Statistics:

 Example: Performing a t-test.


python
Copy code
from scipy import stats

data = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 7, 8, 9, 10]


t_stat, p_value = stats.ttest_1samp(data, 5)
print(t_stat, p_value)

35. Explain how SciPy is used for scientific computing tasks.


SciPy in Scientific Computing:

 Optimization: Solve problems involving finding maxima, minima, and roots.


 Integration: Perform numerical integration and solve differential equations.
 Signal Processing: Analyze and filter signals.
 Linear Algebra: Solve linear systems, eigenvalue problems, and matrix decompositions.
 Statistics: Conduct hypothesis tests, estimate distributions, and perform regression analysis.

36. What is Pandas and how is it used for data manipulation?


Pandas:

 Definition: A powerful open-source data analysis and manipulation library for Python.
 Usage:
 Data Structures: Provides DataFrames and Series for handling structured data.
 Data Cleaning: Handle missing values, duplicates, and inconsistent data.
 Data Transformation: Apply functions, group data, and perform aggregations.
 Data Analysis: Perform statistical analysis, merge/join datasets, and filter data.
 Input/Output: Read and write data from/to various file formats (CSV, Excel, SQL).

37. Describe how to work with DataFrames, including data operations and indexing.
Working with DataFrames:

 Creating DataFrames:
python
Copy code
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}


df = pd.DataFrame(data)

 Indexing:
python
Copy code
# Selecting a column
ages = df['Age']

# Selecting multiple columns


subset = df[['Name', 'Age']]

# Selecting rows by index


first_row = df.iloc[0]

# Selecting rows by condition


adults = df[df['Age'] > 30]

 Data Operations:
python
Copy code
# Adding a new column
df['Salary'] = [50000, 60000, 70000]

# Applying functions
df['Age in 10 Years'] = df['Age'].apply(lambda x: x + 10)

# Grouping and aggregating


grouped = df.groupby('Name').mean()

# Merging DataFrames
data2 = {'Name': ['Alice', 'Bob'], 'Department': ['HR', 'IT']}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df, df2, on='Name')
38. Explain how to read and write data files and perform SQL operations with Pandas.
Reading and Writing Data Files:

 Reading:
python
Copy code
df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')
df = pd.read_json('data.json')

 Writing:
python
Copy code
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
df.to_json('output.json')

You might also like