Project 15 3 Final 1
Project 15 3 Final 1
Student ID:
Student Name:
Module Name:
Module ID:
Course Name:
Intake:
University Name:
Mail ID:
Ph No:
Original Code:
python
def # Missing
pr(n) colonMissing colon and
for j in range(2,n)#
indentation
if n%j==0: # Incorrect indentation
return
False return
In-Depth Critique of
Missing Colons and Indentation: Python depends on proper indentation for code block
structure. The absence of colons (:) following def and for statements, along with
improper indentation patterns, would immediately cause syntax errors or create
logical inconsistencies.
Missing Increment: The while loop lacks any increment operation for the loop variable
i. This creaan infinite loop condition if the code executes at all.
Logical Errors
Incorrect Loop Start: The iteration begins at 1, however 1 is not considered prime.
Prime numbers are mathematically defined as integers greater than 1 having no
positive divisors except 1 and the number itself.
Premature Return: The function f returns the list p after just one iteration due to
incorrect placement of the return statement.
Input Handling: The input function produces a string value; this requires conversion to
integer format before processing.
Loop Bounds: The prime search should encompass the limiting value (using i <= l),
rather than terminating before reaching it.
Poor Naming: Function and variable identifiers are unclear (pr, f, p), creating difficulties
for code maintenance and debugging for other developers or even the original
programmer later.
Redundant Boolean Check: The expression if pr(i) == True can be simplified to just if
pr(i).
Inefficient Prime Test: For each number up to n, all potential divisors from 2 to n-1 are
examined; this approach becomes extremely slow for larger numbers.
pytho
n
def
is_prime(n
): """
Check if a number is prime.
Args:
n (int): Number to
check Returns:
bool: True if n is prime, False otherwise
"""
if n < 2:
return
False if n
== 2:
return
True if n %
2 == 0:
return False
for divisor in range(3, int(n**0.5)
+ 1, 2): if n % divisor == 0:
return False
return True
def
find_primes(limi
t): """
Find all prime numbers up to and including the given limit.
Args:
limit (int): Upper bound
Returns:
list: List of
primes """
primes = []
for number in range(2, limit
+ 1): if is_prime(number):
primes.append(number)
return primes
try:
user_limit = int(input("Enter the limit for prime
search: ")) if user_limit < 2:
print("Please enter a number greater than or equal to 2.")
else:
prime_numbers = find_primes(user_limit)
print(f"Prime numbers up to {user_limit}:
{prime_numbers}") except ValueError:
print("Please enter a valid integer.")
Output :
Syntax and Structure Corrected: All necessary colons and proper indentation
implemented. Loop Bounds: The range has been adjusted to include the
upper boundary.
Variable Naming: Functions and variables use descriptive, meaningful names.
Prime Test Optimization: Only odd numbers beyond 2 are tested, and checking is
limited to the square root of n.
Documentation: Functions include comprehensive docstrings for future
users/developers. User Input Validation: Ensures user input is valid and
positive integer.
No Redundant Boolean Checks: Direct Boolean evaluation is employed.
(Part b) Introduction to
Algorithmic Optimization
In computational number theory, efficiently generating prime numbers for large ranges
represents a classical challenge. The naive approach (testing all divisors up to n-1 for each
candidate) proves practical only for small datasets. Optimization becomes essential rather
than optional when dealing with scale.
1.Baseline "Brute Force": Examine all integers from 2 to n-1 as potential divisors for
each candidate number.
2.Improved "Trial Division to √n": For odd numbers, test only divisors up to the square
root. This intelligently exploits the mathematical principle that if n isn't prime, at least
one factor must be ≤ √n.
3.Sieve of Eratosthenes: The optimal solution for generating multiple primes. Rather
than testing each number individually, it systematically marks all multiples of each
discovered prime, ensuring each composite number is marked only once.
Performance Comparison
Interpretation
import time
import math
function."""
start_time = time.time()
result = func(*args)
end_time =
time.time()
def basic_prime_check(n):
original
approach."
"" if n <
2:
return False
for i in range(2,
n): if n % i ==
0:
return
False return
True
def
optimized_prime_check(n):
"""Optimized primality
test.""" if n < 2:
return
False if n
== 2:
return
True if n %
2 == 0:
return False
2):
if n % i == 0:
return
False
return True
def
sieve_of_eratosthenes(li
mit): """
Sieve of Eratosthenes
up to limit.
multiple primes.
"""
if limit <
2: return
[]
1) is_prime[0] = is_prime[1]
= False
# Sieve process
+ 1):
if is_prime[i]:
prime
for j in range(i * i, limit + 1, i):
is_prime[j] = False
# Collect prime
numbers
if is_prime[i]]
def basic_find_primes(limit):
basic
prime
checking."""
primes = []
1): if
basic_prime_check(i)
primes.append(i)
return primes
def optimized_find_primes(limit):
better
prime
checking."""
primes = []
if optimized_prime_check(i):
primes.append(i)
return primes
# Performance comparison
10000]
print("Performance
{'Speedup':<10}")
print("-" * 60)
primes_basic, time_basic
time_function(basic_find_primes, limit)
primes_optimized, time_optimized
=
time_function(optimized_find_primes,
limit)
primes_sieve, time_sieve
time_function(sieve_of_eratosthe
nes, limit)
# Calculate speedup
print(f"{limit:<8} {time_basic:<12.6f}
{time_optimized:<15.6f}
{time_sieve:<12.6f} {speedup:<10.2f}x")
identical assert
primes_basic ==
primes_optimized ==
limit {limit}"
print("\nOptimization Techniques
Applied:")
print("1. Square root optimization:
numbers")
primes")
Output :
Performance Comparison:
======================================================
======
100 ~0 ~0 ~0 inf x
1000 0.0059 ~0 ~0 inf x
Reduce Computational Complexity: Standard trial division operates at O(n²). Square root
optimization reduces this to O(n√n). The Sieve achieves O(n log log n).
Early Exit: Once evidence indicates a number isn't prime (first divisor discovered),
computation immediately terminates for that candidate.
Even Number Skipping: All even candidates beyond '2' can be immediately rejected.
Sieve Innovation: For batch prime generation, marking all multiples of each identified
prime eliminates redundant calculations.
Broader Lessons
Algorithm selection proves crucial. The performance difference isn't marginal; inefficient
implementations become unusable for large inputs.
In professional environments (cryptography, scientific computing), these performance
gains aren't merely "desirable"—they're absolutely essential.
Constructing a Blackjack game through programming involves more than coding mechanics.
It requires requirements gathering (What constitutes the game flow? How should edge
cases be managed?) and system design (function architecture, data representation).
The objective is a text-based game where one player competes against a computerized
dealer, both striving to achieve a score as close to 21 as possible without exceeding it.
Functional Specification
Solution Architecture
Implementation Walkthrough
import random
def
create_deck():
'Clubs', 'Spades']
ranks = ['2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K', 'A']
deck = []
deck.append(f"{rank} of
{suit}")
return deck
def get_card_value(card):
return 10
needed else:
return int(rank)
def calculate_hand_value(hand):
aces = 0
value =
get_card_value(card) if
aces += 1
total +=
value
aces > 0:
to 1 aces -= 1
return total
if hide_first:
print(f" Hidden
hand[1:]:
print(f" {card}")
display visible_value =
calculate_hand_value(hand[1:]) print(f"
Visible value: {visible_value}")
else:
print(f" {card}")
total_value =
calculate_hand_value(hand) print(f"
and dealer."""
for _ in range(2):
player_hand.append(deck.po
p())
dealer_hand.append(deck.po
p())
def player_turn(deck,
player_value = calculate_hand_value(player_hand)
player_hand.append(deck.pop())
new_card = player_hand[-1]
display_hand(player_hand,
"Player") break
else:
def dealer_turn(deck,
nDealer's turn:")
display_hand(dealer_hand,
"Dealer")
while calculate_hand_value(dealer_hand)
dealer_hand.append(new_card) print(f"\
display_hand(dealer_hand, "Dealer")
dealer_value =
calculate_hand_value(dealer_hand) if
else:
player_value =
calculate_hand_value(player_hand) dealer_value
= calculate_hand_value(dealer_hand)
print(f"\n{'='*40}")
print("GAME RESULT")
print(f"{'='*40}")
if player_busted:
"dealer" else:
def play_blackjack():
print("Welcome to Blackjack!")
game_count = 0
player_wins = 0
dealer_wins = 0
ties = 0
while True:
game_count +=
1 print(f"\
n{'='*50}")
print(f"GAME
{game_count}")
print(f"{'='*50}")
# Initialize game
deck =
create_deck()
random.shuffle(de
ck) player_hand =
[]
dealer_hand = []
display_hand(player_hand,
"Player")
player_value =
calculate_hand_value(player_hand) dealer_value
= calculate_hand_value(dealer_hand)
wins!") player_wins += 1
display_hand(dealer_hand, "Dealer")
wins!") dealer_wins += 1
else:
# Player's turn
player_standi
ng: #
Dealer's turn
# Determine winner
False) else:
# Player busted
# Update score
if result == "player":
player_wins += 1
elif result ==
"dealer":
dealer_wins += 1
else:
ties += 1
# Ask to play
{player_wins}")
print(f"Ties: {ties}")
print("Thanks for
playing!") return
else:
play_blackjack()
===========================================
======= GAME 1
==================================================
Player's hand:
7 of
Hearts K
of Spades
Total value: 17
Dealer's hand:
Hidden card
5 of Diamonds
Visible value: 5
Dealer's turn:
Dealer's hand:
Q of Hearts
5 of Diamonds
Total value: 15
Dealer stands
with 19
========================================
GAME RESULT
========================================
Dealer wins with 19 vs 17!
Testing can be performed manually (running and playing with various inputs and outcome
scenarios) and supported by unit tests (for hand value calculations and deck logic).
Since your PDF doesn't provide specific content for this section, I'll present a
comprehensive model answer based on standard practices for air quality data analysis
and time series analysis using Python. If you have particular data files or instructions,
please provide them!
3.1 Introduction
Air pollution affects millions globally, making precise monitoring and timely interventions
essential for public health protection. Analyzing air quality data (including PM2.5, PM10,
NOx, SO2) enables stakeholders to identify trends, detect anomalous events, and assess
policy effectiveness. The proliferation of affordable sensors and accessible government
data facilitates community-driven or official monitoring initiatives worldwide.
Time series analysis involves examining pollutant measurements across consistent time
intervals (hourly, daily, monthly). It can reveal patterns such as daily cycles, seasonal
variations, and unexpected spikes ("episodes").
import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt import seaborn as sns
from collections import
Counter import os
import warnings
warnings.filterwarnings('ign
ore')
def
load_anage_data(filepath='anage_data.t
xt'): """
Load the AnAge database from downloaded tab-delimited file.
The file should be downloaded from:
https://genomics.senescence.info/ """
try:
# Check if file exists first
if not os.path.exists(filepath):
print(f"File {filepath} not found in current directory")
print(f"Available files: {[f for f in os.listdir('.') if f.endswith(('.txt',
'.csv'))]}") return None
return df
except Exception as e:
print(f"Error loading data:
{e}")
return None
def
explore_data_structure(df
): """
Familiarize with the data frame structure and
contents """
print("\n" + "="*70)
print("DATA STRUCTURE EXPLORATION")
print("="*70)
print(f"\nData Types:")
print(df.dtypes)
print(f"\nKingdoms in
dataset:") if 'Kingdom' in
df.columns:
print(df['Kingdom'].value_counts())
def
task_a_species_by_class_analysi
s(df): """
Task (a): Summarise the number of species within each
animal Class for which maximum longevity information
exists
"""
print("\n" + "="*70)
print("TASK (A): SPECIES COUNT BY CLASS ANALYSIS")
print("="*70)
animalia_df = df[df['Kingdom'] ==
'Animalia'].copy() print(f"Records in Kingdom
Animalia: {len(animalia_df)}")
# Bar plot
colors = plt.cm.Set3(np.linspace(0, 1, len(species_by_class)))
bars = ax1.bar(range(len(species_by_class)), species_by_class.values, color=colors,
edgecolor='black', linewidth=0.8)
ax1.set_xlabel('Animal Class', fontsize=12, fontweight='bold')
ax1.set_ylabel('Number of Species', fontsize=12, fontweight='bold')
ax1.set_title('Number of Species with Longevity Data by Class\n(Kingdom:
Animalia)', fontsize=14, fontweight='bold', pad=20)
ax1.set_xticks(range(len(species_by_class)))
ax1.set_xticklabels(species_by_class.index, rotation=45, ha='right')
ax1.grid(axis='y', alpha=0.3)
pie_data = top_10.tolist()
pie_labels =
top_10.index.tolist()
if others_count > 0:
pie_data.append(others_cou
nt)
pie_labels.append(f'Others ({len(species_by_class) - 10} classes)')
plt.tight_layout()
plt.show()
{species_by_class.median():.1f}") return
species_by_class, filtered_df
if not weight_cols:
print("Error: No weight column found")
print("Available columns:",
df.columns.tolist())
return
weight_col = weight_cols[0]
print(f"Using weight column: '{weight_col}'")
# Filter data for top 4 classes with both weight and longevity
data analysis_df = filtered_df[
(filtered_df['Class'].isin(top_4_class
es)) &
(filtered_df[weight_col].notna()) &
(filtered_df[longevity_col].notna())
].copy()
# Convert to numeric
analysis_df[weight_col] = pd.to_numeric(analysis_df[weight_col], errors='coerce')
analysis_df[longevity_col] = pd.to_numeric(analysis_df[longevity_col], errors='coerce')
if len(analysis_df) == 0:
print("No data available for weight vs longevity
analysis") return
print(f"\nAnalysis Results by
Class:") print("-" * 50)
outliers_info = []
for i, class_name in
enumerate(top_4_classes): if i >= 4:
# Safety check
break
class_data = analysis_df[analysis_df['Class'] == class_name]
if len(class_data) == 0:
axes[i].text(0.5, 0.5, f'No data available\nfor
{class_name}', ha='center', va='center',
transform=axes[i].transAxes)
axes[i].set_title(f'{class_name}\
n(n=0)') continue
# Extract data
weights = class_data[weight_col]
longevities =
class_data[longevity_col]
# Calculate correlation
correlation = weights.corr(longevities)
x_trend = np.logspace(np.log10(weights.min()),
np.log10(weights.max()), 100) y_trend = 10**p(np.log10(x_trend))
axes[i].plot(x_trend, y_trend, 'r--', alpha=0.8, linewidth=2)
outliers =
class_data[ (class_data[weight_col]
> weight_q99) |
(class_data[weight_col] <
weight_q01) |
(class_data[longevity_col] >
longevity_q99) |
(class_data[longevity_col] <
weight_q01)
]
print(f"\n{class_name}:")
print(f" Sample size: {len(class_data)}")
print(f" Weight range: {weights.min():.2e} - {weights.max():.2e} g")
print(f" Longevity range: {longevities.min():.1f} -
{longevities.max():.1f} years") print(f" Correlation (weight vs
longevity): {correlation:.3f}")
if len(outliers) > 0:
print(f" Extreme outliers identified: {len(outliers)}")
for _, outlier in outliers.head(3).iterrows(): # Show top 3
name = outlier['Common name'] if pd.notna(outlier['Common name']) else
'Unknown' weight = outlier[weight_col]
longevity = outlier[longevity_col]
outliers_info.append({
'class':
class_name,
'name': name,
'weight': weight,
'longevity':
longevity
})
print(f" - {name}: {weight:.2e}g, {longevity:.1f} years")
# Load data
df = load_anage_data('anage_data.txt')
if df is None:
print("Failed to load data. Please ensure 'anage_data.txt' is in the current
directory.") return
print(f"\n" + "="*70)
print("ANALYSIS
COMPLETE")
print("="*70)
print("This analysis addresses both requirements:")
print("(a) Species count by animal class with longevity
data") print("(b) Longevity vs weight analysis for top 4
classes")
print("All visualizations use appropriate scales and highlight key insights.")
Output :
ANAGE DATABASE ANALYSIS
======================================================
============
====
Analysis for Animal Longevity Data
Dataset: AnAge Database
(genomics.senescence.info) Successfully loaded
4645 records from AnAge database
Columns: ['HAGRID', 'Kingdom', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species',
'Common name', 'Female maturity (days)', 'Male maturity (days)', 'Gestation/Incubation
(days)', 'Weaning (days)', 'Litter/Clutch size', 'Litters/Clutches per year',
'Inter-litter/Interbirth interval', 'Birth weight (g)', 'Weaning weight (g)', 'Adult weight (g)',
'Growth rate (1/days)', 'Maximum longevity (yrs)', 'Source', 'Specimen origin', 'Sample
size', 'Data quality', 'IMR (per yr)', 'MRDT (yrs)', 'Metabolic rate (W)', 'Body mass (g)',
'Temperature (K)', 'References']
Shape: (4645, 31)
======================================================
============
====
DATA STRUCTURE EXPLORATION
======================================================
============
====
Dataset Shape: 4645 rows × Index(['HAGRID', 'Kingdom', 'Phylum', 'Class', 'Order',
'Family', 'Genus', 'Species', 'Common name', 'Female maturity (days)',
'Male maturity (days)', 'Gestation/Incubation (days)', 'Weaning
(days)', 'Litter/Clutch size', 'Litters/Clutches per year',
'Inter-litter/Interbirth interval', 'Birth weight (g)',
'Weaning weight (g)', 'Adult weight (g)', 'Growth rate
(1/days)', 'Maximum longevity (yrs)', 'Source', 'Specimen
origin', 'Sample size', 'Data quality', 'IMR (per yr)', 'MRDT
(yrs)', 'Metabolic rate (W)',
'Body mass (g)', 'Temperature (K)',
'References'], dtype='object') columns
Column Names:
1.HAGRID
2.Kingdom
3.Phylum
4.Class
5.Order
6.Family
7.Genus
8.Species
9.Common name
10. Female maturity (days)
11. Male maturity (days)
12. Gestation/Incubation (days)
13. Weaning (days)
14. Litter/Clutch size
15. Litters/Clutches per year
16. Inter-litter/Interbirth interval
17. Birth weight (g)
18. Weaning weight (g)
19. Adult weight (g)
20. Growth rate (1/days)
21. Maximum longevity (yrs)
22. Source
23. Specimen origin
24. Sample size
25. Data quality
26. IMR (per yr)
27. MRDT (yrs)
28. Metabolic rate (W)
29. Body mass (g)
30. Temperature (K)
31. References
Data Types:
HAGRID int64
Kingdom object
Phylum object
Class object
Order object
Family object
Genus object
Species object
Common name object
Female maturity (days) float64
Male maturity (days) float64
Gestation/Incubation (days)
float64
Weaning (days) float64
Litter/Clutch size float64
Litters/Clutches per year
float64 Inter-litter/Interbirth
interval float64
Birth weight (g) float64
Weaning weight (g) float64
Adult weight (g) float64
Growth rate (1/days) float64
Maximum longevity (yrs)
float64
Source object
Specimen origin
object
Sample size object
Data quality object
IMR (per yr) float64
MRDT (yrs) float64
Metabolic rate (W)
float64
Body mass (g) float64
Temperature (K) float64
References
Kingdoms in dataset:
Kingdom
Animalia
4636
Plantae 4
Fungi 4
Monera 1
Name: count, dtype: int64
... Source Specimen origin Sample size Data quality IMR (per yr) \
0 ... 1466 wild medium acceptable NaN
1 ... 652 wil small acceptable NaN
d
2 ... 1467 wild small acceptable NaN
MRDT (yrs) Metabolic rate (W) Body mass (g) Temperature (K) References
0 NaN NaN NaN NaN 14
66
1 NaN NaN NaN NaN 65
2
2 NaN NaN NaN NaN 1467
[3 rows x 31 columns]
======================================================
============
====
TASK (A): SPECIES COUNT BY CLASS ANALYSIS
======================================================
============
====
Records in Kingdom Animalia: 4636
Using longevity column: 'Maximum longevity
(yrs)' Records with longevity data: 4135
Aves 1394
Mammalia 1029
Teleostei 798
Reptilia 526
Amphibia 162
Chondrichthye 116
s
Bivalvia 42
Cephalaspidomorphi 16
Chondrostei 14
Insecta 10
Holostei 4
Polychaeta 3
Dipnoi 3
Actinopterygii 3
Chromadorea 2
Echinoidea 2
Rhabditophora 1
Malacostraca 1
Demospongiae 1
Hexactinellida 1
Gastropoda 1
Coelacanthi 1
Cladistei 1
Cephalopoda 1
Branchiopoda 1
Ascidiacea 1
Trepaxonemat 1
a
SUMMARY STATISTICS:
Total animal classes with longevity
data: 27 Total species with longevity
data: 4135 Most represented class:
Aves (1394 species) Average species
per class: 153.1
Median species per class: 3.0
======================================================
============
====
TASK (B): LONGEVITY vs ADULT WEIGHT ANALYSIS
======================================================
============
====
Using weight column: 'Adult weight (g)'
Using longevity column: 'Maximum longevity (yrs)'
Top 4 classes by species count: ['Aves', 'Mammalia', 'Teleostei',
'Reptilia'] Records available for analysis: 3112
Records per
class: Aves:
1375
Mammalia: 1023
Teleostei: 346
Reptilia: 368
Aves:
Sample size: 1375
Weight range: 2.60e+00 -
1.11e+05 g Longevity range: 0.6 -
83.0 years Correlation (weight vs
longevity): 0.257 Extreme outliers
identified: 108
- Cinereous vulture: 9.62e+03g, 39.0 years
- Eastern imperical eagle: 3.26e+03g, 56.0 years
- Black-shouldered kite: 2.66e+02g, 3.5 years
Mammalia:
Sample size: 1023
Weight range: 2.10e+00 -
1.36e+08 g Longevity range: 2.1 -
211.0 years
Correlation (weight vs longevity):
0.523 Extreme outliers identified:
131
- Streaked tenrec: 1.80e+02g, 2.7 years
- Bowhead whale: 1.00e+08g, 211.0 years
- Southern right whale: 4.50e+07g, 70.0 years
Teleostei:
Sample size: 346
Weight range: 1.10e+00 -
3.76e+05 g Longevity range: 3.0 -
205.0 years Correlation (weight vs
longevity): 0.100 Extreme outliers
identified: 292
- Shortfin eel: 4.10e+03g, 32.0 years
- African longfin eel: 4.12e+02g, 20.0 years
- Long-finned eel: 1.10e+04g, 15.0 years
Reptilia:
Sample size: 368
Weight range: 1.48e+00 -
4.20e+05 g Longevity range: 1.3 -
152.0 years Correlation (weight vs
longevity): 0.381 Extreme outliers
identified: 13
- Saltwater crocodile: 2.00e+05g, 57.0 years
- Tuatara: 4.30e+02g, 90.0 years
- Labord's chameleon: 8.73e+00g, 1.3 years
======================================================
============
====
ANALYSIS INSIGHTS AND DISCUSSION
======================================================
============
====
========================================================
==========
====
ANALYSIS COMPLETE
========================================================
==========
====
This analysis addresses both requirements:
(a) Species count by animal class with longevity data
(b)Longevity vs weight analysis for top 4 classes
All visualizations use appropriate scales and highlight key insights.
1.Data Acquisition Data can be obtained from open APIs (such as OpenAQ, government
portals) or CSV files from local sensor networks.
Example:
python
import pandas as pd
data = pd.read_csv('air_quality_data.csv', parse_dates=['timestamp'])
(hourly/daily). Example:
python
data =
data.set_index('timestamp').resample('D').mea
n() data = data.interpolate()
3.Exploratory Data Analysis
import matplotlib.pyplot as
plt plt.figure(figsize=(15,5))
plt.plot(data.index, data['PM2.5'], label='PM2.5')
plt.title('PM2.5 over Time')
plt.xlabel('Date')
plt.ylabel('Concentration
(µg/m³)') plt.legend()
plt.show()
4.Identifying Patterns
5.Advanced Analyses
Threshold Analysis: Comparing daily mean values against WHO and local air quality
standards.
Source Attribution: (With supporting data) using correlations and regression analysis
to identify sources — such as diurnal NOx peaks from traffic patterns.
Effectively communicating results proves as crucial as the analysis itself. Visualization helps
interpret large datasets and identify trends.
Examples:
Minimize Personal Data: Collect and process only data necessary for analysis purposes.
Anonymize Sensitive Information: If datasets could be traced back to individuals
(precise locations, device identifiers), this information should be removed or
generalized.
4.1.2 Security
Adhere to relevant data protection legislation (such as GDPR, IT Act, local privacy laws).
Technical Safeguards:
# Example of privacy-
preserving techniques in
genomic research
def
implement_privacy_safeguar
ds(): """
Demonstrate privacy-preserving techniques for
genomic data. """
safeguards = {
'Data Anonymization': {
'techniques': ['K-anonymity', 'L-diversity', 'Differential privacy'],
'implementation': 'Remove direct identifiers and add statistical
noise', 'challenges': 'Genomic data is inherently identifiable'
},
'Access Controls': {
'techniques': ['Role-based access', 'Multi-factor authentication', 'Audit
logging'], 'implementation': 'Granular permissions based on research
needs',
'challenges': 'Balancing security with research collaboration'
},
'Secure Computation': {
'techniques': ['Homomorphic encryption', 'Secure multi-party
computation'], 'implementation': 'Analyze encrypted data without
decryption',
'challenges': 'Computational overhead and complexity'
},
'Federated Learning': {
'techniques': ['Distributed model training', 'Data stays at source'],
'implementation': 'Share model updates, not raw data',
'challenges': 'Coordination complexity and potential information leakage'
}
}
return safeguards
def
ethical_governance_framewo
rk(): """
Outline ethical governance framework for genomic
research. """
framework = {
'Ethics Committees': {
'composition': 'Independent experts, patient representatives,
ethicists', 'responsibilities': 'Review research proposals, monitor
ongoing studies', 'powers': 'Approve, reject, or require
modifications to research'
},
'Data Access Committees': {
'composition': 'Senior researchers, data protection officers,
ethicists', 'responsibilities': 'Evaluate data access requests, ensure
appropriate use', 'powers': 'Grant or deny access, impose usage
conditions'
},
'Public Engagement': {
'methods': 'Citizen panels, public consultations, patient
advisory groups', 'frequency': 'Regular engagement
throughout research lifecycle', 'outcomes': 'Inform research
priorities and governance policies'
}
}
return framework
def
display_framework_detail
s(): """
Display the complete privacy and governance
framework. """
print("=== PRIVACY-PRESERVING TECHNIQUES IN GENOMIC RESEARCH ===\n")
safeguards =
implement_privacy_safeguards() for
category, details in safeguards.items():
print(f"{category.upper()}:")
print(f" Techniques: {',
'.join(details['techniques'])}") print(f"
Implementation: {details['implementation']}")
print(f" Challenges: {details['challenges']}")
print()
framework =
ethical_governance_framework() for
component, details in
framework.items():
print(f"{component.upper()
}:") for key, value in
details.items(): print(f"
{key.title()}: {value}")
print()
DATA ANONYMIZATION:
Techniques: K-anonymity, L-diversity, Differential privacy
Implementation: Remove direct identifiers and add
statistical noise Challenges: Genomic data is inherently
identifiable
ACCESS CONTROLS:
Techniques: Role-based access, Multi-factor authentication, Audit
logging Implementation: Granular permissions based on
research needs Challenges: Balancing security with research
collaboration
SECURE COMPUTATION:
Techniques: Homomorphic encryption, Secure multi-party
computation Implementation: Analyze encrypted data
without decryption Challenges: Computational overhead and
complexity
FEDERATED LEARNING:
Techniques: Distributed model training, Data stays at
source Implementation: Share model updates, not raw
data
Challenges: Coordination complexity and potential information leakage
PUBLIC ENGAGEMENT:
Methods: Citizen panels, public consultations, patient advisory
groups Frequency: Regular engagement throughout
research lifecycle Outcomes: Inform research priorities and
governance policies
4.3 Summary
In air quality and other domains, responsible data management proves critical not only for
regulatory compliance but for maintaining public trust. Ethics and protection in data-
driven projects must be integrated from the project inception—not added as
afterthoughts.
5.REFERENCES
Sedgewick, R., & Wayne, K. (2011). Algorithms (4th Ed). Addison-Wesley.
Cormen, Leiserson, Rivest, & Stein. (2009). Introduction to Algorithms, 3rd Ed.
Python Software Foundation. Python 3 Documentation. https://docs.python.org/3/
Real Python. Sieve of Eratosthenes: https://realpython.com/python-sieve-of-
eratosthenes/ Wikipedia. Blackjack Rules:
https://en.wikipedia.org/wiki/Blackjack
McKinney, W. (2017). Python for Data Analysis, 2nd Ed. O'Reilly.
World Health Organization (WHO) Air Quality Guidelines:
https://www.who.int/news-room/fact- sheets/detail/ambient-(outdoor)-air-quality-
and-health
European Union GDPR. https://gdpr.eu/
OpenAQ (public air quality data platform): https://openaq.org/