Lecture 1: Introduction to Cyber Security
Topics
Access Control
Security Risk Management
Attack Defense trees
Web security
Cryptography
Security Protocols
Privacy
Modulus
Remainder Function
A mod B
“If I divide A by B, how much will be left?”
A = (B*k) + R
k is the whole number of times B can go into A
R is the number left over
How to modulus
A mod B
A < B?
- If A is less than B, it is just A.
How many times can B go into A? (call this k)
What is k*B?
Now subtract k*B from A(A - k*B)
This is the remainder
Repetitive method
Keep subtracting A – B until you get a number smaller than B
Modulus is circular
What is 10 mod 4?
2*4 +2
What is 14 mod 4?
3*4 + 2
What is 18 mod 4?
4*4 + 2
Why?
It does not matter how many times a number can divide
All that matters is the remainder
Modulus value will ALWAYS be lower than B
Why?
Modulus Range
A mod B will always be between 0 and B-1
Why?
What if A is negative?
A = B*k + R
K can be negative…
R can NOT be negative
Actually it can be, but for us it can NOT be
-41 mod 10?
-41 = 10*(-5) +
-41 = 10*(-5) + 9
Modulus is Important!
It’s the basis of all encryption
You need to have a strong understanding of Modulus
We wont use it until we get to encryption, but you need to start understanding
modulus NOW!
Use the practice quiz on Brightspace to check your understanding
Ungraded and random every time
What is Cyber Security?
Cyber security is the practice of protecting systems, networks, and programs from
cyber/digital attacks.
These cyber attacks are usually aimed at accessing, changing, or destroying sensitive
information; extorting money from users; or interrupting normal business processes.
Cyber Security is multidisciplinary
This is as much a legal, governance, or psychology domain as an technical one
You do NOT need to be an expert on technology to be a cyber security professional
You do need to be aware of the technical aspects
A good technical cyber security expert should be aware of the legal/psychological
aspects
We are primarily focused on the technical aspects in this course
You should always be working to integrate this new technical knowledge with what
your government knowledge
Security (disambiguation)
Information security: often referred to as InfoSec, refers to the processes and tools
designed and deployed to protect sensitive business information from modification,
disruption, destruction, and inspection.
Computer security: The protection afforded to an automated information system in
order to attain the applicable objectives of preserving the integrity, availability, and
confidentiality of information system resources (includes hardware, software,
firmware, information/data, and telecommunications).
Information security focuses on data
Computer security focuses on computer systems
Network security focuses on connections
Cybersecurity focuses on interconnected systems
“Cybersecurity” is the general catch-all term
Cyber Security or Cybersecurity
It literally doesn’t matter
No difference in meaning
In general:
US: cybersecurity
UK: cyber security
Try to be consistent
For your assignments, pick one and then only use that version
“Ciber” is just wrong
This comes from overextending “cipher”
Cipher vs Cypher is similar
Both correct
- Cyber Security/cybersecurity -> be consistent
What is cyberspace? (Discussion)
Different definitions depending on what’s convenient
See notes last year cyber course.
Try to include different areas (transportation, health, etc.) and different aspects
(governance, technology)
What is a system?
1. A product or component, such as a cryptographic protocol, a smartcard or the
hardware of a PC
2. A collection of the above plus an operating system, communications and other things
that go to make up an organization’s infrastructure
3. Any or all of the above plus one or more applications (media player, browser, word
processor, accounts / payroll package, and so on)
4. Any or all of the above plus IT staff
5. Any or all of the above plus internal users and management
6. Any or all of the above plus customers and other external users
CIA
Confidentiality: No disclosure to non-authorized users
Integrity: No modification of information by non-authorized users
Availability: Authorized users always have access
CIA – AAA
Authenticity: Being able to be verified and trusted
Accountability: Actions are traceable to those responsible for them
Accuracy: Information is fit for purpose
The Security Framework
Policy – what you’re supposed to achieve
Mechanism – the ciphers, access controls, hardware tamper-resistance and other
machinery that you assemble in order to implement the policy
Assurance – the amount of reliance you can place on each particular mechanism
Incentives – the motive that the people guarding and maintaining the system have to
do their job properly, and also the motive that the attackers have to try to defeat
your policy
Some more definitions
Asset - anything worth protecting
Vulnerability – A property of a system which can lead to a security failure
Risk – The overall probability of bad things happening x how bad the bad thing is
(danger * consequence)
Trusted -
<…>
Security
(see slides)
Security: When the system satisfies the security requirements.
Maximum security?
Minimum function
Minimum security
Maximum function
(see slides for illustration of the spectrum) -> Max security = minimum usability and Max
usability = minimum security
Security Considerations
(see slides for diagram)
- Security level should match security requirements
Perfect Security?
(Video)
- Most secure system will be the worst usable one.
Attackers
This table shows how attackers attack. The 4 different types of attackers display their goals.
Passive (hard to detect Active (hard to prevent
aim on prevention) hard to know when they will
happen)
Insider
Outsider Spies:
- Requires resources.
- Wait to see if something
interesting happens.
4 types of attackers:
1. Spies (state actors, for example national governments)
2. Crooks cybercrime
3. Bullies people try to do psychological damage
4. Geeks
Spies
Five eyes: US, UK, CAN, AUS, NZ
China different from the five eyes in terms of their goals, they have a huge global
reach and means as well (for example: the great firewall) internally they have a
huge amount of control, and externally they aim for defense which is starting to
change it lacks platform advantage (companies from the five eyes such as
Facebook), however, they have a platform advantage internally.
Russia they have an internal platform advantage A lot of crime outside Russia,
not within their own country “You can operate within our borders as long as you
don’t attack”
Crooks
Big business
Cyber crime
Nearly half of the crime is cybercrime
Less than 1% of global police force are assigned to counter cyber crime
Hard to fight
Easy to get into
“Identity Theft”
How is this different than impersonation at a bank? responsibility: in the bank, the
bank’s responsibility. In cyber space, the user’s responsibility.
Bullies
Aim to harm
Goal is abuse
<…>
Geeks
Have the ability
Interest
Being paid
Doing because they can and are interested in doing it. Also, fulfilling bounties for
example.
Jobs in the Security Industry
(see slides)
Lecture 2: Security Risk Management
Security Management in an organization
Operations Management
Daily operations to prevent, detect, and respond to cyber incidents
Day-to-day planning
Governance
Policies and processes that determine how organisations detect, prevent, and
respond to cyber incidents
Long-term strategic Planning
Cyber Security Management
Security Planning
Managing the security policy
Writing
Updating
Communication
Enforcement
Risk Management
Threat identification
Intelligence
Implementation of security controls
Goal: Continuity of the Organization
Security Plan
Full description
Updated Regularly
Easily Accessible
Enforceable
Contains:
Security Policies
Security Status
Security Requirements
Accountability Details
Business Continuity Plan
Schedule
Security Policy
- What we are trying to achieve
From before: “What we’re trying to achieve”
Definition: Formal statement of the rules to which people and technology must
conform
Specifies what is protected and why
Describes the used procedures, controls, and standards
Assigns responsibility
Defines timeline
Security Policies
Level of detail
Organizational security policy
Business Continuity Plan
Issue-specific security policy
Incident Response Plan
System-specific security policies
Security policy dependency
Often overlap
Often rely on each other
Topics in the Security Policy
Physical security
Employee management
Data protection
Data classification used, encryption usage guidelines
System security (hardware/software/OS/application)
Not only the technology! Also how to choose and how to choose and how to manage
Communication security
Perimeter controls, Web and email usage
Privacy
What privacy levels are employees entitled to
Authentication and access control
Availability
Accountability and responsibilities
Probability (Math)
Probability
“The likelihood for an event to occur”
For us, probability = likelihood
Equation:
P(event) = Outcomes where event occurs/Total outcomes
Number between 0 and 1
0 = event never occurs, impossibility
1 = event always occurs
P(A) = A/Ω
Conditional Probabilities
Probabilities based on some other element
P(A|B)
Probability of A given B
We provide that B is a definite
What is the probability that A will happen
Independent/Dependent Probabilities
Two probabilities are independent if the vents don’t depend on each other
Probabilities can be multiplied to find joint probability of events
If P(A|B) = P(A), then A and B are independent
Combining Probabilities
P(A or B) = P(A) + P(B) – P(A and B)
P(A and B) = P(A|B)*P(B) = P(B|A)*P(A)
Risk Management Process
Definitions
Asset – something of value
Vulnerability – a weakness of an asset/control that can be exploited to cause damage
Threat – exploitation of a vulnerability that leads to damage to an asset
Threat source – who is doing the exploiting of the vulnerability
Threat event – when the threat occurs/the situation when it occurs
Risk – quantification of a threat (probability x impact)
Control – a measure that reduces risk
Risk Management Process
1. Establish Context
2. Risk Identification
3. Risk Analysis
4. Risk Evaluation
All of this is Risk Assessment
5. Risk Treatment
Establish Context
- What is the current security situation?
- The context is based on what is relevant to a specific threat
Risk Identification
Intelligence Operation
What is the risk?
Where does the risk come from?
Who is the attacker?
What is the vulnerability?
Intelligence
What is it?
Gathering information
Only for governments?
No
Companies routinely have intelligence operations
“Trade Secrets”
Risk Analysis
What is the likelihood of occurrence?
What is the magnitude of impact?
What is the risk?
Qualitative Risk Analysis
Threat Modeling
We’ll do this in a few weeks
Quantitative Risk Analysis
Generally put everything in the same units (money)
Tabular Risk Analysis
NIST 800-30
1. Identify Threat Event
2. Identify Threat Source
3. Identify Threat Source Characteristics
4. Determine Likelihood
5. Determine Impact
6. Determine Risk
Risk Evaluation
What is the cost of this risk?
How does it relate to our acceptability of risk?
- The impact is given in a monetary cost
- Probability is based on reducing this monetary cost?
Risk Appetite
High to very risk
Unacceptable risk – need to address risk immediately
Moderate-high risk
Lower bound of unacceptable risk – will need to address risk sooner
Moderate-low risk
Upper bound of acceptable risk – may need to address risk at some point
Little to no risk
Well within our appetite to accept risk
Risk Treatment
Impact (horizontal) and Probability (vertical)
Low High
Low Accept Share (insurance companies
for example)
High Mitigate (high probability Avoid
and low impact)
Fundamentally, we
Reduce Probability
Reduce Impact
Until we’re comfortable accepting risk
Or until we’re unwilling to spend more money on treating risk
- What has been done is context driven.
Risk Management Process
Risk Management Process (Disambiguation)
1. Identify context: (sub-)system and its environment
2. Identify assets
3. Identify threat scenarios
a. Identify threat agents (attackers)
b. Identify vulnerabilities
4. Assess risks
5. Treat risks
Qualitative Risk Assessment
Likelihoods: Qualitative
Rating Likelihood Description
1 Rare May occur only in exceptional circumstances which may be deemed
as “unlucky” or very unlikely
2 Unlikely Could occur at some time but not expected given current controls,
circumstances, and recent events
3 Possible Might occur at some time, but just as likely as not. It may be difficult
to control its occurrence due to external influences.
4 Likely Will probably occur in some circumstance and one should not be
surprised if it occurred.
5 Almost Certain Is expected to occur in most circumstances and certainly
sooner or later.
Impacts: Qualitative
Rating Impact Description (Broad Cybersecurity Context)
1 Insignificant Result in minor security breach in a single area, impact less than several days, no
tangible detriment to organization.
2 Minor Result of a security breach in one or two areas, impact less than a week, can be
dealt with without management intervention, may show lost opportunities.
3 Moderate Limited systemic and possibly ongoing security breaches, impact up to 2 weeks,
requires management intervention, some ongoing costs, customers may be
indirectly aware.
4 Major Ongoing systemic security breaches, impact 4-8 weeks, require significant
management interventions, substantial costs, customers are aware, possible loss
of business
5 Catastrophic Major systemic security breach impact 3 months, senior management
intervention, very substantial costs, significant harm to the business, loss of
confidence, possible criminal action against personnel
6 Doomsday Multiple instances of major systemic security breaches, long-term impact, major
restructuring, criminal proceedings against senior management, substantial
loss, liquidation likely
Qualitative classification of risk
Likelihood
Rare Unlikely Possible Likely Almost
Certain
Insignificant
Minor
Moderate RISK
Impact
Major
Catastrophic
Doomsday
Qualitative classification of risk
Likelihood
Rare Unlikely Possible Likely Almost
Certain
Impact Insignificant L L L M H
Minor L M M H H
Moderate M M H H E
Major H H E E E
Catastrophic H E E E E
Doomsday E E E E E
Low (L) Routine Procedures
Medium (M) Employee level, simple controls
High (H) Team leaders, regular monitoring, controls, costs
Extreme (E) Management level, substantial controls, monitoring, high costs
Quantitative Risk Assessment (Math)
Basic Quantitative Example
Likelihood
Low (< 10-6) Medium High (> 10-2)
(10-6- 10-2)
Low (<$104) $0.01 $100 $100
Medium $100 $1M $1M
Impact
($104- $108)
High (> $108) $1M $10B $100B
- Try to figure out what the impact and likelihood is, is what intelligence does.
Formulae
Risk = Probability * Impact
Benefit of a control = Original risk – Risk with the control
Value of a control = Benefit – Cost of the control
All of this should be expressed in the same monetary units
Estimating Impact
Impact = Cost
Business Impact
Impact on business operations
Loss of service
Loss of sales
Future business impacts
Technical Impact
Possible damage to facilities and systems
Cost of repair
Cost of controls
Estimating Probability
Probability, likelihood, frequency: We need an estimation
Data from past events
Expert consultants
Difficult to estimate accurately
“Zero day exploits”
No data You do not know what you are expecting
Expensive
Intelligence can help inform the probability estimate
Selecting Controls
Controls are a form of treatment
Controls affect each other
Not simple addition
Diminishing returns
Treatment also includes doing nothing
Accepting Risk
Doing nothing is not a control
Evaluating Controls
Benefit of the control
Low Medium High
Low
Cost of the Medium
control
High
BCP and IRP
Business Continuity Plan (BCP)
Plans to support key business functions after disruptive events
Focuses on business goals
Long-term, big-picture, strategic thinking
Contains:
Business impact assessment
Key business process
Critical assets
Potential causes for disruptions
Targets for business continuity – maximum tolerable downtime (MTD) for each
business process
Strategies for avoiding disruptions
Planning for continuity
Plans likely include IRPs
Incident Response Plan
Plan to respond to security violations (incidents)
Focuses on systems
Remember what “systems” include
Day to day operations
Contains:
Definition of the incident
Responsibilities
Action plans
Handling
Recovery
Other aspects:
Legal
Record keeping/evidence preserving
PR
Follow up
BCP Example – This Course
Key business processes
Study process
Course grading process
Critical assets
Physical: Lecture room, Nathan’s Laptop, Projector
Digital: Brightspace, lecture slides and videos, grade files, exam and assignment files
Management: study guide, accreditation
People: Nathan, Cesar, Jayme, Stephen, Dr. Gadyatskaya, students
Possible disruptions
Brightspace failures: course files are deleted, assignment submissions are not saved,
grading is not saved, Turnitin failures
Teacher failures: Nathan is sick, TAs are sick, teachers are too busy with other
courses
Targets
MTD 1 week for both processes
Strategies
Have Service Level Agreements (SLA) with external suppliers (D2L), have redundancy
for internally managed platform (Brightspace), have back-ups for all files and
systems, have support faculty to replace the lecturer, control teaching load,
Planning
Sign SLA with D2L (university administrator)
Have redundancy for Brightspace installation (system administrator)
Set-up a backing-up process for Brightspace (system administrator)
Back-up lecture slides (Nathan)
- BCP is high level thinking trying to be sure nothing bad occurs
IRP Example – Brightspace Failure
- It is not possible that it does not occur, but what to do if it occurs.
Incident
Brightspace does not work
Failure 1: Unable to connect to Brightspace server (service not available)
Consequences
Online course materials are not accessible
Students who have yet to submit homework assignments are unable to do so
Students may face grade penalty
Students may not be able to access needed resources
Responsible
Lecturer
University Administrator
Plan of action
Handling
Notify administrator (Lecturer)
Notify students via email (Lecturer)
Recovery
Extend deadlines to account for failure (Lecturer)
Contact D2L and review SLA (university administrator)
Other issues
Legal: Administrators check if SKA violation warrants legal action
Record keeping/evidence preserving: Lecturer makes a screenshot/video recording of
Zoom not working
PR: University administrators send an email to students and faculty about the
situation, saying they are dealing with the problem
Follow-up: Teach checks with the students during the next class; administrators
check with Lecturer within 1 week that the class is continuing. IRP is reviewed for
improvements
Final Remarks
BCP focuses on high-level, strategic views: business processes, operations. It is about
preparation (have back-ups) and evaluating alternatives.
IRP focuses on tactics and technical activities. It is about recovery (restore from back-
ups) and precise plan of actions.
It’s important to have a precise definition of the incident
Lecture 3: Web Security (part 1)
Web Basics (part 1)
Telephones
Telephones were directly connected
Exchanges (telefooncentrale)
Operators + Switchboard
You call to the operator, and they directly connect you to the person you’re calling
Direct wired connection between two telephones
Call blocks the line
Can’t connect multiple (normal) phones
Long distance connections?
Connected on few wires between exchanges
Chained between different exchanges
Cost?
Numbers eventually replace operators
Automatic network routing based on number
The number you dial tells the exchange where to connect the call
Still requires direct wired connection
One connection still blocks the line
Long distance calls still expensive
1969
ARPA
Advanced Research Projects Agency
US Military Scientists
Give grants to universities to purchase computers
Those universities decide to connect them…
ARPANET
The Telephone Problem
Interlace the data being sent over wire
Wire doesn’t need to be blocked by one connection
Packets this is were the creation of packets come from which are little pieces of
information
ARPANET 1969
First test of packet switching
ARPANET 1970
ARPANET 1973
- Email invented in 1971
- FTP invented and still in use
ARPANET 1982
- These are nodes, computers are smaller and more widespread in the 1980s, so
people would connect to their local node similarly to how they connected to their
local telephone exchange.
TCP/IP
Everything controlled by one central authority
US Military
New “rival” networks being created
Especially internationally
“Exchanges” between these networks
Replaced centralized control of ARPANET with TCP/IP
Centralized control limits expansion
Now the internet would be controlled by private parties
Everyone would “speak the same language”
Transmission Control Protocol/Internet Protocol
Governs how packets are sent over the network
Many different protocols are within TCP/IP
January 1, 1983
- ARPANET was entirely controlled by the US Military, and a centrally controlled
internet is limited.
- Switch to TCP/IP on Jan 1, 1983, which is the day the internet could be said to start
existing. Now there was a connection standard by which operators had to abide by.
TCP/IP
Application Layer
Actual data
Transportation Layer
How information is carried between different machines
Network Layer
The actual packets being transferred
Physical Layer
The hardware doing the transferring
TCP/IP (Post Example)
TCP/IP
TCP/IP + Packets
Packets allow the same physical hardware to carry multiple users worth of data
Interlaced information
TCP/IP allows multiple network/multiple owners to all interface seamlessly
Many different protocols within TCP/IP that allow for the transmission of different
traffic
Now we have what’s needed to build the internet
Packets
Global trade has long adopted shipping containers
90% of global trade goes through some kind of shipping container
We used to just load cargo directly onto whatever vessel carried it
Why don’t we do this anymore?
Expected input
We know what size the container will be
Trucks and Trains can be built to a specific size
Data over the internet is not much different
One unit of data sent over the web
Packets can vary in size
Depending on transportation medium and type of data being sent
Different protocols within TCP have different requirements
General between 1KB and 64KB
Max size is 64KB
How packets are routed
We can see where packets are going around the world
Traceroute
Packet Structure
Three main components
Header
Outlines how the packet is structured
Payload
The data
The whole point of the packet existing
Footer/Trailer
Concludes the packet
Let’s the transmitter/receiver know that this is the end of the packet
- IP header includes destination information
IP Address
IPv4
- A bit is either a 1 or a 0.
32 bit addresses
2^32 possible addresses (~4.3 billion)
There are more devices than addresses
What can we do?
Sub-addressing
Apartment lettering with same number address
~70% of all internet traffic
IPv6
128 bit addresses
2^128 possible addresses
More addresses than we will ever need
We hope
Every device has a unique address
Why isn’t the internet using IPv6 exclusively?
Cheap ISPs
[Link]
- It is expensive to switch over from IPv4 to IPv6, however, ISPs are cheap.
Web Basics (part 2)
Web 1.0
- We start today with a new topic: web security.
- Web applications are mostly written in memory safe languages and thus they do not
suffer form buffer overflow vulnerabilities.
- However, many underlying issues in web security are similar: untrusted, uncontrolled
data can be treated by a misguided web app as code.
Read-Only Web
First 30 years of the internet
Similar to having a book posted on the internet
You can access
You can’t change
Blogs in Web 1.0
You host your own blog
You make things available for others to download/read
No comments
No “live” posting
Web 2.0
- Modern web, in addition to those old issues, has brought some new ones – we now
have mobile code executed in the context by the browser. This increases the attack
surface.
“Interactive” Web
Post 2004-ish
What enabled the rise of social media
Blogs in Web 2.0
Someone is hosting a blogging service
You can post to the blogging service
People can post comments
Web 3.0
“Semantic” Web
Refers to making the web machine readable
Internet of Things
Allowing smart items to interface with various data available on the web
Out of scope for this class
Web Interaction
- Server often maintains a database, that is normally a separate entity logically, and
could also be located on a different machine.
- Private data on the client can be stored in the browser, or as files that browser has
access to.
Resources: web pages, interactive code, other content
URL: Universal Resource Locator
How resources are identified
[Link] the link for example is
the one that is interesting for your browser?
Protocol
Hostname/server this says where on the internet this server is located
hostname is translated into an IP address by DNS
Parameters how information comes back to you?
- In our case path to the resource is an html file called science
- Some resources are static (html files) or dynamic (like php files) – server will generate
it dynamically based on the current database status.
HTTP Request
HyperText Transfer Protocol: Application level protocol to exchange data
Request contains
The URL of the resource
Headers that describe what the browser expects
Requests can be GET or POST
- POST is used for example when filling online forms, it may cause the state of the
server to change.
- GET is used to just fetch an html page from the URL and the page may be generated
dynamically by the arguments supplied in the URL but it will not change the state of
the server itself
Example Request
- This is a GET example
- GET resource /en/science using http version 1.1
- User agent fields will tell the server what browser and OS are used. This is how a web
site may determine what version of the software to propose for download.
Example Request 2
- This is a POST example
- I clicked on “reject cookies” and this generated a POST request. Here content of the
request is included in the URL, but it could also be included as part of the data of the
request.
- Google analytics and other hidden trackers generate post requests too.
HTTP Response
Response contains
Status code
Headers
Data
Cookies
- POST is used for example when filling online forms, it may cause the state of the
server to change.
- GET is used to just fetch an html page from the URL and the page may be generated
dynamically by the arguments supplied in the URL but it will not change the state of
the server itself
Example Response
- 200 is a status code and OK is a reason phrase.
- Other status codes start with 100 (for information like “continue”), 300 (to say about
redirection), 400 (client sent an error) and 500 (server has an error, that is sought-
after status for hacking servers)
Email Basics
- This is just different since this is just sending information and pull emails.
- Little different than web since you are pushing information.
Email is push based where web is pull based
You push information to another server
You request web information be sent to you
POP/IMAP
Post Office Protocol
Internet Message Access Protocol
Protocols to retrieve email
Web Security
Dolev-Yao Attacker Model
Attacker owns the network you are transferring data across a hostile network?
Attack can do everything except break encryption
We will discuss encryption later
Assume that anything unencrypted can be read
More on this model later
Secure Email Requirements
Message Confidentiality
Only sender and receiver can read a message
Sender Authenticity
Message comes from address it claims to be from
Message Integrity
Message has not been modified in transit
Non-Repudiation
Sender can verify message received
Receiver can confirm message was sent
Malware + Spam Protection
PGP
Pretty Good Privacy
Public-key cryptography with email
Secure Email Infrastructure
HTTPS
Encrypted Payloads
Instead of sending plaintext over web, we encrypt the payload first and then send the
encrypted payload
Decrypted by client
HTTPS is to HTTP what PGP is to SMTP
- HTTPS encrypts the payload
- Problem: Need a different kind of secure protocol for every type of payload
TCP/IP
- Protocols are hard to update which poses problems in protecting data somehow
rap it to make it secure by TLS making a more secure transportation layer.
TLS
Transportation Layer Security
Encrypt the payload irrespective of what the payload is
TLS Handshake
How we establish a secure connection with a server
Requirements:
Establish identity of participants
Establish shared encryption key
Verify information
Fast
Done in 4 messages
Lecture 4: Web Security part 2
Exponent Rules (math!)
TLS + Encryption
We will revisit TLS after we address encryption
Last year encryption was much earlier
Encryption includes A LOT of math
Students found this very difficult
Most common feedback was to spread out the math
Allow people to get comfortable with the more basic math before moving on
This year I implemented that feedback
We did modulus in Lecture 1
We are doing exponent rules today
Get comfortable with this math so thst the encryption math isn’t as daunting
TLS requires DHKE
DHKE requires discussing discrete logarithms
Discrete logarithms requires discussing asymmetric encryption
Please make sure you understand the mathematical topics discuss
They will be necessary when we get to encryption in a few weeks!
Terminology
Multiplying values with same base
- Add b and c together.
Dividing values with same base
Web Security
Certificates
How do we know people are who they say they are on the internet?
What’s stopping a malicious actor from pretending to be [Link]?
How does your computer know if a website is not official? certificates
Certificates
Checks certification against other certificates
Keeps going up to the root certificate
Installed on browser
Certificates are used to sign certificates
Root certificates = highest level of trust
Chain of Trust
Each certificate authority (CA) verifies lower level Cas or individual websites
If a website is found to be using the certificate maliciously, it can be revoked by the
CA.
If a CA is behaving maliciously, its own CA can revoke its certificate.
This may affect legitimate website who have certificates issued by an illegitimate CA.
Protecting Packets
- Last week discussed by using a secure protocol within HTTP and by using TLS
VPN
Payload might be encrypted (potentially doubly with TLS), but where the packet is
going is not
Everyone can see WHO you’re communicating with, but not WHAT
How do we stop this?
- There will be a small reduction in speed while using VPN.
Hides who you’re communicating with
Need to validate with VPN
Overhead before connection
Need to ensure packets get to the right place
May require overhead with payload
5-10% reduction in speed using a VPN
TOR
The Onion Router
Passes data through a bunch of different computers
Nodes
- Popular option for dissidents
Very Slow
No node has complete information
- Each node can be separately encrypted is a benefit of using TOR this is why TOR is
so secure.
- A node can also send regular traffic.
- A node is you receive traffic and send it on.
Cookies and Coffee (JavaScript)
Static Website
Web 1.0
What’s the difference between sending static HTML and a picture?
Effectively nothing
Why was flash such a big deal?
Flash was basically just a moving, interactive (moving) picture
- Having a book on the internet (read only).
Dynamic Websites
- This is why we don’t use flash anymore (also because of security issues)
- Dynamic codes
Web 2.0
Now interactive code can be sent over the web
This is usually Javascript
This code is a program that will be run by the client
JavaScript
JavaScript programs are executed in the browser
They can:
Alter web page contents (modifying the data structure representing the web page
that is called DOM)
Track user-side events (clicks, mouse movements, etc.)
Issue web requests and read replies
Access cookies and other browser data
Maintain persistent but invisible connections (AJAX)
Web-based State
We want to have an HTTP session:
Client connects to the server
Client issues a request
Server responds
Client issues a request for something in the response
…
Client disconnects
HTTP is stateless
There is no well-defined way to identify “the same client”
JavaScript could maintain state
Would have to be really intrusive
Maintaining State
DB maintains a long-lived state
Web application maintains ephemeral state transferred back and forth
Intermediate results aren’t long-lived
The ephemeral state is what’s shared with the client
Network Activity
To explain the next couple of concepts, we’re going to play them out
I’m the server with a website
A volunteer will be the computer requesting the website
Everyone else will be the network, carrying packets back and forth
Maintaining State (Network Activity)
1. I get a request for my website, so I send my website to them
a. This takes quite a lot of sent information
b. Computers need to process that site
i. Assemble the puzzle
2. Now the computer makes a modification to the site
a. What are the different states?
3. Now the computer closes the web browser
a. This clears any sites stored on the computer
b. What are the different states?
Cookies part 2
Cookies
Server maintains state
State is indexed by cookies
Cookies are exchanged with clients
Clients store them locally and will send cookies back on each interaction
Cookies are part of HTTP headers
Sometimes also included in web page text
Cookies are usually key-value pairs
Cookies are like dictionaries in Python
We will get there in 2 weeks
Cookies are the key
The state is the value
- Cookies are just strings of text that tells the server what requests are being sent.
Cookie-less (Network Activity)
Life without Cookies
Slow and inefficient every time when a website need to be updated you need the
entire state back? requires a lot of data to be sent back
I need to get the full state from the user for every interaction
A lot of data needs to get sent
I also need to keep separate copies of the LL-state for every user
This will require me to have a LOT of data storage
Cookies (Network Activity)
- Cookies save a lot of network traffic since the cookie tells which website to refer to (a
cookie is just a label).
Cookies simplify web traffic
Don’t need to send entire ephemeral state
Can just send a marker about what the ephemeral state is
This marker lets both the server and the computer know what website state they are
talking about
The server contains a dictionary of all states with associated cookies
The computer knows what cookie is associated with the state it has (ephemeral
state)
Example Request
- GET resource /en/science using http version 1.1
- User agent fields will tell the server what browser and OS are used. This is how a
website may determine what version of the software to propose to download.
Cookie-less Authorization (Network Activity)
Cookies for authorization
When the user logs in, I give them an authorization cookie
Now the user can send me that cookie instead of sending their username and
password every time.
This authorization cookie is usually not sufficient on its own
Authorization cookie has to come from specific address
Often has associated meta data that the server will use to verify this cookie
For example, the authorization cookie might be tied to the type of web browser
sending the request (among other things)
Often regularly re-issued
Example Request
- This is the authorization cookie
Cookies
Cookies are used to
Keep the website state as a session identifier
Personalize web pages
Track users
Ad networks keep their own third-party cookies and can link users across
different websites
Instead of needing to send updated, personalized website to everyone
If full website is 500MB, then we need a separate 500 MB for every user
1 million users = 500TB of server storage space
Cookie can keep track of the small changes for an individual user
Full website could still be 500MB
3GB to store our cookie dictionary
Each set of cookies is only 5KB
In reality less – I used this number of the example
The “_ga” cookie in the example request is <55 Bytes
1 million users means < 10GB of storage space
Cookies are everywhere
Check the cookies in your web browser right now
There at least a dozen for every site
Different Types of Cookies
Session/Functional cookies (These cookies are always allowed)
These maintain state
Allow websites to “remember” what we’re doing when interacting with a single site
Remember a shopping cart while shopping
Authorization cookies
Persistent/Analytical cookies (GDPR requires that sites to ask for permission to issue
these cookies)
Analyze how users use your site
Remember user preferences for their next visit
Language settings, username, etc.
Advertising/Third-Party/Tracking cookies (GDPR requires that sites to ask for
permission to issue these cookies)
Cookies installed by other parties
Web Attacks
Attacks on Cookies
Cookies are used to keep track of authenticated users
During the first visit the user with authenticate normally (e.g. via password)
Upon authentication, the server will issue a cookie
Subsequently, the client will show this cookie as proof of authentication
A stolen cookie can hijack a session
Someone doesn’t need a username and password
They need an authentication cookie
Holder of a session cookie can access the site with the privileges of an authenticated
user
Session hijacking consequences:
Perform actions as if done by the user
Possibly stealing/corrupting sensitive data
Where cookies can be stolen
- It is a vulnerability while we need them.
Cross Site Request Forgery
Security Issues: Javascript
- The entire internet runs on this.
Browsers need to confine JavaScript programs
A script from [Link] should not be able to
Modify [Link] webpage or access its cookies
Read users keystrokes
Cross Site Scripting Attacks (XSS)
Scenario
[Link] sends a malicious script
Tricks the browser to believe it came from [Link]
Profit!
Same Origin Policy
Browsers apply Same Origin Policy to confine JavaScript programs
All web page elements (layout, cookies, events, scripts) are associated with an origin
Origin is defined by the domain name ([Link]) that provided web page (sent in
the cookie also)
Scripts can only access elements with the same origin as themselves
- Everything Javascript does stays in (…)
Network-based Attacks
Compromise the server or the client machine/browser
Predict cookies based on known information
Eavesdrop on the network communication
Redirect network traffic/mislead the client
Dolev-Yao Attacker Model
Preventing Cookie Theft
Make cookies unpredictable
Long and random
Use encrypted communication (HTTPS)
Websites can mark cookies as “secure”
“Secure” cookies will only be sent over HTTPS
Server side defenses
Set sensitive cookies to be very short-lived
How long do you stay logged into a bank vs your email?
Invalidate the cookie when a user logs-out
Delete the cookie when the session ends
Make cookies different from session to session
Collect separate information and correlate whether the present interaction is valid
and legitimate request, based on past activity
Monitoring fields
Checking patterns
Checking other data
New login location?
- Cookies are also attached to packets.
- Cookies are part of the HTTP layer.
SQL: Structured Query Language
- Language of databases
- Language of getting access to databases
Server-side Code
Server-side code will interact with the DB using SQL queries
Javascript used by website to take input
This passes an SQL command to DB
SQL Injection
Will pass through the following SQL command:
SELECT * from Users where(email=‘e@m’ OR 1=1);
Get all users
Potentially log in with the first user
There’s not necessarily printing of the “result”, so there’s no way to see this
information from the login screen
It might show up in the html of the website
Depends on how the other code on the site works
Will pass through the following SQL command:
SELECT * from Users where(email=‘e@m’ OR 1=1);
DROP TABLE Users
Delete all Users
Still no printing, so there’s no real way to know this works (from here)
SQL Injection: Application?
SQL Injection: Webcomic
SQL Injection: Context
SQL Injection: Core Issue
Fundamental issue
Mixing of code and data
Data = database
Code = the part of the website that does anything
Similar to buffer overflow
Corrupt data by overflowing a buffer
SQL Injection: Fixing
Input validation
Do not trust that input is valid
VERIFY!
Same as in buffer overflows
Make sure that input is data – NOT CODE
Check it and reject invalid input
Sanitize it by making it conform to safe input
Sanitization
Blacklisting
Delete illegal characters
‘:;-
Escaping
Replacing problematic characters with safe ones
‘ becomes \’
; becomes \;
Prepared Statements
Treat incoming values as data – regardless of whether or not it is data
Prevents hijacking
Limiting impact
Goes back to risk management
Input validation is a must
Mitigate the effects of an attack
Defence in depth
“Delay not prevent”
Limit server privileges on a database
Limit what commands can be run/what access interface has
Encrypt sensitive data
Decrypt at the server side if needed
Lecture 5: Authentication and Access Control
- Cookies are strings.
- GDPR does not ban cookies
- GDPR require user permission for non-functional cookies such as tracking cookies.
Logarithms
Exponents to a power (from last week)
Logarithms
Logarithm multiplication
Logarithm division
Logarithm exponents
Logarithm Identities
Logarithms more generally
b should never be 1 or 0
On the exam, you’re specifically not allowed a calculator with a “log” button
I will not ask you to calculate logarithms on the exam
You do have to calculate the logs on this homework assignment though
However, I may ask you to manipulate a logarithms
Apply some of the rules on the previous slides in order to get a logarithm into a form
that’s applicable to a certain scenario
Authentication
Terminology
Authentication ensures user identity
Determination of identity
Authorization makes sure an identity can access resources (has privileges)
Determination of access
Authentication
You go to a bank to withdraw money
People used to actually do this
How does the bank know you are who you say you are?
Appearance?
Identity card?
PIN?
Small town bank
Is this secure?
Is authentication being kept up?
- There is a difference between person, user and identity.
Users are not identities.
Impersonation
How a physical banking is supposed to work:
Identity Theft
How online banking is supposed to work:
- Identity theft is seen as the fault of the victim, banks pushed it to this idea. Once it
moved to the online space, it is not the problem of the bank anymore. identity
theft = propaganda term
it is debatable
Identity Theft is propaganda
Passwords
Most common authentication scheme in cyberspace
This is ubiquitous
I could do an entire lecture on passwords
If you are interested, Anderson Textbook pages 31 – 60
Secret code that only you know, and if you turn over this code, systems can
determine identity
As only one identity knows this code
If this code gets stolen?
“Choose a password you can’t remember, and don’t write it down”
Password Criteria
1. Will the user enter the password correctly with a high enough probability?
Will the user make a lot of mistakes?
Don’t want a legitimate user locked out of their account
How can the user check their password before submitting?
2. Will the user remember the password, or will they have to either write it down or
choose one that’s easy for the attacker to guess?
Will the user create a security vulnerability because our authentication criteria is too
poorly designed?
Is there semantic meaning behind the password? some word or sentence that
makes it easier to remember.
Is this meaning easy for a third party to find out?
3. Will the user break the system security by disclosing the password to a third party,
whether accidentally, on purpose, or as a result of deception?
If a password is written down, then that object becomes a vulnerability
If the scheme is too obvious, a third party can guess
We want a password that a third party cannot brute force.
PIN
Personal Identification Number
PIN Number = Personal Identification Number Number
Often used to confirm identity quickly
PIN to access an app on your phone
PIN to use an ATM
PIN to use payment terminal
Usually 4 - 6 numbers
How many possible combinations?
104 - 106
This is not a lot
Are PINs secure?
Yes
Require different factors (more on this later)
Limit attempts
Cracking Passwords
Total Exhaustion Time
Amount of time needed to test all possible passwords
= how many possible passwords given schema/how quickly passwords can be tried
Only if it is allowed to try all passwords
It often isn’t, which makes passwords more secure
PIN Exhaustion Time hard to calculate
10,000 possible 4 digit pins
Banks usually only allow 3-4 tries before locking
More than 10 tries usually results in locking account and needing a human employee
to unlock account
iPhone quadratically increases attempt time with failed attempts really secure
Assume you can try all passwords
Now it’s a function of all possible passwords given a schema
Schema: 8 characters, any character allowed
96 possible characters
96^8 possible passwords ≈ 2^52 ≈ 10^16
7213895789838336 possible passwords
Schema: 8 characters, at least one lowercase, one uppercase, and one number
96 possible characters for 5 of the characters
10 possible digits for the number
26 possible characters for lowercase and uppercase
96^5 * 10 * 26^2 ≈ 2^46 ≈ 10^14
55119194357760 possible passwords
This is less secure
Password Policies
If requiring specific characters is less secure, then why do systems require this?
Because humans are dumb, and humans will never use a completely random
password
Most humans would rather have an 8 character word
Schema: 8 character words
~80,000 words in English with 8 characters
80000 possible passwords ≈ 2^16 ≈ 10^4.8
Schema: 8 - 10 character words
80k + 41k + 35k words in English with 8, 9 or 10 letters
156000 passwords ≈ 2^17 ≈ 10^5.1
Double the possible passwords!
At least twice as secure
Still significantly less than requiring at least one number
Require users to change passwords at regular intervals required by some common
password policies
This is horrible policy
Users do not want to remember new passwords, they want to use their old
passwords
User just repeat passwords with extra characters on the end
01 for january, 02 for february, ...
Specific password length requirements
Requiring passwords to be a specific length makes them so much easier to crack
- A minimum password length is a good policy which does not include math. However,
the first step to hack is by implementing the minimum password length
Multi-Factor Authentication
Instead of solely relying on passwords
Which can be badly secured
Authentication (ensuring identity) based on multiple components
Factors:
Knowledge
Possession
Inherence
Location
Behavior
MFA: Knowledge
“Something you know”
Determination of identity based on something only you would know
Examples:
Password
Username
PIN
Mother’s Maiden Name
Location of your high school
- Security questions could be a good practice, however, minor spelling mistakes could
make is worse.
MFA: Possession
“Something you have”
Determination of identity based on an item that only you could have
Examples:
USB Key
ID Card
Smartphone
- MFA is nebulous, there is never a good answer whether it is knowledge or
possession.
MFA: Inherence
“Something you are”
Determination of identity based on characteristics that are unique to you
Examples:
Fingerprint
Any kind of biometric authentication system
MFA: Location
“Someplace you are”
Determining authentication based on where you purport to be
Examples:
Network-specific access
GPS systems
- You could argue that a GPS security key is possession or location (more general
product to prove location which anyone can possess but only tests the location), it is
context dependent.
MFA: Behavior
“The way you are”
Determination of identity based on how you are acting you could argue that this is
inherence.
Examples:
Hidden CAPTCHAs (Completely Automatic Training Test?) that use mouse movements
to determine if you are human
Multi MFA
GPS-enable security key
Can be both location and possession
Dependent factors one factor is depended on the other
Can’t prove location without item if one factor is missing
Overall, they function like MFA, but if one factor is missing then it doesn’t work
Losing your phone is a nightmare when it comes to DigID
Multi-Channel Authentication
Authentication based on the same factor is NOT MFA
Ultimately the same factor is vulnerable to the same types of attacks
This does increase security, but not as much as MFA
Multi-Channel Authentication
Asking for authentication across two mediums
Password and then a phone confirmation
Can have Multi Channel, Multi-Factor authentication
MFA is inherently Multi-Channel
But not always
MFA in practice (DigID)
0. Set up DigID
Letter sent to your home
Possession dependent on Location
1. Enter PIN on your phone
Dependent Factors
Knowledge dependent on possession
2. Enter code phone into site
Not a factor
Sets up next step, but this step does not do anything to authenticate you
3. Scan site generated QR Code (generated based on provided code)
Possession
4. Confirm on phone
Possession
This is MFA, but it’s heavily bent toward possession.
This is also multi channel.
It’s not as secure as most people think it is.
MFA in practice (DB)
1. Login to website
Knowledge
2. Enter Payment information
Possession (can argue Knowledge)
3. They mail me a letter
Not a factor, this sets up the next step
4. That letter contains a confirmation code that I enter
Possession (can argue possession dependent on location)
Password Managers
- Unique password for every website generated by the password manager.
Password Managers
Store all passwords in one place
Gives you a strong incentive to use unique passwords for all accounts
Eliminates weak points
Counterpoint: “Doesn’t that just make the password manager the weak point?”
Password managers are orders of magnitude more secure than a weakly secured
website
Stronger weak point
Obviously memorizing a unique password to all sites would be more secure than a
PM
This is also impossible
Almost all PMs have MFA
Kind of like adding MFA to all accounts
Operating System Access Control
Operating System Access Control
r – read access
w – write access
x – execution access
Access Control Lists (ACL)
- An ACL (access control list) defines access rights per resource (file, directory)
- It is simple to implement; harder to check access rights at runtime
Advantages:
Quick lookup per item
Changing access per item
Changing file access en masse
Issues:
Adding users
Changing user permission en masse
Assessing user access
Capabilities List
- A capability defines access rights per user
- Runtime checks are easier, delegation is easy to implement; harder to modify access
rights per a file.
Advantages:
Accessing user access
Changing user permissions
Adding/Removing users
Issues:
Slower to lookup individual permissions per file
Cannot change file permissions en masse
Access Control Matrix
- Good for theoretical structure
- This isn’t what’s used - i’d be a ridiculous structure to maintain and hold in memory
- If one user have access to something, he could have also access to something else.
Advantages:
Great theoretical structure
Issues:
WAY too big to be useful in any kind of machine
Jack of all trades, master of none
Though in this case, practically useless
Access Control Models
- Your own computer is a deck as it is not a university controlled system.
Discretionary Access Control (DAC)
System Administrator controls who has access
Limited access given to subsequent accounts
Typically implemented with ACLs
Issues:
Assumes every item has an owner
Mistakes
Example - your own computer
Mandatory Access Control (MAC)
System-wide policy determines who can access a resource based on a sensitivity label
Each user and item has a sensitivity label, access is allowed if that sensitivity matches
Issues:
Difficult to implement
Rigid AF
Limited ability for different types of access
Example:
Military systems (security clearance = sensitivity level)
Role Based Access Control (RBAC)
Roles have access to objects
Users can have multiple roles
Access control is maintained per role, not per user
VERY common scheme in distributed systems
Issues:
Difficult to implement (but way easier than MAC)
Less variability between user access (again, way easier than MAC)
Example: Brightspace
The lecturer has more access than students, based on their role within Brightspace.
Role Hierarchy
Inherit role access from a “lesser” role
Hierarchy example: staff < medical staff < nurse < doctor < dean of medicine; visitors
< patients
staff can go to the staff canteen; visitors and patients can not
medical staff can read patient files; staff and visitors can not; patients can only read
their own files
doctors can take decisions about treatments; nurses can not
dean of medicine can hire doctors
- I have access to the student role - why is that useful?
- Adding permissions up to the chain from people with a lower role.
Role
Used to limit abuse / insider attacks
Examples:
A user may not have simultaneously roles r1 and r2 (ever or during a session)
A user in role r1 must also have role r2
At most/at least k users must have role r1
Four eyes principle; two signatures principle
Concrete Examples:
At least 2 professors evaluate a bachelor thesis
A student cannot be a student assistant in a course they take.
Attribute-based Access control (ABAC)
Attributes are properties of entities in the system (users, resources, environment,
etc.)
User attributes (u): role, group, age, geo-location, authentication factors, etc.
Resource attributes (r): security level, type (file type/program type), size, age, etc.
Environment attributes (e): time, system load, security level/alert level, etc.
Access control decisions (user u can access resource r in environment e) are made
based on the present set of attributes of <u, r, e>
Examples:
students can submit exam solutions only during the allotted exam time
if system load is high, access will be granted only to users who
solved CAPTCHAs.
- Not mutually exclusive
Sandbox, Virtual Machines, and Containers (oh my)
All are environments
Sandbox
Allows code to be run in an restricted environment
Virtual Machines
VMs are entire virtual computers run inside other computers
Effectively a full computer inside another computer
Containers
Newer (only appearing in the 2010s)
Can be thought of as halfway between a Sandbox and a VM
Consists of a limited VM with only components needed to run the purpose of the
container
MLS
Multi-Level Security (MLS)
Self descriptive
A system with different levels of security
Each level of security can have different security design
MLS-ception
Previous Access Control Slides were primarily about security for a single system
MLS is about distributed systems
Not necessarily exclusive with the previous slides
- Having more strict control between different levels of security.
- You can have MLS inside MLS happens with governments for example.
Bell LaPadula Model
- Controlling confidentiality confidentiality: users should only access the
information they are allowed to access.
- With this model, they can use the information on their security level.
Built for the US military
Rules:
No read up
No write down
Focused on confidentiality
- Writing is different than reading
- Reading is about confidentiality, you can read down, but not read up
- Writing is the opposite direction, you cannot write down but you can write up
In-Class Discussion
Bell-LaPadula is designed around confidentiality
Can you propose an MLS system that is designed integrity?
Discuss with the people next to you, and then we’ll discuss as a group
I recommend starting with levels, and then proposing what rules you’d impose
Biba Model
- Integrity no modification by non-authorized people
Rules:
No read down
No write up
- It is the opposite of the previous model.
Focused on integrity
- You cannot give order to the people above you (military example). Information up is
not trusted.
- Higher integrity levels can read to higher integrity levels and write down?
MLS Issues – Composability
How do different security levels interact?
Top Secret + Secret = ?
MLS Issues – Covert Channels
How easy is the MLS system to circumvent?
Can someone pass information to a lower security clearance?
Can someone from lower clearance gain access to a higher clearance item?
Think of the access control matrix from earlier
Multi-Level Security Issues
Composability
Covert Channels
Polyinstantiation
Practicality
Cost
Complexity
Upgradability
MLS Issues – Polyinstantiation
Accidentally revealing information
Two systems for dealing with it:
US: Just lie
UK: “It’s classified”
MLS Issues – Practicality
Cost
Extremely expensive
Only really available to governments/military
Complexity
Ridiculously complex to both set up and operate
Requires extensive system administration capabilities
Adds to cost
Can become less effective over time
Upgradability
Due to complexity, often impractical or impossible to add features
Time to upgrade often means next upgrade is available before previous one was
installed
Violating Access Control
Memory
Buffer Overflow Attack
Long input overflows through the buffer and puts code into the executable code
section
This executable code then gets run on the system
Fix: Sanitize input
Fix: Canaries in the buffer
Race Conditions
Other
Principle of Least Privilege
Systems should be designed such that the default configuration is one of least access
Security Driven Design
Systems being built around security
Too often security is a secondary consideration, not a primary one
We’ll talk about this more later
Psychological Considerations
Windows users always ignore pop-ups
Because they’re annoying
The boy who cried wolf
Lecture 6: Security Protocols & Distributed Systems
Violating Access Control (from Last Week)
Memory
- Memory is volatile which is different from long-term storage
Buffer Overflow Attack
Long input overflows through the buffer and puts code into the executable code
section
This executable code then gets run on the system
Fix: Sanitize input we talked about it during SQL injections chopping it off
Fix: Canaries in the buffer by putting a value into the buffer check if there is a
value and if it is there, the buffer is not overwritten and safe if the value is not
there, it is overwritten. (Canaries are more vulnerable for humans, if this value is
away something bad will happen).
- Buffer between memory chunk to make sure there is enough space between them
(buffer) and there is some other memory stored for other purposes
- CPU is referred to the execution code?
- The attack works by giving the buffer too much information (buffer overflow)
Race Conditions
- Transactions are full sets of instructions done by the computers.
- In between the two transactions another computer comes along
- Breaking access control
- Authorization is a different process from access
- Scanning is a transaction to the access going into a building is another process
(transaction) apply this to computers.
the fix is treating both transactions as one transaction
Other
Principle of Least Privilege
Systems should be designed such that the default configuration is one of least access
Security Driven Design
Systems being built around security
Too often security is a secondary consideration, not a primary one
We’ll talk about this more later
Psychological Considerations
Windows users always ignore pop-ups
Because they’re annoying
The boy who cried wolf
- Windows trained you to ignore pop-ups (lmao)
Authentication Protocols
Protocols
Protocols are communication models
Used to represent interactions of agents (humans/machines) and to analyze security
of these interactions
models include:
participants and their capabilities
messages sent and received
their behavior depending on the received messages
their knowledge/trust expectations
Dolev-Yao Attacker Model
The Dolev-Yao attacker model represents the capabilities of an attacker
We can use this to assess whether our security protocol meets our security goals?
Dolev-Yao attacker model was proposed in “On the Security of Public Key Protocols”
by D. Dolev and A. Yao (1981)
We make the following assumptions about attackers:
Attacker can eavesdrop on all communications
Attacker can record all messages
Attacker can alter messages and create new messages
Attacker is contained by the cryptographic scheme
Trust vs Trustworthy
A trusted component is a component whose failure can break the security policy
A trustworthy component is a component that won’t fail
In Dolev-Yao, we assume that only encryption is trustworthy
Simple Authentication Protocol
- Terrible protocol, attacker can easily pretend to be Alice
Protocol Goals
Encryption
We can verify identity
- Remember, the Dolev-Yao attacker model says that the attacker owns the network,
so the protocol alone has to be such that we can accomplish this goal.
Encryption
We have two whole lectures on encryption
Symmetric Encryption
The same key is used for encryption and decryption
Both participants use the same key
Requires both participants to have the same key
Fast
- Same key for encryption and decryption
Simple Authentication Protocol
Protocol Goals
Authentication
Message Confidentiality
Messages can only be read be authorized participants
Simple Authentication Protocol
Nonce
“A term coined for one occasion”
With security protocols, it’s a random number that’s used one time
That way it can’t be replayed
If the nonce is reused, that gives the indication that something wrong is happening
Simple Protocol
Encryption
We have two whole lectures on encryption
Symmetric Encryption
The same key is used for encryption and decryption
Both participants use the same key
Requires both participants to have the same key
Fast
- Same key for encryption and decryption
Asymmetric Encryption
Everyone has a public key and a private key
The public key is known to everyone
The private key is known only to the keyholder
Only one key is needed to encrypt
Either key
Both keys are needed to decrypt
Slow and very secure
Asymmetric Encryption
- pKA = public key
- sKA = secret key
pkA: The public key of Alice
This is known to everyone
skA: The secret key of Alice
Known only to Alice
pkB: The public key of Bob
This is known to everyone
skB: The secret key of Bob
Known only to Bob
We are still using the same encrypted message format from before (with the curly
brackets)
Simple Asymmetric Encryption Authentication Protocol
Needham-Schroeder Authentication Protocol
- Alice creates a nonce
- Afterwards, Bob creates a nonce, and send both nonces back to Alice.
Alice knows Bob was able to read her message and also sends back Bob’s nonce.
(Confirms Alice has the secret key)
Needham-Schroeder Authentication Protocol: MITM (Man In The Middle attack)
- Charlie (Man In The Middle) pretends it is Alice.
- Charlie does not have Alice’s secret key, so she is not able to read the message.
- Charlie did not create a nonce, Bob did so Charlie only send the message with the
public key and the nonces?
Needham-Schroeder-Lowe Authentication Protocol: MITM
Session Protocols
Symmetric Key Exchange
Asymmetric encryption is slow!
It’s more secure, but it’s slow
Symmetric encryption can be secure enough, but
We need to establish the shared key
How can we establish a shared key?
Simple Shared-Key Protocol
Protocol Goals
Authentication
Message Confidentiality
Integrity
Messages cannot be altered
Why not encrypt the shared key?
Symmetric encryption requires a shared key
This key needs to be known to both parties
If the key is communicated unencrypted, then everyone can know what the key is
So we need a way to encrypt the key
But if both parties need to have the same key in order to encrypt, then we’d need to
send the key needed to encrypt the key that we need to encrypt the message we
want to send
Which the attacker can still see
Simple Shared-Key Protocol
Key sharing protocols
Two ways to solve this problem
Symmetric Encryption
Using only symmetric encryption, we can work with a trusted third party, that has
pre-established symmetric keys with all parties
This only works if the third party can be trusted
Needham-Schroeder Symmetric Key Protocol
This is different from the Authentication protocol!
Asymmetric Encryption
Encrypt the shared key asymmetrically, so only the two participants are able to
decrypt the full key
Diffie-Hellman Key Exchange
Different from the Merkle-Hellman Knapsack problem!
Needham-Schroeder Symmetric Key Protocol
Needham-Schroeder Symmetric Key Protocol
Needham-Schroeder Symmetric Key Protocol Fixed
(Little mistake on this slide)
- The idea is that Alice is supposed to forward to Trend who decrypt the message and
send the time in plain text, Alice sends the message to bob encrypted with the
timestamp bob can verify that he talked to Alice?
Needham-Schroeder Symmetric Key Protocol
By adding time nonce at the beginning, we introduce “freshness” to our protocol
A replay attack would use stale value, which would indicate that they are either in
need of renewal or are being used by an attacker
A legitimate participant would easily renew the key
An attacker would be thwarted
Protocol Goals
Authentication
Message Confidentiality
Integrity
Freshness
Cannot just replay a key after some time
Security is maintained
Regular key renewal
Diffie-Hellman Key Exchange (Shared Key)
Man-in-the-middle Attacks
Protocol Goals
Authentication
Message Confidentiality
Integrity
Freshness
Non-repudiation
Sender cannot deny sending
Receiver cannot deny receiving
Participant Confidentiality who send the message should be confidential
Identity of participants should not be disclosed
Unlinkability
Should not be able to determine participants from previous session
- We want every session to be from completely random people?
Plain DHKE
- How do we fix Diffie-Hellman?
Check:
Confidentiality?
Integrity?
Non-repudiation?
How can we prevent this confusion?
How can Alice know she is talking to Bob and not anyone else?
We need an authenticated key exchange
Encrypt the messages being sent with known public keys
Where are these known public keys stored?
In practice, Plain DHKE is sufficiently secure for most uses
It’s used everywhere, constantly
DHKE is the asymmetric protocol used to establish the symmetric key
Then the symmetric key is used for secure communication
Diffie-Hellman Key Exchange (Shared Key)
Man-in-the-middle Attacks
Secure Public Directories
A mechanism to attach public keys to identities
What are the requirements of these public directories?
Need to know that the public keys can’t be modified
Need immutability between the public key and associated user
What happens if I lose my private key?
Need a way to be able to inform the secure public directory of this
- We need a mechanism to share public keys.
Forward Secrecy
- Forward secrecy: everything before the compromise is secure.
Protocol Goals
Authentication
Message Confidentiality
Integrity
Freshness
Non-repudiation
Participant Confidentiality
Unlinkability
Forward Secrecy
Previous messages don’t get revealed as a result of a compromise
Post Compromise Security
- We want a protocol to secure even after the compromise.
Protocol Goals
Authentication
Message Confidentiality
Integrity
Freshness
Non-repudiation
Participant Confidentiality
Unlinkability
Forward Secrecy
Post Compromise Security
Able to maintain/re-establish secure environment post security
Perfect Protocol
Doesn’t exist
Lecture 1: Perfect security = perfect unusability
The most secure system is one in which no one can do anything
The list of protocol goals is endless
Build the best protocol for the circumstance
Distributed Systems
TLS
Transport Layer Security
We’ve already discussed TLS
Implemented with TLS Handshake
TLS Handshake
Distributed Systems
Web security focuses on how to keep connections secure
Protocols focus on keeping communications secure
Access control is about how to keep devices secure
Distributed systems compound all of these
Components on different networked computers
Can think of individual computers as being a distributed system
Can also think of individual components within a computer as being a distributed
system
How do we keep these systems secure? that is the big question
Concurrency
Ability to run things in parallel
At the same time
Concurrent vs Sequential
Doing taxes?
Sequential (doing taxes is a sequential process, doing one after the other)
Things build on themselves
Calculating Fibonacci sequence?
Sequential
Need previous results (you need the previous two results)
Vector/Matrix multiplication?
Concurrent each vector do not depend on each other
Can do different parts of the calculation at the same time
Two Generals Problem
- Two small armies on the top of the hills one big in the middle
- One army attacks they both lose
- Messengers have to go through the gap sending the message through the valley
and thus the message can be intercepted.
- No matter what, there is no way to solve it, you cannot confirm the message is
across?
Two Generals Problem?
How do we organize the time to attack?
There isn’t a solution to this
This is a problem when designing concurrent systems
Fundamentally, there’s never going to be an assured way of knowing what every
other system is doing
We can have smart assumptions
Increase the likelihood of a success
But can never be assured of it
Dining Philosophers Problem
- All the 5 philosophers got the same instructions
- All eating with one chopstick
Deadlock
Process 1 has resource A and needs resource B
Process 2 has resource B and needs resource A
Neither process can do anything until one of them releases their resource
Locking
A mechanism used to prevent inconsistent updates
If multiple computers/processes/cores/etc. are working on the same thing, how do
you make sure everyone is looking at the same information?
Say that everyone can read information whenever, but only one person can write at a
time however, reading does not change anything
No one can write over anyone else wait for someone to finish writing.
Transactions
Scenario:
You want to transfer 1000€ from one account to another
How do you ensure your 1000€ doesn’t disappear?
How does your bank ensure your 1000€ doesn’t disappear/double?
Transactions
A relatively secure way of ensuring changes to a database without integrity violations
by linking all these things together (access and changes are the similar process?)
Transaction process:
Start transaction
Basically lock everything
Do all the changes
If no errors, make changes permanent
If errors, undo everything
ACID acronym for Atomic
ACID
Atomic
All or nothing
Either make all changes or rollback all changes
Consistent
Doesn’t break the rules of the system
This is what we’re checking for with errors
Isolated
There should be no difference between concurrent and serial transactions
The outcome of multiple transactions should not change based on if they were run in
parallel or serially
This means that transactions that run on the same resources must be run in
series
Durable
Transactions can’t be undone once they are complete can only be undone by
doing another transaction?
Resilience vs Redundancy
Resilience
The ability to resist an attack and keep functioning
Graceful degradation
Being able to still provide a reasonable level of service in the event of failure or attack
Redundancy
Having a replacement that can function as the original
Example:
Resiliency - Nathan giving lecture after traveling for 14 hours
Redundancy - Finding someone to replace him because he’s tired after 14 hours of
travel
Lecture 7: Symmetric Cryptography
Cryptography
What is Cryptography?
“Where mathematics meets security engineering”
Cryptography
The study of creating ciphers
Techniques used to make messages non-understandable to third parties
Cryptanalysis
The study of breaking ciphers
Techniques used to make encrypted messages readable
Cryptology
The study of cryptography and cryptanalysis
Cypher is wrong
Cypher is a misspelling
It generally comes from people thinking “cyber” and “cipher” are similar
words/concepts
Other than pronunciation, they aren’t at all similar
Basic Terminology
Symmetric Cryptography
Encryption and decryption keys are the same
Asymmetric Cryptography
Different encryption and decryption keys
Also called Public/Private key cryptography
Defining Cryptosystems
Let
P be a finite set of plaintexts
C be a finite set of ciphertexts
K be a finite set of keys
A cryptosystem has three functions:
Security of Cryptosystems
Kerckhoff’s Principle (1883):
Security of a system should depend on its key, not on its design remaining obscure
Security through obscurity doesn’t work
Secrecy of a message should depend only on secrecy of the shared key, and not on
the secrecy of the encryption/decryption algorithms
Modeling Security
What makes a cipher good? two main models
Concrete Security
How much work an adversary actually has to do
How long to have a 50% + 1 chance of cracking a key?
Probability given what resources?
Very similar to how we evaluated the security of passwords
Standard Model
- This one is usually used
Indistinguishability
Comparison of encryptions based on the properties that are important
Length is not a hidden property - thus we don’t evaluate based on length
Something is indistinguishable if a random(ish) algorithm cannot find any usable
(non-negligible) information about the key or plaintext
Perfect Secrecy (Unconditional Security)
A cryptographic system is unconditionally secure (perfectly secure) if even an
adversary with unbounded computational capabilities (and resources) cannot break
the system
Given any ciphertext, all plaintexts of the same length are equally likely
No analysis can provide the plaintext
Disadvantage?
Too complex to be practically useful
Does NOT protect integrity (does not protect no modification by third parties)
There is a key to decipher a given ciphertext into any message
Ciphers
Letters as Numbers
Cryptography is about mathematics
We need to represent letters as numbers and numbers as letters
“f” function
If given a CAPITAL letter between A and Z, it will output a number between 0 and 25
f(A) = 0
f(B) = 1
... the f functions returns letters into numbers?
f(Z) = 25
Inverse “f” function (f-1) returns number input and expects a letter
If given a number between 0 and 25, it will output a CAPITAL letter between A and Z
f-1(4) = E
f-1(18) = S
Substitution Cipher
- The example above rotates between 13 places?
All letters are replaced by some other letter
The replacement letter is always the same
In the ROT13 example, letters are flipped A <--> N
Also in the BPP cipher example from a few weeks ago
Substitution ciphers don’t necessarily need to do this
The following could be the case (for example):
A --> N
N --> Z
Z --> W
W --> A
The key is the substitution table
Caesar Cipher
- All the letters are shifted down
Shifts all letters down by some amount
This amount is the key of this cipher
The same key is used to decrypt as encrypt
Mathematically:
Enck(P) = for each letter p in P: f-1(f(p) + k mod 26) = C (encryption)
We convert every letter to a number
Add the key to that number
Then convert that modified number back to a letter
Deck(C) = for each letter c in C: f-1(f(c) - k mod 26) = P (decryption)
We convert every letter to a number
Subtract the key from that number
Then convert the modified number back to a letter
Decrypt the following message (key = 7)
PZ AOPZ AOL YLHS SPML
- = Is this the real life
- Once you know a letter, you can use it again
Substitution Ciphers as Modular Arithmetic
- Z is a fancy way of naming the integers
- For the key, some numbers between 0-25
Cryptoanalysis of Substitution Ciphers
Frequency Analysis
frequency (a) = frequency(Enc(a))
Simplest form of cryptanalysis
Can deduce the message by figuring out the frequency of characters in the ciphertext
- If someone where have a plain text, plop the letters of frequency and suggest on that
you can figure out the plain text just by plottering the frequencies of letters.
Mono vs Poly Alphabetic
Monoalphabetic
Letters in plaintext are mapped to a single letter in the ciphertext
Key is generally a single number
Can also be a substitution table
Example: Substitution ciphers
Polyalphabetic
What a letter is mapped to depends on both what letter it is, and its position in the
ciphertext
Key is generally a word or phrase
Example: Stream Cipher
- Harder to break, if you figure out one letter you do not know the next one
Vigenere Tableau
Enc(P) = f-1((f(P) + f(K))mod 26) = C
Stream Cipher
Frequency Analysis
Can figure out what the key is by looking for patterns
- Take the key and repeat it over and over, that is how stream cipher works.
Decrypt the following message (key = HELLO)
- Use the Vigenere Tableau (this table will be provided during the exam)
- Look where the plain text letter and key letter intersect.
- F is the cipher text letter and the H is the key letter
what row does it represent?
- = Motel California
Problem with Stream Cipher
- Vulnerable to pattern analysis
One-Time Pad
What if the key sequence was as long as the message itself?
One time encryption key that is regularly rotated
Once per message
Once per day
Decrypt the following message (key = ONETWOSIX)
- = Bye Bye Bye
Only encryption scheme that is capable of perfect security
Only scheme in which it is possible to create a truly random ciphertext
Key as long as the plaintext
This creates major issues
How do you transmit the key?
How do you keep the key secure?
Issues with ciphers so far
Patterns appear
Because positions of letters haven’t changed
The longer the key, the harder the pattern
But that requires transmitting a very long key
What would make this better:
Shorter key
Able to move around position of letters
Transposition Ciphers
Take a permutation of length n:
e.g., {1, 2, …, n} -> {3, 1, n, .., 2}
Plaintext is written into n columns and columns are ordered according to the
permutation
- Note that n = 1 gives no encryption, and n = 25 is a random permutation of plaintext
Decrypt the following message (key = 2,5,1,4,3)
- 19 characters
- 5 numbers in key
- Columns of 4 letters (and column 5 of 3 letters)
- Length of characters dividing key length?
Playfair Cipher
1) Draw a 5x5 square and fill it with letters (omitting J)
Preferably not in alphabetical order...
2) Encrypt pairs of letters
Letters create a square, pick the two letters at the opposite end of the square
Pair ordered by row
3) Special cases
Double letters - add X in between
Odd number of letters - add Z at the end
Letters in the same column - take the two letters below (wrapping to the top)
Letters in the same row - take the two letters to the right (wrapping to the left)
Decrypt the following message:
- = Gangnam Style
Block Ciphers
Block ciphers break the plaintext into blocks of data and then encrypt those blocks
Repeated process per block
Playfair Cipher is an example of a block cipher
Block length of 2
Stream cipher is a block cipher with blocks that are the same length as the key
Edge cases:
Substitution ciphers have a block length of 1
One time pad has a block length the same length as the plaintext
Modern Ciphers
Base 10 (Decimal)
- 9 0’s, 0 10’s, …
- 10 is the base to the power of the next number
Base-2 (Binary)
Base-16 (Hexadecimal)
Base-10 to Base-2
Base-10 to Base-16
Different Bases
You can only have as many unique digits as the base
10 unique digits in Base-10 (0 - 9)
2 unique digits in Binary/Base-2 (0, 1)
3 unique digits in Base-3 (0, 1, 2)
16 unique digits in Hexadecimal/Base-16 (0-9, A-F)
Each digit is the base raised to the next higher power
You will need to know how to convert to/from Binary/Base-2
This will definitely be on the exam
You should also know how to convert to/from Base-16
Block Cipher
Data Encryption Standard (DES) uses a block of 64
Advanced Encryption Standard (AES) uses a block of 128
Most modern encryption focuses on encrypting binary values
Computers use bits
We are focused on encryption for bits
Bits and Bytes
A bit is the smallest unit of information for a computer
1
0
A byte consists of 8 bits
Historically, it was the number of bits needed to encode a single character
5Mbps
5 Megabits per second
Connection capable of sending 5,000,000 bits every second
5 MB
5 Megabytes
5,000,000 bytes
Consists of 40,000,000 bits
Encrypting Binaries
- XOR
SP Networks
Substitution Permutation Networks
This is self descriptive
First you substitute
Then you permutate
Almost all modern encryption is some form of SP-Network
Substitution box
- Correction 1001 (it becomes a 9 and replace it with a 14) 13
Permutation Box
We need to break our input into blocks of 4 bits.
1) Take the first half, and substitute it based on the table.
2) Take the second half, substitute it based on the table
3) Combine these back together and then permute the whole input
Substitution Permutation
Longer input
- Break it into blocks of 8 bits
SP-Round
SP-Networks
What’s the problem here?
Strength of encryption is based entirely on the obscurity of the
substitution/permutation table
Kerckhoff’s Principle
We need a key
AES
- One of the most common encryption schemes?
Advanced Encryption Standard (AES)
Block Cipher
Blocks of 128 bits
Keys of
128 bits
192 bits
256 bits
We take 128 bits and do something to turn it into ciphertext
8 bits = 1 byte
128 bits = 16 bytes
How AES Works
1) XOR with part of key
2) Substitute bytes
3) Mix Rows
4) Mix Columns
5) Repeat
AES
- Blocks of 128 bites = 16 Bytes
- You need to be able to describe the process on the exam (but not actually doing it).
Cryptanalysis attacks on block ciphers
Known plaintext attack: Attacker can make queries with plaintexts and ask for
ciphertexts
The known-plaintext attack (KPA) is an attack model for cryptanalysis where the
attacker has access to both the plaintext (called a crib), and its encrypted version
(ciphertext).
Chosen plaintext attack: attacker submits specifically chosen plaintexts
A chosen-plaintext attack (CPA) is an attack model for cryptanalysis which presumes
that the attacker can obtain the ciphertexts for arbitrary plaintexts.
Chosen ciphertext attack: attacker asks to decipher some chosen ciphertexts
A chosen-ciphertext attack (CCA) is an attack model for cryptanalysis where the
cryptanalyst can gather information by obtaining the decryptions of chosen
ciphertexts.
Related keys attack: attacker submits queries with keys related to the target key
In cryptography, a related-key attack is any form of cryptanalysis where the attacker
can observe the operation of a cipher under several different keys whose values are
initially unknown, but where some mathematical relationship connecting the keys is
known to the attacker.
One final remark
Symmetric cryptography uses a shared secret key
Any cryptographic scheme should be secure if and only if the key is secret
Secrecy of the algorithm is not needed and should be avoided
Obscurity is not a security trait
Lecture 8: Threat Modeling
Risk Identification
- Risk Identification threat modeling takes place in this stage
How do we identify risks
Intelligence Operation
External
Finding new threat actors
Estimating when threat events will occur
Ultimately an impossible task to do perfectly a lot of guess work and intuition
Identifying possible risks + how those possible risks manifest
House threat assessment
Threat modeling
Understand what risk we’re facing
Break down those risks into more manageable quantities
Share our understanding with others
Structuring Attack Information
Why threat models exist
Models are meant to simplify
Help guide analysis
Means of analysis
Tell analysts/others what to focus on
Risk identification model helps identify risks
Help explain information
Means of communication structuring information in a way someone expects it
Simplifies situation in such a way that others
Cyber Kill Chain
Not a “traditional” threat model a common threat model
An approach to structure attack information
Accomplishes similar goals
Generic way of structuring information about attacks
Created by Lockheed Martin
Traces the stages of a cyber attack
Comes from “Kill Chain” Kill Chain in cyberspace
Military term for describing the stages of an attack
[Link]
Cyber Kill Chain
Reconaissance
Weaponization
Delivery
Exploitation
Installation
C&C Command & Control
Actions continue to repeat this process
STRIDE
- Acronym where every single letter in the acronym is a type of attack which targets
a specific security goal.
- Idea analyst takes a system and takes each of these.
- Are you protected against all these types of attacks if so you are secure
Attack Graphs (not on the exam?)
Set Theory
Collection of objects
Some rules for sets:
Sets can only contain unique objects
Sets/Items can be combined - A∪B
Sets/Items can be removed - A/B
Sets can be sized - |A|
∈ - “Element of”
∉ - “Not element of”
∅ - Empty Set
Graph Theory
Mathematical Structure
G = (V,E)
V - Set of vertices
E - Set of edges
Edge = (v1 ∈ V, v2 ∈ V)
Finite State Machines (FSM)
Consist of states and state transitions
Used to model/describe behavior of simple automation machines
Attack Graph
V = States that represent the system
E = Transitions between states
AG = (S, τ, S0 ,Ss)
S = Set of states
τ = Set of state transitions
S0 = Set of starting states
Ss = Set of solution states
Attack Graph Example
- Each state represents the configuration of the system as a whole
Attack Trees
- It is a structured way of representing attack information.
Attack Tree Example
- The attacker wants a free lunch.
- A structured way of representing a lot of information.
Attack Trees
Graphical tree structure
Rooted, acyclic graph
Nodes consist of an attack goal or attack component
Nodes have children that are components
These children have one of three kinds of relationship
OR one of the components need to be accomplished
AND
Definition (technical won’t be tested on this)
An attack tree (t) is given with the following formal recursive definition:
t :: b | bΔOR(t, . . . , t) | bΔAND(t, . . . , t)
Where b is said to be an action or a goal.
This is defined recursively (refers to itself).
Semantics (no test on this)
“How do we determine if two attack trees are equal?”
We need to represent the ATs mathematically
Then make determinations of how they can be modified and manipulated
Multiset Semantics
Represents attack trees as multisets
Multisets are sets that can contain the same item multiple times
The refinements (OR/AND) is represented in how nodes are stored within the
multiset structure
Attack Tree Example 2 exam
Radicals/Refinements
OR
AND all of these actions need to be completed in order for the higher one to be
completed.
OR Refinement/Radical
Subgoals are in an OR relationship. One of these subgoals needs to be accomplished
in order to complete the overall goal.
This structure is used when there are multiple independent means of accomplishing
the overall goal, and they all need to be presented in the model.
AND Refinement/Radical
Subgoals are in an AND relationship. Both of these subgoals needs to be
accomplished in order to complete the overall goal.
This structure is used when there are multiple parts of the goal that all need to be
accomplished to complete the overall goal.
Abstraction
How do we figure out what to put on each level?
There isn’t a set way to determine this
Couple of different strategies:
1) Top down
Start from overall goal
Think of components and add them to the tree
2) Bottom up
Start from components and think of higher level node
3) Combination
If you can’t figure out an intermediate node, skip a level
Level of Abstraction
In general, it’s when we add another level of detail
If we break down a goal into sub-goals, or a component into sub-components
We move to another level of abstraction
Levels of Abstraction
In general, each level is a new level of abstraction
Attack Vector
One way ATs can be read
Reading the chain of events (components) that lead to the overall goal
Effectively a complete attack scenario
Can think of it as an attack tree but with everything besides the 1 scenario removed
No extra components or goals in a vector
Self-contained
Can compare different attack vectors to compare different scenarios
Attack Vectors
Get Phone Number +Invite to Social Function -> Befriend Administrator + Invent
Need -> Exploit Admin -> Get Root Access
In-Class Activity
Draw an attack tree – use any scenario you like
Common Error
Using Attack Trees as time based representations of attacks
These structures are designed to model the components of an attack
And how those components relate to each other
Attack Defense Trees (ADTs)
AD Trees
Attack-Defense Trees
Attack trees that also list how different elements can be defended against
Countermeasures + Defense Nodes
ADTs add a new type of edge
The countermeasure edge
Also adds a new type of node
The defense node usually a node in a different color
Up to now, all nodes have been goals/components in service of an attack
From the attackers perspective
Now we add the ability to model defenses
Counters a given node
That’s not a level of abstraction not adding a new level of detail to the model
It does not introduce a new level of detail, it reverses what is being modeled
If an attack vector contained a node with a defense, then that attack vector is likely
defended against
One way to use ADTs: look for attack vectors without defense nodes, then come up
with missing defenses
AD Tree Example
AD Tree Example 2
- This one is more complex used to model defenses
- From a defenders perspective
ADT Specifics
Can have a defense node as the overall goal
Then we’re checking if our overall goal is being kept
Can think of “defense vectors”
The components that build to a successful defense
Attack vectors but in reverse
Can go back and forth between attack and defense nodes
Still not new levels of abstraction
But a specific defense node can have a specific attack that counters it
And vice versa
In-Class Discussion 2
Think about how to add Defense nodes to your Attack Tree from the first activity
- If you defend one threat within an AND relationship, you just have to add a defense
node to one of them.
ADT Research
ADTs Formally
Looking into the semantics of ADTS
How are they defined?
How are they equivalent?
Several proposed formalisms
We discussed many of these
No generally accepted formalism
ADT Expansions – SAND
SAND = Sequential AND adds AND relationships in a specific order?
Normal AND relationships are unordered
All of the components need to occur
The order that these components occur is not set
SAND relationships prescribe an order
Usually drawn as an arrow under the component
ADT Expansions – Bayesian ADTs
Adds probability to the edges
Probabilities on each level need to add up to 1
Allows for expressing attack components by likelihood emphasizing what is the
most likely scenario for example.
ADT Conversion
Converting ADTs to different models
ADTs were created from Fault Trees ADTs are more general/more applicable
Model of failure states in a system
Very technical
Generalized into Attack Trees
ADTs <-> Attack Graphs
Attack Trees are component based attack representations
Attack Graphs are state based attack representations
Similar information represented in different ways
Automated ADT Generation
Major area of research
Generating ADTs from a text description
Very hard because of how to understand the semantics of intermediate nodes
How difficult was it for you to invent the intermediate nodes?
Imagine a computer doing this
This is why this is so challenging
ADTs in Practice
This is my area of research!
These structures have been around for 20+ years
First proposed in 1999
Thoroughly developed by academics
Despite all this work, this model is not widespread in industry
Practitioners prefer to use “temporal” models
Models that describe steps of an attack in order
Component-based representations are not popular
Why?
Is it a question of background?
Many models tend to be used by non-technical practitioners
This is part of what this assignment aims to answer
Computer science students will do the exact same assignment
Is it a question of application?
Maybe ADTs are not as good of a tool for analysis, but would be good for
communication
Or vice versa
ADT Practice Quiz
There is a practice quiz on Brightspace
This is part of my research
I would like to use your responses to develop our understanding of how people
interpret these structures
The first question is about consent to use your anonymized responses
You do NOT need to provide consent
The quiz is not for a grade
It is there to help check your understanding of ADTs for the project and for the exam
You have 1 attempt for the quiz
It’s part of the study
Solutions to the questions are provided after completing it
ADT Project
You have to do it
It’s part of the course
It’s 16% of your overall grade
If you want more than a 0, you need to do the project
Part of the “Assignments” category of your grade
40% of the Assignments category
16% of your overall grade
You need 50% in the Assignments category in order to pass
It’s genuinely hard to pass if you don’t do the project
You can NOT drop the project
The included drop is for one of the 5 regular homework assignments, NOT the
project
Late work is accepted (with increasing penalty, please see the table on Brightspace or
in the Lecture 1 slides)
“Perception” questions
Questions where I ask what you think about ADTs
Really important as part of the study
It’s one of the primary research questions
They are graded on a “minimum effort” standard
If you put in a minimum amount of effort into your answers, you’ll get full
marks
Easiest part of the rubric, whether or not you choose to participate in the
study
(see other slides for more information)
Lecture 9: Asymmetric Cryptography
Hashing
Old Banking
Test key takes a value (of the value) that may have been changed and should have
been lined up with the transaction?
Test Keys
The color code was an example of a test key
It’s a way of determining if changes have been made
Check if the message aligns with the code
If it doesn’t, then the message was likely modified.
Can we figure out what the message was based on this code?
No. It’s a one way function.
Cannot rebuild the message from a test key, only check if the message was modified.
Check is the message aligns with the test key, that is the idea of test keys.
Hash functions
Modern test keys are hash values or method authentication codes
Does not protect confidentiality you cannot rebuild a message from a hash value,
you cannot reverse it.
Hash functions generally can’t be used to rebuild the message afterwards
Protecting integrity and authenticity only the author (of the message?) could
generate the original hash value.
Since modifying the message would require knowledge of the original message
Birthday Paradox
For a class of 23 people, probability that two persons will have the same birthday is
about 50%.
So let’s try it
Hash Collision
Hash functions are only useful so long as every unique input has a unique output
Hash collision - two inputs result in the same output value two different inputs are
authenticated with the same info, which is a problem.
Can’t verify integrity if two plaintexts result in the same hash value
Modern Method Authentication Codes (MACs) are used until a collision is found
once it is found, it will be abandoned.
SHA-256/SHA-512 are the two most common ones for certificate hashing on the web
Digital Signatures
A way to digitally attribute a document to ensure its confidentiality, integrity and
non-repudiability
Asymmetric cryptography is used for the signing
Private key used for encryption
Public key used for decryption
Hash values need to be different, that is really important
Asymmetric Cryptography
Public Key Cryptography
Your public key is/can be known to the world
This is not secret
Someone could use your public key to encrypt a message
M -> C
The decrypt we need to know the private key
It is supposedly impossible to know the private key from the public key
We can also encrypt values with our private key
Only decryptable with our public key
Why would we do this?
Good general scheme:
Encrypt message with our private key to verify it is our identity
Encrypt that ciphertext with someone else’s public key then decrypt with the
private key and encrypt with public key.
Why Asymmetric Cryptography
Symmetric Cryptography is not scalable
Every authorized person needs to have the key
One person gives away their key and the cipher is broken
Internet
Couldn’t work with only symmetric cryptography
Billions of devices
Symmetric Cryptography is fast the internet wants to work fast
So ideally we could use asymmetric cryptography to establish a symmetric key
This fixes the scalability issues of key sharing
Basic Idea
- like a mailbox — everyone can drop a letter to me; only I have a key to open the
mailbox and get that letter out.
- public key — only encrypt; secrecy is not required for security
- private key — only decrypt; this should be secret
Requirements for Asymmetric Cryptosystems
We want a system such that
we can easily encrypt messages knowing public key pk
Polynomial Time
we can easily decrypt messages knowing private key sk
Polynomial Time
it is impossible (very hard) to decrypt a ciphertext (e.g., find sk) knowing only pk
Non-Polynomial Time
How can we find an implementation of such system that will probably satisfy our
requirements?
Idea: rely on math (known difficulty of some tasks & the nice properties of modular
arithmetic).
Computational Complexity
|x| is size of entry for x in our computer system (e.g., number of bytes we need to
store x).
Very basic intuitions:
We say that it is “easy” to compute some function f(x) if f is polynomial in the size of
|x|
E.g., f(x) = |x|2; f(x) = 24|x|5 + 7|x|3; etc.
O(n2); O(n5)
We say it is “hard” to compute function f(x) if f is exponential in the size of |x|
e.g., f(x) = 2|x|, f(x) = |x|!
O(2n); O(n!)
This is Big-O notation
An upper bound
Some functions are not defined arithmetically, but instead represent problems
solvable by algorithms (e.g., compute the greatest common divisor of numbers a and
b; find the shortest route in the network, etc).
algorithm’s complexity (easy/hard) can be measured in the number of steps it takes
to solve the problem for |x|.
we can discuss whether problems are easy or hard to solve, by taking the fastest
algorithm we know for this problem.
we currently know that some problems are hard (i.e. the best algorithm we have to
solve them is exponential in |x|)
Asymmetric Cryptosystem Design
Implementation:
Easy to compute f(x)
Hard to find x for a given f(x)
Easy to find x given f(x) and a hint (key)
Basic functions used:
Prime Factorization
Discrete Logarithm
Identity Property
Identity Property:
∀m: m*1 = m
For all m: m*1 = m
Expanded:
If pk*sk = 1, then m*pk*sk = m*1 = m
Trivial Asymmetric Cryptography
Why the identity property is so important
It enables this idea of being able to split apart the encryption/decryption
Since the keys combine to produce an identity (1), the keys can then be separated
Imagine two functions: A and B
A(x) = y
B(x) = z
A(z) = x
B(y) = x
A(B(x)) = B(A(x)) = x
So we can make A public, anyone can use it. And then keep B private.
A and B are just the keys in asymmetric crypto
Key Requirements
Can’t figure out the plaintext
Can’t find x from A(x) = y
Can’t figure out the other key
Can’t find B from A
This is why modulus is so important
Modulus is VERY hard to reverse
Identity Property
Selling Amazon Goods
Alice sells goods on Amazon
Bob wants to buy things from Alice
Bob doesn’t want to send a public list of items he wants to buy
We need a way for Bob to privately send his selections to Alice
- Easy to understand
- Only works from public to Alice
Selling Amazon Goods – Knapsack
Now we’ll try attaching random codes to every product
Bob can select his products, sum the codes and send this value to Alice
He sends the value of 61
What is Bob ordering?
Any issues with this?
Collisions: What if two different sets of products add up to 61?
Knapsack Problem: Figuring out which products add up to 61 is an NP Problem. So
this would take Alice a long time to figure out which products Bob selected
Solution: We need a mathematical way of selecting numbers so there are no
collisions, and we need a way of being able to quickly figure out the numbers
Selling Amazon Goods – Greedy
Alice selects codes in a superincreasing sequence.
Every number is greater than the sum of all numbers before
Now Bob sends the code 23
We can easily figure out what Bob wants with a Greedy Algorithm
Find the largest value that is less than the code, subtract that from the code
Repeat until you reach 0
We are guaranteed not to have collisions and figuring out what is being ordered is an
easy problem
Greedy algorithm runs in polynomial time
Any issues with this?
Anyone can use this method to figure out what Bob ordered
We need a way to encrypt this data
Merkle-Hellman Knapsack Cryptosystem
These codes were generated using asymmetric cryptography
Specifically,
pk = 23
sk = 22
p = 101
The public codes are generated by taking the private codes, multiplying them by the
public key and modding with the mod value
m*pk mod p
Bob sends the code 120
What is Bob ordering?
Merkle-Hellman Knapsack Cryptosystem
Alice receives 120 from Bob
Alice can follow a simple process to figure out the combination
120 * sk
120 * 22 = 2640
2640 mod 101 = 14
Use the Greedy Algorithm to figure out what Bob ordered
Discrete Factorization Encryption
Discrete vs continuous?
What is factorization?
We use prime numbers to generate the key pair
The prime numbers use to generate the key pair
p and q
Not given to the public
It’s easy to generate a key pair if the prime factors are known
It is easy to encrypt with the public key
It is easy to decrypt with the private key
It is hard to decrypt without the private key
Brute force
RSA is an example of discrete factorization
RSA
RSA stands for its inventors, R. Rivest, A. Shamir and L. Adleman.
Provides public-key encryption based on the factorization problem:
For two given prime numbers p and q, it is easy to compute p*q
Given p*q, it is hard to find p and q (in the size |p| and |q|)
And given p*q mod t for some large t, it is even harder to find p and q.
Fermat’s Little Theorem
gcd is greatest common divisor
Given two numbers, the gcd is the largest number that divides into both numbers
No remainder
A prime number if such that the only divisors are 1 and itself
The gcd between a prime and any number smaller than it is 1
For all a and n such that gcd(a,n)=1
A(p-1)(q-1) =1 (mod n), where p and q are prime numbers
Fermat’s Little Theorem
Idea: This gives us a way to generate public-private key pairs
From p and q
Remember the identity property
RSA Protocol
1. Select two large primes p and q; set n = p*q
2. Choose e:
1 < e < (p-1)(q-1) and e is prime and not a factor of (p-1)(q-1)
gcd(e, (p-1)(q-1)) = 1
3. Find d:
e*d mod(p-1)(q-1) = 1
Can also be found separately for p-1 and q-1, and combined
4. Public parameters: n and e
5. Private parameters: p, q, d
Prime Factorization (RSA)
Select p=7, q=13
Compute n
= p*q
= 7*13
= 91 = n
Compute (p-1)*(q-1)
= 6*12
= 72
Choose e such that gcd(e,(p-1)(q-1))=1:
Choose e such such that 1 < e < (p-1)(q-1), e is prime, and not a factor of (p-1)(q-1)
e.g., choose e=5
Find d such that e*d = 1 (mod (p-1)(q-1)):
d=29
check d*e = 5*29 = 145 = 1 (mod 72)
RSA Encryption
RSA Decryption
RSA
What was the public-private key pair?
e-d
e is the public key - everyone knows this
n is also public
d is the private key - only you know this
RSA Example
Prime Factorization Cryptography
Taking advantage of the this mathematical property:
a(p-1)(q-1) =1 (mod n)
Identity Property
p and q are the prime factors
They are used to generate the keys
Very hard to figure out from the public key (NP problem)
Allows us to find two numbers such that (me)d mod n = m
e-d are a public-private key pair
Created from p and q
Then we can let everyone know what e and n are
Keep d private
If we transmit c = me mod n publicly, it is not possible to figure out m without brute
forcing it
Try all possible m to see which c is created
Easy to decrypt if you know d
Also could know p and q and then figure out d
Asymmetric Key Exchange
Asymmetric Crypto is So Slow
Asymmetric cryptography is arguably more secure
Given that there never needs to be a shared key
There is a mechanism for authenticity which doesn’t exist with symmetric crypto
Asymmetric cryptography also takes a while
The backbone of asymmetric crypto is mathematics with mind-bogglingly huge
numbers
Hundreds of digits
Prime Factorization and RSA is no longer sufficient
We need ridiculously long primes in order to be secure enough in the current security
space
Takes too longer to find valid primes and to encrypt/decrypt
What’s a potential solution?
Use asymmetric crypto to establish identity and a shared key
Then use symmetric crypto (AES) to do actual encryption
Asymmetric crypto gives us an “easy” way to share keys
And maintain authenticity
Can change keys often to maintain security
Asymmetric Symmetric Key Establishment
We want to take advantage of the commutative property:
(a * b) * c = a * (b * c)
The order of multiplication does not change the result
Also holds for other processes
Like exponents
(ba)c = (bc)a
With regular asymmetric crypto, we use the identity property
The idea being that we can split apart the encryption and come to original value
With asymmetric key establishment, we use the commutative property
We don’t want to come to the original value.
We want to the original value to be unfindable
We want to come to a new shared value
Diffie-Hellman Key Exchange (w/ Basic Multiplication)
Asymmetric Symmetric Key Establishment
Alice and Bob never knew each other’s original value
But both reached the same shared value
Multiplication is too simple, can be easily reversed
Again, modular arithmetic is going to save us
Much harder to reverse
On the next slide, we take advantage of the following:
(a * p mod n) * b mod n = (b * p mod n) * a mod n
Diffie-Hellman Key Exchange (w/ Modular Multiplication)
Discrete Logarithm Cryptography
Mathematically more secure than prime factorization
Identity property vs commutative property
Slightly different encryption equation
It’s harder to brute force
We have a public generator g
We have a public modulus operator n
We calculate Enc(m) = gm mod n
g can not be any number
It has to be a “primitive root” or generator
Discrete Logarithm vs Prime Factorization
Prime factorization and discrete logarithm use very similar encryption equation
Enc(m) = ma mod b = c (Prime Factorization)
Enc(m) = am mod b = c (Discrete Logarithm)
With prime factorization, a and b are such that there is another value, d such that
cdmod b = m
With discrete logarithm, this does not exist
Instead, the base of the exponent is chosen very specifically (next slide)
The idea being that we can take the base exponent to another power, mod it, and
then take it to another power and mod it and get the same result
Discrete logarithms are not intended to be decrypted
Rather, they are used to established a shared symmetric key
Then symmetric encryption can be used for secure messaging
Discrete Logarithm Fundamentals
gx mod p
Want some g and p such that:
(ga mod p)b mod p = (gb mod p)a mod p
This does not work for any g and p
Just like the prime factorization equation does work for any e and d
We need a special g for p, and we call this g a primitive root of p
Primitive Root
Intuition
Need a g that will encrypt all messages to different values given the modulus p
How to check?
Checking for primitive roots
Discrete Logarithms
given g (primitive root) and x, it is easy to find gx mod p.
given g (primitive root) and C, it is hard to find x = loggC
thus function f(x) = gx mod p is a one-way function
The commutative property!
(ga)b = (gb)a
This works for any g, a and b
(ga mod p)b mod p= (gb mod p)a mod p
This only works for specific g and p
Primitive roots can be larger than the mod value
Mod value can never be non-prime
Diffie-Hellman Key Exchange
Application of discrete logarithm
Modern DHKE uses elliptic-curves
Does not require decryption
Both participants encrypt using their private keys, and then transmit an intermediary
value, which is then encrypted using their private keys again
Doesn’t necessarily need to use discrete logarithm
It would work with basic addition
Basic addition would be incredibly non-secure
Diffie-Hellman refers to this “shared calculation” scheme
Not to how the calculation is actually performed
As we already saw DHKE with basic and modular multiplication
Those are never used in practice, that was just for explanation
Diffie-Hellman Key Exchange (w/ Exponents)
Diffie-Hellman Key Exchange (w/ Discrete Logarithm)
Man-in-the-middle Attacks
Man-in-the-Middle Attacks
Elliptic-curve cryptography
Probably the most common symmetric key establishment protocol
You can check what protocol is being used
And even seethe website’s public key
Math is more complex than discrete logarithm
Same idea
Not used to encrypt messages
Used to establish a shared symmetric key
Lecture 10: Secure Software Engineering
What is Secure Software Engineering?
Software is developed or engineered in such a way that its operations and
functionalities continue as normal even when subjected to malicious attacks.
Why bother?
There are roughly 4000 cyberattacks every day
Cybercrime can cost organizations millions of dollars in damages
Cybersecurity builds trust to allow users trust you
Our identities protect our data privacy and security cannot be distinguished?
privacy needs to be taken into account.
Every organization has vulnerabilities impossible to create totally secure software
Bill Gates: Trustworthy Computing
This is the e-mail Bill Gates sent to employee at Microsoft, in which he describes the
company’s new strategy emphasizing security in its products.
Use security at every stage?
It is part of the history message to promote security.
- Culture of security started at that time.
Famous cyberattacks
The name The price The scale
WannaCry ransomware $300 in Bitcoin within three Around 200,000 computers
attack (2017) days or $600 within seven were infected across 150
days) to decrypt that data countries
Microsoft Exchange Server N\A 250,000 servers, including
data breach 2021 servers belonging to around
30,000 organizations in the
United States, 7,000 servers
in the United Kingdom
Amazon attack in 2018 Customer churn, and Attack to knock
damage to a brand’s
reputation
- January 2021 after four zero-days were discovered in on-premises Microsoft
Exchange Servers, giving attackers full access to user emails and passwords on
affected servers, administrator privileges on the server, and access to connected
devices on the same network.
- 2018 Amazon witnessed a DDoS attack aimed to knock of the websites and
application hosted on the cloud platform and later it was found that the attack was
1.7 Tbps in strength – meaning web traffic amounting to 1.7 terabytes per second
was witnessed flooding the AWS servers at a [Link] attack. Technically
speaking, hackers launch Denial of service attacks by compromising large sections of
connected devices with malware and then use them as ghost machines to flood
servers hosting websites and operations with fake web traffic to disrupt them
temporarily or permanently in some rare occasions.
How can we create secure software?
SDLC: Software Development Lifecycle
Feasibility analysis whether we can do the project or it is impossible from the
security point of view.
Security requirements
Threat modeling
Secure coding practices
Security testing
Penetration testing
Project inception
The main goal:
Resource allocation (both human and materials)
Capacity planning
Project scheduling
Cost estimation
- Project inception is about money, understand whether you get profit or not or
whether you have enough money
- This is not about security from a security point of view.
Involved security practices
Define metrics and compliance reporting
Determine whether the application is covered by the Security Development Lifecycle
Assign the Security Advisor
Build the Security Leadership team
Analysis & Requirements
The main goal is to align security requirements with functional requirements.
- The customer does care about security at all, that is the problem maybe customer
will specify security requirements which will make it easier. Or decide security
requirements by yourself. do not expect these from the customer.
Involved security practices
Established security requirements in analysis and requirement phase, not in
implementation phase like most people do.
Security and privacy risk assessment you do not have unlimited budget for
every project you should assess every possible risks? privacy is also here
The example:
Functional requirements: (see above?)
- It is essential to define the minimum acceptable levels of security quality and to hold
engineering teams accountable to meeting that criteria. Defining these early helps a
team understand risks associated with security issues, identify and fix security
defects during development, and apply the standards throughout the entire project.
Architectural and detailed design
The main goal is to create a fundamental structure of the software (user interfaces,
system interfaces, network requirements, databases)
- See the data flow to build the software and see how it should work.
Involved security practices:
Perform threat modeling. Threats can be enumerated using STRIDE (Spoofing,
Tampering, Repudiation, Information Disclosure, Denial of service, Elevation of
privilege)
Establish design requirements here we have a model of our software, see the way
a user logs in our system here you should put design requirements extra
security requirements which you skipped previous phase.
- Architectural and detail design are both different designs?
Implementation
The main goal is to write code and build the application according to the earlier
design documents and outlined specifications.
The main practices are:
Use approved tools
Use cryptographic standards
Use static analysis
- We do not live in an ideal world therefore standards do not always work.
Verification and testing
The main goal is to verify that the entire application works according to the customer
requirement.
The security practices are:
Perform dynamic analysis security testing
Identify, implement, and perform security tests
Fuzz Testing
when you see problems here send them to developers who decide what to do with
this.
Release and maintenance
The main goal is to release the software and check for deployment issues if any. After
that customers start using the developed system.
The security practices are:
Penetration testing
Final security review
Execute incident response plan
Static analysis
In general: analysis of run-time properties of a program without executing the code
In contrast to dynamic analysis
Key idea: abstraction
Transforming a real, concrete program into a simpler, abstract program that retains
some key properties
Over-approximation: abstract program has more behaviors than real program
Under-approximation: abstract program has fewer behaviors than real program
Key types:
Syntactic pattern matching
Control & Data flow analysis
Constraint based analysis
Manual code review also static analysis, however, many people see at as a
separate type of testing, but it belongs to static analysis.
- Used in implementation phase as you do not have to execute the program?
Static Code Analysis involves two major steps:
1. Transform the code into an Abstract Syntax Tree (AST)
- "Abstract" because it abstracts away low-level insignificant details like parenthesis,
indentation, etc, allowing the user to focus only on the logical structure of the
program — which is what makes it the most suitable choice for conducting static
analysis onto.
2. Apply analysis rules to find potential issues
Control Flow
Control flow analysis: a technique to model a flow of control within a program,
making analysable all individual execution paths.
A control flow graph has nodes representing basic blocks, and edges representing
flow of control.
Data flow
Data flow analysis: a technique to compute a set of possible values/analysis facts at
every point of a program.
- Example of how static analysis works.
- Because given a control-flow graph, we are computing facts about data/variables and
propagating these facts over the control flow graph
Dynamic analysis
- You need developed software that works then you execute it.
Degree of automation
Manual analysis
Automated analysis
Goal
Testing (verification and validation) testing the functionality of the program (not
security perspective)
Security analysis
Collecting information
At the execution system
Evasion-resistance
Instrumentation
Evasion-resistance
- Technical
- Dynamic analysis is the testing and evaluation of an application during runtime.
- Manual dynamic analysis: traditional security analysis; often aided with debuggers
and bare-metal/transparent analysis systems
- Automated dynamic analysis: a more recent analysis; often associated with
sandboxing
Static analysis vs. Dynamic analysis
- Many software defects that cause memory and threading errors can be detected
both dynamically and statically. The two approaches are complementary because no
single approach can find every error.
Static analysis
The testing and evaluation of an application by examining the code without executing
the application.
Reveals subtle defects or vulnerabilities whose cause is too complex to be discovered
by static analysis.
The role of static analysis is especially valuable in security assurance, because
security attacks often exercise an application in unforeseen and untested ways.
Dynamic analysis
The testing and evaluation of an application during runtime.
Dynamic analysis can play a role in security assurance, but its primary goal is finding
and debugging errors.
Dynamic analysis can play a role in security assurance, but its primary goal is finding
and debugging errors.
Why does the number of vulnerabilities continue to grow?
- The number of vulnerabilities is growing according to the graph.
Cyber Security by Integrated Design
Microsoft SDL
NIST 800-218
Secure software development methodologies
How the table looks like
Closer look at the table
Auxiliary practices: beyond technology
Security metrics
Building a culture of security
Understanding human behaviour
Policies, strategies, standards and conventions
Auxiliary practices of incident or vulnerability response
Communication process and customer responsibilities
Ethics
Privacy
- It is not work of technical people, such as privacy.
Why are developers unable to develop secure software from the beginning?
- People are not robots and therefore they make mistakes.
Human, Organizational, and Technological Challenges
Human:
Role of Personality
Attitudes and Perceptions developers find security difficult to grasp.
Knowledge lack of security knowledge for example
Organizational:
Security culture
Requirements and Policies
Support and Strategies
Technological:
Security APIs/Libraries and Protocols
Analysis Tool
Updates and upgrades
Developers Are Not the Enemy!
Modern security practice has created an adversarial relationship between security
software designers and developers.
Developers are not free from human error!
How to support developers?
Include testing mode
Make defaults safe and unambiguous
APIs should be easy to use
Developers Need Support, Too
Problems with searching the information about how to write secure code:
Some advice is outdated
No concrete examples or exercises
Some critical topics are not well represented.
To remedy these problems pedagogical experts to generate exercises and tutorials,
and human centred security experts and legal experts to deal with social engineering
and regulations are required.
Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?
Tool Output
Collaboration
Customizability
Result
Understandability
Users are not the enemy!
- Users pay money for your software.
Insufficient communication with users produces a lack of user-centered design in
security mechanisms.
- How to create user friendly software?
Users are never motivated to behave in a secure manner!
Conclusion
Develop secure culture in the organization explain people that security exist and
we should know about it then it depends on the role within the project as
different tasks are assigned but the culture is the same in which security is part.
Create security policy in the organization
User-centered design
Technology should adapt to its users rather than require users to adapt to technology
Take into account that developers aren’t free from human error
Support developers while using security tools (for example, having intuitive defect
presentation and detailed explanation of defects with automatic fixes where
appropriate, and including easy and useful configuration options for the tool)
Lecture 11: Cyber Threat Intelligence (CTI)
Agenda
Open source intelligence
Types of data & processing approaches
Examples
<..>
Intelligence
Any kind of data & information we can collect about potential attackers and future
attacks
Can be open source (abbreviated OSINT)
Can be closed-source/restricted (collection requires clandestine operations or
contractual relationships)
Organisations are interested <..>
OSINT
OSINT
Any relevant information that can be collected from publicly available resources
Can be focused on attacks and attackers (threat intelligence)
Attack attribution
Proactive threat landscape monitoring
But also can be focused on organization itself (attack surface)
Representing what might be available to an attacker about this organization
Not necessarily, free, but available <..>
Example data sources
(see slide)
OSINT categories
(see slide)
Open Source Data
Raw Information and Data
Open Source Information
* Processed Information such as journalistic content
Validated Open Source Intelligence
* OSINT which has a high degree of certainty of its truthfulness
Open Source Intelligence
* Compiled data that addresses a specific query
Advanced Google search
- This is an example of Open Source Intelligence?
Google Dorking
- Exploiting common vulnerabilities
- List of comments interesting to look at it.
- See slide for link
Meta data
ExifTool: extracts and modifies meta-data information from files
- Be aware of the meta-data of a picture for example, as a manager, be sure about the
data you send due to the meta data
- You can also manipulate the meta data creating opportunities to be attacked?
- Metadata creates opportunities for attackers
Shodan
Search engine for Internet-connected devices
- Look for internet connected cameras for example.
- This shows that we are really unprotected.
Domain name, etc.
Blockchain OSINT
A cryptocurrency address is a public key that the user shares if they wish to receive
coins to this address
(Most) cryptocurrencies like Bitcoin are public ledgers: transactions among addresses
can be tracked
- Blockchain data open source information?
GDPR and OSINT?
How have GDPR impacted the OSINT investigations?
- Did not change what they do (according to the study).
Ethical & legal considerations
OSINT is publicly accessible, but it does not mean it is 100% legal to store and
process this data (PII, trade secrets, etc), or act upon it (access to publicly open web
cams, etc).
There is no consent
Trespassing
Ethical considerations are important: aggregating data from various public sources
we can uncover something that is private and was not intended for sharing
- Therefore do not click on the links provided during the lecture.
Threat intelligence feeds
Threat Intelligence
Strategic
High-level information on changing risk
The board
Tactical
Attacker methodologies, tools and tactics
Architects and sysadmins
Operational
Details of a specific incoming attack
Defenders
Technical
Indicators of specific malware
SOC staff / IR
(see slide for the cycle)
- Ransomware became a big issue for organizations today.
Indicators of Compromise (IoCs)
Tactics, Techniques and Procedures (TTPs): high-level representation of attackers’
actions (see MITRE ATT&CK)
Tools: specific technological solutions that attackers are using (e.g. WannaCry
ransomware)
Artefacts: files, log entries, and other traces left in the attacked systems
Domain names: Web domains related to the attack (e.g. <..>)
IP addresses: Internet Protocol addresses (e.g., [Link]) related to the attack
Hash values: Hash values of files left by the attackers
Has functions (SHA2, MD5, etc.) compute a fixed-length string for any goven input
string (e.g. a file); have very <…>
- Hash values providing authenticity?
(see slide)
Cyber threat intelligence
Cyber threat intelligence (CTI) feeds are specialized sources of indicators about
adversaries
Techniques, tactics, procedures, and symptoms
[pyramid of pain]
<…>
CTI feeds
CTI feeds exist in a variety of flavors:
Lists with indicators (blaclists of Ips, domain names, malware file hashes, bitcoin
addresses, etc.)
<..>
MISP Platform
Threat intelligence sharing platform and infrastructure <..>
CTI feed quality
CTI feed quality
24 open source CTI feeds with domain names and IP addresses evaluated based on:
Timeliness: how quickly an active IP is flagged
Sensitivity: how much activity is required until flagging
Originality: how many unique indicators
Impact: are the indicators useful or will we block legit services
Tier 1 network operator data is used to check the IP traffic
Lecture 12: Privacy
Definitions (exam material!)
Confidentiality
Only authorized agents should have access to this data
Secrecy
Only authorized agents should know about this data notice that data even exist
Privacy
Only agents authorized by a person should have access to the personal data of a
person
Privacy
Privacy is a capability and a fundamental right
Privacy is control over your own information
Privacy exists only for living people corporations can have privacy, not secrecy?
Practically, we can only talk about privacy of some specific individual and their
specific data considering some specific adversary
Strong Privacy
Anonymity
A person is not identifiable within a set of subjects
Unlikability
A person’s actions/attributes cannot be traced to a person’s identity actions online
for example
Unobservability
Non-authorized party cannot distinguish if a person is using some system or protocol
web security lecture nobody should know where the traffic should be going to.
Web Fingerprinting
Your behavior/tools online give a LOT of details about who you are
This information is tracked with a wide array of tools
Information used by businesses to make money
Sell information directly
Use information to change your behavior
Check your fingerprint: [Link]
What can you do?
Use a more private search engine: DuckDuckGo
Use a more private browser: Brave
Personal Information Disclosure
- What is personal information?
Method: Never disclose personal information
Problem: What counts as “personal information”?
Are functional cookies “personal information”?
If functional cookies are “personal information”, then Web 2.0 stops working then
the Internet stops working.
Ensuring Privacy
Two methods
Personal information not disclosed at all
Can conflict with functional requirements
Personal information is disclosed in a way that the data subject cannot be identified
among others
Small sacrifices in personal privacy for “the greater good” if we maintain
anonymity, the amount of privacy lose being disclosed <..>
Can be difficult to determine what data is releasable
Ensuring Privacy Example
ADT Study
Data Management Plan (DMP)
What data is collected?
Special rules if personal data is collected
How is the data collected?
How is this data safeguarded?
What data is released?
Open Data
Ethics Approval
Cannot get approval without DMP
Ethics Board strictly controls this
Information Release Models
Why release information at all?
Example 1
Company has a data breach as well an information release, however, it is not
intentional
Has to be able to say who was affected
This is an information release
Example 2
Research is funded as a part of a project
The project follows open science/open data rules
Making that data accessible for other researchers
Example 3
Information Releases
Information Releases
Sharing (a part of) a database, or providing an interface to query a database the
ability to ask or get information from that database.
Information release can be insecure/leaking private information if individuals still can
be identified and their private information revealed, e.g.: overall goal is privacy
we can find the record of this specific individual and understand that this is the right
record
Want to be able to release information while maintaining privacy
Maintaining anonymity
Two concepts:
k-anonymity
Differential anonymity
k-Anonymity
Given a database with n records and m columns (attributes)
Intuition: given this database with personal information, produce and information
release such that no individual can be identified with certain guarantees, but the data
is still useful
An information release from this database satisfies k-anonymity for a given set of
attributes, if for each individual in the released data there are k-1 other individuals
that have the same properties.
Methods:
Suppression: removing certain attribute values
Generalization: replacing some attribute values with broader categories
Separation: release data in separate chunks
k-Anonymity
Name Age Gender State of domicile Religion Disease
Ramsha 30 Female Tamil Nadu Hindu Cancer
Yadu 24 Female Kerala Hindu Viral infection
Salima 28 Female Tamil Nadu Muslim Tuberculosis
Sunny 27 Male Karnataka Parsi No illness
Joan 24 Female Kerala Christian Heart-related
Bahuksana 23 Male Karnataka Buddhist Tuberculosis
Rambha 19 Male Kerala Hindu Cancer
Kishor 29 Male Karnataka Hindu Heart-related
Johnson 17 Male Kerala Christian Heart-related
John 19 Male Kerala Christian Viral infection
What is the k-anonymity of this dataset?
- This dataset contains a lot of private information
- For some attributes it is more than one, for gender, the k-anonymity is 4 for the data
itself. 4 females and 6 males, 1 entry there is at least 3 entries how many
attributes have the same <..>.
- The dataset as a whole as a k-anonymity of 1.
k-Anonymity: Suppression
Suppression: Remove information
Information being released may not be necessary
If this data was released as a medical study, would religion or the name help other
(medical) researchers?
What is the k-anonymity of this dataset?
- Age is too specific in this case.
Suppression to increase k-anonymity?
k-Anonymity: Generalization
What is the k-anonymity of this dataset?
- Gender is 4
- The k-anonymity for the whole set is 2
- Find the smallest number of the same entries
Generalization: Make information less specific
Don’t need to provide specific information when a general overview makes sense
Still allows other researchers to analyze the data
But without compromising identity
- The higher the k-anonymity, the harder to find a specific individual
Generalization to increase k-anonymity?
k-Anonymity: Separation
Allows releasing information that is split, so other researchers could still use this for
analysis, but hopefully without being able to discover too much about the
participants
What is the k-anonymity of these datasets?
- Relational information is interesting
Separation to increase k-anonymity?
Differential Anonymity
Is there a problem here?
- There is only 1 entry and find more private information about the individual.
We can reconstruct a partial picture of the full dataset. So based on this, we know
that there is a Male in Karnataka who follows the Parsi religion.
Anonymity
What is the k-anonymity of this dataset?
- It is 1.
k-Anonymity: Suppression
What is the k-anonymity of this dataset?
k-Anonymity: Generalization
What is the k-anonymity of this dataset?
- Still 1
Differential Anonymity
Would this be ok to release?
Differential Anonymity
Intuition: multiple datasets reveal extra information (differential information gain)
Basic idea: for all individuals, the differential information gain between the
processing dataset set that contains the record about this individual and processing
the dataset that does not contain this record is capped at some small level (ε)
Differential Anonymity
What is the problem here?
- It is identifiable if you line up these two together.
Comparison of Privacy Models
K-anonymity is a function of a dataset: it transforms it into a new dataset(s) that
satisfies this property
How easy is it to find a unique entry in a single data set?
Differential privacy is a function of a data processing mechanism: no matter how
sensitive is the original dataset, this mechanism will ensure that a certain level of
privacy will be achieved.
How easy is it to establish a unique identity by looking at multiple datasets?
- Hard to answer.
Privacy by Design
Privacy, like security, should be built into a system
It shouldn’t be implemented as a function of information releases
Since not all information releases are intentional
Should be implemented as a function of the system as a whole
Accidental information releases don’t reveal too much
Permissions
Permissions
Are everywhere
Permissions on the web/mobile are a DAC
You are the administrator
You decide what kind of access to grand
Android Permissions
Access to sensitive platform APIs are guarded by permission checks
Permissions have levels:
normal — granted to all apps automatically being able to save files or have its own
files stored for example being able to access some data.
dangerous — granted if the user approves at run-time location sharing for
example you cannot share it generally
signature — granted to trusted packages* (controlled by certificate checks) this
comes from Google, Google is the one who decides this.
can be system/third-party
Issues with Permissions
Access can be gained through intermediaries other apps can get permission
because some others already have it as well? granting permission to something
else by something you gave permission.
Confused Deputy Attack
S. Bugiel et al. “Towards taming privilege-escalation attacks on Android” at
NDSS’2012
app interaction model in Android allows low-privileged apps to ask those with
more privileges for services
This is actually encouraged by Google in secure coding guidelines for developers
[Link]
Apps can be overprivileged
Request more permissions than needed
Wei et al. “Permission evolution in the Android ecosystem” in ACSAC’2012
o 45% of popular apps are overprivileged
Currently the ecosystem is so complex that excessive privilege is the norm.
Users do not understand permissions
A. Felt et al. “Android permissions: User attention, comprehension, and behavior” at
SOUPS’2012:
17% of users read permissions
even less comprehend them
Permissions constantly evolve
developers are confused with permission usage too
Zhauniarovich and Gadyatskaya “Small changes, big changes: An updated view on the
Android permission system” in RAID’2016
number of permissions grows
permissions change their security levels
Data Privacy
Mobile apps often leak sensitive data from devices
Lu et al. “Checking more and alerting less: Detecting privacy leakages via enhanced
data-flow analysis and peer voting” in NDSS’2015
44% of apps leak data
o Location, device ID, contacts, phone number, browse history, call log,
SMS, cookies…
Small Concepts
Small Concepts
These are concepts worth mentioning, so you are aware of their existence
Not enough to fill an entire lecture
Safety Engineering: Critical Systems
Critical systems are designed to prevent a specific class of failure if at all possible
Security-critical
In which security violations are to be avoided
NSA
Business-critical
In which the continuity of business is to be maintained
Most companies
Safety-critical
In which safety is to be maintained
Prevent loss of life/major loss of property
Safety Engineering: Safety-critical systems
Fail-safe
System becomes safe if it fails
System fails in a way that doesn’t create a danger situation
Railway systems
Fail-secure systems
System becomes secure if it fails
Electronic safe
Fail-operational
System continues to function if it fails
Generally maintains previous status
Thermostat
Fail-soft
System offers a lower level of service if it fails
Graceful degradation
Spare tire
EU as a Standard Setter
California’s Prop 65 in 1986 caused labels to change everywhere in the US
It was easier to print one common label on all products than print special labels just
for California
The EU may become a world standard setter for data privacy/security
It will be easier to just have all data comply with EU law than have two systems
Security Patching
Process of updating software
Historically required
Publishing new software
Sending installation disks
Remote updates
Requiring manual updates?
What’s the best practice?
Security Patching: Stakeholders
Customers
Would prefer to not have to patch (lazy customers)
Vendors
Expense and inconvenience of patching
Security researcher
Wants some reward for their security work
Intelligence agencies
Want to abuse security flaws before they’re patched
Security software companies
Want to sell security solutions
Governments/Large corporations (that are involved)
Don’t want security flaws
Don’t like large necessary patches
Security Patching: Timing
On Demand
Patching when a patch is finished/necessary
Can be disruptive
Pushing a smart TV patch near a major sporting event
Requires stable connections to be maintained
Requires checking if a patch is available
Bad internet connection/bandwidth
Regularly Scheduled
Technically less disruptive
Can be irritating if the schedule is too frequent
Update schedule clashes with an event
Can build the update schedule into the software
Instead of needing to constantly check if a patch is available...
Inflexible
Unable to respond/response hindered to major security threat out of
schedule
Lecture 13: Review
Exam
See the PowerPoint.
Recap
Basics
What is cybersecurity?
Cyberspace
Attackers (different types, 3 sets of 4 types)
Types – passive vs active, insider vs outsider
Who?
Security Framework
Access Control
Operating System Access Control
ACLs
Capability Lists
ACMs
rwx
Access Control Models
DAC
MAC
ABAC
RBAC
- Which one write up/write down etc.
Environments
Security Risk Management
All of the steps of the Risk Management Process
Different types of risk analysis
Tabular
Graphical
Risk evaluation
Risk treatment
- High impact/low impact etc.
Controls calculating their benefit
Symmetric Cryptography
Encrypt/Decrypt/Explain
Substitution Ciphers
Caesar Cipher
Stream Cipher
One-Time Pad
Transportation Ciphers
Playfair Ciphers
SP-Networks
Binary Values
Understand the process of AES
Don’t need to actually encrypt something with AES
- No cryptanalysis on the exam (there wont be brute forcing)
- Decrypt something using the key can be asked
Asymmetric Cryptography
Merkel-Hellman Knapsack Cryptosystem
Encrypt/decrypt with RSA
Discrete Logarithm
Encrypt
Check for primitive root
Diffie-Hellman Key Exchange make sure what he says to check the map, what he
tells you to do?
Hashing what is hashing used for (not specific hashing algoritihms).
Web Security
- Three layers of the web
TCP/IP what do these protocols main
Packets their basic structure
HTTP Request/Response
Security
HTTPS
TLS why does it make more secure and how does it work?
Digital Certificates
VPNs
TOR
Cookies
Attacks
XSS
MitM
SQL Injection
Protocols
Needham-Schroeder Authentication Protocol Error therefore vulnerable to MitM
attacks
Needham-Schroeder-Lowe Authentication Protocol
Needham-Schroeder Symmetric Key Protocol
Concurrency
Transaction
ACID what is that type of transaction why is it important
Resilience vs Redundancy
- No transposition ciphers on the exam.