Summary of the first-part of this Stanford’s web security course taught by Feross. Topics include: Cookies, XSS, CSRF, SOP, CSP, Fingerprinting, Phishing
Lecture 1 (HTML, JS review)
- Uniform Resource Locators (URL) -> protocol (http://), hostname (example.com), port (:80 / :443), path (/), query (?), fragment (#)
- Relative URL (“asdf”) -> from current path, Absolute URL ("/asdf") -> from root path
Lecture 2
What happens when you insert a link on the URL?
Lecture 20a (Browser Architecture)
Readings
They separate the rendering engine (deals with untrustworthy inputs) and browser kernel (runs with not sandbox / with privileged authority like accessing filesystems etc) to limit damages. This 2 processes communicate with each other with an IPC (inter process communication)
The browser have a single browser process, while each tab will run its own rendering process (parallel processing benefits, yay!). So if some tab crashes / have bugs / compromised, the dama ge that can be done is limited.
Browser Process
- Location bar
- Cookie, history, password db
- Network stack, SSL / TLS
- Download manager, clipboard, window management, disk cache
Renderer Process
- HTML/CSS/XML parsing
- Image decoding / SVG
- rendering
- regex
- layout
- JS interpreting
Lecture 3 (Session Attacks)
- To store state, we make use of sessions in the server, because HTTP request is stateless
- The safe way is to maintain a session table (sessionId -> username) in backend. Generate session using a big enough number randomly, so that it is almost impossible to tamper and get into others session
Cookies
Cookie Headers:
httpOnly
SameSite : ‘Strict’ | ‘Lax’ | ‘None’ -> to prevent CSRF (Cross Site Request Forgery)
‘Strict’ only allows cookies to be sent to same-origin -> safest, usually used for sensitive cookies / cookies with elevated rights
‘Lax’ allow cookies to be sent cross-origin only on top level navigation. Cookies will NOT be attached on (subsequent subresources such as images etc). -> Usage: To save loggedInStatus to websites like facebook.com (user experience), so that when we navigate to facebook.com from the url, it knows that we have logged in before. This header can also be used to store cookies with non-sensitive information / authority to modify
‘None’ -> can only be used over HTTPS -> specify ‘Secure’ when setting cookie. Cookies will be attached on cross origin requests. Usage: Who will use this if its not secure? ans: tracking cookies
Secure
Max-Age or Expires -> Setting this to a past date will delete cookie
Ambient Authority
Definition: Browser ‘helpfully’ attach all site X cookies when we make an HTTP req to site X
CSRF (Cross-site Request Forgery)
Abusing ambient authority. Example: a hidden form in attacker.com which will transfer money. If you are still logged into your bank’s account, and the bank’s cookie are not secure, you can automatically transfer money everytime you visit attacker.com because it is attached to the POST request sent to your bank’s API
Lecture 4 (Same Origin Policy)
Same origin policy = same (protocol, hostname, port) tuple
- a subdomain can access its parent site’s cookies
- two different subdomains of a same site is mutually exclusive
- cannot ‘fetch’ cross origin -> due to CORS policy
Relaxing SOP
Sometime the SOP are too strict. e.g. we want axess.stanford.edu to talk with login.stanford.edu to get who’s logged in.
Two examples:
- Fragment Identifier Communication
- parent site add fragment identifier to the embedded source
- the embedded site queries for fragment identifiers
postMessage
API- What it is? A secure cross-origin communications between cooperation origins
- Specify intended recipient: e.g. window.parent.postMessage(data, ‘https://axess.stanford.edu’
- Validate expected sender: check event.origin to be the same as your intended origin messages Add
if(event.origin !== 'target origin') return;
Potential pitfall to take note
- pages can embed images, styles, scripts from other origins
- pages can submit forms to other origins (backward compatibility)
- pages can embed other page and read their cookies (e.g cookies set on cs106a.stanford.edu can be read by cs253.stanford.edu that iframes cs106a.stanford.edu by calling iframe.contentDocument.cookie)
Lecture 5 (Exceptions on SOP)
Can we prevent a site from linking to our site?
- Cannot prevent linkage, but can reject certain request (say from competitor.com) by looking to the ‘Referer’ header.
- Related: Referrer-Policy HTTP header
- Example : Google Docs do not want to send full url
Can we prevent other side from embedding our site?
- Why do this? -> prevent clickjacking
- Solution: X-Frame-Options HTTP header
- Related: Content-Secure-Policy
Can we prevent other site from submitting forms to our site?
- Why do this? -> prevent CSRF
- Solution: Detect ‘Origin’ header (auto added by browser for CORS requests), use an allowlist (prevent form submission) or SameSite cookies (not attaching cookies)
Can we prevent other site from embedding images / scripts that are hosted on our site?
- Why do this?
- prevent hotlinking (steals bandwidth)
- prevent user’s logged in avatar from showing up on other site
- Solution: Detect ‘Referer’ header or SameSite cookies
Lecture 6 (XSS) = Code Injection.
- Interesting story: Samy worm
- BIG IDEA: Never trust client data!
Common elements defense:
- HTML elements: Change all
<
to<
and&
to&
- HTML attributes: Change all
'
and"
to'
and"
; - Always quote HTML attributes
“data: " or “javascript: "
- Text editor in the browser: Type in
data:text/html,<html contenteditable></html>
in the navigation bar - You can run javascript on the context of the page you are visiting by prefixing with “javascript:”
"data: "
can be used for loading small images inline<img src='data:image/png;base64,iVBORw0KGgoAAA....' />
- DEFENSE : Always validate that USER_DATA is a valid URL and do not contain quotes before it is put into src or href attributes
on* attributes
- USER_DATA can close the function early and add their own code. e.g. ); alert(document.cookie
id attributes
- The DOM basically create a new JS global variable of that id.
Script elements
- Avoid using backslash escape
- Fix 1: Hex encode user data to produce a string with characters 0-9, A-F, include inside javascript string, decode hex string
- Fix 2: use a
<template>
tag to store data that won’t visibly render, then use escape rules for HTML elements
Lecture 7 (XSS Defense)
- Escape on the way out (when rendering) because we don’t know the context that the data will be used on the way in
- XSS Auditor on Chrome? Why deprecate? can be used to snipe validation scripts, can be used to bruteforce and learn what is in the page… also still many ways to bypass XSS.
Content Security Policy
- Inverse of SOP. SOP limits which page can post forms/ load images / scripts/ style from our site. CSP limits which server can our site talk to.
- Goal: only takes content from our site
- no inline js or html attribute js is allowed
- question: how to test our policy? (read more on: base-uri)
- BIG PROBLEM: CSP breaks site. How to resolve? can whitelist sites, but what if our trusted sites call other sites, then we should still include that in the whitelist. Then we must use ‘unsafe-inline’ -> defeating CSP purpose
- read more on ‘strict-dynamic’ -> idea: use a unique identifier called ‘nonce’ that change every page load. If use this dont cache html
- Reasonable “starter” CSP header (from Feross)
Content-Security-Policy:
default-src 'self' data:;
img-src *;
object-src 'none';
script-src 'strict-dynamic' 'nonce-NONCE_GOES_HERE' * 'unsafe-inline';
style-src 'self' 'unsafe-inline';
base-uri 'none';
frame-ancestors 'none';
form-action 'self';
- use
textContent
overinnerHTML
to prevent DOM-based XSS. Look forward for ‘Trusted Types’, factory methods to create HTML
Lecture 8 (Fingerprinting and Privacy on the Web - Pete Snyder from Brave)
Topics Discussed:
- Why websites track?
- “Classic tracking” -> use cookie (originally used for authentication purposes)
- “Fingerprinting” -> uniquely identify you from the parameters. Example: height / width, browsers used(user agent string), ads blocker used, installed fonts, canvas / webgl (hardware: graphics card, color depth make subtly different), other WebAPIs (WebRTC, Device Memory, Web Audio etc.) identifying hardwares (number of cores, audio channels, shaders, device memory, network etc).
- Test your browser fingerprinting depth
- Search for fingerprint2.js (open source fingerprinting library)
- Possible Defenses: -> remove functionality, consistency (all browser say pretty much the same thing), restrict access, noise (injecting unique data everytime / steganography),
Lecture 9 (DOS, Phishing)
You should actually watch this lecture yourself. It was super fun.
UI Attacks
- TabNabbing -> opened site (in new tab) messes up with the tab that opened it and directed it elsewhere (can be used for phishing)
- Solution to tabnabbing: use rel=“noopener” if you use target="_blank”
Phishing
- Beware of site urls that look the same but has different hexcode representation. They are visually undistinguishable. Uh oh. (IDN homograph attack)
- Picture in picture attacks
- Defenses: password manager / hardware security key
Side Channel Attacks
- example: CSS history leak (whether you have visited that link before), Learning if you are logged in or not based on the width of some buttons :))
- Sensor data leak from web APIs that do not require user interaction to activate such as ambient-light APIs