mozilla-services/shavar-prod-lists

Name: shavar-prod-lists

Owner: Mozilla Services

Description: Shavar/tracking protection lists used in prod

Created: 2015-08-20 21:42:18.0

Updated: 2018-04-25 11:28:42.0

Pushed: 2018-05-18 20:10:16.0

Homepage:

Size: 203

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

shavar-prod-lists

This repo serves as a staging area for shavar/tracking protection lists prior to production deployment to Firefox. This repo gives Mozilla a chance to manually review all updates before they go live, a fail-safe to prevent accidental deployment of a list that could break Firefox.

Lists

These two JSON files power Tracking Protection in Firefox.

These lists are processed and transformed and sent to Firefox via Shavar.

Blacklist

The blacklist is the core of tracking protection in Firefox. Firefox 42 ships a single processed version of the blacklist, and that list excludes the “Content” category URLs. This is the “Basic protection” list. Firefox 43 adds a second “Strict protection” list which includes the “Content” category URLs for blocking.

A vestige of the list is the “Disconnect” category, which contains Facebook, Twitter, and Google domains. We re-map the Facebook and Twitter domains to the Social category, per Disconnect guidance. The google_mapping.json file is used to remap the individual Google domains to their respective categories. This remapping is temporary while until the list is updated to fix these issues.

Entity list

Tracking protection technically works by blocking loads from blocked domains. But the Entity List conceptually changes it, so that it is no longer about domains, but about the companies. If you are visiting a website, engaged 1-on-1 with them, Tracking Protection will block the other companies who the user may not realize are even present and didn't explicitly intend to interact with.

Tracking Protection blocks loads on the blacklist when they are third-party. The Entity list whitelists different domains that are wholly owned by the same company. For example, if abcd.com owns efgh.com and efgh.com is on the blacklist, it will not be blocked on abcd.com. Instead, efgh.com will be treated as first-party on abcd.com, since the same company owns both. But since efgh.com is on the blacklist it will be blocked on other third-party domains that are not all owned by the same parent company.

Updating

This repo is configured with Travis CI builds that run the scripts/json_verify.py script to verify all pull request changes to the list are valid.

This Travis CI status check must pass before any commit can be merged or pushed to master.

Making changes to the format

When making changes to the list formats, corresponding changes to the scripts/json_verify.py script must also be made.

To help validate the validator (such meta!), use the list fixtures in the tests directory. Run the script against a specific file like this:

ripts/json_verify.py -f <filename>
scripts/json_verify.py -f tests/disconnect_blacklist_valid.json

s/disconnect_blacklist_valid.json : valid

scripts/json_verify.py -f tests/disconnect_blacklist_invalid.json

s/disconnect_blacklist_invalid.json : invalid
book has bad DNT value: bogus

scripts/json_verify.py -f tests/google_mapping_invalid.json

s/google_mapping_invalid.json : invalid
r: Expecting property name: line 1  column 2 (char 1)

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.