Reference: G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens and B. Preneel. FPDetective: Dusting the Web for Fingerprinters. In Proceedings of CCS 2013, Nov. 2013.
Visit FPDetective at GitHub for source and releases
FPDetective is designed as a flexible, general purpose framework that can be used to conduct large scale web privacy studies. The framework is developed using Python, C++(browser modifications), JavaScript and MySQL programming/scripting languages.
Crawler: The crawler features two instrumented browsers, PhantomJS and Chromium. CasperJS and Selenium were used to drive the browsers to websites and navigate through the pages. To build instrumented versions of the browsers, we modified parts of the WebKit source code, which was the rendering engine used by both Chromium and PhantomJS.
Parser: The parser is used to extract relevant data from the logs generated by the crawler, and to store them in the database. It also tags sites with a label if a known fingerprinting script is found in the HTTP requests made for this visit.
Intercepting Proxy: In order to obtain Flash files for static analysis, we redirected traffic through mitmproxy an SSL-capable intercepting proxy. We used the mitmdump module to log all the HTTP traffic passing through the proxy, and the libmproxy library to parse and extract Flash files based on content sniffing.
Decompiler: We used the JPEXS Free Flash Decompiler to decompile
Flash files and obtain the ActionScript source code. The source code is then searched for fingerprinting related
function calls (e.g. enumerateFonts and getFontList to obtain a binary occurrence vector.
See Appendix B, in the paper, for full set of methods and properties searched in the decompiled source code.
Central Database: We ran crawls using several machines, but used a central database to store, combine, and analyze the results of different crawls with minimal effort. The stored data include the set of JavaScript function calls, the list of HTTP requests and responses, and the list of loaded or requested fonts. For the Flash experiments, we also stored a binary vector that represents the occurence of ActionScript API calls that might be related to fingerprinting.
Here we present a summary of results, please consult the paper for the details.
Table: Prevalence of Fingerprinting with JavaScript Based Font Probing on Top 1M Alexa sites
With FPDetective we found 404 sites in the Alexa top million pages that fingerprint visitors on their homepages using JavaScript-based font probing. These scripts are served by 13 different fingerprinting providers, of which only one had been identified in prior research.
Table: Flash Fingerprinting objects with font enumeration, found on Top 10K Alexa websites
Flash-based fingerprinting was present the homepages of 145 out of the top 10,000 sites, indicating that Flash-based fingerprinting is more prevalent. This is possibly because of its extended capabilities for font enumeration, proxy detection and its widespread browser support. Please note that, the table only includes Flash fingerprinters that use font enumeration (95 of them).
We found out that the local fonts loaded by @font-face CSS rules are exempted from the Tor Browser's font-per-document cap, and that it is possible to load an unlimited number of system fonts using the local() value of the @font-face rule's src descriptor.
Visit http://jsfiddle.net/C4t7w/13/ for demo and explanation.Firegloves is a proof-of-concept browser extension for Mozilla Firefox that was created for research purposes. In order to confuse fingerprinting scripts, Firegloves returns
randomized values when queried for certain attributes, limits the number of fonts that a single browser tab can load and reports false dimension values for the offsetWidth and
offsetHeight properties of HTML elements to evade JavaScript-based font detection.
We set the Do-Not-Track header to 1 in the PhantomJS browser and visited the websites identified as performing JavaScript based fingerprinting in our previous experiments. For all of these pages, we obtained the same results, showing a complete disregard towards Do-Not-Track
This study was performed by KU Leuven researchers from the iMinds security department: Gunes Acar (COSIC), Marc Juarez (COSIC & IIIA-CSIC), Nick Nikiforakis (DistriNet), Claudia Diaz (COSIC), Seda Gurses (COSIC & NYU), Frank Piessens (Distrinet), Bart Preneel (COSIC).