What A Website Knows About You

All the cool web sites you visit need to know the properties of your “user agent” (in today’s environment it is your web browser) in order to optimize the views and transactions of the pages you visit on your device. This information (browser and version, operating system and version, etc.) is provided to the web site in the user agent header field. This information is important to the web site application developers in order to enhance the end user experience — but like a lot of conveniences, this type of information about your system can be a privacy concern.

As it turns out, your browser and associated add-ons can provide a lot of information to the web site that wants to request and use it. Here are some of the data that can be acquired during your visit to a web site:

  • Browser & version
  • Operating system and version
  • Device manufacturer and model of your device
  • Screen resolution
  • System fonts
  • Current viewport (size of the useful browser window)
  • Color depth (bits per pixel)
  • JavaScript, Java, Flash, Silverlight, etc (available)
  • Cookies (available)
  • Super cookies (available)
  • IP Address
  • Geo-location (latitude, longitude)
  • Language (language identification code)
  • Time zone
  • Facebook, Twitter, Google+, Google login status
  • Ad blockers (active)
  • Connection speed (measured on the server)
  • MAC address (physical network device address)
  • Plugins (a list of installed plugins)
  • Proxy (is a proxy being used?)

Because mobile device browsers have far fewer capabilities than desktop versions and because the mobile device manufacturers learned about of the security and privacy problems that desktop systems suffered, there is less information available to sites about mobile systems. But that does not stop a lot of the stakeholders from trying to traffic your visits (see UHID notes below).

If you would like to see what data your system is potentially exposing to sites, you can visit BrowserSpy or the Electronic Frontier Foundation’s Panopticlick website.

It is pretty clear that if you combine information such as IP address, geo-location, device model, language code, time zone and MAC address, that web sites can readily identify your device uniquely. This is called a digital fingerprint and we will discuss this further in the article.

Cookies

In addition to the above information, a browser can store a number of cookies. Cookies are data records written by a website to the browser to identify you to the site during and after your visit. Cookies have practical uses. Because web sessions cannot easily maintain the “state” or step within a transaction like a purchase, they need to use a cookie to keep track of your activity on a site. In a banking transaction, the site needs cookies to keep a status of where you are in a workflow process. This is why when you disable cookies on the browser, you cannot use applications like banking or shopping. Cookies used for authentication and banking can be encrypted and signed by the web site so that they cannot be abused for improper purposes.

Also, of course, businesses use cookies for advertising purposes to track your visits to its sites and serve up specific advertising and choices. Due to this approach, cookies can often hold information you would like to keep private.

Private Browsing

Most browsers now have an option for private browsing. The browser folks make a big deal about this but the only thing that is different is that your browser won’t store data such as cookies and the browsing history is cleared once the private browsing session has ended.

Private browsing does not limit all the above information we listed that can be collected — except for cookies. So, if sites are using digital fingerprinting, they are not affected by private browsing.

Super Cookies

Because advertisers and some others still want to track visitors who use private browsing, they have been skirting around the browsers’ privacy settings since their inceptions using what has been termed as “super cookies”. These mechanisms are not coded as cookies but have the same objective to track the users’ activity on sites.

The first versions of super cookies stored their tracking information outside the browser cookie storage locations on the user’s system. This put the tracking data outside the control of the browser. As an example, Adobe Flash in 2009 had the capability to store large and comprehensive user behavior data (Local Shared Objects) that persists in the Flash plug-in. These super cookies were stored purposely in areas that are difficult to track and destroy. This was appealing to marketers because normal users did not have the technical skills to use utilities to remove them. In 2010 Adobe released a version of Flash that supported the privacy modes of IE, Firefox, Chrome, and Safari. The LSOs created in privacy mode are discarded at the end of the session. Those created in a regular session are also not accessible in privacy mode. Some games use these LSOs to store your game info, so disabling them entirely is not an option for some users. There is a Firefox add-on BetterPrivacy-signed that tracks and selectively manages these types of super cookie.

Perma-Cookies (aka Unique Identifier Header)

More recently in 2014 some mobile carriers like Verizon were detected by the EFF using a HTTP header field to store a unique identifier header (UIDH) which contained information about the device as a Verizon mobile customer. The Verizon network would add this field to mobile internet traffic once it left the mobile device and so users could not stop its use. Although Verizon claimed the data was not intrinsically personal identifiable information (PII), it certainly provided information about the device and the data plan being used, so that advertisers could use this information. In 2016 the FCC fined Verizon for not advising customers about the injection of customer data in the UIDH and there is a process to opt out of this scheme by Verizon.

Under some UIDH identifier aging periods, the use of UIDHs mapped to browser cookies (even with private browsing) allows sites to be able to “resurrect” your cookies and detect your return to their sites. This is why UIDH is a personal privacy concern.

The only way to avoid this type of tracking scheme is to use a VPN.

Resurrecting Cookies

We mentioned fingerprinting above as a methodology of using various parameters that can be extracted from a device that is visiting a site and using those parameters to create unique fingerprints of your device. Security companies develop these techniques and supply them to credit card processors, banks or retailers to detect online fraud. These software packages have their own algorithms for creating the unique fingerprints.

In the majority of cases, when you visit a retailer and want to use your credit card, your system will eventually be connected to a system that will fingerprint your device. They will match your card identity with the fingerprints of devices you normally use. If you are using a different device or you are on a new network, the system may raise the risk of your transaction. This risk factor may trigger the requirement for additional authentication factors such as the requirement to enter a PIN or a code sent to you via SMS. These systems will also track fingerprints of systems used in fraud and attempted fraud transactions. For example, if they see a device with the same fingerprint doing several transactions on different cards in short periods of time, they will raise an alert.

This same technology can be used by web sites to “rebuild” a cookie when you visit the site without any old cookies. They fingerprint your device, send a cookie and keep the cookie on their site. When you visit again, they can fingerprint you and extract your previous cookie from their database to be used while on their site. The only way to avoid these zombie cookies is to change the parameters that the sites use to create the fingerprint enough so that your system looks different. This can be difficult because these algorithms can use data analytics tools that can identify you with some high degree of confidence even if a couple items like an IP address has changed. This is equivalent to facial recognition systems that can recognize your face in profile even if they used a full face shot to begin within. Changing some the parameters reduces the confidence but it is not a full disguise.

One approach to avoid this tracking approach is to not allow JavaScript to run on your machine. JavaScript is often used to collect the fingerprinting parameters. Disabling JavaScript is a tough choice since it is used a lot by legitimate sites. The NoScript Firefox extension allows you to selectively disable and enable JavaScript, Java and plugin execution for sites that you trust. Also the Tor Browser includes NoScript and a number of other extensions that deal with this type of potential active content.

Keep Current

The search for internet privacy is an ongoing arms war between you and others who want to know what you are doing. It will not change in the near future. The EFF do a good job tracking the latest trends in this and this article from them provides some sample research techniques that this article touches upon, that are used by both ends of the privacy spectrum.

4 thoughts on “What A Website Knows About You”

  1. Hello
    Excuse me ask this in this page
    what is your opinion about Instapaper?
    i used that for bookmarking + Reading
    is another better privacy friendly website?
    Thank you very much 🙂

    1. It seems like a reasonable company, and I’ve used it in the past, but I won’t put anything confidential on Instapaper. I recommend Wallabag, which is an open source Instapaper replacement.

  2. Well, the webmasters undoubtedly know a lot when we visit their website. Private browsing is a lie to be frank. I mean, yes, it does help to browse sites without storing them in your browser history, but it doesn’t prevent webmasters from knowing those crucial details. But, that’s the way the web works and there’s very less we can do about it.

    1. I agree that this is tough work to maintain privacy but it is important not to think it is a losing proposition. the article explains that there are tools like add-ons to FF that reduce the capability of running Javascript on your browser unless you allow it for special sites. just this on-of-the-box utility can reduce the capabilities of digital fingerprinting a lot. using these supported tools make you more difficult to track, they are improved and they are transparent in their capabilities – or not.

Leave a Comment

Your email address will not be published. Required fields are marked *