Pandora’s Checkbox

The Information Age brought with it a cliché—that unread agreement you dismiss to get to the software you need to use. There’s no way you’re going to read it. For example, macOS High Sierra comes with a software license agreement totaling 535 pages in PDF form, which contain (by my count) 280,599 words of intensely detailed yet maddeningly vague legal language. On that operating system, Apple Music has another license, and the App Store has yet another, and so on.

It would take thousands of dollars in consulting fees with a lawyer to make a fully informed decision, or you can proceed regardless. So you proceed. You always have. Each little app, website, or gizmo peppers you with a new set of terms and conditions. Each upgrade gets a few extra clauses thrown in, and you agree again.

You’re not a fool. You assume you’re signing away rights and control you want. It comes in the bargain. You try to skim the terms and conditions, and this deal feels a bit more Faustian all the time—mandatory binding arbitration, data collection, disclaimers of liability, and so on.

None of this is really news to you if you’ve dug into it. You’re not really in possession of your software; you’ve merely licensed the use of it. You can’t really hold them responsible for flaws; you agreed to accept the software as is. You can’t really control what information they collect about you; you hand that over and get a free or discounted product in return.

However, where things get slippery is that a company with whom you’ve entered into a transaction has also signed agreements with yet other companies. Worked into those overwrought terms and conditions you clicked through, with their vague-yet-precise language, are ways of ensuring that you’ve already agreed to these subsequent proxy agreements as well.

What the T&C often allow is for your data to commingle at some broker whose name you’ve never heard of. A common situation in which this happens is when any entity responsible for handling money.

Say that you learn about a subscription service called Company A. You find them in your web browser or your mobile app, and you sign up, agreeing to their T&C. Then you ask to subscribe to a new e-mail about scarves every day, or whatever Company A does. They in turn ask for your credit card info, your billing address, and maybe a few other demographic details about you.

Company A turns to Company B to determine how risky you are. To do this, they ship off some information about you. If you used a mobile app, they’re possibly reading off what Wi-Fi networks are nearby, what Bluetooth devices are nearby, what apps are installed on your phone, what IP addresses you’re using, what fonts you have installed, and a wealth of other information. If you’ve used a browser, the information is similar but more limited. You’re being geographically located in either case. The headers from your browser are sent. The last website you were at before visiting Company A is probably sent.

Company B collects this information and compares it to all the other data it has on millions of other requests it’s collected from other companies. It has no real duty to sequester Company A’s data from Company Z (neither of which know anything about one another), and by putting it all together, it can detect patterns better. For example, it may have the ability to know where you are, even if you are behind a proxy. It may be able to track your traffic across the Internet as you move from Company A to Company Z and so on—because the number of details it gets are enough usually to uniquely identify you. It needs no cookies or other storage on your end for this.

This means that Company B has the role of an invisible data broker whose job it is to assess fraud risk on behalf of companies. The more clients it has feeding it data, the stronger its signals become, so Company B is incentivized to gather as many sources of data as possible, and it wants those data to be as rich and as frequently updated as possible.

Company A gets back something like a score from Company B indicating how much risk you pose—whether or not you’re likely to try to scam them out of free services (or if you’re even a human or not). Assuming you’re fine, then Company A sends your info off to Company C, a credit card processor who is the one actually responsible for charging you money and giving it back to Company A.

Company C is collecting data as well because they stand the greatest risk during this transaction. They collect data themselves, and they’re almost certainly using a data broker of some kind as well—either Company B or more likely something else, a Company D.

These interactions happen quite quickly and, usually, smoothly. In a few seconds, enough info about you to identify your browsing patterns and correlate you with your purchase of Scarf Facts has now been aggregated by one or two data brokers.

These brokers sell their services to companies hoping to prevent fraud, and they make money because they are able to draw from ever larger sources of traffic and gain a clearer picture of the Internet. You agreed to this, but I doubt it was clear to you that entities other than you and Company A were involved.

If you’re wondering whether or not this is really happening, this sort of collection has become increasingly common as businesses have tried to compete with one another by reducing friction around their sign-up processes. Simple CAPTCHAs have not been enough to hold back the tide of automated and human attempts to overwhelm large and small businesses attempting to sell services and goods online, and they have turned to data-based solutions to fight back. We can’t wind back the clock to a simpler time.

Unfortunately, most people are uninvolved and have become bycatch in the vast nets we’ve spun. It is likely, as time goes on, that the brokers who collect and analyze the data collected this way will try to sell them, or analyses of them, to profit in other ways. The value of these data increases as they become more representative of the traffic of the Internet as a whole.

I’m not asking you to stop and read the T&C on the next website you sign up for. That’s ever going to be practical. But now you know about another piece of your soul you’re possibly chipping off in return for clicking “Accept.”

Privacy Policy Updates: Data Storage

I updated WordPress today to version 4.9.6. I noticed this version comes with support for implementing privacy policies throughout the site. I seem to have been ahead of the curve in implementing my own, but when the GDPR in the EU comes into effect this month, it will clarify and simplify data privacy for much of Europe. This implies enforcement will become a more direct matter as well. Any web service accessible to Europe and which does business in Europe now has updated their privacy policies to ensure it complies with the GDPR—which is why everyone has gotten a raft of privacy policy updates.

Most of these privacy policy updates pertain to what rights customers or users have to their own data. Often, they grant new rights or clarify existing rights. This week’s new version of WordPress is yet another GDPR accommodation.

Today, I have to announce my own GDPR update. Yes, I’m just a tiny website no one reads, and I provide no actual services. But having already committed to a privacy policy, which I promised to keep up to date (and announce those changes), I’m here to make another update.

One nice thing that came with the the WordPress update is a raft of suggestions on a good privacy policy (and in what ways WordPress and its plugins may cause privacy concerns). I found that I had covered most of them, but one thing I needed to revisit was a piece of functionality in Wordfence.

I use Wordfence for security: It monitors malware probes and uses some static blacklists of known bad actors. It also, by default, sends cookies to browsers in order to track which users are recurring ones or which users are automated clients. The tracking consisted only of an anonymous, unique token which distinguished visitors from one another. Unfortunately, this functionality had no opt-out and did not respect Do Not Track.

Although my tracking was only for security purposes—not for advertising—and although did not store any personal information, nor did I share with anyone else, I realized I would have to disable it.

I had made explicit mention of this tracking in my previous revision of my privacy policy:

I run an extra plugin for security which tracks visits in the database for the website, but these are, again, stored locally, and no one has access to these.

This is unfortunately more vague than it should have been, since it doesn’t mention cookies. It also provides no provision for consent. It merely states the consequences of visiting my site.

The GDPR makes it clear that that all tracking techniques (and specifically cookies) require prior consent. Again, I’m not a company, and I don’t provide any service. I’m not even hosted in the EU’s jurisdiction. My goal, though, is to exist as harmoniously with my visitors as possible, whomever they may be, and have the lightest possible touch.

So I’ve disabled Wordfence’s cookie tracking. I’ve added a couple of points to my privacy policy which clarify more precisely which data is logged and under which circumstances cookies may be sent to the browser.

This interferes my analytics, unfortunately—it’s no longer possible to be sure which visitors are humans anymore. I think it’s worth it, regardless.

I also made a couple of other changes based on WordPress’s suggestions. I moved a few bullet points around to put some points closer together which feel more logically grouped. I also added a point which specifies which URL my site uses (meaning the policy would be void if viewed in an archived format, within a frame, or copied elsewhere).

Privacy Policy Update: No Mining

I got a weird spam e-mail overnight asking if I wanted to embed someone’s cryptocurrency miner into my website. They purport to be opt-in only, but all the other examples I’ve read about online up to now have been surreptitious, hijacking the browser for its own ends without asking. The end user only notices when their computer fans switch on or their computer gets too hot.

Such mining scripts have been strongly contentious in other websites. They exert excessive and unilateral control over the browser’s system. I certainly had such things in mind when I promised never to embed ads and the like in my website, but I had never spelled out that I had no intention of hijacking the browser for my own ends (ad or not).

This morning, I added a new point to my privacy policy.

  • This website does not load software in the user agent (your browser) which serves any purpose beyond displaying the website and its assets—meaning it does not use your browser to mine cryptocurrency, for example.

Most of my privacy policy describes what the website does without mentioning the browser. This point adds a clear expectation for browsers which visit.

I generalized the point a bit to include things which aren’t just cryptocurrency miners. It might be tempting to grab a few of my users’ cycles for SETI@home or the like, for example, but if a user wants to contribute to a project like that, they can do so themselves. I’ll have to rely on persuasive words to bring people around to a cause like that.

Adding a Privacy Policy

I’ve decided to give my website a privacy policy. It’s maybe more of a privacy promise.

It might sound strange to make a privacy policy for a website with which I don’t intend users to interact, but I’ve realized that even browsing news websites or social media has privacy implications for users who visit them. So I wanted to state what assurances users can have when visiting my website—and set a standard for myself to meet when I make modifications to my website.

Most of the points in it boil down to one thing—if you visit my site, that fact remains between you and my site. No one else will know—not Google, not Facebook, not your ISP, not the airplane WiFi you’re using, not some ad network.

I went to some trouble to make these assurances. For example, I had to create a WordPress child theme which prevents loading stylesheets associated with Google Fonts used by default. Then—since I still wanted to use some of those fonts—I needed to check the licensing on them, download them, convert them to a form I could host locally, and incorporate them into a stylesheet on my own server.

I also needed to audit the source code for all the WordPress plugins I use to see what requests they make, if any, to other parties (and I’ll have to repeat this process if I ever add a new plugin). This was more challenging than I realized.

I needed to ensure I had no malware present and that my website remain free of malware. I began with WordPress’s hardening guide. I found a very thorough plugin for comparing file versions against known-good versions (WordFence, which I found recommended in the hardening guide). I also made additional checks of file permissions, excised unused plugins, made sure all server software was up to date, and incorporated additional protections into the web server configuration to limit my attack surface.

Finally, I had to browse my website for a while using my local developer tools built into my browser, both to see if any requests went to a domain other than my own and to inspect what cookies, local storage, and session storage data were created. This turned up a plugin that brought in icons from a third party site, which I had to replace.

After all that, I feel sure I can make the assurances my privacy policy makes.

Minimizing Your Trust Footprint

I originally published the following last year as an article in The Recompiler, a magazine focused on technology through the lens of feminism. I present it here with as few modifications from its original publication as possible.


For everyone who chooses to engage with the Internet, it poses a conflict between convenience and control of our identities and of our data. However trivially we interact with online services—playing games, finding movies or music, connecting to others on social media—we leave identifying information behind, intentionally or not. In addition, we relinquish some or even all rights to our own creations when we offer our content to share with others, such as whenever we write on Medium.

Most of us give this incongruity some cursory thought—even if we don’t frame it as a conflict—such as by when we set our privacy settings on Facebook. With major data breaches (of identifying, health, financial, or personal info) and revelations of widespread, indiscriminate government surveillance in the news over the last few years, probably more of us are thinking about it these days. In some way or another, we all must face up to the issue.

At one extreme, it’s possible to embrace convenience completely. Doing so means handing over information about ourselves without regard for how it will be used or by whom. At the other extreme, there’s a Unabomber-like strategy of complete disconnection. This form of non-participation comes along with considerable economic and social disenfranchisement.

The rest of us stride a line between, maybe hewing nearer to one extreme or another as our circumstances allow. This includes me—and as time passes, I am usually trying to exert more control over my online life, but I still trade off for convenience or access. I use an idea I call my trust footprint to make this decision on a case-by-case basis.

For example, I realized I began to distrust Google because the core of their business model is based on advertising. I wrote a short post on my personal website about my motives and process, but to sum up, I didn’t want to be beholden to a collection of services that made no promises about my privacy or their functionality or availability in the future. I felt powerless using Google, and I knew this wouldn’t change because they have built their empire on advertising, a business model which puts the customers’ privacy and autonomy at odds with their success.

Before I began to distrust Google, I didn’t give my online privacy or autonomy as much thought as I do today. When I began getting rid of my Google account and trying to find ways to replace its functionality, I had to examine my motives, in order to clarify the intangible problem Google posed for me.

I concluded that companies which derive their income from advertising necessarily pit themselves adversarially against their customers in a zero-sum game to control those customers’ personal information. So I try to avoid companies whose success is based on selling the customer instead of a product.

Facebook, as another example, needs to learn more about their users and the connections between them in order to charge advertisers more and, in turn, increase revenue. To do so, they encourage staying in their ecosystem with games and attempt to increase connections among users with suggestions and groups. As noted in this story about Facebook by The Consumerist last year:

Targeted ads are about being able to charge a premium to advertisers who want to know exactly who they’re reaching. Unfortunately, in order to do so, Facebook has to compromise the privacy of its hundreds of millions of users.

Most social networks engage in similar practices, like Twitter.

Consequently, my first consideration when gauging my trust footprint is to ask who benefits from my business: What motivates them to engage with users, and what will motivate them in the future? This includes thinking about the business model under which online services I choose operate—to the extent this information is available and accurate, of course.

Of course, this information often isn’t clear, up front, available, or permanent, so it’s really a lot of guessing. The “trust” part is quite literal—I don’t actually know what’s going to happen or if my information will eventually be leaked, abused, or sold. Some reading and research can inform my guesses, but they remain guesses. I don’t trust blindly, but it is still something of an act of faith.

It’s for that reason my goal isn’t to completely avoid online services or only use those who are fully and radically transparent. I only want to minimize the risk I take with my information, to reduce the scale of the information I provide, and to limit my exposure to events I can’t control.

The second consideration I make in keeping my trust footprint in check is to question whether a decision I make actually enlarges it. For instance, when I needed a new calendaring service after leaving Google, I realized that I could use iCloud to house and sync my information because I had already exposed personal information to iCloud. I didn’t have to sign up for a new account anywhere, so my trust footprint wasn’t affected.

The tricky part about that last consideration is that online services have tendrils that themselves creep into yet more services. In the case of Dropbox, which provides file storage and synchronization, they essentially resell Amazon’s Simple Storage Service (AWS S3), so if you don’t trust Amazon or otherwise wish to boycott them, then avoiding Dropbox comes along in the bargain. The same goes for a raft of other services, like Netflix and Reddit, who all use Amazon Web Services to drive their technology.

That means it’s not just home users who are storing their backups and music on servers they don’t control. Whether you call it software-as-a-service or just the “cloud,” services have become interconnected in increasingly techological and political ways.

It doesn’t end with only outsourcing the services themselves. All these online activities generate vast amounts of data which must be refined into information—for which there is copious value, even for things as innocuous as who’s watching what on TV. Nielsen’s business model of asking what customers are watching has already become outdated. Nowadays, the media companies know what you watch; the box you used to get the content has dutifully reported it back, and in turn, they’ve handed that data over to another company altogether to mine it for useful information. This sort of media analytics has become an industry in its own right.

As time passes, it will become harder to avoid interacting with unknown services. Economies of scale have caused tech stacks to trend more and more toward centralization. It makes sense for companies because, if Amazon controls all their storage, as an example, then storage becomes wholly Amazon’s problem, and they can offer it even more cheaply than companies which go out and build their own reliable storage.

Centralization doesn’t have to be bad, of course. It’s enabled companies to spring up which may not have been viable in the past. For example, Simple is an online bank which started from the realization that to get started with an entirely new online bank, “pretty much all you need is a license from the Fed and a few computers.”

The upshot is that the process of managing your online life to be entirely within your control becomes increasingly fraught as centralization proceeds. When you back up to “the cloud,” try to imagine whether your information is sitting on a hard disk drive in northern Virginia, or maybe a high-density tape in the Oregon countryside.

It’s not even necessary to go online yourself to interact with these business-to-business services. Small businesses have always relied upon vendors for components of their business they simply can’t provide on their own, and those vendors have learned they can resell other bulk services in turn. The next time you see the doctor, ask yourself, into which CRM system did your doctor just input your health information? Where did the CRM store that information? Maybe in some cosmic coincidence, it’s sitting alongside your backups on the same disk somewhere in a warehouse. Probably not, but it could happen.

My trust footprint, just like my carbon footprint, is a fuzzy but useful idea for me, which acknowledges that participation in the online world carries inevitable risk—or at least an inevitable cost. It helps me gauge whether I’m closer or further away from my ideal privacy goals. And just the same way that we can’t all become carbon neutral overnight without destroying the global economy, it’s not practical to run around telling everyone to unplug or boycott all online services.

Next time you’re filling out yet another form online, opening yet another service, trying out one more new thing, remember that you’re also relinquishing a little control over what you create and even a small part of who you are. And if this thought at all gives you pause, see if there’s anything you can do to reduce your trust footprint a little. Maybe you can look into hosting your own blog for your writing, getting network-attached storage for your home instead of using a cloud service, limiting what you disclose on social media, or investing in technology that takes privacy seriously.