Last Updated:
Google logo against a backdrop of filing cabinets

This item has been flagged as inappropriate - What google reads in your private docs

David Rutland
David Rutland Privacy

Yes, I know. If you’re using Google - or any cloud provider, you can’t assume that anything is private at all. The cloud is just someone else’s computer and if you’re not paying for it, you’re the product - you’ve heard it all before and I’m sure you’re as tired of the old truisms as I am.

But they’re truisms for a reason, and it’s accepted as fact that Big G mines everything for clues as to your sexual orientation, pet ownership status, and how often you brush your teeth. They do this so they can aim advertising at you and sell the data to other companies to sell you advertising.

But there’s an expectation of privacy, regardless. Especially if you’re using it for work-related purposes. I know it doesn’t make sense, but literally billions of people use Google services. They email their friends, relations, and co-workers using Gmail or GSuite; They write novels, formulate plans; share works-in-progress and conduct business using Google’s (frankly excellent) online office suite.

Because it works so very very well, it’s easy to ignore that all Google products exist for the sole purpose of mining information.

Until they remind you.

Content scanning is business as usual

A pair of scary red eyes

One of this author's part time freelance gigs is writing for a VPN affiliate company masquerading as a privacy website. I’m not proud of it, but it helps pay for my financially crippling shortbread addiction, and keeps the Crow-mobile’s wheels from falling off.

While most of the content is SEO buffed VPN articles that are refined for any possible VPN-related query (eg: How can I watch Venezuelan league matches in Arkansas), I’m not involved in that part of the business - I do genuine(ish) articles on privacy, encryption, and security.

The article involved was about PayPal scams, and, involved among things, phishing emails - the kind which look like this.

A made up scam email from Amazon

This article is part of a team effort, and needs to be shared in order to further optimise SEO, do layout, formatting and inject affiliate links.

It’s a legit and useful guide, so imagine my surprise when I was greeted with this warning:

A message stating This item has been flagged as inappropriate and can no longer be shared

This item has been flagged as inappropriate and can no longer be shared ? what the hell does that mean?

Colleagues who needed to access the document to add their own ha’pennyworth saw this instead.

A notice from google denying access to the document

I requested a review and access was restored within a few hours.

Targeting speech as well as ads

The language I used for the phishing email was flagged up as dangerous by an algorithm. Which is weird, because I wrote the text for the supposedly phony email myself. To the best of my knowledge, it doesn’t actually exist in the wild, it’s new.

It doesn’t take a huge leap of imagination to realise this means that Google must be doing some kind of AI analysis and the AI grabbed the wrong end of the stick. It’s pretty freaking scary.

Again, there’s an assumption from users that what they write is, apart from data mining for ads and tracking, to a certain extent, private.

Writing about phishing emails isn’t illegal, sending phishing emails isn’t in itself illegal - although it doesn’t take much for it to become illegal in most places.

Google is blocking me from sharing my perfectly legal content. Worse, it was actually looking through my perfectly legal content - for purposes other than spamming me with crappy ads (which I block with PiHole and uBlock Origin) and renting out my data to shady companies.

It was a side effect of Google scanning every single document on every single Google account for content that Google deemed dangerous or contrary to the public good.

There is a huge difference between scanning content for advertising and scanning content for… well… content.

The rules can change arbitrarily

OK. So if I had been preparing a phishing email to send out, that would have been pretty dodgy. Phishing undermines confidence in Google’s services and harms their ad revenue - especially since Gmail’s spam filters are notoriously awful.

So it’s reasonable, from a business point of view, for them to prevent me using their software to create phishing emails (which, again, I was not doing).

But Google has been invading people’s privacy in the name of public interest for a while now. Who can forget that the company removed users’ access to their personal copy of Judy Mikovits’ ‘Plandemic’ which was stored in their personal drives.

Yes, it’s a shitty video which probably was the indirect cause of thousands of deaths, but it belonged to the users whose drive it was in. There was no warning.

It’s a well known fact that Google will also pass on private documents and emails to law enforcement agencies who ask nicely, and sometimes, they will proactively report users to authorities - even if they don’t ban the user.

Again, these tend to be shitty people, but not cool, Google.

Don’t store anything even remotely sensitive on Google servers.

Back in the late 90’s The Crow used to have a copy of Jolly Roger’s Cookbook on floppy disk. I used it as a handy reference and used to joyfully blow stuff up in my back garden. That particular reference work is illegal to possess in the UK under Section 58 of the Terrorism Act 2000. It’s counted as ‘information likely to be useful to a terrorist.’

I don’t have a copy now, and the Cookbook link above won’t take you through to the free electronic version, because anyone in the UK following that link would be committing an offence under the same act. You’d be arrested, and I’d feel bad about it.

In the event I wanted to store a personal copy on Google Drive, there’s a pretty good chance that Big G would rat me out to the local fuzz.

Google operates worldwide, and things which are completely lawful in one place, are often illegal elsewhere. Consider what Google may be willing to tell authorities in China, Russia, Saudi Arabia, Italy, or indeed, the UK.

I wouldn’t even think about organising or planning demonstrations or protests using any Google services.

armed police at a riot. There is fire

Don’t Use Google Docs. Use these alternatives

I only use Google Drive / Docs to deal with one client. For everything else I use either software on my own teeny tiny laptop, or more often, self-hosted software running on the Raspberry Pi 4B, which is located on the radiator behind my couch.

Whether you’re a business which requires the ability to share and edit documents between teams members, or an individual who values their privacy, there are solutions which do not require you to share all of your information with Google, Microsoft, or anyone else.

There are others out there, but these are ones I have used and would recommend - mainly because they both run flawlessly on my Raspberry Pi home server set-up.

Nextcloud with Collabora

Nextcloud is probably the first thing you install on your self-hosted box. It’s pretty easy to do (although there are pitfalls) and I’ll probably get around to writing my own step by step guide at some point. In the meantime, just ̶G̶o̶o̶g̶l̶e̶ search for it.

Nextcloud is Free and Open Source Software, and its core functionality is as a file sync and storage solution similar to Dropbox or Google Drive, but there’s so much more to it. Extra features can be added by installing apps from the Nextcloud App store. These include things like the Cookbook Recipe Manager, Online Radio, Music Players, text editors, video conferencing and, of course, a handful of office suites.

OnlyOffice integration doesn’t work on the Raspberry Pi, but Collabora does.

To use it on a Raspberry Pi, you’ll need the ARM64 version of the CODE document server and the Collabora online app itself.

This very document open in Collabora online on NextCloud

It works well for me - as you can see in this screenshot, but if you’re going to have more than a handful of users, you probably need something with a little more horsepower than a humble PI. Intel Nucs are pretty good low power, high performance machines, but they are a bit pricier than a humble Pi.

I spun up Cryptpad in a docker container before ARM support for Collabora was available - it’s easier to set up than an entire Nextcloud instance, and it’s a little more responsive.

Cryptpad

Cryptpad is also free, open source and nearly as fully featured as an full-on office suite. It can be deployed in minutes and allows anonymous collaborative creation and editing of documents, spreadsheets, presentations, and polls – all running on your own Pi based server in your own house. All documents are encrypted, and cannot be accessed or read by anyone without authorisation.

I ditched it as soon as I was able, because using any of its features will result in incessant nagging for users to create accounts - so anonymous sharing was a painful experience. These intrusions can be turned off through configuring some server settings with a text editor, but if you do intend to limit access using accounts, the credentials are stored on and authenticated by Cryptpad servers, rather than your own. You don’t have full control, and if the Cryptpad servers go offline for any reason, you’ve lost access to everything.

A blank document open in the Cryptpad editoe

Etherpad

Etherpad is the bedrock on which Google docs was built, and in their magnanimity Google has left the software open source under an Apache License. It was later forked as Etherpad lite. Etherpad lite is basic, but supports images, links and rich text. Access to pad creation and editing is controlled through APIs, and it is super easy to use and share.

And if you can’t self-host at home…

Self hosting is fun and a bit of a hobby in The Crow’s nest, but it can be a complete ballache sometimes - case in point, the patch cable linking the Pi to my router failed for no apparent reason 30 minutes after posting this article.

If you don’t want to have the hardware in your house, along with the backup and bandwidth responsibilities, there are a couple of options available to you.

Rent a Virtual Private Server

By renting a VPS, you have complete control over the software deployed on it, without the hassle of maintenance. You can install any amount of web facing software including the options mentioned above. Bluehost will pay me a trifling sum if you buy hosting through this link.

Dedicated Nextcloud Hosting

A number of VPS providers have teamed up with Nextcloud to provide dedicated hosting for Nextcloud instances with any number of users. Check out this list for more deets.

Dedicated Nextcloud Hosting is the simplest option for deploying for large organisations or if you simply don’t have the time, expertise, or inclination to do it yourself. I have no affiliation with any of these Nextcloud hosts.

TL;DR

Google spies on you for reasons other than advertising. Ditch Big G and use something else.