95 - We don't know what people do with our data

March 29, 2020

I’ve had a busy week where I’ve spent a lot of time writing some data protection impact assessments and privacy policy type stuff. It’s felt a little like fiddling while rome burns to be honest.

One of the things that annoys me is that privacy policies are so rarely written in any depth that I need to make sensible decisions about the system. Software as a Service companies frequently have wide open privacy policies that actively confuse the privacy impacts of ad tracking on the site, my personal data as the administrator/staff user of the service, and personal data that might be uploaded as part of the use of the service.

For example, take Atlassian’s privacy policy for Trello (and other Atlassian products). This defines information they collect to include my details when I sign up, content I submit to the site, ad tracking cookies and analytics amongst others. The next section down says that it uses information it collects to provide the service, for research and development, to market the product and so on.

But there’s no mapping in there about whether there’s a difference between the sources of information and what they do with it. If I create a Trello board called “My Doctors Appointments” and start storing cards in there about various doctors appointments, would I expect that they could use that content for marketing of products? I don’t think anybody would think that was reasonable, and yet that is what the privacy policy says is acceptable.

How about sharing of the data? Clearly they need a legal basis for enabling the features of the application to let me add other third parties to a board, so they must be able to share the data with a third party. Atlassians privacy policy is actually one of the better ones here, in that they are quite specific about who they share with, but it’s still a little hard to squint and work out if it’s only shared with people I approve of, or whether it’s shared with anyone that Atlassian decides is a collaborator. Several other SaaS services I’ve reviewed recently have privacy policies that allow them to share your data with anyone else who uses the service.

The problem here is that we tend to lump data into a couple of fairly black and white buckets. Personal data and non-personal data. Privacy policies are written mostly to protect the company, which means that the privacy policy is written to allow as much as legally possible while sounding like they are safe guardians of our data. But we don’t really know what happens to our data.

I’ve seen people online asking this week whether it’s a breach of GDPR for the UK Government to send everyone a text message (it’s not for numerous reasons, too many to recount, I should write a blog post why), and I’ve seen the online debate about whether we should use mobile phone tracking data to track our movements during an outbreak like this.

I don’t have any good answers to this stuff because it’s nowhere nearly as simple as it’s all made out to be. What actually is our personal data, what do we mean when we say we want privacy, and how can it be used when there is a legitimate threat to life are all tough questions in this policy area.

If we are lucky, we’ll come out of this with a better, more nuanced view on what data we give up, whether it’s worth it and appropriate access for the state to that data. But given the current level of public discourse, I worry that we’ll end up with one side yelling about abuse of power and data, and the other side yelling about public health emergencies with little grey in between the two position.

The clocks change tonight for one of the last times ever. Who cares? | WIRED UK

https://www.wired.co.uk/article/clocks-change-uk-2020-daylight-saving-time

When you wake up on Sunday, you might not even realise it’s Sunday. Or the weekend. On Monday morning, when you stumble into your kitchen and see the oven clock reads 08:00, you can go back to bed for another hour. Except you can’t, because it’s actually 09:00. The oven clock hasn’t updated and you’re now late for work. Except you’re not, because you can work from bed. Result.
What’s the point of time, after all, when each day is an excruciating repeat of the one that preceded it, stuck on a loop of endless Zoom calls and Houseparty drinking sessions? Or to put it another way: the clocks changing means even less than it has before. At least the clocks jumping forward an hour takes us (artificially) one hour closer to the end of lockdown.
But, soon enough, that horological delight will be taken away from us. Soon, the clocks will stop changing once and for all. Last March, the European Parliament voted to scrap the twice-a-year change from either March or October 2021. At this point, member states will have to choose whether to remain on permanent summer time or permanent winter time.

GitHub - security-prince/Application-Security-Engineer-Interview-Questions: Some of the questions which i was asked when i was giving interviews for Application/Product Security roles. I am sure this is not an exhaustive list but i felt these questions were important to be asked and some were challenging to answer

https://github.com/security-prince/Application-Security-Engineer-Interview-Questions

Some of the questions/topics which i was asked when i was giving interviews for Application/Product Security Engineering roles. I am sure this is not an exhaustive list but i felt these questions were important to be asked and some were challenging to answer. I tried to include the reference resource for some of the questions/topics

This is a lovely set of security questions that I’d expect security engineers to be able to at least have a go at. There’s a lot of topics covered, from cross site scripting to DNS Exfiltration, File compression to file upload exploitation. I wouldn’t really expect very many people to be able to answer every single one of these authoritatively, but they make a good base for interview questions

China’s spy Wang Liqiang defects to Australia, offers ASIO trove of information on CCP espionage tactics

https://amp.theage.com.au/national/defecting-chinese-spy-offers-information-trove-to-australian-government-20191122-p53d1l.html?__twitter_impression=true

Mr Wang said he was part of an intelligence operation hidden within a Hong Kong-listed company, China Innovation Investment Limited (CIIL), which infiltrated Hong Kong’s universities and media with pro-Chinese Communist Party operatives who could be activated to counter the democracy movement. He says he had personal involvement in an October 2015 operation to kidnap and abduct to the Chinese mainland a Hong Kong bookseller, Lee Bo, and played a role in a clandestine organisation that also directed bashings or cyber attacks on Hong Kong dissidents.

This is an interesting insight into nation state espionage operations. How do they setup operatives in another country? What passports do they provide? How do they actually work?

I’m slightly suspicious of this story on the basis that it sounds like Mr Wang has gone to multiple media outlets to share this information while he is waiting for the Australian Intelligence and Security Organisation (AISO) to actually provide him protection.

Booz Allen analyzed 200+ Russian hacking operations to better understand their tactics | ZDNet

https://www.zdnet.com/article/booz-allen-analyzed-200-russian-hacking-operations-to-better-understand-their-tactics/

GRU ATTACKS CAN BE PREDICTED WITH RUSSIA’S MILITARY DOCTRINE
According to the Booz Allen report, the cyber operations conducted by both groups cannot be viewed in isolation. They are almost exclusively conducted in a broader political context.
The GRU being a military-run operation, all actions follow a set of patterns. Booz Allen says it analyzed more than 200 unique cyber incidents publicly attributed to the GRU and found that pattern.
According to the US intelligence contractor, that pattern perfectly fits the principles described in a Russian government document called “The Military Doctrine of the Russian Federation,” which the Russian Army publishes at regular intervals.
The last version of this document was published in 2014 and lists 23 security risks to the Russian Federation to which the Russian Army must reply in one form or fashion.

This is a good reminder that the GRU aren’t just cyber bogeymen who attack anything and everything. They are an organised military/state unit that has specific objectives in mind. If you aren’t in their sights, you should stop using them as cardboard cutout enemies and concentrate on realistic threat actors instead. I know that it’s exciting to be able able to claim that you are worried about foreign intelligence, but most of us would be far better off investing in basic cyber defences and worrying about realistic attackers.

You can sign up with Booz Allen Hamilton to read the full PDF if you want. It’s got a lot of detail in it, but does a good job of justifying the interesting headline.

The unreasonable importance of data preparation – O’Reilly

https://www.oreilly.com/radar/the-unreasonable-importance-of-data-preparation/

Collecting the right data requires a principled approach that is a function of your business question. Data collected for one purpose can have limited use for other questions. The assumed value of data is a myth leading to inflated valuations of start-ups capturing said data. John Myles White, data scientist and engineering manager at Facebook, wrote: “The biggest risk I see with data science projects is that analyzing data per se is generally a bad thing. Generating data with a pre-specified analysis plan and running that analysis is good. Re-analyzing existing data is often very bad.” John is drawing attention to thinking carefully about what you hope to get out of the data, what question you hope to answer, what biases may exist, and what you need to correct before jumping in with an analysis[1].

This is a good reminder that a lot of these advertising programs are running with a model of “collect as much data as possible”, whereas data science is increasingly suggesting that you need to be carefully selecting what data you want to gather because it matches the algorithms you want to run.

There is value in the “collect it all” when you want to do exploratory data science, but generally speaking, focused collected data is better.

How Zoom, Netflix, and Dropbox are Staying Online During the Pandemic

https://www.datacenterknowledge.com/uptime/how-zoom-netflix-and-dropbox-are-staying-online-during-pandemic

Zoom uses a combination of its own data centers and public cloud (by Amazon Web Services) for its compute infrastructure. While it’s had some challenges quickly scaling compute in its own data centers, due to the lockdown-related “supply chain issues” (details of which Guerrero did not disclose), scaling compute in the cloud hasn’t been a problem.
Other than having to scale “a lot faster” than anticipated, “everything is kind of in our standard operating procedure,” he said.

There’s a common theme here among all three companies, who are managing to stay online despite huge surges in load at this time. All three use AWS for their cloud computing needs. Sure, Dropbox and Zoom do some of their work in their own data centers, but they also use AWS. You’ll notice that those who are managing their own data centers have had issues with those, and have shifted more load into AWS to compensate.

Zoom Removes Code That Sends Data to Facebook - VICE

https://www.vice.com/en_us/article/z3b745/zoom-removes-code-that-sends-data-to-facebook

when a user opened the app, their timezone, city, and device details to the social network giant.
When Motherboard analyzed the app, Zoom’s privacy policy did not make the data transfer to Facebook clear.
“Zoom takes its users’ privacy extremely seriously. We originally implemented the ‘Login with Facebook’ feature using the Facebook SDK in order to provide our users with another convenient way to access our platform. However, we were recently made aware that the Facebook SDK was collecting unnecessary device data," Zoom told Motherboard in a statement on Friday.

Zoom claims that using Facebooks SDK included some automatic code that sent data over to Facebook without them realising. This sort of thing is why privacy is so hard, how can you sensibly write a real realistic privacy policy in the face of this sort of complexity and have any hope that it will be accurate?

Doc Searls Weblog · Zoom needs to clean up its privacy act

https://blogs.harvard.edu/doc/2020/03/27/zoom/

There’s too much to cover here, so I’ll narrow my inquiry down to the “Does Zoom sell Personal Data?” section of the privacy policy, which was last updated on March 18. The section runs two paragraphs, and I’ll comment on the second one, starting here:
… Zoom does use certain standard advertising tools which require Personal Data…
What they mean by that is adtech. What they’re also saying here is that Zoom is in the advertising business, and in the worst end of it: the one that lives off harvested personal data. What makes this extra creepy is that Zoom is in a position to gather plenty of personal data, some of it very intimate (for example with a shrink talking to a patient) without anyone in the conversation knowing about it. (Unless, of course, they see an ad somewhere that looks like it was informed by a private conversation on Zoom.)

This is a full bore attack on Zoom’s privacy policy and mechanism for operating. I take a less stringent approach to this. We could assume that Zoom’s privacy policy enables it to do all kinds of nasty things, but it also needs that policy in place to do less nasty things. Having the freedom, legally speaking, to do something doesn’t necessarily mean that someone is doing something. However, as a consumer, there’s a question about whether you need to be consulted or informed if this changes.

Zoom’s privacy policy would allow them to analyse your audio and video and send that analysis to ad companies. However, it also says that it doesn’t do that. It does want to sell adverts in various ways, and so has a policy that lets it do that.

https://www.theguardian.com/technology/2020/feb/20/uk-google-users-to-lose-eu-gdpr-data-protections-brexit

“Nothing about our services or our approach to privacy will change, including how we collect or process data, and how we respond to law enforcement demands for users’ information,” Google said in a statement. “The protections of the UK GDPR will still apply to these users.”
Ireland, where Google and other US tech companies have their European headquarters, is staying in the EU, which has one of the world’s most aggressive data protection rules, the GDPR.

(Joel) You have likely received an email from Google about the changes to their terms the crux of which is the effective movement of the Data Controller from Google Ireland to Google US. (If you don’t like, you can opt-out of this change… by deleting your Google account!)

While I am not a lawyer, my reading is that this only impacts UK consumers (for now…?) and some of the Google services. It does not impact Google G-Suite or GCP. You should keep an eye on your Google contracts to see if they intend to make this change.

My personal view is that Google has no intention of being stuck within multiple (potentially opaque) data authorities and jurisdictions and has no idea how/when/if the UK will handle data protection once past transition periods and puts them on the front foot if the UK ends up with a weaker data protection position than the EU27’s GDPR.

Why Don’t We Just Ban Targeted Advertising? | WIRED

https://www.wired.com/story/why-dont-we-just-ban-targeted-advertising/

Imagine Congress passed a law tomorrow morning that banned companies from doing any ad microtargeting whatsoever. Close your eyes and picture what life would be like if the leading business model of the internet were banished from existence. How would things be different?
Many of the changes would be subtle. You could buy a pair of shoes on Amazon without Reebok ads following you for months.
It’s true, the ads you came across while browsing might be for things you’re less inclined to buy. But a ban on targeted advertising wouldn’t mean the end of personalization. Spotify could still suggest Marvin Gaye based on your enjoyment of Sam Cooke. Bumble could still monitor your swipes to figure out your type. Netflix could still surmise that your life has felt empty ever since you finished season 7 of the Great British Baking Show, and suggest the appropriate spinoffs. (For example.) What companies couldn’t do anymore is share their dossiers about you with adtech companies and advertisers. The geyser of behavioral data currently gathered for marketing purposes would slow to a trickle. As a result, a lot less of your personal information would end up in the hands of data brokers and, from there, third parties like insurance companies, potential employers, or law enforcement agencies.

I don’t really buy this. It’s an interesting thought experiment, what if we banned the use of personal data for advertising. GDPR went half there, it required users to give consent for the use of their personal data, and has principles of data minimisation that means that companies are supposed to only track what they actually need. But the implementation has been fairly poor. So many companies claiming that they can collect this data for legitimate purposes, when it’s hard to weigh that against the invasion of privacy. Or the amount of consent forms that don’t meaningfully allow a consumer to make a choice.

But even if you did, it’s really hard to actually work out what privacy really means. The author says that Spotify could still suggest Marvin Gaye based on your enjoyment of Sam Cooke. But it only knows that because other people have listened to both, and that builds a connection in their systems. That connection uses your personal data for the enrichment of Spotify, so would it be banned? I think that the assumptions built into the article fail to understand that there isn’t a clear black and white of “personal data” and “non-personal data”. It’s mostly just a mass of foggy greyness.

Tweet showing people spreading out from Florida springbreak

https://twitter.com/TectonixGEO/status/1242628347034767361?s=20

Want to see the true potential impact of ignoring social distancing? Through a partnership with
@xmodesocial
, we analyzed secondary locations of anonymized mobile devices that were active at a single Ft. Lauderdale beach during spring break. This is where they went across the US:

This is a fascinating visualisation. The claim of billions of anonymised mobile phone location data points shows just how well we are tracked through mobile phone operators, and the sort of analysis that can be done on that data.

We Need A Massive Surveillance Program (Idle Words)

https://idlewords.com/2020/03/we_need_a_massive_surveillance_program.htm

I am a privacy activist who has been riding a variety of high horses about the dangers of permanent, ubiquitous data collection since 2012.
But warning people about these dangers today is like being concerned about black mold growing in the basement when the house is on fire. Yes, in the long run the elevated humidity poses a structural risk that may make the house uninhabitable, or at least a place no one wants to live. But right now, the house is on fire. We need to pour water on it.
In our case, the fire is the global pandemic and the severe economic crisis it has precipitated. Once the initial shock wears off, we can expect this to be followed by a political crisis, in which our society will fracture along pre-existing lines of contention.
But once the initial outbreak is contained, we will face a dilemma. Do we hurt people by allowing the economy to collapse entirely, or do we hurt people by letting the virus spread again? How do we reconcile the two?
One way out of the dilemma would be some kind of medical advance—a vaccine, or an effective antiviral treatment that lowered the burden on hospitals. But it is not clear how long the research programs searching for these breakthroughs will take, or whether they will succeed at all.
Without these medical advances, we know the virus will resume its spread as soon as the harsh controls are lifted.
Doctors and epidemiologists caution us that the only way to go back to some semblance of normality after the initial outbreak has been brought under control will be to move from population-wide measures (like closing schools and making everyone stay home) to an aggressive case-by-case approach that involves a combination of extensive testing, rapid response, and containing clusters of infection as soon as they are found, before they have a chance to spread.
That kind of case tracking has traditionally been very labor intensive. But we could automate large parts of it with the technical infrastructure of the surveillance economy. It would not take a great deal to turn the ubiquitous tracking tools that follow us around online into a sophisticated public health alert system.

This is a strong argument from Maciej who has been a strong privacy advocate for about as long as I can remember.

But his analysis is right, there is quite a risk right now that after the harshest quarantine measures are relaxed, people will go back out and reinfection will occur. We can contain that by not reducing the quarantine until vacines are available, but that’s talking months if not years of lockdown, and should be clearly unacceptable.

His argument that repurposing the existing surveillance capabilities that mobile phone operators, law enforcement and data scrapers have on human movements could allow us to build a strong public health alerting system.

The risk of course is whether it would be dismantled later, or whether it would then be used for other purposes, and how much you worry about that.

View this page on GitHub.