173 - Scaling a community

October 31, 2021

This last few weeks, there’s been a lot of discussion around whether Facebook is a net positive for society or not.

Sadly, the stories of families who can keep in touch, or old university friends who support one another are never newsworthy, and so we tend to see just the negative reporting around these things.

Bringing people together is often viewed as it’s own good, whether it be the mission of Facebook to connect the world, or whether it just be bringing together your community in your organisation. But people are people, and that means that they have their own lives that impact on everything that they do. From people who are just bad apples, to people who are having a difficult time and are short tempered, argumentative or irritable with their coworkers, we always have to keep an eye on the health of the community.

It can be easy to excuse bad behaviour, we often naturally want to empathise with people and assume they didn’t intend negative consequence, that their actions have causal reasons that it’s difficult to unpick, and the reality is that this is often true. But it shouldn’t be an excuse, and as a community we need to hold each other to high standards that means that the community is a nicer and better place to work, to exist and to spend time.

Whether we think Facebook is an evil privacy abusing megacorp that seeks to profit off of people’s data, or a panacea to enable human connection around the globe, what’s clear is that global networking has bought significant change to our society. While trying to avoid using the new “m word”, the reality is that our social dynamics have changed fundamentally in the last few decades, and our social regulations, restrictions and communal consensus has not kept up with the level of interconnectedness that we see today. The next decade will see more of this kind of thing, and it won’t just be Facebook that is behind it all.

Facebook Papers: ‘History Will Not Judge Us Kindly’ - The Atlantic

https://www.theatlantic.com/ideas/archive/2021/10/facebook-papers-democracy-election-zuckerberg/620478/

The documents are astonishing for two reasons: First, because their sheer volume is unbelievable. And second, because these documents leave little room for doubt about Facebook’s crucial role in advancing the cause of authoritarianism in America and around the world. Authoritarianism predates the rise of Facebook, of course. But Facebook makes it much easier for authoritarians to win.
Again and again, the Facebook Papers show staffers sounding alarms about the dangers posed by the platform—how Facebook amplifies extremism and misinformation, how it incites violence, how it encourages radicalization and political polarization. Again and again, staffers reckon with the ways in which Facebook’s decisions stoke these harms, and they plead with leadership to do more.
[…]
I’ve been covering Facebook for a decade now, and the challenges it must navigate are novel and singularly complex. One of the most important, and heartening, revelations of the Facebook Papers is that many Facebook workers are trying conscientiously to solve these problems. One of the disheartening features of these documents is that these same employees have little or no faith in Facebook leadership. It is quite a thing to see, the sheer number of Facebook employees—people who presumably understand their company as well as or better than outside observers—who believe their employer to be morally bankrupt.
I spoke with several former Facebook employees who described the company’s metrics-driven culture as extreme, even by Silicon Valley standards. (I agreed not to name them, because they feared retaliation and ostracization from Facebook for talking about the company’s inner workings.) Facebook workers are under tremendous pressure to quantitatively demonstrate their individual contributions to the company’s growth goals, they told me. New products and features aren’t approved unless the staffers pitching them demonstrate how they will drive engagement. As a result, Facebook has stoked an algorithm arms race within its ranks, pitting core product-and-engineering teams, such as the News Feed team, against their colleagues on Integrity teams, who are tasked with mitigating harm on the platform. These teams establish goals that are often in direct conflict with each other.

Facebook is something of a victim of it’s own success here. It has made a system that enables people to connect with one another, and that means that not only can people around the world connect with others going through the same experiences as them, giving them a support network and sense of community. But people will use those communities for their own purpose as well. This is inevitable at scale, where humans tend towards average, meaning that you’ll always have some bad actors in your system, and you need to consider how to resist the negative, without harming the positive actions that your system has.

Before the “Whistleblowers” there were “Goodbye” posts: leaving #Facebook Engineering in 2016 because of #China, User Content, and #EndToEndEncryption – dropsafe

https://alecmuffett.com/article/15058

The suggestion that “anything that allows us to connect more people more often is de facto good” can be restated as “the end justifies the means” – a consequentialist argument which not only ignores the impact of the “means” but also presupposes the value of the “ends” to all parties.
My perspective is different to Boz’s. I don’t feel that the goal is for Facebook to connect people; I feel that the end should be for people to be connected. “Connection” should not be a “Gotta Catch ‘Em All” Pokemon game, it should be about making peoples’ lives better.
The question then becomes: what constitutes “better”?
I can see a massive upside of China as a target for Facebook: the potential for growth, the space to connect 1.35 billion more people, the opportunity to create (and be seen to create) change for good.
That sounds like “better”, especially if you believe that “more == better”.
But China is not a liberal democracy and to be permitted to operate there involves compromise, and to operate there we would have to compromise a lot.
[…]
Now consider us. Our volume of user-generated content. How many posts we would have to censor? How many photos we would have to block? Think of the volume of appeals and erroneous takedowns which we currently perform, but at least our existing content-owners don’t risk state censure by appealing.
For instance: if there is an earthquake and some buildings collapse, or if there is a civil rights protest, do we want to put ourselves in the way of people sharing photos of that? [one may ask the same question of those people who want to ‘minimise harm’ in E2EE systems by (e.g.) restricting the number of people to whom an image can be sent]
It’s common to present implementation of such controls as “compliance with applicable local laws and regulations”, but at some point this becomes “collusion”, and to collude in inhibiting such connectedness strikes me as unattractive.

Alex wrote this essay a number of years back, when he left Facebook, and only made it public when it got recently leaked.

There’s some interesting deep questions for all western tech firms in here, primarily the question around “when does compliance with laws become collusion”, for which western firms operating with and alongside illiberal autocracies. There’s a second question for tech firms about whether it is their responsibility to care, and there’s an assumption in here that “western democracy == good, anything else == bad” that I think is an overreduction of a quite complex philosophical point. It’s almost back to the trolley problem, is it better to allow harm through inaction than to cause reduced deliberate harm through action?

NOBELIUM targeting delegated administrative privileges to facilitate broader attacks - Microsoft Security Blog

https://www.microsoft.com/security/blog/2021/10/25/nobelium-targeting-delegated-administrative-privileges-to-facilitate-broader-attacks/

Microsoft has observed NOBELIUM targeting privileged accounts of service providers to move laterally in cloud environments, leveraging the trusted relationships to gain access to downstream customers and enable further attacks or access targeted systems. These attacks are not the result of a product security vulnerability but rather a continuation of NOBELIUM’s use of a diverse and dynamic toolkit that includes sophisticated malware, password sprays, supply chain attacks, token theft, API abuse, and spear phishing to compromise user accounts and leverage the access of those accounts. These attacks have highlighted the need for administrators to adopt strict account security practices and take additional measures to secure their environments.
In the observed supply chain attacks, downstream customers of service providers and other organizations are also being targeted by NOBELIUM. In these provider/customer relationships, customers delegate administrative rights to the provider that enable the provider to manage the customer’s tenants as if they were an administrator within the customer’s organization. By stealing credentials and compromising accounts at the service provider level, NOBELIUM can take advantage of several potential vectors, including but not limited to delegated administrative privileges (DAP), and then leverage that access to extend downstream attacks through trusted channels like externally facing VPNs or unique provider-customer solutions that enable network access. To reduce the potential impact of this NOBELIUM activity, Microsoft encourages all of our partners and customers to immediately review the guidance below and implement risk mitigations, harden environments, and investigate suspicious behaviors that match the tactics described in this blog. MSTIC continues to observe, monitor, and notify affected customers and partners through our nation-state notification process. Microsoft Detection and Response Team (DART) and Microsoft Threat Experts have also engaged directly with affected customers to assist with incident response and drive better detection and guidance around this activity.

How secure is your managed service provider? Did they retain some global administrator privileges to your system to manage account recovery, support activity or just because they never got round to removing those privileges?

You should be auditing who has access to your cloud productivity environments, especially your Microsoft 365 and GSuite accounts, since access to admin in those systems almost certainly gives someone access to all emails and documents in your company, giving lateral access to everything else you do.

APT trends report Q3 2021 | Securelist

https://securelist.com/apt-trends-report-q3-2021/104708/

While the TTPs of some threat actors remain consistent over time, relying heavily on social engineering as a means of gaining a foothold in a target organization or compromising an individual’s device, others refresh their toolsets and extend the scope of their activities. Our regular quarterly reviews are intended to highlight the key developments of APT groups.
Here are the main trends that we’ve seen in Q3 2021:
We continue to see supply-chain attacks, including those of SmudgeX, DarkHalo and Lazarus.
In this quarter we focused on researching and dismantling surveillance frameworks following malicious activities we detected. These include FinSpy and the use of advanced and highly capable payloads staged by a commercial post-exploitation framework known as Slingshot. These tools contain powerful covert capabilities, such as the use of bootkits for persistence. Bootkits remain an active component of some high profile APT attacks, notwithstanding various mitigations Microsoft has added to make them much less easy to deploy on the Windows operating system.
We observed an abnormal spike in activity coming from what is widely known to be Chinese-speaking threat groups this quarter, particularly when compared to the start of the year. By contrast, we have seen a decrease in activity in the Middle East this quarter.

Good general roundup of the APT trends for the last quarter. There’s more evidence in here that supply chains are coming under scrutiny by a wider variety of actors, so anything you are doing now will also protect you from developing capabilities in less advanced actors in the future.

2 Widespread Attacks on Your Containerized Environment and 7 Rules to Prevent it. | by Boris Zaikin | ITNEXT

https://itnext.io/2-widespread-attacks-on-your-containerized-environment-and-7-rules-to-prevent-it-957aa7dfa5e0

The container itself is a small OS that can be susceptible to attack with malicious code. In this article, we talk about
What you need to do to secure your app in the container.
Security tools.
Security rules, policies.
How to apply common security principles to prevent attacks.
I consider using Docker and Kubernetes as two current leaders in container engines and orchestration.

This is a nice overview of some of the tools and techniques you’ll need to get in place if you need to secure a kubernetes infrastructure.

Recent Attack Uses Vulnerability on Confluence Server | FortiGuard Labs

https://www.fortinet.com/blog/threat-research/recent-attack-uses-vulnerability-on-confluence-server

In August 2021, Atlassian published a security advisory about CVE-2021-26084 that could enable a threat actor to run arbitrary code on unpatched Confluence Server and Data Center instances. FortiGuard Labs analyzed the situation and published a Threat Signal with relevant information. After releasing the advisory, there occur massive scanning and proof-of-concept exploit code in public. We also collect a lot attacking traffic. In this blog we will analyze the payloads leveraging this vulnerability, deep dive into the attack and summarize the IOCs for these suspicious activities that may hint the network was affected by CVE-2021-26084.
In September, we observed numerous threat actors targeting this vulnerability whose goal was to download a malicious payload that would install a backdoor or miner in a user’s network. These threats include Cryptojacking, Setag backdoor, Fileless attack that uses PowerShell in a system to execute shell without file dropped and Muhstik botnet; we will elaborate each of them in this analysis.

We’re looking at a few weeks at most to go from vulnerability release to mass scanning in this case.

For many companies, their atlassian infrastructure may well not be core IT, there’s a good possibility that it might even be shadow IT, setup by the development team or business unit that needs it, and not monitored or managed. Even shadow IT needs patching, otherwise it will become increasingly vulnerable and subject to attacks like this one.

AuthZ: Carta’s highly scalable permissions system | Building Carta

https://medium.com/building-carta/authz-cartas-highly-scalable-permissions-system-782a7f2c840f

Permissions, also known as authorization, is the process of granting access to resources in your system. For any team, it’s crucial to get permissions right. At Carta, where we are working with financial data all day, it’s the most important thing.
But we had a problem. Instead of maintaining one legacy system, we were maintaining five. The permissions could conflict — and they were impossible to extend. Our business needs were growing, and we had several new products in the funnel.
It was obvious we had to build a new authorization system to create leverage for engineering and product. We knew we needed it to be three things:
Scalable
Fast
Generic enough for any new product needs
Sounds simple enough, but in reality it’s not that easy. In my career, I’ve seen permission systems that are too simple. They lack the features to support fine-grained access on single resources. I’ve also seen them too complex. One small change might unravel a whole policy of attribute-based permissions.
In this article, we’ll look at how my team — Identity and Access Management — took a creative approach to avoid those pitfalls by rebuilding Carta’s permissions system based on Google Zanzibar.

I love a good story on how a security product is developed and this shows a complex domain, migration from legacy systems and a core security policy makeover.

gitoops/blog.md at main · ovotech/gitoops · GitHub

https://github.com/ovotech/gitoops/blob/main/docs/blog.md

With the proliferation of CI/CD integrations and dynamic checks in Version Control System providers, users can directly and indirectly run code in a variety of contexts by pushing changes to repositories. The most common example is running code in a CI/CD runner by triggering software test suits from a feature branch when opening a pull request.
Combined with lax access controls to repositories and their branches, this can offer easy paths for lateral movement and privilege escalation within an organization.
Here are a some common scenarios:
The lack of production branch protections on a repository with a production continuous deployment pipeline could allow anyone with write access to the repository to deploy malicious changes to production.
The lack of branch-based access controls for secrets in CI/CD systems like CircleCI and, historically, GitHub Actions, means that an untrusted (unreviewed) feature branch may have access to production secrets when running a build in a pull request context.
Running a Terraform production plan on an untrusted (unreviewed) feature branch may give untrusted infrastructure code and Terraform providers access to production and production secrets.
An excessive number of administrators on a critical repository can increase the chances of branch protections being disabled to enable an attack via CI/CD pipelines.
As organizations grow to have thousands of repositories, hundreds of users and teams, use dozens of CI tools, and empower teams with autonomy, it is unreasonable to expect security teams to manually investigate and keep tabs on these attack paths.

A nice tool for introspecting your CI/CD pipeline for vulnerabilities, enabling defending teams to regularly check whether secrets are inappropriately exposed by the pipeline

GitHub - securitywithoutborders/hardentools: Hardentools simply reduces the attack surface on Microsoft Windows computers by disabling low-hanging fruit risky features.

https://github.com/securitywithoutborders/hardentools

Hardentools is a collection of simple utilities designed to disable a number of “features” exposed by operating systems (Microsoft Windows, for now), and primary consumer applications. These features, commonly thought for enterprise customers, are generally useless to regular users and rather pose as dangers as they are very commonly abused by attackers to execute malicious code on a victim’s computer. The intent of this tool is to simply reduce the attack surface by disabling the low-hanging fruit. Hardentools is intended for individuals at risk, who might want an extra level of security at the price of some usability. It is not intended for corporate environments.

This is a lovely looking tool for normal users. It’s designed to remove features that the vast majority of average users don’t use, but are common malware vectors. From Office macros to Adobe Reader features, and powershell, some of these are used to remotely manage computers, but shouldn’t be enabled by default on personal versions of windows.

It would be nice to see Microsoft learn from this, and start shipping windows with many of these things disabled, and only enabled if the user needs them.

How Pokémon GO scales to millions of requests? | Google Cloud Blog

https://cloud.google.com/blog/topics/developers-practitioners/how-pok%C3%A9mon-go-scales-millions-requests

Priyanka: What happens when I hunt a Pokémon down and catch it?
James: When you catch the Pokémon, we send an event from the GKE frontend to Spanner via the API and when that write request from the frontend to spanner is complete. When you do something to update the map like gyms and PokéStops, that request sends a cache update and is forwarded to the spatial query backend.
Spanner is eventually consistent: once the update is received, the spatial data is updated in memory, and then used to serve future requests from the frontend. Then the frontend retrieves information from the spatial query backend and sends it back to the user. We also write the protobuf representation of each user action into Bigtable for logging and tracking data with strict retention policies. We also publish the message from the frontend to a Pub/Sub topic that is used for the analysis pipeline.
Priyanka: A massive amount of data must be generated during the game. How does the data analytics pipeline work and what are you analyzing?
James: You are correct, 5-10TB of data per day gets generated and we store all of it in BigQuery and BigTable. These game events are of interest to our data science team to analyze player behavior, verify features like making sure the distribution of pokemon matches what we expect for a given event, marketing reports, etc.
We use BigQuery - it scales and is fully managed, we can focus on analysis and build complex queries without worrying too much about the structure of the data or schema of the table. Any field we want to query against is indexed in a way that allows us to build all sorts of dashboards, reports, and graphs that we share across the team. We use Dataflow as our data processing engine, so we run a Dataflow batch job to process the player logs stored in Bigtable.
We also have some streaming jobs for cheat detection, looking for and responding to improper player signals. Also for setting up Pokétops and gyms and habitat information all over the world we take in information from various sources, like OpenStreetMap, the US Geological Survey, and WayFarer, where we crowdsource our POI data, and combine them together to build a living map of the world.

The amount of data shipped around in modern multiplayer gaming systems is orders of magnitude bigger than most of us ever deal with. This interview is a lovely insight into one of the architectures that has had to scale immensely over the years, and has done so with very few outages or customer complains (about scaling anyway).

Rogers Chairman Fires Board for Firing Him for Firing CEO - Bloomberg

https://www.bloomberg.com/opinion/articles/2021-10-25/rogers-chairman-fires-board-for-firing-him-for-firing-ceo

Basically the chairman tried to fire the CEO, then the board instead fired the chairman, and then the chairman (as controlling family shareholder) tried to fire the board. Also, some of the family tried to fire the chairman from his job as controlling family shareholder. It is not clear which of those things worked; as far as I can tell the most likely answer is that Edward Rogers lost at the board but won at the trust, meaning that he will eventually get to kick his sisters and mother off the board, reinstate himself as chairman and fire the CEO, but meanwhile his sisters and mother and the CEO probably have a few months to run the company and try to head him off.

The details in here around the goings on at Rogers, Canada’s largest telecoms provider, has reminded me that Succession is back on our screens, and this story sounds straight out there, just, you know, in real life.

View this page on GitHub.