CrappyNet - Simulate a crappy network
a development tool for those who give a crap about software quality
version 0.003 (download the latest code from github)

CrappyNet is a development and QA tool for helping teams create products that work well in the real world (not just the idealized world simulated in a programmer’s development environment).

Overview:

“The hardest part about fixing a problem is reproducing the problem.”
- Countless Quality-Concerned Developers

CrappyNet is guided by the “hurt-fast” philosophy, as such, it attempts to quickly bring to light problems early in the development cycle, when they’re easy to fix, rather that late in the process when fixing such problems is costly (in scheduling, money, or reputation). “A little bit of hurt now saves a whole lotta hurt later.”

The problem CrappyNet solves

The problem CrappyNet solves typically goes something like this:

A team is creating an application that consists of a client interacting with a web-based (probably RESTful) server. The team creates a clever client/server API, then implement and test on their development environment. Everything works excellent, the experience is smooth and fast and all tests pass, then (in what they don’t know yet is a worse-case scenario) the product ships. Soon reports start coming back from some small but significant portion of real-world users that the application performs terribly and it crashes, sometimes crashing so bad that the data is corrupt and the application will no longer run at all. These reports are the worst because there’s not much clue as to what is going on, and in the office no one is ever able to reproduce the problem, or if the problem does happen it is so infrequent and rare that there’s no reasonable way to fix it or even know if it’s fixed.

Eventually someone in the office realizes that the problem arises because of real-world problems in the network. Out in the real world networks are slow, they drop packets, they deliver some packets faster than others, they have brief outages, they reroute. But the team has been developing and testing on a perfect office network and so never experienced these problems during development.

So they start a QA process of testing the application in real-world scenarios such as mobile phones walking the streets and driving the roads through tunnels and behind hills; laptops using the weak and overloaded wifis in hotel lobbies; and office computers where someone crawls under the desk plugging and unplugging the ethernet cables.

Lo and behold the team does start to see the kinds of problems reported by real-world users. But even with QA reporting the problems with some frequency, they are still hard to reproduce and so turnaround time is large and to make matters worse when a problem is infrequent and hard to reproduce it is difficult for QA to ever confirm with certainty that it has been fixed (only that it has been longer-than-usual since that particular problem has manifested itself).

Finally the team realizes the fundamental problem is this: They are basing their application around HTTP for all communications but forgot to consider early on that the HTTP protocol is unreliable (and always has been, it was designed that way). HTTP is built on TCP/IP, which is reliable, but HTTP, which consists of multiple individual TCP/IP packets, does not guarantee receipt or delivery. It was easy to forget about the unreliability of HTTP during development because the office network was so reliable that there were never delivery problems during development; HTTP worked perfectly in the idealized world of the office.

“You can fix a problem at step one for a dollar, or fix it at step ten for thirty dollars.”
- managment guru W. Edwards Deming (maybe)

This late in the game, fixing the problem is not going to be easy. By this stage a whole lot of code has been written based on wrong assumptions that A happens then B happens then C happens then D happens and it’s always flawless. In the real world it could be A then C then half of D then C again and B never happened at all. Damn! In trying to fix the code the team realizes that the API between client and server itself needs fixing--hopefully it’s not too late to change the API, and they don’t have to resort to kludgy workarounds that never really work well.

Recent real-world examples of the problem:

  • If my wife dares to send or reply to email from her android email app while we’re driving in a weak cell area (hello, Nevada) there is no guarantee that the mail will go out. Sometimes the app thinks her email has been sent and so it is out of the app’s “outbox” folder, but the recipient will never receive anything and there’s no way to recover the message. At this point there’s no indication that the mail wasn’t sent until the intended recipient complains “how come you never sent me email”.
  • I use a popular music-streaming app that often gets stuck mid-song if the wifi/cable network it’s using has one of its frequent micro-blackouts. The rest of that song will not be heard, and so I must press the “next” button to get it to play. This does not mean I don’t like the current song, but the service may interpret it that way. On some occasion this problem is bad enough to corrupt the local data storage and the app must be reinstalled.
  • I use a GPS run-tracking app to measure my runs while I use another app to listen to music. I have learned not to bring up the GPS run-tracking app in certain weak-cell parts of my run because then it is likely to crash. And I know to never bring up the app at the end of my run (the most likely place to do so) when my phone will be transitioning from cell-network to wifi, because if it’s communicating during that transmission it is almost certain to crash.
  • I’m in a hotel with very many guests sharing the underpowered wifi network and using the world’s most-popular search engine. Under this circumstance, it’s not unusual for part of the resources of that search engine’s web page (e.g. CSS, Script, Images) to not get loaded with the rest of the page or to load late. The result is a whacked out page that may or may not be usable. So I cross my fingers and reload.
  • I am the head of a world power that has just launched a nuclear strike, in error. I use my smartphone to send the cancel-nuclear-strike sequence but due to EMP effects that sequence is delivered out-of-sequence. Armageddon follows.
I confess to fabricating that last example, but the rest of them are real and frequent.

How developing with CrappyNet avoids that problem

CrappyNet acts as a gateway, or proxy, between the client and the server, that simulates most of the problems found in the real world environment. In this controlled environment, CrappyNet’s adjustable parameters let the team simulate network problems of any severity.

Very early in the process, as soon as an API is being used between client and server, a CrappyNet gateway should be placed between the client and server (even if they’re running on the same machine, as often happens during development). From the beginning, client and server developers should usually be developing and testing through the CrappyNet gateway, forcing problems to happen frequently. Yes, this will mean a little pain during development to take the time to fix these real-world problems early on, but that pain will be over soon, compared to the long-lasting pain that happens when you try to fix these problems late in the process. And with real-world network performance speeds enabled, the developers may find that the application runs slower than they’d like--so they’ll be forced to figure out ways to make it faster!

Installation

CrappyNet is a small server build on the Node.js platform, and requires no other tools.

Any computer can act as a CrappyNet gateway. If you are a developer you probably want that server to on your local machine.

Installing CrappyNet:
  1. Install nodejs if it is not already installed. See nodejs installation for very quick ways to install nod.
  2. Install crappynet from github, the latest version of CrappyNet.
  3. That’s all. See the User Guide (next section) for how to run CrappyNet.

User Guide

From the directory where you unzipped crappynet.zip, run the following command:
node crappynet <admin-port> <gateway-port> <settings-filespec>
where <admin-port> is a port for administering this crappynet gateway, <gateway-port> is the port that will act as a gateway, and <settings-filespec> is the name of a file that will hold configuration settings.

A first example: browsing google

For example, suppose you wanted to experience the Google search engine as seen by users in a bad hotel lobby or on a flaky cell connection, you might run this command:
node crappynet 9090 8080 google_crappynet.json
If you now open your browser to “http://127.0.0.1:9090” you’ll see the administration screen for this crappynet gateway instance. If you set the first two fields (“host” and “port”) to “google.com” and “80” and press “Submit”, then open another browser window to “http://127.0.0.1:8080” you’ll see what it’s like to use Google from a slow, unreliable network.

You can adjust the parameters in the adminstration screen (e.g. http://127.0.0.1:9090) and see the immediate effect in the gateway screen (e.g. “http://127.0.0.1:8080”).

Configuring your setup:

Unless you’re a Google employee, you don’t care about the previous example. You want to simulate problems on your web site, or in your Restful API.

You’ll want to launch “node crappynet” with ports and settings that fit your needs. In most cases you may need to change a development version of your software to point to your CrappyNet gateway instead of to your live server. This may include switching to http instead of https (until we fix this problem), altering your hosts

Complications:

  • CrappyNet does not (yet) support https, so you may need to change your developer version to use https instead of http.
  • If your service relies on multiple servers at different addresses or ports, you may run multiple crappynet instances, each running on a different pair of admin/gateway port. These may all run on the same machine or even the same directory.
  • Depending on how much control you have over your environment, to make all communications go through CrappyNet you may be required to alter your hosts table or any service that resolved DNS.

How to solve the problems shown by CrappyNet

Our goal with CrappyNet is to gain awareness of the problem, and not to tell anyone how to fix it. From past experience, it is not hard to design and code around this problem as long as the issues are understood and tackled head on, and early, and CrappyNet remains a part of continues development and testing.

See the “Related Links” section

Q&A

Q: Why did you bother building CrappyNet instead of joining project XXX?
A: We were unaware of any project XXX offering a hurtfast-like solution to the common problem of real-world networks. If in fact there is already a project XXX we want to do what we can to help. Maybe Netflix's Latency Monkey may be similar, but the haven't released that yet so we don't know how much they overlap. Charles Proxy also has some similar features, it appears.


Q: Do you really expect an average developer to work on something that makes their software run worse instead of better?
A:
No. We don’t expect average developers to use this at all. Average developers will hate it. Only superior developers who want to make superior software for their users will understand the value of CrappyNet, and why they would want to set their project back by two days early on, so they will save lots of time and frustration later.
   
Q: How can I help with this project?
A:
Just use the thing and tell us if it does what you need and where to go with it (be kind). See the “collaborators” section at the end of this document for how to send us feedback. If you want to get into the code, join us github crappynet.
   
Q: Why require Node.js? I prefer <insert language/platform here>.
A: We’d never used nodejs before and were looking for an excuse to try it out. A good argument could also be made that CrappyNet should be part of the server infrastructure itself instead of as a separate gateway (e.g. build a crappynet apache layer). If you really think it’s important to have a CrappyNet in a different platform, let’s talk about it.
   
Q: My service uses SSL/HTTPS, how do I use CrappyNet with SSL?
A: Um, I dunno. We’ll try to figure out how to crappify https connections soon, if the fact that we’re acting as a gateway isn’t an ssl deal breaker. (Who knows about this stuff?) If you’re the developer of the software, for the moment it’s probably easier for you to disable HTTPS in you software, for a temporary developer build, and debug crappy network issues in that setup.
   
Q: What are your future CrappyNet plans?
A: CrappyNet still needs a lot of work, both in the code itself and in evangelizing the need for such a tool. If it works out, then CrappyNet will likely be followed by other Crappy tools.

We're also considering whether this would work better as a service in the cloud, instead of expecting each user to run it themselves. This would requires us setting up, maintaining a server <sad>, but would provide revenue <happy>

Related Links

History

 2012-02-06 v0.003   
First public release
 2012-04-26
Code moved to github

Collaborators