I too built a rather decent deep learning rig for 900 quid | The Scinder on WordPress.com

the Scinder
8 min readJun 25, 2017

Skip to the components list
Skip to the benchmarks

Robert Heinlein’s 1957 Door into Summer returns throughout to a theme of knowing when it’s “time to railroad.” Loosely speaking this is the idea that one’s success comes as much from historical context as it does from innate ability, hard work, and luck (though much of the latter can be attributed to historical context).

Much of the concepts driving our modern AI renaissance are decades old at least- but the field lost steam as the computers were too slow and the Amazookles of the world were yet to use them to power their recommendation engines and so on. In the meantime computers have gotten much faster and much better at beating humans at very fancy games. Modern computers are now fast enough to make deep learning feasible, and it works for many problems as well as providing insight into how our own minds might work.

I too have seen the writing on the wall in recent years. I can say with some confidence that now is the time to railroad, and by “railroad” I mean revolutionize the world with artificial intelligence. A lot of things changed in big ways during the original “time to railroad,” the industrial revolution. For some this meant fortune and progress and for others, ruin. I would like to think that we are all a bit brighter than our old-timey counterparts were back then and we have the benefit of our history to learn from, so I’m rooting for an egalitarian utopia rather than an AI apocalypse. In any case, collective stewardship of the sea changes underway is important and this means the more people learn about AI the less likely the future will be decided solely by the technocratic elites of today.

I’ve completed a few MOOCs on machine learning in general and neural networks in particular, coded up some of the basic functions from scratch and I’m beginning to use some of the major libraries to investigate more interesting ideas. As I moved on from toy examples like MNIST and housing price prediction one thing became increasingly clear:

It took me a week of work to realize I was totally on the wrong track training a vision model meant to mimic cuttlefish perception on my laptop. This sort of wasted time really adds up, so I decided to go deeper and build my own GPU-enhanced deep learning rig.

Luckily there are lots of great guides out there as everyone and their grandmother is building their own DL rig at the moment. Most of the build guides have something along the lines of “. . . for xxxx monies” in the title, which makes it easier to match budgets. Build budgets run the gamut from the surprisingly capable $800 machine by Nick Condo to the serious $1700 machine by Slav Ivanov all the way up to the low low price of “under $5000” by Kunal Jain. I did not even read the last one because I am not made of money.

I am currently living in the UK, so that means I have to buy everything in pounds. The prices for components in pounds sterling are. . . pretty much the same as they are in greenbacks. The exchange rate to the British pound can be a bit misleading, even now that Brexit has crushed the pound sterling as well as our hopes and dreams. In my experience it seems like you can buy about the same for a pound at the store as for a dollar in the US or a euro on the continent. It seems like the only thing they use the exchange rate for is calculating salaries.

I’d recommend first visiting Tim Dettmers’ guide to choosing the right GPU for you. I’m in a stage of life where buying the “second cheapest” appropriate option is usually best. With a little additional background reading and following Tim’s guide, I selected the Nvidia GTX 1060 GPU with 6GB of memory. This was from Tim’s “I have little money” category, one up from the “I have almost no money” category, and in keeping with my life philosophy of the second-cheapest. Going to the next tier up is often close to twice as costly, but not close to twice as good. This holds for my choice of GPUs as well: a single 1070 is about twice the cost and around 50% or so faster than a 1060 However, two 1060s does get you pretty close to twice the performance, and that’s what I went with. As we’ll see in the benchmarks Tensorflow does make it pretty easy to take advantage of both GPUs, but doubling the capacity of my DLR by doubling the GPUs in the future won’t be plausible.

My upgradeability is somewhat limited by the number of threads (4) and PCIe lanes (16) of the modest i3 CPU I chose; if a near-term upgrade was a higher priority, I should have left out the second 1060 GPU and diverted that part of a budget to a better CPU (e.g. the Intel Xeon E5–1620 V4 recommended by Slav Ivanov). But if you’re shelling out so much for a higher-end system you’ll probably want a bigger GPU to start with, and it’s easy to see how one can go from a budget of $800 to $1700 rather quickly.

The rest of the computer’s job is to quickly dump data into the GPU memory without messing things up. I ended up using almost all the same components as those in Nick’s guide because, again, my physical makeup is meat rather than monetary in nature.

Here’s the full list of components. I sourced what I could from Amazon Warehouse Deals to try and keep the cost down.

GPU (x2): Gigabyte Nvidia GTX 1060 6GB (£205.78 each)
Motherboard: MSI Intel Z170 KRAIT-GAMING (£99.95)
CPU: Intel Core i3 6100 Skylake Dual-Core 3.7 GHz Processor (£94.58)
Memory: Corsair CMK16GX4M2A2400C14 Vengeance 2x8GB (1£05.78)
PSU: Corsair CP-9020078-UK Builder Series 750W CS750M ATX/EPS Semi-Modular 80 Plus Gold Power Supply Unit (£77.25)
Storage: SanDisk Ultra II SSD 240 GB SATA III (£72.18)
Case: Thermaltake Versa H23 (27.10)

Total: £888.40

I had never built a PC before and didn’t have any idea what I was doing. Luckily, Youtube did, and I didn’t even break anything when I slotted all the pieces together. I had an install thumb drive for Ubuntu 16.04 hanging around ready to go and consequently I was up and running relatively quickly.

The next step was installing the drivers and CUDA developer’s toolkit for the GPUs. I’ve been working mainly with Tensorflow lately, so I followed their guide to get everything ready to take advantage of the new setup. I am using Anaconda to manage Python environments for now, so I made one with tensorflow and another with tensorflow_gpu packages.

I decided to train on the CIFAR10 image classification dataset using this tutorial to test out the GPUs. I also wanted to see how fast training progresses on a project of mine, a two-category classifier for quantitative phase microscope images.

The CIFAR10 image classification tutorial from tensorflow.org was perfect because you can flag for the training to take place on one or two GPUs, or train on the CPU alone. It takes ~1.25 hours to train the first 10000 steps on the CPU, but only 4 minutes for the same training on one 1060. That’s a weeks-to-days/days-to-hours/hours-to-minutes level of speedup.

# CPU 10000 steps
2017–06–18 12:56:38.151978: step 0, loss = 4.68 (274.9 examples/sec; 0.466 sec/batch)
2017–06–18 12:56:42.815268: step 10, loss = 4.60 (274.5 examples/sec; 0.466 sec/batch)

2017–06–18 14:12:50.121319: step 9980, loss = 0.80 (283.0 examples/sec; 0.452 sec/batch)
2017–06–18 14:12:54.652866: step 9990, loss = 1.03 (282.5 examples/sec; 0.453 sec/batch)

# One GPU
2017–06–18 15:50:16.810051: step 0, loss = 4.67 (2.3 examples/sec; 56.496 sec/batch)
2017–06–18 15:50:17.678610: step 10, loss = 4.62 (6139.0 examples/sec; 0.021 sec/batch)
2017–06–18 15:50:17.886419: step 20, loss = 4.54 (6197.2 examples/sec; 0.021 sec/batch)

2017–06–18 15:54:00.386815: step 10000, loss = 0.96 (5823.0 examples/sec; 0.022 sec/batch)

# Two GPUs
2017–06–25 14:48:43.918359: step 0, loss = 4.68 (4.7 examples/sec; 27.362 sec/batch)
2017–06–25 14:48:45.058762: step 10, loss = 4.61 (10065.4 examples/sec; 0.013 sec/batch)

2017–06–25 14:51:06.100071: step 3800, loss = 0.96 (7407.2 examples/sec; 0.017 sec/batch)

2017–06–25 14:52:28.510590: step 6000, loss = 0.91 (8172.5 examples/sec; 0.016 sec/batch)

2017–06–25 14:54:56.087587: step 9990, loss = 0.90 (6167.8 examples/sec; 0.021 sec/batch)

That’s about 21–32x speedup on the GPUs. Not quite double the speed on two GPUs because the model isn’t big enough to utilize all of both GPUs, as we can see in the output from nvidia-smi

# Training on one GPU
# Training on two GPUs

My own model had a similar speedup, going from training about one 79-image minibatch per second to training more than 30 per second. Trying to train this model on my laptop, a Microsoft Surface Book, I was getting about 0.75 steps a second. [Aside: the laptop does have a discrete GPU, a variant of the GeForce 940M, but no Linux driver that I’m aware of :/].

# Training on CPU only
INFO:tensorflow:global_step/sec: 0.981465
INFO:tensorflow:loss = 0.673449, step = 173 (101.889 sec)
INFO:tensorflow:global_step/sec: 0.994314
INFO:tensorflow:loss = 0.64968, step = 273 (100.572 sec)

# Dual GPUs
INFO:tensorflow:global_step/sec: 30.3432
INFO:tensorflow:loss = 0.317435, step = 90801 (3.296 sec)
INFO:tensorflow:global_step/sec: 30.6238
INFO:tensorflow:loss = 0.272398, step = 90901 (3.265 sec)
INFO:tensorflow:global_step/sec: 30.5632
INFO:tensorflow:loss = 0.327474, step = 91001 (3.272 sec)1
INFO:tensorflow:global_step/sec: 30.5643
INFO:tensorflow:loss = 0.43074, step = 91101 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.6085

Overall I’m pretty satisfied with the setup, and I’ve got a lot of cool projects to try out on it. Getting the basics for machine learning is pretty easy with all the great MOOCs and tutorials out there (I completed Geoffrey Hinton’s and Andrew Ng’s Coursera courses earlier this year), but the learning curve slopes sharply upward after the first few toy examples. Working directly on real projects with a machine that can train big models before the heat-death of the universe becomes essential for gaining intuition and building cool solutions.

Originally published at thescinder.com on June 25, 2017.

--

--

the Scinder

Interested in science and scientific literacy, space, engineering, optics.