Photo by Rafael Pol on Unsplash
You’ve heard about deep learning. You’ve heard it is changing the world. You’ve also heard it requires a pretty nice GPU.
Naturally, you take a look at your computer to see what kind of hardware you have. You will probably discover 1 of 3 things:
- You don’t have a dedicated GPU at all (common in laptops).
- You have a GPU, but it isn’t an Nvidia GPU which is pretty much required for deep learning.
- You have an Nvidia GPU, but it isn’t nearly powerful enough for your deep learning needs.
So — now what? How do you get access to the hardware necessary to get your hands dirty with deep learning? Let’s take a look.
Kaggle notebooks (also known as kernels) are a free compute environment provided by Kaggle with which you can run your code. Here are the technical specs:
- 9 hours of execution time
- 5 Gigabytes of auto-saved disk space (/kaggle/working)
- 16 Gigabytes of temporary, scratchpad disk space (outside /kaggle/working)
- 4 CPU cores
- 16 Gigabytes of RAM
- 2 CPU cores
- 13 Gigabytes of RAM
You can see above that you have two options: a CPU based notebook or a GPU based notebook. Since we are discussing deep learning, we will assume the GPU notebook. Note: if the demand for GPU notebooks is high, you might be placed in a queue and have to wait for one.
For the GPU notebook, you get a single NVIDIA Tesla P100. This GPU has 12GB of RAM and 4.7 teraFLOPS of double-precision performance.
Kaggle also pre-installs almost all the libraries you would need to run your deep learning experiments making setup extremely easy. Really, all you have to do is turn on a Kaggle notebook with a GPU and start coding.
In my opinion, this is an amazing option, but it does have some downsides. First, you only get 6 hours of execution time when committing code. Committing code is how you save it. It’s not uncommon for deep learning experiments to take days to train, so a 6-hour limit can be pretty limiting if you start working on more complex problems.
Second, while the hardware is amazing given its zero cost, the single GPU can be too small for models that require a lot of memory or a lot of training. For example, deep learning models trained on video or large corpora of text. Both of these would require a lot of GPU memory. They would also almost certainly take longer than 6 hours to train on a single GPU.
Both of these downfalls only occur for advanced users attempting to train fairly large deep learning models. For that reason, if you are just getting started with deep learning, I would strongly recommend that you start with Kaggle Notebooks. They cost nothing, get you access to a good single GPU, come pre-loaded with basically all the necessary libraries and allow you to focus on just learning how to leverage deep learning.
If you continue down the deep learning path, though, at some point you will likely outgrow your Kaggle Notebooks. So — what then?
The Allure of the Cloud
At this point, I see a lot of people turn to the cloud. It is pretty easy to get started, you only pay for what you use, and you can get access to some really powerful machines.
For example, for about $31 an hour, you can get access to a machine with 8 Tesla V100s and a total GPU memory of 256GB. That is some serious compute. You can also get a single K80 GPU for $0.90 an hour. The K80 GPU isn’t the best (its about 2.5x slower than an Nvidia 1080 GPU), but it does have 24GB of ram!
That being said, for most people, I would avoid training your deep learning models in the cloud.
The main reason is simple — if you always have to pay to try a new model or run a new experiment, it will mentally eat at you. You will have to decide every single time if the model you want to run is worth the cost. And that mental battle will prevent you from learning and experimenting as much as possible.
Also, if you stay with the cheaper machines you are not getting that much more hardware than you get from Kaggle. The biggest benefit you would get would be an infinite amount of time to run your models, but since you pay per hour, that would start to add up. Also, if you upgrade your hardware significantly from Kaggle it also starts to get pretty expensive.
So — what option is left to you?
If you have already cut your teeth on deep learning using Kaggle Notebooks and you know you want to go bigger, skip the mental anguish of the cloud, and build your own deep learning rig.
Let’s run through some numbers.
Let’s assume for now that you will be okay with 1 GPU, but want the freedom of longer running models that you don’t get on Kaggle as well as the ability to upgrade to more GPUs later. That is exactly where I was when I built my machine.
I went with the following core components:
- Intel Core i7–6850K 3.6 GHz 6-Core Processor. I choose this processor because it gets you 40 PCI-E lanes which allow you to run 2 GPUs to the maximum potential. Also, having 6 cores is really nice for parallelizing data processing.
- Corsair Vengeance LPX 32 GB (2 x 16 GB) DDR4–3200 Memory because in my opinion 16GB just isn’t enough these days.
- Samsung 970 PRO 512 GB M.2–2280 NVME Solid State Drive in order to get the fastest data loading from my hard drive as possible. I also bought a 3 TB spinning disk to store data on that I didn’t currently need to access for training.
- Gigabyte GeForce GTX 1080 Ti 11 GB AORUS Video Card. When I purchased my machine this was really the best option for desktop-based deep learning. If I were to buy today, I would look closely at the 2080 Ti.
Right now, you could build the above system (with all the other necessary components) for about $2,400.
For a conservative comparison, let’s compare this to the cost of a p2.xlarge on AWS which is a worse machine (except for more GPU memory). This machine costs $0.90 an hour. Adding 512GB of storage to that will probably cost you another $0.07 an hour. For $2,400 you could run this machine for about 103 days.
That sounds amazing, right?
Except when I think about how it took probably 2 weeks of compute just to figure out reinforcement learning, so I could play pong at a super-human level. If you think that each project you attempt could take around 14 days of training to figure out all the bugs and optimize, that gets you about 7 projects before you hit $2,400.
Even worse, imagine every time you find a bug after 2 days of training. That bug cost you almost $50. For me, having to pay every time I want to train a model is just too painful. I’d rather sink the cost of building my own machine and then feel free to go crazy. I know I have learned much more this way because I am not afraid to experiment. In fact, since I spent so much upfront, I am not incentivized to run as many experiments as possible.
More Than Money
Building your own machine has benefits that are not monetary. I learned a ton by choosing all the components for my machine and then putting it together. Sure — it was frustrating at times, but in the end, it was a huge sense of accomplishment.
I also was able to discover how to install and optimize all the necessary libraries, drivers, and packages to run deep learning. And if I ever want to upgrade my system, I have a really strong foundation on which to build. For example, just this month I bought a larger NVME drive (upgraded to 1 TB). I now have 1.5TB of really fast data storage for about a $150 investment. If I ever wanted to upgrade my GPU to the 2080 Ti, I wouldn’t have to buy an entirely new machine. I would just buy the new GPU and put it in. Once you’ve built your own machine, this ability to upgrade it piecemeal over time is amazing!
Hopefully, I have convinced you of the benefits of building your own machine once you’ve outgrown Kaggle Notebooks. $2,400 is not a small investment, but I think most people would be better off by continuing to use Kaggle Notebooks while saving to build a machine. That is not to say you should never touch cloud computing. If while saving you have a large scale model you want to run, give it a go on AWS or Google Cloud Compute. I just don’t think the cloud is a good long-term solution for someone looking to do many deep learning projects.
No matter which path you choose, though, I hope you have a blast building deep learning models!
Note: I use affiliate links when linking out to products, but these are products I have actually purchased and used.
Join my data science community.