Resources for getting started with Deep Learning

About two years ago I decided to fix a huge gap in my education and finally understand what is this buzzword “deep learning” all about. I got so excited about this that I ended up changing the direction of my PhD research and started working on the project that applies reinforcement learning to quantum error correction. My physics friends often ask for an advise on how to get started in machine learning, so I finally compiled a list of resources that I found most useful. My own path was very diffusive – I went through much more than what I selected here and in a different order, so hopefully this post will make it easier for others to focus their time on the right things and be more productive.

This list will be most useful for people with some knowledge of linear algebra, calculus and programming, although some of these resources assume a more general target audience. For example, 1 and 2 even explain what a derivative is!

Acronyms that I will use for brevity:

AI = Artificial Intelligence
ML = Machine Learning
RL = Reinforcement Learning
TF = TensorFlow
Ng = Andrew Ng’s last name (not an acronym)

So here we go:

Michael Nielsen’s online book / tutorial Neural Networks and Deep Learning is by far the best starting point in the whole observable universe! It will let you go from the state “I don’t know what a neural network is, even though everyone is talking about this” to “Neural networks are simple, what’s next”. He selects one problem in computer vision (recognizing handwritten digits) and guides you step by step through the solution with neural networks. He starts simple and gradually adds more rings and bells to the neural net until it achieves the classification accuracy comparable to state-of-the-art on the MNIST dataset. Most importantly, all of this is supplemented with Python code which he also explains in detail. Nowadays, no one (except researchers) needs to write their own neural net implementations, because everything is available in high level libraries such as Keras. However, it’s good to go through this at least once in your life, implementing back-propagation for a neural net from scratch just to really understand the inner working of the training process. There is no better way of doing this than together with Michael Nielsen! And yes, this is that same Michael Nielsen who co-authored “Quantum Information and Quantum Computation”!
Once you roughly know what a neural network is, the next best resource is the Coursera Deep Learning specialization. It is the second most popular specialization on Coursera, superseded only by “Python for everybody”, and taught by Andew Ng – the founder of Coursera himself! He is an amazing teacher who explains things with crystal clarity. He also seems like just a very kind dude, literally radiating kindness from the screen! This specialization is composed of 4 courses which guide your through all major topics in deep learning. The thing that I like the most about this specialization is its breadth. While Nielsen’s tutorial lets you dive into solving that one particular problem, this course is going to cover a ton of stuff in less detail, pointing you to seminal papers in the field. It is after taking this specialization that I was able to read research papers in ML. Reading papers is ultimately the best approach, because the field is so new and advances fast as you study it. A lot of the lessons in the specialization just reference some paper, explain in a few slides the main idea, and it’s left to you to go read it in detail. Despite this, his presentation of material is very accessible – like I said, he even explains what a derivative is!
Comparable to the previous one in quality, although probably less known at this point, is the Deep Learning lecture series 2020 taught by DeepMind employees at UCL. This super timely lecture series incorporates a lot of the knowledge that has been added to deep learning pile in the most recent years. Each lecture in the series is presented by a different researcher from DeepMind. The clarity of their explanations and the quality of teaching and slides impresses me so much that I would put it on par with Andrew’s course! Eventually they go to more advanced topics, such as attention mechanisms in neural nets, transformers, AlphaZero etc, which are not covered in Andrew’s course. Importantly, they also provide a huge number of references to guide your learning process. This being said, I don’t think it’s a good idea to jump to this resource directly after 1. This series works really well as an overview of more advanced topics that became popular in recent years, but not as an introductory course. Of course, they start from basics to bring everyone to the same page, but it’s not enough intro to feel comfortable with the topics that they discuss later on.

The resources above will teach you a lot about deep supervised learning, but not about another branch of ML which has been gaining momentum in recent years – deep reinforcement learning. This branch didn’t have time yet to make such huge industrial impact as deep supervised learning, so there seems to be less resources available on this topic.

I don’t know of any alternative starting point for RL except the book Reinforcement Learning: an Introduction by Sutton & Barto. The first edition appeared in 1998, and afterwards the book has been polished and revised over the years, so the second edition from 2018 is written in extremely clear and enjoyable way. This book will equip you with the basics of more conventional RL which existed before the deep learning revolution. It was my RL equivalent of Andrew’s course – after reading it I was ready for research papers. However, one thing that Sutton & Barto book doesn’t provide, is a broad overview of more modern methods.
There is also a Reinforcement Learning Course by DeepMind at UCL. It is mostly structured around the Sutton & Barto textbook, so it might be a good idea to do them in parallel, although I watched it afterwards and it was a nice review of things that I should have remembered from the book but successfully forgot. I liked this course and Hado van Hasselt’s teaching, but the slides are often pretty lame, so it’s also a good resource to train your willpower and try not to skip them. There is also another version of this course taught by David Silver. At the end of the course, they introduce some of the more modern algorithms, such as AlphaZero and various DQN variants.
For a more practical introduction to modern deep RL I recommend a Deep RL Bootcamp by Berkeley. Again, the lecture series is taught by multiple people all of whom are big shots in the field. I found it very satisfying to watch this after reading the Sutton & Barto book – you already have a grip on basic RL concepts, and they glue them with deep learning. They cover popular modern algorithms such as DQN and PPO, but also some more exotic topics such as inverse RL and applications in robotics. The lecture “Nuts and Bolts of Deep RL Experimentation” by John Schulman is invaluable!
To get hands-on with RL, there is a great resource from OpenAI Spinning Up in Deep RL. It has self-contained educational implementations of lots of basic algorithms and other educational resources. In my first RL toy project, I was simply looking through their code and adapting it to my problem. Implementing low-level things yourself is the best way to gain understanding, although you might not want to do this forever. Luckily, there are several RL libraries that provide high quality ready-to-go implementations of RL algorithms which you can take and throw at your own problem. I am a big fan of TF-Agents, but there are many more (OpenAI Baselines and Stable Baselines among the most popular, although based on their commit history it seems that they will soon go extinct). My reason for using TF-Agents is that it is extremely modular, very well documented, and regularly maintained. It is also compatible with TensorFlow 2, which is a big advantage compared to others who haven’t migrated to TF2 yet.

I want to mention a few more resources that I liked less, but maybe other people will find more interesting:

For a very long time the book Artificial Intelligence: a Modern Approach by Russel & Norvig was considered a bible of AI. This book is huge (also good as a self-defense weapon) – it contains all of the AI knowledge accumulated during the 20th century! When deep learning came, it trashed most of this stuff, so it is not a very practical book to spend time on if you just want to have a quick jump-start in deep learning. However, it is the classics of the field, and it’s definitely worth suffering through if you are planning to get involved with AI research more seriously. This book is extremely well written though – it was polished over multiple years (1st edition was in 1995). In the next edition of the book, they promise to add a whole new chapter about modern deep learning by Ian Goodfellow (see next).
Another popular textbook, which I see as a younger sibling of the previous book, is Deep Learning by Ian Goodfellow. I just didn’t fall in love with it.

On a different note:

For general cultural development in AI I would recommend the Artificial Intelligence Podcast by Lex Fridman which is a part of the course that he was teaching at MIT before setting himself loose on the world by dropping his faculty position and heading for a road trip across the US and launching a startup. He has over 100 conversations with AI industry leaders, Nobel laureates, physicists, neuroscientists etc. This podcast is a source of huge inspiration for me, and a way to familiarize myself with the cutting edge developments in science and technologies related to AI. I absolutely love Lex, and hope that he continues this podcast as his career takes the new turn!

So this is my small list, but going through the references in these resources will keep you busy for a long time. Enjoy!