“Hello Deep Learning”—Understanding Deep Learning from the Ground Up

johnwalker · 1 April 2023 12:56

Many introductions to deep learning technology either start from theory, which can be forbidding to the non-mathematically oriented, or else build on top of complex frameworks such as PyTorch, which provide so much ready-made infrastructure that it’s difficult to understand the basics of what is going on under the bonnet. To provide programmers a grasp on the basics of deep learning by actually writing and working with code, solving a real world problem, Bert Hubert has developed and published “Hello Deep Learning”, a self-paced tutorial which develops programs in standard C++, without any deep learning libraries or development environments. The tutorial develops programs which ultimately implement recognition of handwritten characters, introducing learning from a training data set, back propagation, automatic differentiation, multiple-layer neural networks, and more. The programs can all be compiled and run on a modest computer, and no graphics processor unit (GPU) is required. Here is the description of the project from the home page.

A from scratch GPU-free introduction to modern machine learning. Many tutorials exist already of course, but this one aims to really explain what is going on, from the ground up. Also, we’ll develop the demo until it is actually useful on real life data which you can supply yourself.

Other documents start out from the (very impressive) PyTorch environment, or they attempt to math it up from first principles. Trying to understand deep learning via PyTorch is like trying to learn aerodynamics from flying an Airbus A380.

Meanwhile the pure maths approach (“see it is easy, it is just a Jacobian matrix”) is probably only suited for seasoned mathematicians.

The goal of this tutorial is to develop modern neural networks entirely from scratch, but where we still end up with really impressive results.

Here are the chapter titles from the tutorial, linked to the chapters on-line. This is a work in progress, and a few of the more advanced chapters have not yet been posted.

Hello Deep Learning

Introduction (which you can skip if you want)

Chapter 1: Linear combinations

Chapter 2: Some actual learning, backward propagation

Chapter 3: Automatic differentiation

Chapter 4: Recognizing handwritten digits using a multi-layer network: batch learning SGD

Chapter 5: Neural disappointments, convolutional networks, recognizing handwritten letters

Chapter 6: Inspecting and plotting what is going on, hyperparameters, momentum, ADAM

Chapter 7: Dropout, data augmentation and weight decay, quantisation

Chapter 8: An actual 1700 line from scratch handwritten letter OCR program

Chapter 9: Gated Recurring Unit / LSTM: Some language processing, DNA scanning

Chapter 10: Attention, transformers, how does this compare to ChatGPT?

Chapter 11: Further reading & worthwhile projects

Chapter 12: What does it all mean?

Complete source code for all of the examples and the training data they use are posted on GitHub at https://github.com/berthubert/hello-dl.

This looks to be an excellent way for working programmers familiar with C++ to get started in machine learning and obtain an understanding of what is going inside the black boxes provided by the large toolkits upon which large systems are built.

Once you’re ready to build large production systems, for example to optimise the output of your paperclip factory, you’ll want to use one of the heavy-duty frameworks such as PyTorch or TensorFlow, which provide much of the basics pre-built and allow you to use hardware accelerators and distributed processing without having to implement all of the machinery yourself or extensively modify your code, but starting with this tutorial will provide insight into what the frameworks are doing for you and thus how to use them effectively to solve your problem.

pturmel · 1 April 2023 13:42

This is crucial for programming in general, not just for deep learning.