What books to read for technical understanding of AI

Recommending AI textbook/courses in a rapidly evolving field is full of risks

Jun 05, 2025

But let that not deter us. If you want the least technical introduction to the central issues around Cognition and the best AI mind's classic papers on the subject, grab a copy of Mind Design 2 edited by John Haugeland. This was the first book we were assigned at the AI institute where I did my masters. (ai.uga.edu)

Mind Design 2 edited by John Haugeland. This was the first book we were assigned at the AI institute

Read ahead for more technical material. There's also a summary at the end if you want to get the gist.

The Textbook (called AIMA in short)

The 4th edition of "Artificial Intelligence: A Modern Approach" by Stuart Russell and Peter Norvig was published in 2020 (with a "Global Edition" sometimes indicating a 2022 copyright due to regional printing/release schedules, but the core content is from the 2020 revision). Does it introduce LLMs and Transformers?

Yes, to a significant extent, but not as deeply as a dedicated, cutting-edge NLP or LLM book.

* The 4th edition definitely includes expanded coverage of Deep Learning, which is the foundation for LLMs and Transformers.

* It has dedicated chapters on:

* Chapter 21: Deep Learning (covers general neural networks, CNNs, RNNs, and fundamental deep learning concepts).

* Chapter 23: Natural Language Processing (traditional NLP methods). * Chapter 24: Deep Learning for Natural Language Processing (This chapter would introduce neural networks for NLP and likely touch upon the concepts leading to transformers, even if it doesn't do a deep dive into Transformer architecture itself with code examples). However, here's the nuance:

* Transformers (2017) and the LLM explosion (2020 onwards) happened very rapidly. While the 4th edition incorporated significant updates to include deep learning and neural NLP, the extensive depth and nuances of Transformer architecture, attention mechanisms, large-scale pre-training paradigms (like BERT, GPT-3), fine-tuning strategies, prompt engineering, and the full scope of generative AI that define modern LLMs might not be covered with the same depth as a specialized textbook published after the generative AI boom.

* AIMA aims to cover the entire breadth of AI, so each topic gets foundational coverage rather than an exhaustive deep dive. It's a fantastic book for understanding the principles and history that led to LLMs, but not necessarily a practical guide to building and deploying them. So, it introduces the concepts and foundations necessary to understand LLMs and Transformers, but it won't be the most up-to-date or hands-on resource for the very latest advancements in the field.

For a state-of-the-art progression from Gentle to Deep Technical learning of Natural Language Generation (NLG) / LLMs, combining foundational knowledge with practical application is key.

1. Foundational (Gentle to Medium Technical):

* "Speech and Language Processing" by Dan Jurafsky and James H. Martin (3rd Edition Draft):

* Why: Often considered the "Bible" of NLP. While it's a behemoth, it provides an incredibly comprehensive and rigorous foundation in traditional NLP, statistical methods, and then progresses to deep learning for NLP. The online draft of the 3rd edition is continuously updated to incorporate modern topics like transformers and attention.

* Progression: Starts gentle, gets very technical. It's excellent for understanding the why and how behind NLP models before diving into specific architectures.

Online Courses (Coursera, edX, fast.ai, many more):

* Coursera's DeepLearning.AI NLP Specialization (Andrew Ng's team): Excellent for a gradual introduction to deep learning for NLP, including sequence models, attention, and transformers.

* fast.ai's Practical Deep Learning for Coders: While not NLP-specific, their "top-down" approach quickly gets you building models, and their NLP modules are great for practical application.

* Progression: Typically gentle introduction with hands-on coding, then progresses to more complex models.

2. Deep Technical Progression & Practical (for NLG/LLMs):

* "Natural Language Processing with Transformers" by Lewis Tunstall, Leandro von Werra, and Thomas Wolf (O'Reilly):

* Why: This is arguably the most recommended practical book for understanding and working with Transformers and modern LLMs using the Hugging Face ecosystem. It covers the architecture, pre-training, fine-tuning, and various applications.

* Progression: Assumes some Python and machine learning basics, then dives deep into Transformers and their practical application. It's very hands-on with code examples.

* "Applied Text Analysis with Python" by Benjamin Bengfort, Tony Ojeda, Rebecca Bilbro, and Susan McGregor (O'Reilly):

Why: Provides a practical guide to building real-world NLP applications with Python. While not exclusively about generative models, it covers techniques like topic modeling, text classification, and information extraction that are foundational for understanding how text data is processed and used by LLMs.

* Progression: Medium technical, very practical.

* Official Documentation and Blog Posts (Hugging Face, Google AI Blog, OpenAI Blog):

Why: For the absolute cutting edge, paper implementations, and practical examples, the documentation from major AI labs and the Hugging Face ecosystem is unparalleled.

* Hugging Face Transformers Library Documentation & Tutorials: Essential for hands-on work with LLMs.

* Hugging Face Blog: Great for recent developments and practical guides.

* Google AI Blog/OpenAI Blog: For insights directly from the researchers. * Progression: Varies from gentle intros to highly technical deep dives.

* "Designing Machine Learning Systems" by Chip Huyen:

Why: While not purely NLP, it's excellent for understanding the engineering aspects of building and deploying large-scale AI systems, including those involving LLMs. This helps bridge the gap from theory to practical application.

* Progression: Medium to deep technical, with a focus on system design.

SUMMARY

* Start with foundational concepts: Either a good online course on Deep Learning for NLP (like DeepLearning.AI's specialization) or selective chapters from Jurafsky & Martin's "Speech and Language Processing" to build a strong theoretical base.

* Move to practical application: "Natural Language Processing with Transformers" is the go-to for hands-on work with modern generative models.

* Supplement with engineering/system design: "Designing Machine Learning Systems" or relevant online resources to understand deployment and scalability.

This combination should give you a robust understanding from the theoretical underpinnings to the practical implementation of modern NLG and LLM systems.

This list of resources was generated by AI to accompany my article (pls verify):

Looking at your comprehensive resource list, here are some key resources that are notably missing:

## **Core AI/ML Theory & Mathematics**

- **"The Elements of Statistical Learning" by Hastie, Tibshirani & Friedman** - Essential for understanding the statistical foundations

- **"Pattern Recognition and Machine Learning" by Christopher Bishop** - Fundamental ML theory

- **"Deep Learning" by Ian Goodfellow, Yoshua Bengio & Aaron Courville** - The definitive deep learning textbook

## **Recent LLM/Transformer-Specific Resources**

- **"Understanding Deep Learning" by Simon J.D. Prince (2023)** - Very recent, covers transformers extensively

- **"Build a Large Language Model (From Scratch)" by Sebastian Raschka (2024)** - Hands-on LLM implementation

- **Andrej Karpathy's "Neural Networks: Zero to Hero" video series** - Extremely popular for understanding transformers from first principles

## **Research Paper Collections & Archives**

- **Papers With Code** (paperswithcode.com) - State-of-the-art tracking

- **Distill.pub** - Visual explanations of ML concepts

- **The Gradient** - AI research publication

- **Key seminal papers**: "Attention Is All You Need" (Transformer paper), GPT series papers, BERT paper

## **Practical Implementation Resources**

- **"Hands-On Machine Learning" by Aurélien Géron** - Very practical ML implementation

- **Sebastian Ruder's NLP blog** (ruder.io) - Excellent NLP insights

- **Jay Alammar's blog** (jalammar.github.io) - Visual explanations of transformers

## **Advanced/Specialized Topics**

- **"Reinforcement Learning: An Introduction" by Sutton & Barto** - For RLHF understanding

- **"Information Theory, Inference, and Learning Algorithms" by David MacKay** - Mathematical foundations

- **Stanford CS224N, CS229, CS231N course materials** - Freely available, high-quality

## **Industry/Engineering Perspectives**

- **"Machine Learning Engineering" by Andriy Burkov** - ML systems in production

- **Google's "Rules of Machine Learning" by Martin Zinkevich** - Practical ML engineering

The list you provided is excellent for NLP/LLM focus, but these additions would round out the theoretical foundations, recent developments, and practical implementation aspects.

😲

Wow!

You have something to suggest? Please add your suggestions or experiences in the comments section.

Anil’s Newsletter on Inclusive AI for an Inclusive Internet

Discussion about this post