I'm happy to announce that I've started working on standalone paid courses, so you could support my work and get cheap educational material. These courses will be of completely different quality, with more theoretical depth and niche focus, and will feature challenging projects, quizzes, exercises, video lectures and supplementary stuff. Publishing dates are uncertain, so stay tuned! — 25.03.2025

☝️ FYI

This course was revised in March 2025. However, there are still a considerable amount of gaps and bland narration. I'm still working hard to make this course engaging, so please be patient.

Here you can find tutorials on machine learning, data science and several other fields organized as a sequential, in-depth theoretical course presented in modules. I've created it single-handedly by studying and summarizing a huge amount of material, and launched together with this blog as open source.

I also have plans to make video tutorials based on Research posts for my YouTube channel @avheuristics in the future.

So far, the course is completely theoretical and covers a wide range of advanced topics taken from different online lectures, textbooks, articles and my university classes, including sophisticated concepts of AI algorithms, data analysis and statistics. This is a fully-fledged textbook designed to help students, but it doesn't feature homework or quizzes.

Fundamental concepts are covered superficially. For better understanding, it's recommended to first familiarize yourself with the basics of linear algebra, calculus, statistics and programming, even though it's not necessary.

I decided to integrate the course into this personal website to achieve better popularization, but it will probably be moved to a separate one in the future.

The course content is largely compiled from texts of the Research, or simply pulled as posts automatically, however, the contents may reference any sources.

📋 CONTENTS

Where the course began: an introduction about AI and research

Essentials

1. Computer science essentials for data science and machine learning ~1.5 h

2. Linux for data scientists and machine learning engineers ~50 min

3. Web tools and other essentials for data scientists: cloud platforms, Docker, REST & more ~1.5 h

4. Intro to MLOps/ModelOps ~1 h

Mathematics

5. Linear algebra for machine learning ~1 h

6. Introduction to statistics: probability theory & distributions ~50 min

7. Introduction to statistics, pt. 2: sampling, correlation, covariance, likelihood function, confidence intervals, quantiles, density estimation ~50 min

8. Statistical distributions: types, parameters, characteristics ~50 min

9. Hypothesis testing: basics, test statistics, t-test, chi-squared test ~50 min

10. Hypothesis testing, pt. 2: other types of tests, A/B testing, ANOVA, power analysis ~50 min

11. Calculus for machine learning and data science ~50 min

12. Group theory for ML, pt. 1 ~1 h

13. Group theory for ML, pt. 2 ~1 h

14. Information theory for ML ~1.5 h

Programming

15. Algorithms and data structures ~50 min

Basic ML theory & techniques

16. Introduction to machine learning ~50 min

17. Semi-supervised learning ~1.5 h

18. Self-supervised learning ~1.5 h

19. Online machine learning ~1 h

20. Active learning ~1.5 h

21. Improving ML models ~1.5 h

Mathematical optimization

22. Gradient optimization ~50 min

23. Advanced optimizers ~50 min

Regression

24. Linear regression ~50 min

25. Regularization ~50 min

26. Regression analysis ~50 min

Classification basics & ensembling

27. Logistic regression ~40 min

28. Classification metrics ~1 h

29. Decision trees ~50 min

30. Ensemble methods ~1 h

31. K-nearest neighbors ~1 h

32. Support vector machines ~1 h

33. Kernel (in-depth look) ~1 h

Clustering basics

34. Clustering & K-means ~50 min

35. Clustering metrics ~1 h

36. Mean shift algorithm ~50 min

37. DBSCAN & OPTICS ~1 h

38. Hierarchical clustering ~1 h

Working with data

39. Exploratory data analysis ~50 min

40. Data collection techniques ~1 h

41. SQL and databases for DS ~1.5 h

42. Intro to Big Data ~1 h

43. Playing with PySpark ~1 h

44. Data engineering zone ~1.5 h

Data visualization

45. Data visualization in an art ~40 min

46. t-SNE ~1 h

Data analytics

47. Introduction to data analytics ~10 min

48. Basics of BI ~10 min

49. Everybody needs a dashboard ~10 min

50. Cloud analytics ~1.5 h

Doing better experiments

51. Synthetic data ~1.5 h

52. Experimental design ~10 min

53. Advanced AB-tesing ~1.5 h

Probabilistic models & Bayesian methods

54. Sequential models ~1.5 h

55. Markov models ~1 h

56. Bayesian models ~1 h

57. Gaussian processes ~50 min

58. Gaussian mixture models ~1 h

59. Sampling, in-depth ~1 h

60. Monte Carlo methods ~1 h

61. Partition function (a closer look) ~1 h

62. Graphical models ~1.5 h

63. Bayesian networks ~1.5 h

Deep learning basics

64. Intro to TensorFlow & Keras ~1 h

65. Intro to PyTorch ~1 h

66. Neural network concepts, pt. 1 ~40 min

67. Neural network concepts, pt. 2 ~40 min

68. Neural network concepts, pt. 3 ~40 min

69. Batch-normalization ~1.5 h

Fundamental NN architectures

70. CNN architecture, pt. 1 ~1 h

71. CNN architecture, pt. 2 ~1 h

72. ResNet architecture ~1.5 h

73. Inception and DenseNet ~1 h

74. RNN architecture ~1 h

75. Autoencoder architecture ~1 h

Generative models

76. Generative models ~1.5 h

77. GAN architecture ~1.5 h

78. Diffusion models ~1.5 h

79. Energy-based models ~1.5 h

80. Normalizing flows ~1 h

Transformers

81. Attention mechanism ~1.5 h

82. Transformer architecture, pt. 1 ~1 h

83. Transformer architecture, pt. 2 ~1 h

84. BERT model ~1 h

85. Sentence transformer ~1 h

Natural language processing

86. Intro to NLP ~1 h

87. Word embeddings ~1 h

88. Dialogue systems ~1.5 h

89. Topic modeling ~1 h

LLM engineering

90. Intro to LLMs, pt. 1 ~1.5 h

91. Intro to LLMs, pt. 2 ~1.5 h

92. LLM engineering ~1.5 h

93. Tuning LLMs ~1 h

94. LLM inference optimization ~1.5 h

95. Retrieval-augmented generation ~1 h

Computer vision

96. Image processing ~1 h

97. Video processing ~1 h

98. Intro to Computer Vision ~1 h

99. Img-to-img translation ~1 h

100. NST algorithm ~1 h

101. Image object detection ~1 h

102. Image object segmentation ~1.5 h

103. Optical character recognition ~1.5 h

104. Image blending ~1 h

105. Inpainting ~1 h

106. Depth map ~1 h

107. Pose estimation ~1 h

108. Geometry estimation, pt. 1 ~1.5 h

109. Geometry estimation, pt. 2 ~1.5 h

110. Vision transformers ~1.5 h

Audio analysis

111. Speech recognition ~1.5 h

112. Speech synthesis ~1.5 h

113. Music generation ~1 h

Specialized & advanced architectures

114. RvNN architecture ~1 h

115. Siamese neural network ~1.5 h

116. MoE architecture ~1.5 h

117. PixelRNN & PixelCNN ~1 h

118. DBN architecture ~1 h

119. Neural ODEs ~1 h

120. Deep probabilistic models ~50 min

121. Spiking neural network ~10 min

Time series & applications

122. Time series ~1 h

123. Econometrics for DS ~1.5 h

Graph theory in ML

124. Graph neural networks ~1 h

125. Social networks analysis ~1.5 h

Quantum machine learning

126. Intro to quantum computing ~10 min

127. Quantum algorithms ~10 min

128. Intro to QML, pt. 1: VQC, HHL, QNN, gradients ~10 min

129. Intro to QML, pt. 2: quantum SVM ~10 min

130. Intro to QML, pt. 3: quantum neural network ~10 min

131. Intro to QML, pt. 4: variational quantum eigensolver ~10 min

Prompt engineering

143. Basics of prompting: how to instruct models properly ~2.5 h

AI theory

144. Intro to AI theory, pt. 1 ~1.5 h

145. Intro to AI theory, pt. 2 ~2 h

146. Intelligent agents ~1 h

147. AI search ~1.5 h

148. AI logic ~1.5 h

149. AI planning ~1.5 h

150. AI reasoning & uncertainty, pt. 1 ~1.5 h

151. AI reasoning & uncertainty, pt. 2 ~1.5 h

152. Swarm intelligence ~10 min

153. Causal representation learning ~1.5 h

154. Knowledge representation ~1 h

AI engineering

155. Intro to AI engineering ~1 h

156. RAG for LLMs ~50 min

157. Deploying LLMs ~50 min

158. Vector databases & ANN ~1.5 h

159. Advanced RAG ~1 h

160. Adversarial ML ~1.5 h

Reinforcement learning

161. Intro to RL ~1.5 h

162. AI-driven navigation ~1.5 h

Scaling & distributed learning

163. Multithreading in ML ~1 h

164. Training models at scale, pt. 1 ~1 h

165. Training models at scale, pt. 2 ~1.5 h

166. Approximate inference ~1.5 h

AI web agents

167. AI web agents ~1.5 h

🌱 UPDATES & CONTRIBUTION

The course keeps expanding while outdated information gets revised. Writing new chapters is a fairly time-consuming process, so if you'd like to see this educational project evolve, you're welcome to contribute. There are plenty of ways:

Report bugs, typos, grammar, syntax errors (LaTeX, Markdown, code blocks), semantic and logical inaccuracies, narrative mistakes, etc.
Suggest improvements or ideas (code optimization, readability, UX, content to include, etc.)
Modify the source code by opening a pull request (expand topics, add details, fix something, etc.)
Create new pages, sub-chapters, practical exercises, code notebooks, homework, quizzes, etc.

Please use GitHub Issues for reports and suggestions. You can open a new PR here and check accepted changes here. For more information, see the repository's README.md.

Your contributions are important in building accessible education in AI (and beyond). GitHub profiles of people who contribute significantly will be listed on this page (and, probably, in the Hall of fame).

Questions? Text me!

Adding new pages

If you're going to add a new article (lesson), you may link to a third-party resource. The content of the article doesn't have to be a part of Research, but it must comply with copyright and coherence in the context of the entire course.

Research posts are written in .mdx format and located here, while images (banners and post content) are stored here. Gatsby's frontmatter must include indexCourse, titleCourse and courseCategoryName to properly display your page in the table of contents.

Licensing

TL;DR: you're free to use, distribute and modify [only] the course-related content of this website, as long as you attribute.

The course material is distributed under a separate CC BY-SA 4.0 license, which is a special sublicense extending the permissions of the main website's content protection CC BY-NC-ND 4.0 license. This sublicense allows you to copy and redistribute the course material in any medium or format for any purpose (even commercially) and adapt it (remix, transform and build upon the material for any purpose, even commercially) with attribution under the same license (CC BY-SA 4.0).

The course material covers all the internal pages referenced in the table of contents above (i.e., .mdx files of the referenced posts located in the website's GitHub repository, including all the /research category posts covered in the course), as well as the current page (/course). The sublicense applies to the text of the mentioned pages and to the media files on these pages (except for those already copyrighted by someone else).

To summarize, you are not permitted to distribute the content of this website's posts either commercially or in a modified form, but this rule does not apply to course-related content (as long as you credit, of course).

Please note that the source code of the website (i.e. basically all the content of the website's GitHub repository excluding .mdx files and the /course page) is distributed under a separate (software) license. For more information, refer to the repository's README.md.

Plans & to-do

Finalize unfinished tutorials (with yellow notices)
Add quizzes (as Quiz component)
Add homework/practice sections to each tutorial
Add Jupyter notebooks to each chapter (or most chapters)
Integrate executable code blocks
Add more chapters related to AI theory, information theory, statistics, big data processing, geometry estimation, knowledge representation, semi-supervised learning, and beyond!

❤️ SUPPORT

The course is free forever. I've put a huge amount of my free time into this work, wished to make something enormous and helpful for people, and never sought to profit from this endeavor, as I believe that educational materials should be open to everyone. I also believe that earning the people's love is the only right way to earn coins from such projects, so if you found it useful — please consider support beyond contributing.

The easiest way to thank me and speed up development of the course is to donate right here.

🙏 ACKNOWLEDGMENTS

Many nights — in the light of a dim lamp — I've explored a completely new discipline, which at first seemed unfeasible to me. I must, therefore, express my gratitude to those who guided me through this arduous journey of learning:

Andrew Ng — for the greatest classic ML course, excellent deep learning Coursera specialization and online lectures at Stanford
Peter Norvig and Stuart Russell — for my favorite & one of the greatest textbooks on AI
Victor Kantor, Evgeniy Riabenko, Evgeniy Sokolov, Emeli Dral — my first lecturers who got me into data science
Josh Starmer — for being the best statistics teacher ever
Radoslav Neychev and Vladislav Goncharenko — for excellent and clear lectures and practices at MIPT
Ian Goodfellow, Yoshua Bengio and Aaron Courville — for their clear deep learning textbook
Marc Deisenroth, Aldo Faisal and Cheng Soon Ong — for their clear textbook on mathematics for ML
Konstantin Vorontsov — for in-depth knowledge in machine learning
Sergey Balakirev — for clearest CS/ML tutorials I've ever seen
Alexander Dyakonov — for courses and his captivating easy-to-read data science blog
Yury Kashnitsky — for his ML course platform with its inspiring community
Alexandr Boyko & Daniel Bourke — for making awesome DS/ML roadmaps I've been using
Wiki of ITMO students' notes, Department of Computer Science — for in-depth knowledge in various ML-related topics
machinelearning.ru — for plenty of good classic ML and data mining literature
and a lot of other teachers at MIT, MIPT and Harvard — for giving free online lectures

Thanks for educating, inspiring, or both.

UPDATED ON FEB 7, 2025