

🎓 12/167
This post is a part of the Mathematics educational series from my free course. Please keep in mind that the correct sequence of posts is outlined on the course page, while it can be arbitrary in Research.
I'm also happy to announce that I've started working on standalone paid courses, so you could support my work and get cheap educational material. These courses will be of completely different quality, with more theoretical depth and niche focus, and will feature challenging projects, quizzes, exercises, video lectures and supplementary stuff. Stay tuned!
Machine learning has seen tremendous growth over the last decade, in part because researchers have steadily incorporated an ever-expanding set of mathematical tools designed to capture the underlying structures and symmetries present in real-world data. One of the most powerful and elegant among these mathematical tools is group theory. Group theory plays a critical role in modeling the ways that data can be transformed, rotated, reflected, permuted, or otherwise manipulated, while preserving essential structure or meaning. The capacity to handle symmetries effectively is central to many of the leading-edge machine learning architectures — especially in computer vision, robotics, molecular modeling, and other areas where transformations of the data (including rotations, translations, or permutations of elements) need to be taken into account in a mathematically principled way.
Motivation for group theory in machine learning
Group theory is sometimes seen as an esoteric branch of abstract algebra, but it has deep relevance to machine learning, especially in situations where one might expect or require certain invariances or equivariances of a model's output with respect to transformations of the input data. A typical example is in computer vision, where we want a classification system for images to be robust to certain transformations such as rotations or flips. If a system is trained to detect objects in images, we usually want it to classify an object the same way whether the image is rotated by 90 degrees or mirrored along some axis — unless there is a meaningful semantic distinction introduced by such a transformation (e.g., the difference between left-facing and right-facing text in optical character recognition could matter). The existence of these symmetries often means that we can reduce the effective complexity of a model, or that we can incorporate more knowledge about the data into the architecture of the network itself, thereby improving sample efficiency and generalization.
The reason that group theory lies at the heart of such approaches is that a group is a set equipped with an operation that satisfies certain axioms (closure, associativity, identity, inverse). In simpler terms, a group formalizes the notion of a "transformation set" that can be performed on an object, such that combining transformations in succession is still a valid transformation, there is a notion of doing "nothing" (the identity transformation), and each transformation has a corresponding inverse. This precisely corresponds to how we usually think about transformations like "rotate by 20° then rotate back by —20°", or "flip about a horizontal axis, then flip again about that same axis to recover the original". By modeling the transformations as elements of a group, we can exploit the rich theory of group actions, group representations, and related concepts from representation theory and harmonic analysis. These frameworks unify how transformations behave when applied to data, and how that behavior can be reflected in neural architectures or other machine learning models.
Why symmetries and transformations are central in modern ML
In many tasks, especially in vision and signal processing, data has inherent symmetries: an image of a cat remains an image of the same cat when the camera is slightly rotated or moved, a speech signal might be shifted in time without fundamentally changing the content, a 3D protein structure might be rotated in space, and so on. A hallmark of many successful ML architectures is leveraging such domain-specific symmetries:
- Convolutional neural networks (CNNs) inherently incorporate translational symmetry by applying convolution filters across spatial or temporal dimensions.
- Group-equivariant neural networks (G-CNNs) extend this idea to additional transformations such as rotations and reflections (Cohen and Welling, ICML 2016).
- Steerable filters (Freeman and Adelson, 1991) let a filter change orientation in a predictable way, which is closely related to group theory since the transformations of orientations can be modeled by certain subgroups of the continuous rotation group.
When the architecture aligns with the transformations that a dataset may undergo, the model can learn more efficiently, require fewer samples, and generalize in a more robust manner. From this perspective, group theory is the natural mathematical language to describe these transformations and to systematically exploit them.
Examples of rotation/reflection invariance in image tasks
Computer vision applications often showcase how invariance to rotations or reflections can be vital. In digit classification on the MNIST dataset, a digit rotated by 30° or 90° is still the same digit. If we train a classifier from scratch on every possible rotation, we might need far more training examples, whereas a carefully crafted architecture that encodes rotational symmetry from the outset can generalize to those rotations right away.
Even more complex tasks, like detecting tumors in medical images that might be oriented in arbitrary ways, benefit from the same principle. In such contexts, reflection invariance can also matter (e.g., flipping a scan left-to-right, which might not change the clinical significance of a tumor). These real-world examples underscore why machine learning researchers frequently turn to group theory: it offers a unified theoretical framework to encode the notion of symmetry, invariance, and related properties that can be exploited to reduce complexity and improve performance.
Fundamentals of group theory
To properly appreciate how group theory helps in ML contexts, it is helpful to have a thorough grounding in the basic concepts. Because many advanced and specialized results (like group representations and group convolutions) hinge on definitions of groups, subgroups, and group actions, I want to outline the core elements that define group theory and highlight the specific forms of groups commonly used in machine learning (ML) applications.
Basic definitions
A group is a set together with a binary operation, often denoted , that combines any two elements to form another element of . The operation might also be expressed with juxtaposition () or with some symbol (like in the case of an additive group). For instance, in a rotation group, the operation is composition of rotations — rotating by 30° followed by 45° is the same as a single rotation by 75°.
Formally, a structure is a group if it satisfies the following axioms:
- Closure: For all , is also in .
- Associativity: For all , .
- Identity element: There exists an identity element such that for all , .
- Inverse elements: For every , there exists an inverse such that .
Conceptually, one can think of a group as the mathematical encapsulation of the idea of "transformations" or "symmetries" that can be composed in a consistent manner, with a do-nothing transformation () as identity, and the ability to undo every transformation ().
Subgroups and cosets (brief mention)
A subgroup of a group is a subset of that itself forms a group under the inherited operation. For example, in the group of all integer additions , the set of even integers is a subgroup. Subgroups are important because they often represent restricted sets of transformations that still maintain closure under the operation.
A related concept is the notion of a coset. Given a subgroup of and an element , the left coset of with respect to is . Cosets arise often in group theory when analyzing factor groups, but here, it suffices to keep in mind that cosets let you partition a group into distinct equivalence classes under the subgroup . While subgroups and cosets are not the focus of typical ML-based group theory applications, they do play a role in certain advanced constructions (e.g., factor-group-based representations).
Examples of groups
There are many classes of groups, but a few show up constantly in ML. One can categorize them by whether they are finite or infinite, abelian (commutative) or non-abelian (non-commutative), discrete or continuous (Lie groups), etc. Let me highlight some representative examples.
Cyclic groups (e.g., )
A cyclic group is generated by a single element. A popular instance is , the group of rotations mod n. If we imagine rotating a 2D object by increments, the group has exactly elements: 0°, , , …, . The group operation is composition of these rotations; closure follows from how angles add up mod (or mod in radians). This is a finite, abelian group.
Cyclic groups are conceptually among the easiest to understand. In ML contexts, one might consider to represent 90°-rotation symmetries of an image. If a network is designed to be invariant or equivariant under these transformations, it might be effectively encoding or learning a representation of the group .
Dihedral groups ()
Dihedral groups can be viewed as the symmetry groups of a regular n-gon, consisting of both rotations and reflections. has elements: rotations and reflections. If you imagine a square, includes the identity (do nothing), rotate by 90°, rotate by 180°, rotate by 270°, and then reflections about axes (vertical, horizontal, main diagonal, secondary diagonal in the square case). This group is finite but not abelian (a reflection followed by a rotation does not produce the same transformation as the same rotation followed by that reflection).
In ML, dihedral group symmetries appear when both rotations and flips matter for a system (like an image classification model that needs to treat a shape or pattern the same under certain reflections). Depending on the domain, dihedral transformations can significantly reduce the effective variability in the data.
Special orthogonal groups SO(2), SO(3)
A special orthogonal group, , consists of all n x n orthogonal matrices with determinant +1. For instance, is the group of all 2D rotation matrices
where ranges over all real values (usually taken mod , but it is effectively a continuous parameter). is the group of all 3D rotation matrices with determinant 1. These groups are compact Lie groups and are extremely important in 2D and 3D geometry, robotics, and computer vision (where orientation and rotation in space is crucial).
Machine learning systems that involve rotational invariance in images or point clouds often revolve around or . These are infinite (and uncountable) groups, so the representation theory and analysis is necessarily more advanced than for finite groups.
Orthogonal group O(2)
is the group of all 2D orthogonal transformations, which include both rotations and reflections. Formally, is the set of all matrices such that , with determinant ±1. The subset with determinant +1 is , so is effectively reflection transformations. The presence of reflections means is also non-abelian, while is abelian (rotations commute in 2D).
In computer vision, 2D reflection might correspond to flipping an image about an axis. If a problem demands invariance or equivariance to flips and rotations of the plane, captures that entire continuous set of transformations.
Representation theory basics
One of the crowning achievements of modern algebra is representation theory, which lets one encode abstract groups as linear transformations on vector spaces. Representation theory provides a powerful handle on how to exploit symmetries and group structures in practical scenarios — especially relevant when building neural networks that need to respect certain invariances.
What is a representation?
A representation of a group on a vector space (over the real or complex field, though complex fields are more standard in representation theory) is a group homomorphism
where is the group of all invertible linear transformations on . Intuitively, each element of is mapped to an invertible linear operator on . The group homomorphism condition requires
meaning that composing two group transformations corresponds to composing the linear transformations on the vector space.
If is finite-dimensional, these linear transformations can be represented by invertible matrices, so we can think of a representation as a map from group elements to invertible matrices that respect the group operation. For infinite-dimensional scenarios (e.g., function spaces), the idea is analogous but the mathematics become more intricate.
Dimension, matrix representations, and change of basis
If has dimension , then is a invertible matrix for each . One must note that if we pick a different basis for , the matrices describing the same representation will look different, but they still represent the same homomorphism. In representation theory, a key concept is that two representations and are equivalent (or isomorphic) if there is a change of basis in that transforms one into the other.
Key types of representations
Several fundamental types of representations appear throughout mathematics and have direct analogies in ML contexts as well.
Trivial representation
The trivial representation maps every group element to the identity transformation on . In matrix form, that means for all . This representation effectively "ignores" the group structure, but it is often a building block for more interesting representations or used as a baseline reference.
Regular representation
The regular representation is especially important for finite groups. Suppose is finite with elements, and we consider (or ). Label the coordinates of a vector in this space by the elements of . Then the regular representation is defined by letting permute the coordinates according to left multiplication (or right multiplication) on the set . In other words, if is the basis vector that is 1 in position and 0 elsewhere, then
This representation is typically high-dimensional, but crucially it contains every possible irreducible representation of as a sub-representation. In ML, the regular representation can appear when we think of feature vectors indexed by group elements or data transformations.
Irreducible representations (irreps)
A representation is irreducible (or simple) if it contains no nontrivial sub-representations. By Maschke's theorem (for finite groups over and a broad class of other settings), every finite-dimensional representation of a finite group can be decomposed as a direct sum of irreducible representations (often abbreviated as irreps). This implies that understanding all irreps is paramount to understanding every possible representation of a group.
From a machine learning vantage point, irreps are sometimes viewed as the fundamental "building blocks" of any representation. For continuous groups (like , ), the classification of irreps can be more complicated but still follows a systematic theory (e.g., spherical harmonics as irreps for ).
Equivalence of representations
As mentioned, two representations are called equivalent if there is an invertible linear map such that
for all . This is effectively a change-of-basis transformation. If two representations are equivalent, they are essentially the "same" from the standpoint of group theory, just expressed in different coordinate systems.
Isomorphic vs. non-isomorphic irreps
If two irreps are equivalent, we say they are isomorphic. If they are not, we say they are non-isomorphic. Identifying all non-isomorphic irreps of a group is a major goal in classical representation theory and also emerges in advanced ML frameworks that utilize group-based transformations. Indeed, implementing group-equivariant architectures can involve selecting appropriate irreps for the group at hand so that one's network operations transform in a consistent manner.
Group actions on functions
The most ubiquitous way that a group can "show up" in a functional or ML context is by acting on functions. In classical analysis, we often think of functions as living in a vector space (like ) and letting the group elements transform the input of these functions, thus yielding a new function. This perspective eventually ties into the concept of group convolution, group Fourier transforms, and more.
Left-regular action
For a finite group , define a space of real-valued (or complex-valued) functions on by
An element can act on a function by
This is known as the left-regular action. In effect, we have a representation of on given by
One can check that holds. This is the function-space version of the regular representation described earlier.
Functions as vectors
When is finite, is isomorphic to the vector space of functions on (since a function on can be identified with a vector of length whose entries are for each ). The left-regular action then becomes a permutation of coordinates, precisely the same as the regular representation in matrix form. This viewpoint is extremely helpful when building group-equivariant layers in neural networks, where each channel or index might correspond to a group element.
Extension to continuous groups
For , , , and other continuous (Lie) groups, the same idea holds conceptually:
except now is usually defined on a continuous domain and might be a point in or on a manifold. Summations are replaced by integrals, and one has to deal with measure theory, completeness, or other functional-analytic details (which can become very rich). Nonetheless, the overarching concept — a group transforming the argument of a function — remains consistent. From the perspective of ML, if an image is a function (or for an RGB image), then a rotation in produces a new image such that .
Fourier theory on groups (Peter–Weyl theorem)
Motivation and intuition
The classical Fourier transform decomposes periodic functions on the circle ( or the unit circle in the complex plane) into sums of sines and cosines. Likewise, the discrete Fourier transform decomposes functions on a finite cyclic group . But these ideas generalize dramatically: Fourier theory on groups is the framework by which one can decompose functions into "harmonic components" that correspond to irreducible representations of the group. This is often called Non-commutative harmonic analysis when dealing with non-abelian groups.
In machine learning, especially in architectures that strive to be equivariant under group transformations, one frequently uses the fact that -equivariant operations can be expressed in the "Fourier domain" on . For finite abelian groups, this is straightforward. For more complex or continuous groups, it can be more challenging but also quite powerful. The Peter–Weyl theorem states that for compact groups, one obtains an orthonormal basis of from the matrix entries of irreps of . This is a continuous analog of the idea that sums of exponentials (sines and cosines) can be used to expand periodic functions.
Finite groups
For a finite group , the group algebra is finite-dimensional, and the decomposition of into irreps is effectively a direct sum of sub-representations. The Fourier transform on is, in that sense, mapping a function on to its coefficients in each irreducible representation's coordinate system. Formally, for a function
its group Fourier transform is a collection of matrices (one per each irreducible representation) capturing how projects onto that irrep.
An inverse Fourier transform then reconstructs from these matrix coefficients. Orthogonality relations of characters (the trace of irreps) and other representation-theoretic facts supply the explicit formula. The net result is that for a function on a finite group , one can shift from the "time domain" (or direct domain) into the "frequency domain" given by irreps — exactly paralleling classical discrete Fourier analysis on .
Continuous (compact) groups
For compact Lie groups like , , or more general , the Peter–Weyl theorem states that the matrix coefficients of irreps span an orthonormal basis for . Concretely, consider : it is isomorphic to the unit circle, so the irreps are the standard one-dimensional representations (with ). Then the expansions become familiar Fourier series in sines and cosines. For , the irreps are described by spherical harmonics on the 2-sphere in a certain manner, giving expansions in terms of spherical harmonics. All these expansions can be employed to handle integrals, convolutions, or define group-equivariant layers in neural networks.
Idea of band-limited representations
One intriguing notion in continuous groups is that of band limitation: practically speaking, one never expands a function into all irreps up to infinite dimension but instead truncates at some finite set of frequencies or representation degrees. This approach is parallel to standard Fourier series where one only keeps frequencies . In some ML contexts, especially in steerable CNNs for or spherical CNNs for , only a finite set of irreps is used, effectively applying a band-limited approach that often suffices for many tasks.
Examples
A canonical example is the Fourier series expansion on . If is a function on the circle, it can be expanded as:
where are Fourier coefficients. This is precisely the decomposition into 1D irreps of (which is abelian), and these irreps are the exponentials . For the group (which includes reflections), the expansions become more complicated because it is non-abelian and reflection changes orientation, but the underlying principle remains that the irreps or certain sub-representations can be used to expand .
Distinction between and
is isomorphic to the circle group, so it is abelian. has a reflection component, making it non-abelian. For abelian groups, all irreps are 1-dimensional. For non-abelian groups, irreps have dimension > 1 (except for some special cases), so the group Fourier transform leads to matrix-valued coefficients rather than purely scalar exponentials. This difference has profound implications in how one builds group-equivariant neural networks, because the feature spaces can become multidimensional subspaces that transform according to these non-abelian irreps.
Connecting to machine learning (theoretical preview)
Symmetries and data
In a typical supervised learning problem, we might collect a training set . If there is a group of transformations such that has the same label as (invariance) or at least transforms in a predictable manner (equivariance), then the learning algorithm can exploit this knowledge. For instance, if is a group of rotations of the plane and is a classifier, then:
- Invariance would mean for all .
- Equivariance would mean for some action of on the output space as well.
These properties reflect the notion that the group transformations do not fundamentally alter or do alter in a known manner the underlying semantic content or structure. In simpler terms, if a dog remains a dog upon rotation, then a dog classifier can be made rotation-invariant. Or if rotating a vector field in the input domain also rotates some vector features in the output domain, that is an equivariance property.
Data augmentation vs. group-based model design
One might achieve a certain invariance or robustness to transformations in a more brute-force way: data augmentation. This is where one artificially enlarges the training set by including transformations of the data. While this approach often works well in practice, it can be data-intensive and might not fully leverage the underlying group structure in a theoretically optimal way.
In contrast, group-based model design tries to encode the transformations directly into the structure of the model. Convolutional neural networks, for instance, are automatically equivariant to translations. Similarly, G\)-CNNs (Cohen and Welling, 2016) build in equivariance to a broader group (like rotations + flips, or more exotic transformations). Such an approach can lead to more parameter-efficient architectures, less reliance on extensive augmentation, and potentially better generalization to transformations unseen in the training set. This is a major reason why group theory has gained prominence in advanced ML research.
Preview of equivariance and invariance
-
Invariance: A function is invariant to if for all in and for all in . In classification tasks, the model output is the same no matter how the input is transformed, which is often desired for transformations that do not alter semantic class identity.
-
Equivariance: A function is equivariant if there is a compatible group action on the output such that . This means transformations in input space map to corresponding transformations in the feature or output space. In certain tasks (e.g., segmentation or keypoint detection), it can be crucial for the model to maintain alignment under transformations of the input.
In the upcoming part of this course (pt. 2), these notions are made operational in neural networks using the language of representation theory. One typically picks an appropriate set of irreps or representation spaces in which convolutional kernels or filters are defined, ensuring the entire network is -equivariant by design.
Foundations for part 2
The second part will dive deeper into how representation theory underpins group convolutions, as well as the concept of steerable CNNs. Steerability is a property wherein filters can be "steered" to arbitrary orientations or transformations by applying a small set of basis filters and combining their responses appropriately. This approach is deeply connected to the idea that a filter might transform according to some (usually low-dimensional) representation of the group. The subspace spanned by those basis filters forms an irrep or direct sum of irreps. Then, to get a rotated version of the filter, one applies a matrix from the representation. This is a practical manifestation of advanced group theoretic ideas.
Hence, everything from building group equivariant layers to analyzing how data transforms in these spaces relies on the fundamental knowledge covered in the present article: the concept of groups, their representations, and how they act on functions. Understanding these building blocks will make advanced group-based architectures more intuitive and reveal how they can drastically reduce the demands on data augmentation or brute-force enumerations of transformations.
Summary and transition
This article has explored the basic building blocks of group theory and representation theory, setting the stage for their application in modern machine learning contexts. By reviewing core definitions — closure, identity, inverses, associativity — and examining canonical examples like cyclic groups, dihedral groups, special orthogonal groups, and orthogonal groups, I have illustrated how symmetries and transformations naturally map onto group-theoretic structures. Representation theory then shows how to translate these symmetries into linear algebra operations, which is precisely the language of neural network layers and transformations in vector spaces. The discussion of group actions on functions further underscores how one passes from an abstract group definition to an actual operation on data.
Finally, a brief tour of Fourier theory on groups (both finite and continuous) demonstrates how these transformations can be analyzed in a "frequency domain", analogously to classical Fourier analysis but generalized to non-abelian, higher-dimensional irreps. This perspective is central to certain advanced techniques in ML, such as building group-equivariant convolutional networks with a direct tie-in to irreps and non-commutative harmonic analysis.
Moving forward, the next steps (part 2) will expand these concepts to show how they concretely translate into group-based neural architectures, including group convolutional networks, steerable CNNs, and group equivariant autoencoders. The underlying theme remains that if the problem domain has known symmetries, harnessing group theory can fundamentally reshape how a model learns and generalizes, often leading to elegant, highly structured solutions that require fewer data and exhibit greater stability against transformations. This is one of the most exciting frontiers in theoretical and applied machine learning research, as recognized by many leading publications (e.g., Cohen and Welling, ICML 2016; Weiler and Cesa, NeurIPS 2019; Kondor and Trivedi, ICML 2018; Esteves, CVPR 2018) focusing on group equivariant network design.
I recommend reflecting carefully on these theoretical underpinnings, because a deeper appreciation for the group-theoretic viewpoint unlocks a range of advanced architectural insights and fosters the creativity needed to design novel ML models aligned to domain symmetries.

An image was requested, but the frog was found.
Alt: "visual depiction of a dihedral group acting on a square"
Caption: "Illustration of the elements of D4 acting on a square, showing rotations and reflections."
Error type: missing path

An image was requested, but the frog was found.
Alt: "diagram showing group representation as matrices"
Caption: "Representations map group elements to matrices that act on a vector space."
Error type: missing path
import numpy as np
def rotate_point_2d(x, y, theta):
"""
Rotate a point (x, y) by angle theta (in radians) about the origin.
Returns the new coordinates (x', y').
"""
cos_t = np.cos(theta)
sin_t = np.sin(theta)
x_new = x*cos_t - y*sin_t
y_new = x*sin_t + y*cos_t
return x_new, y_new
def dihedral_group_actions(square_points):
"""
Given four points of a square (in CCW order),
return all the transformations from the dihedral group D4
(the symmetry group of a square).
Each transformation is represented as a list of points after transformation.
"""
transformations = []
# Rotations: 0, 90, 180, 270
for r in [0, np.pi/2, np.pi, 3*np.pi/2]:
rot_points = [rotate_point_2d(px, py, r) for (px, py) in square_points]
transformations.append(rot_points)
# Reflections: reflect over x-axis, y-axis, y=x, y=-x (for example)
# x-axis reflection
refl_x = [(px, -py) for (px, py) in square_points]
transformations.append(refl_x)
# y-axis reflection
refl_y = [(-px, py) for (px, py) in square_points]
transformations.append(refl_y)
# reflection over y=x
refl_diag1 = [(py, px) for (px, py) in square_points]
transformations.append(refl_diag1)
# reflection over y=-x
refl_diag2 = [(-py, -px) for (px, py) in square_points]
transformations.append(refl_diag2)
return transformations
# Example usage:
square = [(1,1), (1,-1), (-1,-1), (-1,1)] # corners of a square
d4_actions = dihedral_group_actions(square)
print("All D4 transformations of the square (each element is a list of points):")
for i, t in enumerate(d4_actions):
print(f"Transformation {i}: {t}")
In the snippet above, I provide a quick demonstration of how one might generate all transformations in the dihedral group (symmetries of a square) in a simple Python setting. Although this example is purely geometric, it reflects the fundamental operations that more advanced group-aware ML models will rely upon — but often in a more algebraic or representation-theoretic guise. By bridging geometry and algebra, group theory remains a unifying theme in the modern machine learning toolbox.