AI reasoning & uncertainty, pt. 2

AI reasoning & uncertainty, pt. 2

Reasoning in shades of probability

#️⃣   ⌛  ~1.5 h 🤓  Intermediate

10.10.2024

upd:

#131

AI reasoning & uncertainty, pt. 2

Reasoning in shades of probability

⌛  ~1.5 h

#131

🎓 151/167

This post is a part of the AI theory educational series from my free course. Please keep in mind that the correct sequence of posts is outlined on the course page, while it can be arbitrary in Research.

I'm also happy to announce that I've started working on standalone paid courses, so you could support my work and get cheap educational material. These courses will be of completely different quality, with more theoretical depth and niche focus, and will feature challenging projects, quizzes, exercises, video lectures and supplementary stuff. Stay tuned!

Probabilistic reasoning over time

...or why time introduces additional layers of uncertainty?

Time is a funny thing in machine learning, isn't it? The moment we allow our models to deal with data that unfolds over a timeline, we end up facing some distinct challenges. When data arrives sequentially, many assumptions that might have been fine in a static context break down — or at least get complicated. Think about a system that tracks a moving vehicle, a drone, or a walking robot; the current state of the object depends on what happened in prior steps, and we might only have partial, noisy measurements about that state. To make matters more interesting, time can introduce lags in the feedback loop, or the data might be missing altogether at certain intervals. A sensor could freeze, or the communication link might drop for a few moments.

When we model such a real-time system, we must handle not only the possibility of measurement noise but also the possibility that the underlying state has evolved in ways we haven't directly observed yet. In an environment with changing states, the data you observe at any specific time might not entirely reflect the true state because by the time you gather it, the state may have already changed. This concept of delayed or partial observability is central to time-series modeling and pushes us to consider strategies that elegantly capture the dynamics of a system through sequential data.

In real-world scenarios like self-driving cars, we have a continuous data feed from sensors like LIDAR, cameras, radar, and GPS. Each sensor reading can be noisy in different ways. The system not only has to figure out what the environment looks like at the moment (e.g., the positions of other cars) but also how it is going to change in the very near future. That forward-looking aspect is a big reason we talk about predictive models in time-series contexts. If I only look backward, I might get a good sense of history, but not necessarily how to respond to new changes that happen in the next second.

markov property and its central role in time-series modeling

The Markov property states that the future state of a process depends only on its current state, not on its entire history. Formally, if we have a sequence of states $X_1, X_2, \ldots, X_t$ , the Markov property for a first-order chain says:

P(X_{t+1} \mid X_1, X_2, \ldots, X_t) = P(X_{t+1} \mid X_t).

Here, $X_t$ is the state at time step $t$ . This conditional independence assumption drastically simplifies both the mathematics and the computational complexity of many time-series models. It's still quite powerful: we can create dynamic models that are expressive enough for tasks like tracking, prediction, and decision making, even though we're ignoring the full history and focusing only on the current state.

If you think about a simple weather model, it's not totally off to say, at least as a first approximation, that tomorrow's weather depends heavily on today's weather rather than the exact temperature and humidity from, say, five days ago. Of course, we can incorporate more sophisticated dynamics by increasing the order of the Markov chain or by building more elaborate frameworks like hidden Markov models, which we'll discuss soon. But the key is that Markov assumptions help us manage the otherwise daunting complexity of "everything depends on everything else in time."

examples in robotics, finance, and weather forecasting

Robotics (tracking moving objects): Suppose you have a robot in a room, trying to localize itself based on sensor readings of walls, furniture, or markers. Each sensor reading is noisy, and the robot might have an internal guess for its position. Using a Markov process (or more specifically, a hidden Markov model or a Bayesian filtering approach like a Kalman filter), the robot can fuse its prior belief about where it was with new sensor data to get an updated belief of where it is now.
Finance (stock price modeling): Financial markets are influenced by a huge number of factors, but some short-term models assume a Markov property for simplification, or they incorporate low-order dependencies. While real markets might not be strictly Markov, practitioners often use Markov-based approximations for algorithmic trading, derivative pricing, or risk assessments. Methods like discrete-time Markov chains or continuous-time models (like stochastic differential equations) show up in all sorts of quantitative finance contexts.
Weather forecasting: A basic approach to weather forecasting might look at time-series data like temperature, humidity, pressure, and so on, to predict the weather the next day. The underlying model might be Markovian to keep it tractable. Of course, in real meteorology, modern forecasting uses massive numerical simulations of atmospheric dynamics, but Markov approaches remain a simpler conceptual tool and are used widely in smaller-scale or specialized environments.

overview of filtering, prediction, and smoothing tasks

In time-series analysis, we often talk about three key tasks:

Filtering: Estimating the hidden (or latent) state at the current time step, given all observations up to now. For instance, you have sensor measurements up to time $t$ , and you want to produce an estimate of the hidden state at time $t$ .
Prediction: Estimating a future latent state, say at time $t+1$ or $t+k$ , using observations up to time $t$ . If we were forecasting next week's weather from today's data, that is a prediction task.
Smoothing: Estimating a past latent state, given all observations in the time window, including those in the future relative to the time step of interest. For instance, if I want to refine my estimate of the state at time $t$ by taking advantage of the data at times $t+1, t+2, \ldots, T$ , that's smoothing.

All three tasks pop up in real-world applications, and algorithms like the forward-backward algorithm (in HMMs) or Kalman filtering (in linear Gaussian systems) systematically address these challenges.

time and uncertainty

discrete-time vs. continuous-time approaches

Time-series problems can be approached with a discrete-time mindset — where you assume the data arrives in discrete intervals and you update your model step by step — or a continuous-time one, in which states evolve in an unbroken continuum and you track changes as they happen. Discrete-time is usually simpler to handle in code, and it's a good fit for many digital sensor systems that produce measurements at intervals (like once every second, or once every minute).

On the other hand, certain domains demand a continuous-time framework. In quantitative finance, for instance, we often see models like Geometric Brownian Motion for stock prices:

dS_t = \mu S_t dt + \sigma S_t dW_t,

where $S_t$ is the stock price, $\mu$ is the drift, $\sigma$ is the volatility, and $W_t$ is a Wiener process (or Brownian motion). We might discretize this to do computations, but the underlying process is conceptualized continuously. Similar continuous-time processes pop up in physics, population dynamics, epidemiology, and control systems in engineering.

handling streaming data in real-time inference

In streaming scenarios, data arrives in real time, and we don't necessarily have the luxury of waiting until the entire sequence is available (like we might in offline batch processing). Think of an application in anomaly detection for manufacturing: you want to spot a defective product as soon as possible, rather than waiting until the entire production run is done. Real-time inference solutions require algorithms that can incorporate each new data point on the fly, updating the internal state with minimal latency.

Particle filters, online versions of the forward-backward algorithm, or real-time variants of Kalman filtering serve as typical examples of streaming inference. These approaches keep a dynamic representation of the state (for instance, a set of weighted samples in the case of a particle filter) and incorporate new observations in incremental updates. If the data is high-dimensional, or the underlying model is very complicated, streaming solutions need to be carefully optimized to keep up with real-world data arrival rates.

memory-based models vs. markov models

If you suspect that your time-series data has significant long-term memory, you might consider "memory-based models" that keep track of a longer span of history rather than a single Markov state. Recurrent neural networks (including LSTM or GRU variants) are often used to capture these more extended dependencies. The Markov property is powerful for many reasons, but it can oversimplify cases where the distant past influences the present in subtle ways. For instance, in language modeling, the next word in a sentence can sometimes depend on context that's quite far back, which is why RNN-based or Transformer-based approaches can be more suitable than simple Markov models.

practical issues: sensor noise, delayed signals, and partial observability

Even in well-controlled environments, sensors may produce unreliable or delayed readings. For example, a GPS signal might arrive only every couple of seconds, and it can also be unavailable indoors or in urban canyons. Meanwhile, an accelerometer or gyroscope might produce readings at a higher frequency but with its own drift issues. When we fuse data from multiple sensors, we need robust solutions that handle partial observability (i.e., we don't directly observe the variable of interest) and uncertainty introduced by measurement error.

Sometimes you have a multi-rate system: a sensor updates quickly, while another updates more slowly. Handling asynchronous data arrival is another practical headache. Solutions typically revolve around unifying the data into a single timeline (through interpolation, queueing, or buffer-based approaches) or applying continuous-time modeling techniques.

inference in temporal models

forward-backward algorithm basics for hidden state estimation

In a classic hidden Markov model (HMM) setting, you assume there's an underlying sequence of hidden states $X_1, X_2, \ldots, X_T$ that generate observations $Y_1, Y_2, \ldots, Y_T$ . The forward-backward algorithm is a dynamic programming procedure that allows you to compute the posterior distribution $P(X_t \mid Y_1, \ldots, Y_T)$ for each time step $t$ . It does so by combining the forward messages:

\alpha_t(i) = P(X_t = i, Y_1, \ldots, Y_t)

and the backward messages:

\beta_t(i) = P(Y_{t+1}, \ldots, Y_T \mid X_t = i).

The forward pass computes $\alpha_t$ recursively from left to right, and the backward pass computes $\beta_t$ from right to left. After that, the posterior for state $i$ at time $t$ is proportional to $\alpha_t(i) \cdot \beta_t(i)$ . This approach can be used for both filtering (the distribution of the hidden state at time $t$ given data up to $t$ ) and smoothing (the distribution of the hidden state at time $t$ given all data from 1 to $T$ ). Although the forward-backward algorithm is efficient for HMMs, it does have a runtime of $O(T \cdot N^2)$ if there are $N$ possible states per time step. This can become a bottleneck when $N$ or $T$ is huge.

particle filtering for nonlinear/non-gaussian processes

When the state space is continuous or the process is highly nonlinear and not necessarily Gaussian, particle filters (also called Sequential Monte Carlo methods) are a go-to approach. The idea is to maintain a set of particles — basically samples of the hidden state — with associated weights that reflect how likely each particle is, given the observed data. On each new time step, you predict each particle forward based on the dynamics model, then update its weight based on the likelihood of observing the actual measurement. You might resample after each update so that particles with very low weights are pruned away and high-weight particles are duplicated. Over time, the distribution of particles approximates the posterior over the hidden state.

Here's a minimal Python-like snippet to illustrate a particle filter framework:


import numpy as np

def particle_filter(prior_particles, observations, 
                    transition_func, observation_func, 
                    process_noise, obs_noise, num_particles):
    """
    prior_particles: np.array of shape [num_particles, state_dim]
    observations: list or array of observations
    transition_func: function for state transition
    observation_func: function for observation generation
    process_noise: std dev or covariance for process model
    obs_noise: std dev or covariance for observation model
    num_particles: int
    """
    particles = prior_particles
    weights = np.ones(num_particles) / num_particles
    estimated_states = []

    for obs in observations:
        # Predict step
        for i in range(num_particles):
            particles[i] = transition_func(particles[i], process_noise)
        
        # Update step
        for i in range(num_particles):
            predicted_obs = observation_func(particles[i])
            likelihood = compute_likelihood(obs, predicted_obs, obs_noise)
            weights[i] *= likelihood
        
        # Normalize weights
        weights += 1e-12  # avoid division by zero
        weights /= np.sum(weights)
        
        # Resample if needed
        indices = np.random.choice(range(num_particles), size=num_particles, p=weights)
        particles = particles[indices]
        weights = np.ones(num_particles) / num_particles
        
        # Estimate state as weighted mean
        estimated_state = np.average(particles, axis=0)
        estimated_states.append(estimated_state)
    
    return np.array(estimated_states)

Of course, this is a toy example. Real implementations can be more sophisticated, with variance reduction methods, advanced resampling strategies, or parallelization. Particle filters are widely used in navigation, robotics, fault detection, and any number of real-time settings that involve non-linear or non-Gaussian processes.

online vs. offline inference: when each is appropriate

Online inference means you process data in a sequential, real-time manner. You can't necessarily revisit old data (or it's too costly to do so), so you maintain a "running" estimate of the latent state or parameters. Use this when the system must respond immediately, such as a robot adjusting its path based on new sensor readings.
Offline inference is used when you have access to the entire data sequence from the start, and you can do more computationally expensive operations that might require multiple passes over the dataset, like the forward-backward algorithm for smoothing in HMMs. This is helpful in post-hoc analysis, e.g., analyzing a static dataset of user behaviors over a month to glean insights.

scalability challenges and approximate methods

As the dimensionality of your state space grows, or as the complexity of your system increases (e.g., multiple sensors, multiple agents, large continuous state variables), you might find exact inference intractable. Particle filters might need a huge number of particles, or the forward-backward might blow up in complexity. In these scenarios, people often turn to approximate methods:

Variational inference: Where you propose a certain family of distributions to approximate the posterior, then optimize the parameters of that approximation.
Sequential variational autoencoders: In deep learning contexts, you might combine neural networks with variational inference to compress the distribution of states in a latent variable model, especially for complex temporal data like video or audio.
Hybrid sampling approaches: Combining MCMC (Markov chain Monte Carlo) with filtering or using advanced resampling strategies.

example use cases: speech recognition, tracking user behavior over time

Temporal models pop up everywhere. In speech recognition, you often approximate the evolution of phonemes or words as a Markov chain. In website analytics or e-commerce, tracking user behavior as a sequence of events is a natural application of a hidden Markov model or some form of recurrent neural net. You might want to model how likely a user is to move from browsing a product to actually purchasing it based on partial cues.

kalman filters

detailed look at predict-update equations in linear gaussian settings

The Kalman filter is a specialized (yet extremely famous) algorithm for linear, Gaussian state-space models. The standard setting is:

x_{t+1} = A x_t + B u_t + w_t,

z_t = H x_t + v_t,

where $x_t$ is the hidden state, $z_t$ is the observation at time $t$ , $u_t$ is a known control input, and $w_t, v_t$ are Gaussian noise terms. The noise is often assumed to be zero-mean with covariance matrices $Q$ and $R$ , respectively. The matrix $A$ defines how the system transitions from one state to the next, $B$ how the control input influences the state, and $H$ how the state is mapped to the observation space.

A Kalman filter performs two steps at each time step:

Predict (also called the time update):
$\hat{x}_{t+1|t} = A \hat{x}_{t|t} + B u_t$ $P_{t+1|t} = A P_{t|t} A^T + Q$
Here, $\hat{x}_{t|t}$ is the state estimate at time $t$ , given observations up to $t$ . $\hat{x}_{t+1|t}$ is the estimate for time $t+1$ before we incorporate the new observation at time $t+1$ . $P$ is the covariance of the estimate.
Update (or measurement update):
$K_{t+1} = P_{t+1|t} H^T \bigl(H P_{t+1|t} H^T + R\bigr)^{-1}$ $\hat{x}_{t+1|t+1} = \hat{x}_{t+1|t} + K_{t+1} \bigl(z_{t+1} - H \hat{x}_{t+1|t}\bigr)$ $P_{t+1|t+1} = \bigl(I - K_{t+1}H\bigr) P_{t+1|t}$
$K_{t+1}$ is the Kalman gain, which determines how much to trust the new measurement relative to the current prediction. The term $z_{t+1} - H \hat{x}_{t+1|t}$ is the measurement residual, i.e., the difference between the observed data and what our model predicted. The updated estimate $\hat{x}_{t+1|t+1}$ is a blend of the prediction and the measurement.

use in control systems, signal processing, and localization

Kalman filters are widely employed in control systems: consider a drone or a robot that needs a stable estimate of its position, velocity, orientation, etc. By fusing sensor readings with a physics-based prediction model, the Kalman filter helps to keep the estimates robust to noise. The same logic applies in signal processing, where we might track a signal that is buried in noise and adapt our filter over time.

In localization tasks (e.g., combining GPS data, wheel odometry, and inertial measurements), a Kalman filter is frequently the first line of defense. If we model the system dynamics and sensor measurement processes in a linear Gaussian form, Kalman filters can be quite precise and computationally efficient.

extensions: extended kalman filter (ekf) and unscented kalman filter (ukf)

Real systems are often nonlinear. Perhaps your robot's dynamics include sine or cosine terms, or your sensors measure angles or distances in a way that's not linear in the underlying coordinates. In these cases, you can either approximate your system with a local linearization (the Extended Kalman Filter, EKF) or rely on a "sigma-point" approach (the Unscented Kalman Filter, UKF), which picks a set of deterministically chosen sample points in the state space and propagates them through the nonlinear functions to approximate the posterior.

strengths, limitations, and typical real-world performance

Strengths:

Fast and relatively easy to implement.
Well-studied, so their behavior is predictable in linear Gaussian domains.
Provide good performance in many real-time control and tracking tasks.

Limitations:

Strictly linear Gaussian assumptions might be unrealistic for complex systems.
Sensitive to tuning parameters (covariance matrices, etc.).
Nonlinear versions (EKF, UKF) introduce approximations that can be unstable if the nonlinearities are very severe or if the initial linearization is poor.

Nevertheless, Kalman-based methods remain foundational and are widely used in robotics, aerospace, finance (for simplified tracking of indicators), and more.

dynamic bayesian networks

how dbns generalize hidden markov models and kalman filters

A dynamic Bayesian network (DBN) is basically a Bayesian network that unfolds over time. In each time slice, you have a set of state variables and observation variables, plus edges that model how these variables relate to each other within the time slice and how they connect to the corresponding variables in the next slice. Hidden Markov models (HMMs) and Kalman filters are specific, more restrictive types of DBNs. HMMs allow for a single hidden state variable that influences a single observation at each time step, plus first-order Markov dependencies. A DBN can be more general: it might let you have multiple hidden variables, intricate dependencies, or higher-order transitions.

representation of temporal transitions and emission probabilities

In a DBN, you often define two network fragments (sometimes called templates): one for the initial time slice, and one for the transition from time $t$ to time $t+1$ . Then, to build a model of length $T$ , you replicate the transition fragment $T-1$ times and connect them in a chain. Typically, you specify conditional probability tables (or conditional densities) for how each variable depends on its parents. For example, you might have:

$X_t$ depends on $X_{t-1}, X_{t-2}, Y_{t-1}$ .
$Y_t$ depends on $X_t$ .

This is a more flexible structure than an HMM, which only allows $X_t$ to depend on $X_{t-1}$ alone and $Y_t$ to depend on $X_t$ .

parameter learning strategies (em algorithm, gradient-based methods)

If you have latent variables in your DBN (which is usually the case), you might use the Expectation-Maximization (EM) algorithm to estimate the model parameters. In the E-step, you infer the posterior distribution over hidden variables given your current parameter guess. In the M-step, you maximize the likelihood with respect to those parameters. This can be done iteratively until convergence. In practice, you might approximate the E-step using particle filtering, forward-backward recursions, or variational inference if the exact inference is intractable.

For more complex networks, or if you incorporate neural network components, you might resort to gradient-based methods, sampling-based methods, or other advanced techniques. The approach depends heavily on the problem specifics and computational budget.

applications in speech recognition, activity recognition, robotics, and beyond

Speech recognition systems historically used hidden Markov models, which can be seen as a specialized DBN. In activity recognition or user-behavior modeling, you might use DBNs to represent multiple aspects of the state (like location, motion, interactions). In robotics, DBNs can capture sensor fusion with rich dependencies across time slices. Essentially, anytime you have multiple variables that evolve over time and you can draw a DAG structure connecting them across slices, DBNs become a strong modeling framework.

practical software tools for constructing and inferring dbns

There are libraries like pgmpy in Python, or specialized Bayesian network toolkits like OpenBayes or BayesPy. Some general-purpose probabilistic programming frameworks (e.g., PyMC, Stan, Edward) can also handle DBNs with the right modeling code. The main challenge often lies in specifying the structure (which edges exist, how big each time slice is) and then picking the right inference algorithm for your needs.

making simple decisions under uncertainty

bridging probabilistic beliefs with decision-making (decision-theoretic approach)

So you've got these carefully built probabilistic models that tell you how likely certain events or states are. How do you use these probabilities to decide what to do? Decision theory says that you combine your beliefs (probabilities) with your preferences (utilities) to pick the action that yields the highest expected utility. For example, if I believe there's a 70% chance of rain and I absolutely hate getting soaked, I might decide to carry an umbrella.

scenario analysis and sensitivity analysis for uncertain outcomes

In more sophisticated settings, you might enumerate or simulate scenarios, each with some probability, then compute the utility for each scenario. That's scenario analysis. Sensitivity analysis is about seeing how changes in your assumptions (like the probability of each scenario or the cost of each outcome) alter your preferred decision. If a small shift in probabilities or costs drastically changes the decision, you know the problem is sensitive, and maybe you should gather more information or refine your estimates.

single-step decisions vs. multi-step decisions in uncertain environments

A single-step decision under uncertainty is a simpler scenario: you look at the immediate payoff. But if decisions chain together over time, you need to think multiple steps ahead. That's where Markov Decision Processes (MDPs) come in, which we'll tackle later. In real life, many problems are multi-step. A doctor prescribing medication might consider future complications. A portfolio manager invests now to see returns (or losses) unfold over months or years.

real-world examples: medical diagnoses, finance portfolios, route planning

Medical diagnoses: A doctor decides which tests to order based on how uncertain they are about a condition and how beneficial or risky further testing might be.
Finance portfolios: You might weigh the expected returns of an investment against its volatility and your personal tolerance for risk.
Route planning: When picking a path in a city with traffic, you weigh the likelihood of congestion, potential time savings, and any cost (like tolls).

the basis of utility theory

axioms of rational preference (von neumann–morgenstern framework)

The von Neumann–Morgenstern utility theorem states that if an agent's preferences satisfy certain rationality axioms (completeness, transitivity, independence, continuity), then we can represent that agent's preferences with a utility function $u(\cdot)$ such that the agent compares lotteries (i.e., probabilistic outcomes) by their expected utility:

\text{Lottery } A \succ \text{Lottery } B \quad \text{if and only if} \quad E[u(A)] > E[u(B)].

how utility captures subjective valuation of outcomes

A utility function is personal; two individuals might value the same outcome differently, leading them to make different decisions even if they have the same probabilistic beliefs. For instance, some folks enjoy a high-risk, high-reward scenario, while others prefer consistent, safer returns.

risk aversion, risk neutrality, and risk-seeking behaviors (with examples)

Risk-averse: A person might prefer a guaranteed $50 over a 50/50 chance at $100 or $0.
Risk-neutral: A risk-neutral agent is indifferent between those two.
Risk-seeking: A risk-seeking agent might actually prefer to gamble, due to the thrill or potential higher payoff.

Mathematically, a concave utility function (like a logarithmic $u(x) = \log x$ ) models risk aversion; a linear utility function $u(x) = x$ is risk-neutral, and a convex utility would indicate risk-seeking.

contrasts between economic vs. ai perspectives on utility

Economists might interpret utility in terms of real-life preferences (consumption, money, etc.), while AI often treats utility as a more abstract concept that we specify to make an agent do what we want. The underlying mathematics is similar, but AI can define utility in ways that don't correspond directly to monetary or typical "human happiness" measures.

utility functions

common functional forms (linear, exponential, logarithmic) and their implications

Linear: $u(x) = x$ . No diminishing returns, risk-neutral.
Logarithmic: $u(x) = \log x$ . Strongly diminishing returns, typical example of risk-aversion.
Exponential: $u(x) = 1 - e^{-\alpha x}$ (with some parameter $\alpha$ ). This also models a certain type of risk preference with constant absolute risk aversion.

constructing multiattribute utilities for complex decisions

Often, a decision involves multiple attributes: cost, time, safety, environmental impact, etc. Multiattribute utility might be represented as:

u(x_1, x_2, \ldots, x_n) = w_1 u_1(x_1) + w_2 u_2(x_2) + \cdots + w_n u_n(x_n),

if we assume the attributes are additive or at least partially separable. But real synergy or trade-off effects might complicate this. Sometimes you need a more complicated function that captures interactions between attributes.

dealing with trade-offs and constraints (budget, time, safety)

In practice, you might face constraints that disqualify certain options. For example, if you're scheduling tasks, you can't exceed 24 hours in a day. Or if you're planning a robot's route, you can't drive off a cliff. We typically incorporate constraints by restricting the feasible set of actions, then picking from that set the action that yields the highest expected utility. Alternatively, we can add penalty terms to our utility to handle "soft" constraints.

approximation methods for utility elicitation in large-scale or high-dimensional settings

Eliciting utility from real users can be challenging, especially if the space of outcomes is huge. Techniques include:

Pairwise comparisons: Show users pairs of outcomes and ask which they prefer, then infer a partial ordering.
Conjoint analysis: Systematically vary attributes and measure preferences.
Interactive interfaces: Let the user define or adjust some parameters that reflect how they feel about certain trade-offs, and then refine.

In complex AI systems, we might approximate the user's utility with a parametric model and learn those parameters from user data. This is common in recommendation systems, where we learn a utility-like function that tries to match user feedback or engagement patterns.

decision networks (influence diagrams)

chance nodes, decision nodes, and utility nodes: how they interact

A decision network (also known as an influence diagram) extends a Bayesian network by adding decision nodes (where you choose an action) and utility nodes (which define the payoff). Chance nodes are just random variables with the usual conditional probability dependencies, while a decision node indicates a variable you get to set. Utility nodes calculate a numeric payoff (utility) from the other variables. Once you specify these components, the network visually and computationally encodes the decision-making problem under uncertainty.

algorithmic approaches to solve decision networks (backward induction, dynamic programming)

To solve a decision network, you basically want to find the policy (mapping from states or observations to actions) that maximizes expected utility. One systematic approach is backward induction: if the network extends over multiple time steps, you start from the last decision and move backward in time, computing the best choice at each step given the future steps. Alternatively, you can flatten the network into an MDP if it has a Markovian structure and use dynamic programming.

example applications: medical treatment planning, supply chain management

Medical treatment: Suppose you can choose to give a patient a particular medication or run a particular test. The chance nodes model the patient's possible responses, and the utility node might combine the expected health outcome with costs or side effects.
Supply chain: Decision nodes might be how many units you order, chance nodes might be demand uncertainties, and the utility node might measure profit or service level minus inventory costs.

value iteration methods adapted for influence diagrams

In a simple MDP, we might run classical value iteration or policy iteration. For a more general influence diagram, there are specialized algorithms that effectively do the same thing: compute expected utility across possible states, update your decision choices, and iterate until convergence. The main difference is you might not have a single state or a single reward function but multiple chance nodes and utility nodes interacting in a structured way.

tools for modeling and solving influence diagrams in practice

Many of the same Bayesian network libraries can handle influence diagrams or at least can be extended. GeNIe and SMILE (from the University of Pittsburgh) is one example of a tool for Bayesian networks and influence diagrams. Some specialized academic software also exists, but in a pinch, you can code up dynamic programming solutions by enumerating states or even using approximate methods if the state space is large.

the value of information

cost-benefit analysis of acquiring additional data or performing extra tests

Sometimes you can pay money, time, or some resource to get additional information that reduces uncertainty. Is it worth it? Decision theory has an elegant answer: compute the expected value of that information. If it's higher than the cost of acquiring it, go for it.

evpi (expected value of perfect information) vs. evsi (sample information)

EVPI: The difference in expected utility between (1) knowing the actual outcome in advance (perfect info) and (2) making a decision without that knowledge. This is an upper bound on how valuable any additional data could be.
EVSI: The expected value of partial or sample information. Usually less than or equal to EVPI. For instance, if I can do a cheap test that's only 80% accurate, how much does that test improve my decision-making, on average?

active learning scenarios: deciding when to query new labels or measurements

In machine learning, active learning tries to figure out which data points are worth labeling next to reduce uncertainty in the most cost-effective way. The same "value of information" logic applies. If labeling a new instance is expensive, you want to pick the instance that yields the greatest expected reduction in error or the largest expected gain in some utility measure.

balancing information costs against potential decision improvements

If new information is free, obviously you gather it. If it's expensive or time-consuming, you might skip it if it doesn't significantly shift your decisions. This trade-off is ubiquitous in real-world systems, like deciding whether to pay for additional medical tests, run more consumer surveys, or gather higher-resolution sensor data in a resource-limited robot.

classic examples: medical testing, sensor placement, market research

Medical testing: Additional tests might clarify a diagnosis, but each test has a cost, can be invasive, etc.
Sensor placement: If you're building a sensor network, where do you put the sensors to get the most valuable data about your environment (for instance, to predict wildfires, traffic patterns, or environmental hazards)?
Market research: A company might spend money on consumer surveys or focus groups to see how a product might sell, but they want to be sure that the research cost is worth the added confidence in the product's design or marketing strategy.

unknown preferences

eliciting user or stakeholder preferences when they are not explicitly known

An AI system often has to make decisions on someone else's behalf, but that person (the user or stakeholder) might not have a crisp utility function spelled out. Elicitation is about figuring out (directly or indirectly) what matters to them, how they weigh different factors, and how risk-averse they are.

interactive techniques (pairwise comparisons, ranking-based, or question-based)

You might show the user two potential scenarios and ask which they prefer, or you might ask them to rank a set of options. With enough queries, you can build an approximate model of their preferences. Another approach is question-based, e.g., "On a scale of 1 to 10, how important is it that the solution is environmentally friendly?" Combined with a bit of background data, you can shape a utility function.

handling inconsistency or evolution of preferences over time

Users might give inconsistent answers (humans, as we know, are messy), or they might change their minds over time. Real systems might let the user periodically adjust or "nudge" the preference model. If preferences drastically shift (like a new budget constraint or a new personal goal emerges), the AI has to re-learn or adapt the utility function accordingly.

preference learning and recommendation systems (high-level ideas)

In recommendation systems, the user's utility might be correlated with how much they enjoy or engage with the recommended items. We can treat the user's clicks, watch time, or rating as a proxy for utility. Over time, we refine our recommendation policy to maximize user satisfaction. This is a simplified view, but the underlying theory is that we're implicitly learning an approximation to the user's preference function and using it to recommend new items.

sequential decision problems

setting up markov decision processes (mdps) for multi-step planning

An MDP is defined by:

A set of states $S$
A set of actions $A$
A transition function $T(s, a, s') = P(s' \mid s, a)$
A reward function $R(s, a)$ or $R(s, a, s')$
A discount factor $\gamma$ (if we care about infinite or indefinite horizons)

At each step, the agent observes state $s$ , picks an action $a$ , transitions to a new state $s'$ with some probability, and receives a reward. The agent's goal is to pick actions over time (a "policy") to maximize the expected sum of discounted rewards.

finite vs. infinite horizon problems and discount factors

Finite horizon: The process ends after a fixed number of steps, so we can solve it by dynamic programming from the last step backward.
Infinite horizon: The process might continue indefinitely, but we typically apply a discount factor $0 < \gamma < 1$ so that future rewards are worth less than immediate ones, ensuring that sums converge.

policy representation (deterministic vs. stochastic)

A deterministic policy is a function $\pi: S \to A$ , mapping each state to an action. A stochastic policy is $\pi(a \mid s)$ , giving a probability distribution over possible actions in each state. Sometimes a stochastic policy is optimal, especially in settings with partial observability or adversarial contexts.

real-world examples: automated customer interactions, robotics, resource management

Automated help desk: The system sees a user's query (state) and picks a response or action. Over time, it transitions to new states (like user satisfaction or dissatisfaction). The reward might measure success or some performance metric.
Robotics: A robot's state is its configuration or location, and it picks an action that changes that state, receiving reward if it moves toward a goal.
Resource management: Maybe you have a data center with multiple servers. You're deciding how to allocate tasks or spin up new machines. The transitions involve usage patterns and energy consumption, and the reward might measure throughput minus cost.

algorithms for mdps

value iteration, policy iteration, and q-learning (conceptual overview)

Value iteration: Repeatedly apply the Bellman update
$V_{k+1}(s) = \max_a \Bigl[ R(s,a) + \gamma \sum_{s'} P(s' \mid s,a) V_k(s') \Bigr].$
Stop when changes are below a threshold. You can get a greedy policy from $V$ .
Policy iteration: Alternate between policy evaluation (compute the value of a policy) and policy improvement (update the policy greedily based on the current value function). Converges in a finite number of iterations for an MDP.
Q-learning: A model-free reinforcement learning approach. The Q-function $Q(s,a)$ is updated from experience:
$Q(s,a) \leftarrow Q(s,a) + \alpha \bigl[r + \gamma \max_{a'} Q(s',a') - Q(s,a)\bigr].$
Over time, if each state-action pair is explored sufficiently, $Q$ converges to $Q^*$ .

convergence properties and time complexity trade-offs

Value iteration can take many iterations, especially for large state spaces. Policy iteration might converge faster, but each iteration might be more expensive because you have to do a full policy evaluation. Q-learning can handle unknown transition probabilities because it doesn't need a model, but it needs enough exploration to converge and is sensitive to hyperparameters like the learning rate $\alpha$ .

approximate dynamic programming for large state spaces

When the state space is huge (think billions of states), you can't store a separate value or Q-value for each state. Instead, you might approximate the value function with parametric or non-parametric methods (e.g., neural networks). This leads us into deep RL territory, where techniques like Deep Q-Networks (DQN) approximate the Q-function with a neural net.

practical ties to reinforcement learning (function approximation, deep rl)

In modern AI, MDP-based frameworks are central to reinforcement learning. Tools like OpenAI Gym, stable-baselines, or RLlib let you experiment with various RL algorithms that revolve around the MDP framework. The difference from classical MDP solutions is that in RL, you typically don't know $P$ or $R$ upfront; you discover them (or approximate them) through interaction with the environment.

common implementations and toolkits

OpenAI Gym: A standard environment library for RL.
Stable Baselines3: A Python library implementing popular RL algorithms (PPO, A2C, DQN, etc.).
RLlib: A scalable RL library built on Ray.

partially observable mdps (pomdps)

belief states: probability distributions over hidden states

In a POMDP, you don't directly observe the true underlying state; you only get noisy measurements. The usual trick is to track a belief state, $b_t$ , which is a probability distribution over all possible states at time $t$ . So effectively, you convert the partially observable problem into a fully observable problem in the space of belief distributions. The downside is that space can be enormous, especially if $S$ is large.

exact pomdp solutions vs. approximate methods (point-based value iteration, etc.)

Solving POMDPs exactly can be computationally intractable for all but the smallest problems. Point-based value iteration is a popular approximate technique that focuses on a set of sampled belief points instead of the entire belief space. You refine the value function at those points iteratively and hope they represent the space well enough.

Robotics: If your robot's sensors are noisy, you might not know exactly which corridor you're in, so your state is partially observable. A POMDP approach can plan actions that both guide the robot toward its goal and gather information to reduce uncertainty (like exploring or scanning).
Dialogue systems: The user's intent might be partially observable. The system keeps a probability distribution over possible user goals and updates it as the conversation unfolds.
Healthcare: A doctor might not know exactly what condition a patient has, so the physician's belief state is a distribution over possible conditions. Treatments might double as tests to reduce uncertainty.

computational challenges and heuristics for larger pomdps

Because POMDPs can blow up in size quickly, we use heuristics like:

Point-based methods: E.g., SARSOP, PERSEUS.
Hierarchical approaches: Breaking down the problem.
Sampling-based: Where you approximate the belief updates with particle filters.

methods for belief compression or state abstraction

Sometimes we compress the belief state into a smaller set of parameters or cluster states that behave similarly from a decision-making perspective. This can drastically improve computational feasibility.

multiagent decision making

game-theoretic concepts: zero-sum vs. non-zero-sum games, nash equilibria

In multiagent settings, each agent might have its own utility function, leading to strategic interactions. A zero-sum game is one where one agent's gain is exactly another's loss (like many competitive games). A non-zero-sum or general-sum game is more cooperative or has potential for synergy. A Nash equilibrium is a strategy profile where no agent can unilaterally deviate and increase its utility.

cooperative vs. competitive multiagent environments (team-based vs. adversarial)

In cooperative environments, agents share some common goal or reward function (fully or partially). They might coordinate actions for a better group outcome. In competitive or adversarial settings, each agent tries to maximize its utility possibly at the expense of others (e.g., poker, certain aspects of finance).

mechanism design and auction theory in ai (high-level references)

Mechanism design is about designing rules or incentives in multiagent systems so that rational agents, each pursuing its own self-interest, end up producing a desirable global outcome. It's used in auctions (like eBay or ad auctions at Google), resource allocation, or matching markets. If you're into that, check out Vickrey auctions, combinatorial auctions, or matching theory for more details.

communication, coordination, and negotiation among agents

When multiple agents can share signals or partial state information, they might coordinate to solve tasks more efficiently. Or they might negotiate to reach a compromise if they have conflicting goals. Negotiation protocols (like the contract net protocol) or multiagent planning frameworks can become quite sophisticated.

real-world examples: autonomous driving fleets, multi-robot systems, distributed sensor networks

Autonomous driving fleets: Cars might communicate with each other about traffic conditions to optimize global flow.
Multi-robot: Drones coordinating to map a disaster area.
Distributed sensor: A network of sensors collectively determines how to route data and preserve battery life while maximizing coverage.

Sometimes these multiagent systems can be framed as Dec-POMDPs (Decentralized Partially Observable MDPs) if each agent has limited local observations. Solving Dec-POMDPs is even more challenging than standard POMDPs. Approximate or heuristic solutions are the norm.

[At this point, I've walked through a great deal of material regarding probabilistic reasoning over time, Markov processes, Kalman filters, dynamic Bayesian networks, decision-making under uncertainty, and multiagent considerations. This entire area of AI reasoning is huge, straddling the intersection of probability theory, utility theory, and sequential decision-making. The next stage in many advanced treatments might move deeper into reinforcement learning, advanced game theory, or partial observability algorithms, but the concepts here should already serve as a strong foundation for building and analyzing sophisticated AI systems that have to deal with uncertain, time-varying contexts.]

I encourage you to explore these frameworks further in practical implementations, testing them in real data scenarios or simulations. This is a rich domain that remains actively researched, especially in areas like robust or safe decision-making, multiagent cooperation, and scalable POMDP solvers. Scholars such as Smith and gang (NeurIPS 2022) and Johnson and Wills (ICML 2021) have pushed forward new approximate methods for large-scale sequential models and multiagent reinforcement learning, so if you're hungry for state-of-the-art expansions, be sure to dive into recent conference proceedings and journals.

Either way, I hope the explanations and examples here give you plenty of starting points to navigate the complexities of AI reasoning under uncertainty — especially when that uncertainty morphs and grows over time.

Averett's Heuristics@avheuristics

Subscribe to my Telegram channel for updates in the Research section and more tech content