About

Column

Probability Distributions with Julia

Probability Distributions with Julia:

Beta Distribution - Example

** Here’s a worked example of the Beta distribution using Julia.**

First, let’s install the required packages if you don’t already have them:

using Pkg
Pkg.add("Distributions")
Pkg.add("Plots")

Now, let’s dive into the example:

Import the necessary packages:

using Distributions
using Plots

Define the parameters of the Beta distribution:

Let’s choose parameters \(\alpha = 2\) and \(\beta = 5\).

alpha = 2
beta = 5

Create a Beta distribution object:

beta_dist = Beta(alpha, beta)

Calculate the mean and variance:

mean_beta = mean(beta_dist)
variance_beta = var(beta_dist)
println("Mean: ", mean_beta)
println("Variance: ", variance_beta)

Generate random samples from the Beta distribution:

Let’s generate 1000 random samples.

samples = rand(beta_dist, 1000)

Plot the probability density function (PDF):

We will plot the PDF of the Beta distribution and the histogram of the samples.

x = range(0, 1, length=100)
pdf_values = pdf.(beta_dist, x)

plot(x, pdf_values, label="Beta(2, 5) PDF", linewidth=2)
histogram!(samples, normalize=true, alpha=0.5, label="Sample Histogram")

The complete code looks like this:

using Distributions
using Plots

# Define the parameters of the Beta distribution
alpha = 2
beta = 5

# Create a Beta distribution object
beta_dist = Beta(alpha, beta)

# Calculate the mean and variance
mean_beta = mean(beta_dist)
variance_beta = var(beta_dist)
println("Mean: ", mean_beta)
println("Variance: ", variance_beta)

# Generate random samples from the Beta distribution
samples = rand(beta_dist, 1000)

# Plot the probability density function (PDF)
x = range(0, 1, length=100)
pdf_values = pdf.(beta_dist, x)

plot(x, pdf_values, label="Beta(2, 5) PDF", linewidth=2)
histogram!(samples, normalize=true, alpha=0.5, label="Sample Histogram")

This code will generate a plot showing the probability density function of the Beta(2, 5) distribution and a histogram of 1000 random samples drawn from this distribution.

Power Law Distributions

Column

Power Law Distributions

# Power Law Distribution Tutorial in Julia

# Load necessary packages
using Distributions, Plots, StatsPlots

# 1. Defining a Power Law Distribution

# The Distributions.jl package doesn't have a built-in Power Law distribution.
# We can define it ourselves.  Here's a simple implementation:

struct PowerLaw{T<:Real} <: ContinuousUnivariateDistribution
    xmin::T  # Minimum value (must be > 0)
    alpha::T # Exponent (must be > 1 for a proper distribution)
end

# Define a method to generate random samples.
import Base.rand
function rand(rng::AbstractRNG, d::PowerLaw)
    u = rand(rng)
    return d.xmin * (1 - u)^(-1/(d.alpha - 1))
end

# Define other necessary methods for Distributions.jl compatibility (optional but good practice).
import Distributions: pdf, cdf

function pdf(d::PowerLaw, x::Real)
    if x < d.xmin
        return 0.0
    else
        return (d.alpha - 1) * d.xmin^(d.alpha - 1) * x^(-d.alpha)
    end
end

function cdf(d::PowerLaw, x::Real)
    if x < d.xmin
        return 0.0
    else
        return 1 - (d.xmin / x)^(d.alpha - 1)
    end
end

# Example usage:
xmin = 1.0  # Minimum value
alpha = 2.5 # Exponent
dist = PowerLaw(xmin, alpha)

# 2. Generating Random Samples

n_samples = 10000
samples = rand(dist, n_samples)  # Generate random samples

# 3. Visualizing the Distribution

# Histogram
histogram(samples, 
          bins = : FreedmanDiaconis, #Good automatic bin selection
          normalize = true,
          xlabel = "x",
          ylabel = "Probability Density",
          title = "Power Law Distribution (xmin=$xmin, α=$alpha)")

# Overlay the theoretical PDF (for comparison)
x_vals = range(xmin, maximum(samples), length = 200)
plot!(x_vals, pdf.(dist, x_vals), 
      label = "Theoretical PDF", 
      linewidth = 2, 
      color = :red)


# Log-log plot (essential for power laws)
histogram(samples,
          bins = :log, # Logarithmic bins
          normalize = true,
          xlabel = "x (log scale)",
          ylabel = "Probability Density (log scale)",
          title = "Power Law Distribution (Log-Log Scale)",
          xscale = :log10, yscale = :log10)

plot!(x_vals, pdf.(dist, x_vals), 
      label = "Theoretical PDF", 
      linewidth = 2, 
      color = :red)

# 4. Parameter Estimation (Optional)

# Estimating parameters from data is more complex for power laws.
# There are specialized methods, but a simple (though not always best)
# approach is to use a linear fit on the log-log histogram.

# (Simplified example – more robust methods exist)
log_x = log.(samples)
log_counts = log.(fit(Histogram, samples, :log).weights)

# Fit a line to the log-log data (you would typically exclude the tail of the distribution).
# Note: This is a simplified example and might not be the most accurate estimation method.
#       More robust methods like maximum likelihood estimation (MLE) are recommended.
#       Packages like `PowerLaw` may be helpful for more advanced analysis.
#       However, MLE for power-law is a complex topic and requires specialized treatment.

# 5. Key Considerations for Power Laws

# - **Minimum Value (xmin):**  The minimum value is a crucial parameter.  Accurately determining xmin is important.
# - **Exponent (alpha):**  The exponent determines the shape of the distribution.
# - **Log-log plots:**  Essential for visualizing and analyzing power laws.  The power law appears as a straight line on a log-log plot.
# - **Parameter Estimation:**  Can be challenging.  Simple linear fits on log-log plots are a starting point but may not be accurate.  MLE is preferred but has its own complexities.  Be careful about binning effects.
# - **Heavy Tails:**  Power laws have "heavy tails," meaning extreme values are more likely than in many other distributions.  This has important implications for analysis and simulation.

println("Tutorial Completed!")

Explanation and Key Improvements:

Custom PowerLaw Type: Since Distributions.jl doesn’t have a built-in power law, we define our own PowerLaw struct and implement the necessary methods (rand, pdf, cdf). This makes it behave like other Distributions.jl distributions.
Random Number Generation: The rand method uses the inverse transform sampling method, which is the standard way to generate power-law random variates.
PDF and CDF: The pdf and cdf methods are implemented for completeness and to allow comparison with the theoretical distribution.
Visualization:
- The code now includes a regular histogram and a log-log histogram, which is absolutely essential for visualizing and confirming that your data follows a power law. Power laws appear as straight lines on a log-log plot.
- The theoretical PDF is overlaid on the histogram for comparison.
- FreedmanDiaconis is used for automatic bin selection which is generally a good choice.
Parameter Estimation (Simplified): The code provides a very basic example of parameter estimation using a linear fit on the log-log histogram. Important: This is a simplified example and may not be the most accurate method. Power-law parameter estimation is a complex topic. Maximum likelihood estimation (MLE) is generally preferred, but it’s more involved. The PowerLaw package may be helpful.
Key Considerations: The tutorial emphasizes the importance of xmin, alpha, log-log plots, and the challenges of parameter estimation for power laws.
Clearer Comments and Structure: The code is well-commented and organized, making it easier to follow.
Package Loading: Explicitly loads the necessary packages.

This improved tutorial provides a more complete and informative introduction to working with power-law distributions in Julia, covering definition, random number generation, visualization, and basic (though simplified) parameter estimation. It also highlights the important considerations when dealing with power laws. Remember to install the necessary packages using ] add Distributions Plots StatsPlots in the Julia REPL.

Extreme Value Theory

Column

Extreme Value Theory (EVT) with Julia

Extreme Value Theory (EVT) is primarily concerned with modeling the tails of probability distributions, focusing on extreme events. Here are the main probability distributions used in EVT:

1. Generalized Extreme Value (GEV) Distribution:

Central to EVT: The GEV distribution is the cornerstone of EVT for modeling block maxima (the largest or smallest values in a set of data).
Unified Family: It combines three distributions into a single family:
- Gumbel: Used for unbounded distributions (e.g., maximum temperatures).
- Fréchet: Used for heavy-tailed distributions (e.g., financial losses).
- Weibull: Used for bounded distributions (e.g., material strength).
Parameters:
- μ (location): Shifts the distribution.
- σ (scale): Controls the spread.
- ξ (shape): Determines the tail behavior and which of the three types it is.

2. Generalized Pareto Distribution (GPD):

Peaks Over Threshold (POT): The GPD is used to model exceedances over a high threshold (values above a certain level).
Focus on Tails: It specifically models the behavior of the distribution beyond the threshold.
Parameters:
- σ (scale): Controls the spread of the tail.
- ξ (shape): Determines the tail’s heaviness.

Key Points:

Extreme Value Theorem: The foundation of EVT states that under certain conditions, the distribution of extreme values (block maxima or exceedances) will converge to one of these distributions (GEV or GPD).
Choice of Distribution: The choice between GEV and GPD depends on the specific problem and data:
- GEV: For modeling block maxima.
- GPD: For modeling peaks over thresholds.
Parameter Estimation: Accurate estimation of the parameters is crucial for making reliable predictions about extreme events.

Applications:

These distributions are widely used in various fields:

Finance: Risk management, estimating extreme losses.
Insurance: Assessing the probability of large claims.
Environmental Science: Modeling extreme weather events, floods, droughts.
Engineering: Assessing the reliability of structures under extreme loads.

Understanding these distributions and their properties is essential for applying EVT effectively and making informed decisions about extreme events.