QML Power Spectrum Estimation Pipeline
The QML (Quadratic Maximum Likelihood) power spectrum estimation pipeline provides unbiased, minimum-variance power spectrum estimates from observed data. This script orchestrates the complete analysis workflow from data input through statistical inference, for any spin-0 or spin-2 field on the sphere.
Overview
The main_qml.py script implements a comprehensive QML estimation pipeline
optimized for power spectrum analysis of spherical fields. Key features include:
Unbiased power spectrum estimation with exact error propagation
Support for auto-correlation and cross-correlation analyses
MPI parallelization for computational efficiency
Integration with Fisher information matrix results
Comprehensive likelihood data for cosmological parameter fitting
The pipeline handles both simulated and observed data, providing production-ready power spectrum estimates suitable for cosmological inference.
Mathematical Background
QML Estimator Theory
For each multipole \(\ell\), the QML estimator provides the minimum-variance unbiased estimate:
where: - \(\mathbf{x}\) is the observed data vector (e.g. temperature and polarization maps) - \(\mathbf{E}_\ell = \mathbf{C}^{-1} \frac{\partial \mathbf{C}}{\partial C_\ell} \mathbf{C}^{-1}\) is the estimator matrix - \(\mathbf{C}\) is the total covariance matrix (signal + noise + systematics) - \(\mathbf{S}\) is the signal covariance matrix
Statistical Properties
The QML estimator achieves optimal statistical properties:
Unbiasedness: \(\langle \hat{C}_\ell \rangle = C_\ell\) for all multipoles
Minimum Variance: Saturates the Cramér-Rao bound given by Fisher information
Gaussianity: Asymptotically Gaussian for large sky coverage
Correlation Structure: Full covariance matrix available for likelihood analysis
Usage
Command Line Interface
# Single-process execution (small analyses)
python main_qml.py
# MPI parallel execution (recommended)
mpirun -n 16 python main_qml.py
# With custom configuration file
python main_qml.py --config qml_analysis.yaml
Script Workflow
The execution pipeline consists of the following stages:
Environment Initialization: Set up MPI communicator and process management
Configuration Parsing: Load and validate YAML configuration parameters
Data Loading: Read sky maps, noise covariance, and auxiliary data files
Geometry Setup: Define pixel selection, multipole binning, and field combinations
Matrix Preparation: Compute or load estimator and normalization matrices
QML Computation: Execute parallel power spectrum estimation
Result Gathering: Collect and synchronize estimates across MPI processes
Output Generation: Save power spectra, covariance matrices, and likelihood data
Validation: Perform sanity checks and quality assessment
Configuration Format
The script requires a YAML configuration file specifying analysis parameters:
# QML Analysis Configuration
# Data specification
mapfiles1: ["map_T.fits", "map_Q.fits", "map_U.fits"]
covmatfile1: "noise_covariance.bin"
maskfile: "analysis_mask.fits"
# Cross-correlation (optional)
do_cross: false
mapfiles2: ["map2_T.fits", "map2_Q.fits", "map2_U.fits"] # if do_cross=true
covmatfile2: "noise_covariance2.bin" # if do_cross=true
# Analysis parameters
nside: 512
lmin: 2
lmax: 2000
nbands: 40 # Number of bandpowers
nfields: 3 # T, Q, U
physical_labels: ["T", "Q", "U"]
# Fiducial model
clfile: "fiducial_power_spectra.txt"
beamfile: "beam_transfer_function.fits"
# Output files
outfile: "qml_power_spectra.dat"
outfilelhood: "likelihood_data.dat"
outfilecov: "bandpower_covariance.dat"
# Computation options
compute_fisher: true # Include Fisher information
save_intermediate: false # Save estimator matrices (large files)
Implementation Features
Core Algorithms
The pipeline implements several sophisticated algorithms:
Iterative Matrix Methods: Conjugate gradient for large matrix operations
Bandpower Binning: Optimal multipole averaging for noise reduction
Cross-Spectrum Handling: Proper treatment of cross-correlation estimates
Numerical Stability: Regularization and conditioning for robust computation
Performance Optimization
Key optimization strategies include:
Distributed Memory: MPI parallelization across estimator computation
Memory Management: Efficient handling of large covariance matrices
I/O Optimization: Parallel file operations and memory mapping
Computational Kernels: Optimized linear algebra operations (BLAS/LAPACK)
Quality Control
The pipeline includes comprehensive validation:
Input Verification: File format, dimensions, and consistency checking
Numerical Stability: Matrix conditioning and eigenvalue analysis
Statistical Tests: Unbiasedness checks and chi-squared validation
Convergence Monitoring: Iterative solver diagnostics
Output Products
The format of power spectrum outputs depends on the normalization mode selected
when calling get_power_spectra(mode=...). The default "deconvolved" mode
returns estimates of the true \(C_\ell\); "decorrelated" returns
uncorrelated bandpowers with unit-variance errors; "convolved" returns raw
estimates with the window matrix for forward-modelling comparisons. See
Power Spectrum Estimation for full details.
Primary Outputs
Power Spectrum File (qml_power_spectra.dat):
# QML Power Spectrum Estimates
# Column 1: Effective multipole
# Column 2: TT power (μK²)
# Column 3: EE power (μK²)
# Column 4: BB power (μK²)
# Column 5: TE cross-power (μK²)
# Column 6: TB cross-power (μK²)
# Column 7: EB cross-power (μK²)
2.5 5247.3 4.2 0.015 -142.1 0.002 0.001
7.5 4923.8 12.8 0.023 -129.4 0.003 0.002
...
Likelihood Data (likelihood_data.dat):
Contains full statistical information for cosmological parameter fitting: - Bandpower central values and uncertainties - Complete covariance matrix between bandpowers - Window functions and effective multipoles - Calibration and beam uncertainties
Covariance Matrix (bandpower_covariance.dat):
Full covariance matrix enabling proper likelihood construction for parameter fitting.
Diagnostic Outputs
Analysis Log: Detailed execution information including: - Timing and memory usage statistics - Convergence diagnostics for iterative methods - Quality metrics and validation results - Parameter summary and configuration record
Intermediate Products (optional): - Estimator matrices for reanalysis - Normalization factors and window functions - Pixel weights and effective areas
Examples
Standard Auto-Correlation Analysis
"""
Example: QML power spectrum estimation from CMB maps
"""
import subprocess
import yaml
# Configure standard analysis
config = {
'mapfiles1': ['planck_T.fits', 'planck_Q.fits', 'planck_U.fits'],
'covmatfile1': 'planck_noise_cov.bin',
'maskfile': 'planck_analysis_mask.fits',
'nside': 1024,
'lmin': 2,
'lmax': 2500,
'nbands': 50,
'clfile': 'planck2018_fiducial.txt',
'beamfile': 'planck_beam.fits',
'outfile': 'planck_qml_spectra.dat',
'compute_fisher': True
}
# Save and run
with open('qml_config.yaml', 'w') as f:
yaml.dump(config, f)
subprocess.run(['mpirun', '-n', '16', 'python', 'main_qml.py',
'--config', 'qml_config.yaml'])
Cross-Correlation Analysis
"""
Example: Cross-correlation between two CMB datasets
"""
# Configure cross-correlation
cross_config = {
'do_cross': True,
'mapfiles1': ['dataset1_T.fits', 'dataset1_Q.fits', 'dataset1_U.fits'],
'mapfiles2': ['dataset2_T.fits', 'dataset2_Q.fits', 'dataset2_U.fits'],
'covmatfile1': 'dataset1_noise.bin',
'covmatfile2': 'dataset2_noise.bin',
'maskfile': 'common_mask.fits',
'nside': 512,
'lmin': 2,
'lmax': 2000,
'nbands': 30,
'outfile': 'cross_correlation_spectra.dat'
}
Multi-Scale Analysis Pipeline
# Progressive resolution analysis
# Low-resolution initial estimate
python main_qml.py --config configs/qml_nside256.yaml
# Medium resolution
mpirun -n 8 python main_qml.py --config configs/qml_nside512.yaml
# High resolution (production)
mpirun -n 32 python main_qml.py --config configs/qml_nside1024.yaml
Systematic Error Studies
"""
Example: Impact of analysis choices on power spectrum estimates
"""
# Test different multipole ranges
lmax_values = [1500, 2000, 2500, 3000]
for lmax in lmax_values:
config = base_config.copy()
config['lmax'] = lmax
config['outfile'] = f'qml_spectra_lmax{lmax}.dat'
with open(f'config_lmax{lmax}.yaml', 'w') as f:
yaml.dump(config, f)
subprocess.run(['mpirun', '-n', '16', 'python', 'main_qml.py',
'--config', f'config_lmax{lmax}.yaml'])
Performance Guidelines
Computational Scaling
QML estimation scales with system size as:
Memory: \(O(N_{ell} \times N_{pix}^2)\) for estimator matrices
Computation: \(O(N_{ell} \times N_{pix}^2)\) for quadratic forms
Communication: \(O(N_{ell} \times N_{pix})\) for MPI coordination
Resource Requirements
Typical resource needs for different analysis configurations:
Configuration |
Memory (GB) |
Compute Time |
Storage (GB) |
Recommended Cores |
|---|---|---|---|---|
Quick test (nside=128) |
~4 |
~1 hour |
~1 |
4-8 |
Standard (nside=512) |
~64 |
~8 hours |
~10 |
16-32 |
High-res (nside=1024) |
~512 |
~48 hours |
~100 |
32-64 |
Ultimate (nside=2048) |
~4000 |
~2 weeks |
~1000 |
64-128 |
Optimization Strategies
For large-scale analyses:
Iterative Solvers: Use preconditioned conjugate gradient for matrix operations
Block Processing: Process multipole ranges in blocks to reduce memory
Checkpointing: Save intermediate results for fault tolerance
Load Balancing: Distribute estimator computation optimally across processes
Validation and Quality Assessment
Statistical Tests
The pipeline includes several validation checks:
- Unbiasedness Test:
Compare estimates from multiple simulations to input spectra
- Chi-squared Test:
Verify statistical consistency of bandpower residuals
- Cross-Validation:
Compare estimates from different sky regions or data splits
- Null Tests:
Analyze difference maps and systematic-dominated modes
Error Analysis
Comprehensive error characterization includes:
Statistical Errors: From Fisher information matrix and Monte Carlo
Systematic Errors: From pipeline variations and null tests
Calibration Uncertainties: Propagated through covariance matrices
Theoretical Modeling: Impact of fiducial spectrum assumptions
Troubleshooting
Common Issues and Solutions
- Convergence Problems:
Check covariance matrix conditioning
Verify input map and mask consistency
Examine numerical precision requirements
Consider iterative solver tolerances
- Memory Limitations:
Reduce nside or multipole range
Use more MPI processes for distribution
Enable out-of-core computation modes
- Unexpected Results:
Validate input files and units
Run on known simulations for comparison
Check for systematic contamination
Verify calibration and beam corrections
- Performance Issues:
Profile MPI communication patterns
Optimize I/O and memory access patterns
Consider alternative solver algorithms
Balance computation vs. communication
See Also
Power Spectrum Estimation : Spectra class API documentation
Fisher Matrix Analysis Pipeline : Fisher matrix analysis pipeline
Mock Data Generation : Mock data generation utilities
cosmoforge.cosmocore.harmonic: Spherical harmonic utilities