# Spectroscopy

**`spectro.py` module**

So far, it contains only the `SpectrumSimulator` class

## Simulation of VUV spectra by TDDFT calculations

`SpectrumSimulator` is a high-level orchestration engine designed to bridge the gap between TD-DFT quantum chemical calculations and experimental laboratory data. It automates the Gaussian broadening of discrete electronic transitions, converting dimensionless oscillator strengths into physically meaningful molar absorption coefficients ($\epsilon$) and Absorbance ($A$).

* [Principle](#principle)
* [From Quantum Chemistry to UV-Vis Spectra](#from-quantum-chemistry-to-uv-vis-spectra)
* [Practical examples](#practical-examples)

---

<a id="principle"></a>

### Principle

#### Pre-requisite: the Jablonski energy diagram

The Jablonski Diagram is a conceptual energy diagram that illustrates the various electronic states of a molecule and the different pathways by which a molecule can absorb energy (usually light) and subsequently dissipate that energy. It's an indispensable tool in photochemistry and photophysics for understanding light-matter interactions.

It organizes electronic states by energy, with higher states typically having higher energy. It represents different electronic configurations, primarily focusing on singlet states (S) and triplet states (T).
- Singlet states (S0, S1, S2...): All electron spins are paired. The ground state is almost always S0.
- Triplet states (T1, T2...): Two electrons have parallel spins. These states are generally lower in energy than their corresponding singlet excited states (e.g., T1 is typically below S1).

It also shows transitions between states, under the form of vertical lines or horizontal wavy lines.
- Vertical lines usually denote absorption and emission (fast processes)
    - **Fluorescence**: Emission of a photon from an excited singlet state (S1) back to the ground singlet state (S0). It's typically fast (~10-9 to 10-7 s) and occurs at a longer wavelength than absorption (Stokes shift).
    - **Phosphorescence**: Emission of a photon from an excited triplet state (T1) back to the ground singlet state (S0). Due to the spin-forbidden nature of the transition, it's a much slower process (~10-3 to 10 s or even hours), giving rise to a long-lived afterglow

- Horizontal wavy lines denote non-radiative transitions (slower processes). These are processes where energy is lost without emitting light:
    - **Vibrational Relaxation (VR)**: Rapid energy loss within an electronic state as the molecule dissipates excess vibrational energy to its surroundings.
    - **Internal Conversion (IC)**: A non-radiative transition between electronic states of the same spin multiplicity (e.g., S2 to S1 or S1 to S0). This is also very fast.
    - **Intersystem Crossing (ISC)**: A non-radiative transition between electronic states of different spin multiplicity (e.g., S1 to T1). This involves a spin flip and is typically slower than IC.

#### Vertical transitions

In computational chemistry, a vertical transition refers to an electronic excitation that occurs without any change in the positions of the nuclei in a molecular system. This concept is rooted in the Born-Oppenheimer Approximation and the Franck-Condon Principle. A illustration is given in **Figure a)** for a transition between a singlet ground state (*S*<sub>0</sub>) to the first singlet excited state (*S*<sub>1</sub>). Because electrons are much lighter and move much faster than nuclei, an electronic transition happens so rapidly that the molecular geometry remains "frozen" during the process.

Electronic vertical transitions can be calculated in the framework of the so-called Time-Dependent Density Functional Theory (TDDFT). While standard Density Functional Theory (DFT) is designed for ground-state properties, TDDFT is a standard tool for calculating electronic excited states. It allows us to determine:
- **Vertical Transition Energies** (*T<sub>e</sub>*): The energy required to excite an electron from the ground state to an excited state.
- **Oscillator Strengths** (*f*): A dimensionless quantity that indicates the probability (intensity) of the transition.

In the gas phase or solution, electronic transitions are not infinitely sharp, as shown by the vertical red transition in **Figure b)**. They are broadened by vibrational effects and solvent interactions. The `SpectrumSimulator` models this by applying a Gaussian convolution to each transition, with a default standard deviation related to the spectral bandwidth &sigma; = 0.3 eV. This matches standard international practices (like those used in GaussView), ensuring that the generated spectra are comparable to those generated by other tools.

<div style="display: flex; gap: 20px; align-items: flex-end; justify-content: center; max-width: 800px; margin: 0 auto;">
  <figure style="flex: 1; text-align: center; margin: 0;">
    <img src="../_static/spectra/S0-S1-FranckCondon-classik.svg" alt="FranckCondon" style="width: 300px;">
    <figcaption style="margin-top: 8px;"><b>Figure a)</b>. Franck-Condon principle within the Born-Oppenheimer approximation applied to the ground state to first singlet excited state electronic transition</figcaption>
  </figure>
  <figure style="flex: 1; text-align: center; margin: 0;">
    <img src="../_static/spectra/TDDFT-Spectrum.svg" alt="Spectrum" style="width: 400px;">
    <figcaption style="margin-top: 8px;"><b>Figure b)</b>. TDDFT calculation and relation with en experimental band (cyan curve)</figcaption>
  </figure>
</div>


<br> 
<div class="rqE">

In electronic states calculations, it is essential to distinguish between computational efficiency and absolute accuracy. While TDDFT is the most widely used tool for large-scale vertical transitions, it is often viewed as a "semi-quantitative" method due to its inherent approximations.

Here is a comparison between TDDFT and Wave Function Theory (WFT) to help you understand their respective roles in calculating electronic transitions. TDDFT is the "standard" or "state-of-the-art" method for excited states because it strikes a nearly perfect balance between computational speed and predictive power for most organic molecules.
- **TDDFT**
    - Speed: Its computational cost scales similarly to ground-state DFT. This allows it to handle systems with hundreds of atoms (e.g., proteins, polymers, large dyes).
    - The "State-of-the-Art" status: For "well-behaved" valence transitions (like π → π<sup>*</sup>), TDDFT usually gives errors within 0.1–0.3 eV, which is sufficient for interpreting experimental UV-Vis spectra.
- **WFT**. When high precision is required, or when TDDFT fails (like in CT or double excitations), we turn to Wave Function Theory (WFT) methods. Examples: CC2, CCSD(T), CASPT2, or ADC(2).
    - Why they are more accurate: These methods systematically solve the Schrödinger equation by adding correlation effects explicitly. They can handle complex transitions (like double excitations) that TDDFT misses entirely.
    - But these methods are extremely expensive. While TDDFT can handle 10<sup>2</sup> atoms, a high-level WFT method like CCSD(T) might be restricted to 10–20 atoms because the calculation time increases "exponentially" with the number of electrons. 
</div>

---

<a id="FromQuantumChem2VUV"></a>

### From Quantum Chemistry to UV-Vis Spectra

#### The Nature of TDDFT Results

Time-Dependent Density Functional Theory (TDDFT) provides a "stick spectrum". For each electronic transition i, it calculates:
- An excitation energy, *T<sub>e,i</sub>*
- An oscillator strength, *f<sub>i</sub>*

In practice, we do not observe sharp lines (sticks) but broad bands. This broadening is due to:
- Vibrational effects: each electronic state has associated vibrational and rotational levels.
- Environmental effects: interactions with solvent molecules or thermal fluctuations.

To simulate this envelope, we broaden each "stick" using a Gaussian distribution function. This function distributes the intensity (all *f<sub>i</sub>*) over a range of wavenumbers around the central transition *T<sub>e,i</sub>*

#### Basic equations

$$\varepsilon_{i}(\bar{\nu})=\varepsilon_{i}^{\mathrm{max}}\exp\left[-\left(\frac{\bar{\nu}-\bar{\nu}_{i}}{\sigma}\right)^{2}\right]$$

It is demonstrated, in a [Gaussian whitepaper](https://gaussian.com/uvvisplot/), that the previous equation becomes, in the cgs units system:

$$\varepsilon_{i}(\bar{\nu})=\frac{\sqrt{\pi}e^{2}N_{\mathrm{A}}}{1000\ln(10)c^{2}m_{e}}\frac{f_{i}}{\sigma}\exp\left[-\left(\frac{\bar{\nu}-\bar{\nu}_{i}}{\sigma}\right)^{2}\right]$$

where:
- $\varepsilon_i$ is the molar absorption coefficient, in units of L∙mol$^{-1}$∙cm$^{-1}$
- $f_i$ is the dimensionless oscillator strength
- $\sigma$, the half-width of the Gaussian band at each $\varepsilon^{\mathrm{max}}$, is in cm$^{-1}$
- $c$ is the speed of light in cm∙s$^{-1}$
- $N_\mathrm{A}$ is the Avogadro number, in mol$^{-1}$
- $m_e$ is the electron mass in g
- $e$ is the elementary charge, in electrostatic unit of charge (esu)

In most cases, there will be more than one electronic excitation in the region of interest. The overall spectrum is obtained from the sum of all the individual bands:

$$\varepsilon(\bar{\nu}) = \sum_{i}^{N}\varepsilon_{i}(\bar{\nu})$$

---

### Practical examples

As explained above, the simulator does not simply draw lines; it simulates the vibronic envelope by applying a Gaussian convolution to each vertical transition.

Using `pathlib` for robust file management, the following examples demonstrate the "one-line" power of the `SpectrumSimulator` API.

#### A. Data Input Format

The `SpectrumSimulator` is designed to parse `.dat` files generated from Gaussian TD-DFT calculations. To be processed correctly, your input file must follow this specific tabular structure:

| iState | State | lambda/nm | fe | S^2 |
| :--- | :--- | :--- | :--- | :--- |
| 1: | Singlet-A | 386.34 | 0.0612 | 0.000 |
| 2: | Singlet-A | 353.53 | 1.0775 | 0.000 |
| 3: | Singlet-A | 325.70 | 0.0048 | 0.000 |

- iState: Transition index (e.g., 1:)
- State: Electronic state symmetry (e.g., Singlet-A)
- lambda/nm: Vertical excitation wavelength in nanometers
- fe: Oscillator strength (dimensionless)
- S^2: Spin contamination (usually 0.000 for singlets)

#### B. Basic TD-DFT Simulation

This workflow reads a `.dat` file, simulates the spectrum, and saves a 300 DPI figure

```python
import pyphyschemtools as t4pPC
from pyphyschemtools import SpectrumSimulator

t4pPC.centerTitle("Basic simulation. Superposotion of the vertical transitions and the similated spectrum")

file = get_ppct_data("Spectra/DBA-syn-syn-TDDFT_ethanol_ExcStab.dat")
fig = "fig_examples/Spectra/DBA-syn-syn-TDDFT_ethanol.png"
title = "DBA in ethanol"

sim = SpectrumSimulator(plotWH=(8,6))
sim.plotEps_lambda_TDDFT(file,lambdamin=200, lambdamax=800, titles=title, tP=10, ylog=False, save_img=fig)

```

<figure>
<img src="../_static/spectra/DBA-syn-syn-TDDFT_ethanol.png" alt="TDDFT spectrum of DBA" style="width:600dpi;">
<figcaption>TDDFT spectrum of DBA</figcaption>
</figure>

#### C. Comparative Absorbance Analysis

The `plotAbs_lambda_TDDFT` method allows for the superposition of the simulated spectra of multiple chemical species, applying the Beer-Lambert law

 ```python
t4pPC.centerTitle("Superpose several simulated spectra")

files = ["DBA-syn-syn-TDDFT_ethanol_ExcStab.dat", "DBA6-syn-syn-TDDFT_ethanol_ExcStab.dat"]
files_TDDFT = [get_ppct_data(f, main_folder="data_examples/Spectra") for f in files]
titles=["DBA in ethanol","DBA6 in ethanol"]
fig = "fig_examples/Spectra/DBAxx_ethanol.png"

sim = SpectrumSimulator(plotWH=(9,6),fontSize_axisLabels=12,fontSize_axisText=12,fontsize_peaks=10,fontSize_legends=8)
lambdaMin = 200
lambdaMax = 500
Amax = 1.9
C0theo = [2e-5, 2.5e-5]
sim.plotAbs_lambda_TDDFT(files_TDDFT, C0theo, lambdaMin, lambdaMax, Amax, titles, save_img=fig)

```

<figure>
<img src="../_static/spectra/DBAxx_ethanol.png" alt="TDDFT simulated spectra" style="width:600dpi;">
<figcaption>TDDFT simulated absorbance spectra of DBA and DBA6</figcaption>
</figure>

## `QuantitativeAnalysis` Class Documentation

The `QuantitativeAnalysis` class is a core component of the `pyphyschemtools` library. It provides a complete workflow for analytical chemistry—from raw data ingestion to the final quantification of unknown samples with statistical confidence.

### Initialization & Data Ingestion

The class can be initialized manually with arrays or loaded directly from spreadsheets. It features built-in support for both `.xlsx` (Excel) and `.ods` (LibreOffice) formats using `pathlib` for robust path management.

```python
from pyphyschemtools.spectro import QuantitativeAnalysis

# Loading data from a LibreOffice Calc file
analysis = QuantitativeAnalysis.from_excel(
    file_path="calibration_data.ods", 
)
```

If your data is already in Python variables, you can instantiate the class directly without using a file. This is ideal for quick tests or manual data entry.

```python
from pyphyschemtools.spectro import QuantitativeAnalysis

# Experimental data as lists
concentrations = [1.0, 2.0, 3.0, 4.0, 5.0]
absorbance = [0.12, 0.25, 0.38, 0.49, 0.62]

# Direct initialization
analysis = QuantitativeAnalysis(
    x=concentrations, 
    y=absorbance, 
    x_label="Concentration (mol/L)", 
    y_label="Absorbance"
)
```

### Method Fitting & Diagnostics

The `fit_linear()` method performs an Ordinary Least Squares (OLS) linerar regression ($y = ax + b$). It automatically generates a formatted summary table in the console to validate the analytical method.

```python
analysis.fit_linear()
```

**Key Statistical Indicators**:
* $R^2$: Evaluates the linearity of the response.
* MAE (Mean Absolute Error): Represents the average accuracy in signal units.
* LOD (Limit of Detection): Lowest concentration detectable ($3\sigma / \text{slope}$).
* LOQ (Limit of Quantification): Lowest concentration quantifiable ($10\sigma / \text{slope}$).

### Advanced Visualization

The `plot_calibration()` method creates a dual-panel figure designed for scientific reporting.
- Top Panel: Displays experimental points, the regression line, and a 95% Prediction Interval (shaded area) representing the uncertainty of the model.
- Bottom Panel: Displays Residuals ($y_{exp} - y_{calc}$). A random distribution of residuals confirms the validity of the linear model, while a pattern (e.g., a "U" shape) suggests non-linearity

```Python
# Generate and save the diagnostic plot
analysis.plot_calibration(save_img="calibration_report.png")
```

### Sample Quantification

Once the model is validated, the `predict()` method allows you to convert experimental signals into concentrations. It is designed to handle replicates (e.g., triplicates) and provides the mean result with an associated uncertainty.

```python
# Measuring an unknown sample in triplicate
signals = [943, 986, 1021]
result = analysis.predict(signals, sample_name="Paracetamol_Batch_A")
```

### Summary of Methods

| Method | Description |
| :--- | :--- |
| `from_excel()` | **Classmethod**: Loads data from `.xlsx` or `.ods` files. |
| `fit_linear()` | Performs OLS regression and prints the diagnostic table. |
| `plot_calibration()` | Generates the dual-panel fit and residual plot. |
| `predict()` | Calculates concentration and uncertainty from raw signals. |