[CHNOSZ-commits] r906 - in pkg/CHNOSZ: . vignettes
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Mon May 26 05:46:15 CEST 2025
Author: jedick
Date: 2025-05-26 05:46:14 +0200 (Mon, 26 May 2025)
New Revision: 906
Modified:
pkg/CHNOSZ/DESCRIPTION
pkg/CHNOSZ/vignettes/anintro.Rmd
pkg/CHNOSZ/vignettes/vig.bib
Log:
Revise CRISPR-Cas section of anintro.Rmd with AI assistance
Modified: pkg/CHNOSZ/DESCRIPTION
===================================================================
--- pkg/CHNOSZ/DESCRIPTION 2025-05-24 07:25:53 UTC (rev 905)
+++ pkg/CHNOSZ/DESCRIPTION 2025-05-26 03:46:14 UTC (rev 906)
@@ -1,6 +1,6 @@
-Date: 2025-05-24
+Date: 2025-05-26
Package: CHNOSZ
-Version: 2.1.0-77
+Version: 2.1.0-78
Title: Thermodynamic Calculations and Diagrams for Geochemistry
Authors at R: c(
person("Jeffrey", "Dick", , "j3ffdick at gmail.com", role = c("aut", "cre"),
Modified: pkg/CHNOSZ/vignettes/anintro.Rmd
===================================================================
--- pkg/CHNOSZ/vignettes/anintro.Rmd 2025-05-24 07:25:53 UTC (rev 905)
+++ pkg/CHNOSZ/vignettes/anintro.Rmd 2025-05-26 03:46:14 UTC (rev 906)
@@ -255,7 +255,7 @@
The `affinity()` function calculates chemical affinities over ranges of *T*, *P*, and activities:
```{r diagram, fig.cap = "Eh--pH (Pourbaix) diagram for S"}
-# Set up the C-H-N-O-S basis system with electron
+# Setup the C-H-N-O-S basis system with electron
basis("CHNOSe")
# Define aqueous sulfur species
species(c("SO4-2", "HSO4-", "HS-", "H2S"))
@@ -679,7 +679,7 @@
IS = NaCl_res$IS
m_Clminus = NaCl_res$m_Clminus
-# Set up basis species
+# Setup basis species
basis(c(metal, "H2S", "Cl-", "oxygen", "H2O", "H+"))
basis("H2S", log10(Stot))
basis("Cl-", log10(m_Clminus))
@@ -1140,7 +1140,7 @@
Some buffers constrain multiple basis species simultaneously:
```{r buffer_3}
-# Set up basis species
+# Setup basis species
basis(c("FeS2", "H2S", "O2", "H2O"), c(0, -10, -50, 0))
# Associate both H2S and O2 with the PPM buffer
basis(c("H2S", "O2"), c("PPM", "PPM"))
@@ -1156,7 +1156,7 @@
This example reproduces part of Fig. 6 in @SS95:
```{r buffer_4, fig.cap = "Equilibrium log H<sub>2</sub> fugacity for 10^-6^ activity of HCN or formaldehyde with water, 1 bar of N<sub>2</sub> and 10 bar of CO<sub>2</sub>"}
-# Set up a system in terms of gases and liquid water
+# Setup a system in terms of gases and liquid water
basis(c("hydrogen", "carbon dioxide", "nitrogen", "water"))
# Use 10 bar of CO2 and 1 bar of other gases (default)
basis("CO2", 1)
@@ -1179,7 +1179,7 @@
Redox buffers like QFM, HM, and PPM can be used as inputs for subsequent calculations:
```{r buffer_5, fig.cap = "Gold solubility at 300 °C with PPM buffer for *f*O<sub>2</sub> and *a*H<sub>2</sub>S"}
-# Set up system for gold solubility calculation
+# Setup system for gold solubility calculation
basis(c("Au", "Fe", "H2S", "H2O", "oxygen", "H+"))
# Apply PPM buffer for fO2 and H2S
basis("O2", "PPM")
@@ -1232,7 +1232,7 @@
```{r buffer_8}
# Define the KMQ buffer
mod.buffer("KMQ", c("quartz", "muscovite", "K-feldspar"), "cr", 0)
-# Set up the system
+# Setup the system
basis(c("Al2O3", "quartz", "K+", "H2O", "oxygen", "H+"))
# Set K+ molality for KCl solution
basis("K+", log10(0.5))
@@ -1596,7 +1596,7 @@
Zc_values <- canprot::Zc(aa)
col <- col_in[rank(Zc_values)]
-# Set up chemical system with H+ and e-
+# Setup chemical system with H+ and e-
basis("CHNOSe")
# Calculate affinities as a function of redox potential
a <- affinity(Eh = c(-0.6, 0), iprotein = ip)
@@ -1650,18 +1650,26 @@
normalization of protein formulas and optimizing physicochemical parameters.
The metastable equilibrium model provides a theoretical framework for predicting how chemical conditions influence relative protein abundances.
-### Comparing evolutionary branches: Groupwise relative stabilities of CRISPR-associated (Cas) proteins
+### Environmental controls on protein evolution: A case study with CRISPR-Cas systems
-CRISPR-associated (Cas) proteins have important functions in microbial immunity to viruses and in biotechnology for gene editing.
-Plotting their relative stabilities is a way of relating their evolution to environmental variables.
+Evolution doesn't happen in a vacuum – organisms and their molecular machinery must cope with changing environmental conditions over geological time.
+Just as geochemists use relative stability diagrams to predict which minerals are stable under different temperature and pressure conditions,
+we can apply similar thermodynamic principles to understand protein evolution.
+The central hypothesis is that environmental variables, particularly temperature and redox conditions (Eh),
+have shaped the amino acid compositions of proteins throughout Earth's history.
+This approach becomes especially powerful when comparing not individual proteins, but entire evolutionary lineages.
-Here, the scope of the comparison is not single molecules but rather sets of related sequences that belong to evolutionary classes or types.
-Specifically, each type of CRISPR-Cas system (numbered I--VI) is represented by various numbers of genomes in the classification presented by @MWI_20.
-This is visualized in the following diagram, showing the `r zc` and number of amino acids in the effector modules.
-(The effector module combines with CRISPR RNA (crRNA) to form the effector complex that targets a specific DNA sequence.)
-The larger size of effector modules of many Class 1 systems is associated with multiple Cas proteins, which were combined to calculate `r zc`.
+CRISPR-Cas systems are molecular scissors that bacteria and archaea use as immune systems against viruses,
+and which biotechnologists have adapted for precise gene editing.
+These systems evolved into six major types (I--VI), each representing an evolutionary branch with multiple representatives across different genomes [@MWI_20].
-```{r Cas_Zc, echo=FALSE, fig.cap = "Carbon oxidation state and size of CRISPR effector modules (Cas proteins)", cache=TRUE, small_mar = TRUE}
+Rather than comparing individual proteins, we can ask:
+which types of CRISPR-Cas systems would be most thermodynamically stable under different environmental conditions?
+The following diagram introduces the players by showing the carbon oxidation state (`r zc`) and size of the effector modules;
+the effector module combines with CRISPR RNA (crRNA) to form the effector complex that does the actual cutting work on a target DNA sequence.
+Class 1 systems tend to have larger effector modules made up of multiple proteins, which were combined to calculate `r zc`.
+
+```{r Cas_Zc, echo=FALSE, fig.cap = "Carbon oxidation state and size of CRISPR-Cas effector modules", cache=TRUE, small_mar = TRUE}
# Read data table
file <- system.file("extdata/protein/Cas/Cas_uniprot.csv", package = "CHNOSZ")
dat <- read.csv(file)
@@ -1673,7 +1681,7 @@
dat$ID <- ID
# Remove missing IDs
dat <- subset(dat, ID != "")
-# Keep proteins in effector complexes
+# Keep proteins in effector complexes - the functional units that cut DNA
dat <- subset(dat, Effector)
# Read amino acid compositions
@@ -1680,56 +1688,56 @@
aafile <- system.file("extdata/protein/Cas/Cas_aa.csv", package = "CHNOSZ")
aa <- read.csv(aafile)
-# Loop over subtypes (I-A, I-B etc.)
+# Loop over evolutionary subtypes (I-A, I-B etc.)
subtypes <- unique(dat$Subtype)
effector_aa_list <- lapply(subtypes, function(subtype) {
- # Get the IDs for this subtype
+ # Get the IDs for this evolutionary subtype
idat <- dat$Subtype == subtype
ID <- dat$ID[idat]
- # Sum the amino acid compositions of all effector proteins in this subtype
+ # Combine amino acid compositions of all effector proteins in this subtype
iaa <- aa$ref %in% ID
all_aa <- aa[iaa, ]
summed_aa <- sum_aa(all_aa)
- # Put in all gene names and IDs
+ # Store gene names and IDs for reference
summed_aa$protein <- paste(all_aa$protein, collapse = ",")
summed_aa$ref <- paste(all_aa$ref, collapse = ",")
- # Put in the subtype
+ # Label with the evolutionary subtype
summed_aa$abbrv <- subtype
- # Return the amino acid composition for this subtype
+ # Return the combined amino acid composition for this evolutionary branch
summed_aa
})
-# Make a data frame of amino acid compositions of effector proteins
+# Create data frame of amino acid compositions for each evolutionary branch
effector_aa <- do.call(rbind, effector_aa_list)
-# List each type (I to VI)
+# Identify the six major evolutionary types (I to VI)
type_names <- c("I", "II", "III", "IV", "V", "VI")
-# Find which subtypes belong to each type
+# Map subtypes to their parent types
subtype_type <- sapply(strsplit(effector_aa$abbrv, "-"), "[", 1)
-# Get class for each subtype
+# Assign evolutionary classes (1 or 2) based on system architecture
class <- ifelse(subtype_type %in% c("I", "III", "IV"), 1, 2)
-# Assign colors for classes
+# Color scheme to distinguish evolutionary classes
col1 <- hcl.colors(5, "Peach")[1:3]
col2 <- hcl.colors(5, "Purp")[1:3]
type_col <- c(
- # Class 1
+ # Class 1 - multi-protein complexes
I = col1[1], III = col1[2], IV = col1[3],
- # Class 2
+ # Class 2 - single-protein effectors
II = col2[1], V = col2[2], VI = col2[3]
)
-# Assign point size for classes
+# Point sizes to show types in each class
type_cex <- c(
I = 1, III = 1.5, IV = 2,
II = 1, V = 1.5, VI = 2
)
-# Get colors and sizes for subtypes
+# Apply colors and sizes to subtypes
subtype_col <- type_col[subtype_type]
subtype_cex <- type_cex[subtype_type]
-# Calculate Zc and number of amino acids in effector complexes
+# Calculate chemical features of effector complexes
Zc <- canprot::Zc(effector_aa)
naa <- protein.length(effector_aa)
-# Make plot
+# Plot chemical features
plot(naa, Zc, pch = class + 20, cex = subtype_cex, col = subtype_col, bg = subtype_col, xlab = "Number of amino acids", ylab = quote(italic(Z)[C]))
legend("topright", c("I ", "III", "IV"), pch = 21, pt.cex = type_cex[1:3], col = col1, pt.bg = col1, title = "Class 1 ", cex = 0.95)
legend("topright", c("II", "V", "VI"), pch = 22, pt.cex = type_cex[4:6], col = col2, pt.bg = col2, title = "Class 2", bty = "n", cex = 0.95)
@@ -1741,43 +1749,46 @@
```
</div>
-Plotting the relative stabilies of Cas effector modules from all `r length(Zc)` genomes
-(the number of points in the previous diagram) would result in a complex, hard-to-interpret diagram.
-Instead, here we visualize relative stabilities of groups, one for each of the CRISPR-Cas types (I--VI).
-This is done by first calculating the formation affinities for all effector modules, ranking them, then finding the average rank for each type.
+Each type (I--VI) represents an evolutionary branch containing multiple genome representatives – some branches have just a few members, others have more.
+This creates a methodological challenge: how do we fairly assess the relative stability of groups with unequal membership?
-The function used for this, `rank.affinity()`, includes a rescaling step to handle groups with different numbers of members.
-We can see that rescaling is necessary because the average rank of a group with one member is bounded by 1 and 42,
+The solution lies in CHNOSZ's `rank.affinity()` function, which calculates formation energies for all individual proteins,
+ranks them, then finds the average rank for each evolutionary group.
+A rescaling step ensures that groups with different numbers of members can be compared fairly.
+To see why rescaling is necessary, consider that the average rank of a group with one member is bounded by 1 and `r length(Zc)`
+(the total number of genomes in this calculation),
but the average rank of a group with three members is bounded by 2 (the average of 1, 2, and 3) and 41 (the average of 40, 41, and 42).
-Rather than representing maximum affinity as in previous diagrams, the stability fields here represent maximum average rank of affinity after rescaling.
-```{r Cas_stability, echo=FALSE, fig.cap = "Groupwise relative stabilities of Cas effector modules in different types of CRISPR-Cas systems as a function of Eh and temperature; dashed line is water stability limit", cache=TRUE}
-# Setup plot
+Rather than representing maximum affinity as in previous diagrams, the resulting stability fields represent maximum average rank of formation affinity after rescaling.
+This provides a thermodynamic framework for predicting which CRISPR-Cas types would predominate under different environmental conditions.
+
+```{r Cas_stability, echo=FALSE, fig.cap = "Groupwise relative stabilities of effector modules in different types of CRISPR-Cas systems as a function of Eh and temperature; dashed line is lower water stability limit", cache=TRUE}
+# Setup figure
par(mfrow = c(1, 2))
-# Load amino acid compositions to CHNOSZ
+# Load amino acid compositions into CHNOSZ thermodynamic database
iprotein <- add.protein(effector_aa)
-# Calculate affinities as a function of Eh and T
+# Calculate formation affinities across environmental gradients
basis("QEC+")
swap.basis("O2", "e-")
aout <- affinity(T = c(0, 120), Eh = c(-0.8, 0), iprotein = iprotein)
-# Identify types
+# Group proteins by evolutionary type for comparative analysis
types <- lapply(type_names, `==`, subtype_type)
names(types) <- type_names
-# Calculate average rank for each type
+# Calculate average rank of formation affinity for each evolutionary group
arank <- rank.affinity(aout, groups = types)
-# Make first diagram
+# Create stability diagram showing thermodynamically favored types
d <- diagram(arank, fill = type_col[type_names])
water.lines(d, lty = 2)
title("Stable types", font.main = 1)
-# Take out stable types
+# Identify and remove dominant types to reveal secondary patterns
istable <- unique(as.numeric(d$predominant))
stable_type_names <- type_names[istable]
itype_stable <- subtype_type %in% stable_type_names
-# Get metastable types and subtypes
+# Focus on metastable evolutionary branches
type_names <- type_names[-istable]
effector_aa <- effector_aa[!itype_stable, ]
subtype_type <- sapply(strsplit(effector_aa$abbrv, "-"), "[", 1)
@@ -1784,7 +1795,7 @@
types <- lapply(type_names, `==`, subtype_type)
names(types) <- type_names
-# Make second diagram
+# Generate second diagram for metastable types
iprotein <- add.protein(effector_aa)
aout <- affinity(T = c(0, 120), Eh = c(-0.8, 0), iprotein = iprotein)
arank <- rank.affinity(aout, groups = types)
@@ -1799,12 +1810,15 @@
```
</div>
-The first diagram shows the stable types.
-These were then removed to show the metastable types in the second diagram.
+The stability diagrams reveal a compelling pattern: Type III systems dominate at reducing conditions (low Eh), particularly at higher temperatures.
+This finding gains evolutionary significance when we consider that Type III was likely the first CRISPR-Cas system to evolve in Class 1 [@MWK22].
+The thermodynamic preference for reducing conditions aligns with the hypothesis that these ancient immune systems arose
+when Earth's atmosphere and oceans were more reducing than today.
-An interesting result is the relative stability of Type III effector modules at reducing conditions (low Eh).
-Notably, the Type III system was proposed to have evolved first in Class 1 [@MWK22].
-The result is therefore consistent with the possibility that amino acid compositions of Type III Cas sequences were shaped by reducing conditions on early Earth.
+The key innovation here – using `rank.affinity()` to compare groups rather than individuals –
+opens up possibilities for analyzing any system where evolutionary lineages contain multiple representatives, from enzyme families to entire metabolic pathways.
+This groupwise approach to relative stability analysis enriches the geobiologist's toolkit with methods
+for mapping the environmental niches where different protein families achieve maximum thermodynamic stability.
## Further Resources - Demos
Modified: pkg/CHNOSZ/vignettes/vig.bib
===================================================================
--- pkg/CHNOSZ/vignettes/vig.bib 2025-05-24 07:25:53 UTC (rev 905)
+++ pkg/CHNOSZ/vignettes/vig.bib 2025-05-26 03:46:14 UTC (rev 906)
@@ -838,8 +838,6 @@
pages = {67--83},
volume = {18},
doi = {10.1038/s41579-019-0299-x},
- issn = {1740-1534},
- refid = {Makarova2020},
}
@@ -852,5 +850,4 @@
chapter = {2},
pages = {13--38},
doi = {10.1002/9781683673798.ch2},
- isbn = {9781683673798},
}
More information about the CHNOSZ-commits
mailing list