Machine learning model and deep learning was used to achieve correlations between calculated and experiment data

We present a specific machine learning model **to predict the stability of missense mutation in TP53 **using the example of a combination of several physical experiments in which the unfolding of P53 mutations was studied depending on the denaturant concentration.

The thermodynamic parameters of common missense mutations were determined by calculations and compared with those of the native p53 DNA-binding domain experiments.

The effect of common cancer mutations on the thermodynamic stability of wild type p53 has been well studied by urea denaturation.

The thermodynamic parameters of common missense mutations were determined by calculations and compared with those of the native p53 DNA-binding domain experiments.

The effect of common cancer mutations on the thermodynamic stability of wild type p53 has been well studied by urea denaturation.

Calculated and experimental research values

to which machine learning methods and clustering will be applied to develop a method for predicting the stability of mutant proteins

lg(cond(W))

The stability parameter is approximately equal to Kd.

lg[Kd]

calculated dissociation constant

dG(D-N)(H2O)

экспериментальная величина

TdS

measure of change in differential entropy

[Urea]50%, M

Denaturant concentration

ddG(D-N)(H2O)

experimental value, or more precisely its change compared to the wild type

Thermodynamic stability of wild-type and mutant p53 core

Data Analysis for human p53 DNA-binding domain (amino acids 94–312) and its mutants

[Thermodynamic stability of wild-type and

mutant p53 core domain]

mutant p53 core domain]

[Semirational design of active tumor suppressor p53 DNA binding domain with enhanced stability]

[Mechanism of rescue of common p53 cancer mutations by second-site suppressor mutations]

[Structures of oncogenic, suppressor and rescued p53 core-domain variants: mechanisms of mutant p53 rescue]

p53 protein denaturation curves obtained in 4 different experiments

Equilibrium denaturation of p53 core domain

Comparison of experimental data from four studies

- Combined data from four experimental studies, extended data from which will be used in our study.

Table of experimental values and calculated values for p53 mutations

T123A

V143A

H168R

G245S

R249S

V143A/N268D

G245S/N239Y

R249S/T123A

R249S/H168RT123A/H168R

R249S/T123A/H168R

V157FV157F/N235K

V157F/N235K/N239Y

R175HIS

C242SER

R248Q

R273H

Q104P

Q104HIS

A129D

A129E

A129S

M133L

D148E

D148S

T150P

Q165K

Q165E

R174K

C182S

L201P

V203A

L206S

D228E

N239Y

S260P

N268D

Q104HIS

A129D

A129E

A129S

M133L

D148E

D148S

T150P

Q165K

Q165E

R174K

C182S

L201P

V203A

L206S

D228E

N239Y

S260P

N268D

p53 mutations:

N239Y/N268D

M133L/V203A

M133L/N239T/N268D

V203A/N239Y/N268D

M133L/V203A/N239Y/N268D

The nature of the required dependencies are presented in the following diagrams.

Calculated values using our software

Experimental value

Experimental value

Corelation rate=0.9

Corelation rate=0.78

Corelation rate=-0.66

Corelation rate=-0.63

Corelation rate=0.62

Corelation rate=-0.67

The maximum correlation dependence for lg(cond(W))/[Urea]50%

The maximum correlation dependence between the calculated and experimental values under conditions of p53 protein denaturation was found between** lg(cond(W)) **and the denaturant concentration **[Urea]50%** in the region of __increased concentrations required__ for denaturation of 50% of the protein in solution. Starting with C1=2.8M and more.

**The correlation dependence between the values reached 90%**

The maximum correlation dependence for TdS/[Urea]50%

The maximum correlation dependence between the calculated and experimental values under conditions of p53 protein denaturation was found between** TdS** and the denaturant concentration **[Urea]50%** in the region of *increased concentrations required* for denaturation of 50% of the protein in solution. Starting with C1=2.8M and more.

**The correlation dependence between the values reached 78%**

General diagram of the found dependencies.

List of p53 protein mutations, the denaturation of which requires an increased concentration of [Urea]50% denaturant

Correlation between calculated lg(cond(W)) and experimental data [urea]50% at the required increased denaturant concentration

I

p53 mutations

Q104P

Q104HIS

A129S

M133L

T150P

R174K

C182S

L201P

V203A

L206S

N239Y

S260P

N268D

Q104P

Q104HIS

A129S

M133L

T150P

R174K

C182S

L201P

V203A

L206S

N239Y

S260P

N268D

Dependencies between calculated and experimental data taking into account denaturant concentration.

I

II

Various physical quantities should be taken for subsequent analysis of the correlation between experimental and calculated data.

TdS>0

TdS<-1

TdS>0

TdS<-1

Entropy change

Stability change

Q104HIS

A129D

A129E

A129S

Q165E

C182S

N268D

N239Y/N268D

R249S

R175HIS

C242SER

R273H

R249S(aver)

N268D(aver)

file name:

3.255

2.74

2.66

2.93

2.53

3.06

3.505

3.925

2.625

2.265

2.295

3.175

2.625

3.53

5.8041

5.8076

5.8001

5.8002

5.8019

5.7978

5.8061

5.8077

5.802

5.8008

5.7919

5.8002

5.802

5.8061

M133L

Q165K

R174K

V203A

N239Y

M133L/V203A

M133L/V203A/N239Y/N268D

T123A

V143A

H168R

R249S/H168R

T123A/H168R

R249S/T123A/H168R

V157F/N235K

V157F/N235K/N239Y

N239Y(aver)

file name:

3.275

2.695

3.085

3.345

3.265

3.405

4.145

3.125

2.095

2.285

2.615

2.165

2.555

2.54

2.61

3.55

**TdH (Entropy change)**

0.728543

0.940725

0.305402

1.015973

0.61455

1.742776

0.667285

0.833603

1.050555

2.416545

1.251111

3.248702

2.084351

2.237484

2.804485

0.61455

Features of the correlation between the **ddG experimental value **and the **calculated Kd,** note that we take the logarithm of the dissociation constant, so the value can go into the negative region

Calculated value

Calculated value

Experimental value

Positive region of entropy change

M133L

D148E

D148S

Q165K

R174K

V203A

D228E

N239Y

M133L/V203A

M133L/V203A/N239Y/N268D

T123A

V143A

H168R

R249S/H168R

T123A/H168R

R249S/T123A/H168R

V157F/N235K

V157F/N235K/N239Y

N239Y(aver)

D148E

D148S

Q165K

R174K

V203A

D228E

N239Y

M133L/V203A

M133L/V203A/N239Y/N268D

T123A

V143A

H168R

R249S/H168R

T123A/H168R

R249S/T123A/H168R

V157F/N235K

V157F/N235K/N239Y

N239Y(aver)

The graph represents the relationship between experimental data and the entire range of entropy change

Application of machine learning methods for data clustering

Selection of segments for correlation analysis

M133L 3.275 0.0315

D148E 3.04 0.0602

D148S 3.28 0.0963

Q165K 2.695 0.0407

V203A 3.345 0.0440

D228E 3.22 0.0603

N239Y 3.265 0.02665

M133L/V203A 3.405 0.075

M133L/V203A/N239Y/N268D

T123A 3.125 0.036149

H168R 2.285 0.104794

R249S/H168R2.6150.0542

R249S/T123A/H168R

V157F/N235K 2.54 0.097

V157F/N235K/N239Y

N239Y(aver)3.55 0.026

D148E 3.04 0.0602

D148S 3.28 0.0963

Q165K 2.695 0.0407

V203A 3.345 0.0440

D228E 3.22 0.0603

N239Y 3.265 0.02665

M133L/V203A 3.405 0.075

M133L/V203A/N239Y/N268D

T123A 3.125 0.036149

H168R 2.285 0.104794

R249S/H168R2.6150.0542

R249S/T123A/H168R

V157F/N235K 2.54 0.097

V157F/N235K/N239Y

N239Y(aver)3.55 0.026

Creativity is to discover a question that has never been asked. If one brings up an idiosyncratic question, the answer he gives will necessarily be unique as well.

Q104H 3.25 5.804

A129D 2.74 5.807

A129E 2.66 5.800

A129S 2.93 5.800

Q165E 2.53 5.801

C182S 3.06 5.797

N268D 3.505 5.806

R249S 2.625 5.802

R175HIS 2.265 5.800

C242SER 2.295 5.791

R273H 3.175 5.800

R249S(aver) 2.625 5.802

N268D(aver)3.53 5.806

A129D 2.74 5.807

A129E 2.66 5.800

A129S 2.93 5.800

Q165E 2.53 5.801

C182S 3.06 5.797

N268D 3.505 5.806

R249S 2.625 5.802

R175HIS 2.265 5.800

C242SER 2.295 5.791

R273H 3.175 5.800

R249S(aver) 2.625 5.802

N268D(aver)3.53 5.806

Various clustering methods used to automatically divide the resulting sample of points.

as can be seen from the graphs, various physical calculation data should be used to predict experimental data in the positive region of entropy change TdS;

in the region near zero we do not present correlation graphs;

in the**negative region of entropy change**, the calculated value characterizing stability lg(cond(W)) rather than disorder should be used

in the region near zero we do not present correlation graphs;

in the

positive region of the entropy change

area near zero

negative region of entropy change

Analysis of various areas of stability lg(cond(W)) of calculated data

lg(cond(W))>lg(cond(W))wt

lg(cond(W))<lg(cond(W))wt

1

2