Supplement to: “HaDeX: Analysis and Visualisation of Hydrogen/Deuterium Exchange Mass Spectrometry Data”

Weronika Puchała, Michał Burdukiewicz, Michał Kistowski, Katarzyna A. Dąbrowska, Aleksandra E. Badaczewska-Dawid, Dominik Cysewski, Michał Dadlez

20.06.2019

1 About HaDeX

HaDeX is a novel tool for processing, analysis, and visualization of HDX-MS experiments. HaDeX covers the final parts of the analytic process, including a comparison of experiments, quality control and generation of publication-quality figures. To make the HaDeX R package available to the less R-fluent users, we enhanced it with a comprehensive Graphical User Interface available as a HaDeX GUI. The reproducibility of the whole procedure is ensured with advanced reporting functions.

The GUI is available online: http://mslab-ibb.pl/shiny/HaDeX/ or can be installed locally on Windows systems: https://sourceforge.net/projects/hadex/files/HaDeX_setup.exe/download. Alternatively, R-fluent users can access the GUI through the HaDeX_gui() function.

This document covers the main functionalities of both the R package and the GUI.

1.1 Comparison of the existing HDX-MS software

To show the novelty of HaDeX, we compare its functionalities with other relatively new software for analysis of HDX-MS data: MEMHDX (Hourdel et al. 2016) and Deuteros (Lau et al. 2019).

We have not considered HDX Workbench (Pascal et al. 2012) as it deals with the preliminary steps of the analysis.

Web server: a software is available as a web server.

Programmatic access: analytic functionalities are documented and available from a command line.

Desktop software: a software can be installed locally.

Multi-state analysis: a software supports comparisons of more than two states.

ISO-based uncertainty: analytic functions produce ISO-compatible uncertainty intervals.

Coverage and peptide overlap: overview of experimental sequence coverage is available in a user-friendly way.

Quality control: additional information about course of the experiment.

N- and C-terminal length corrections: manual correction of sequence length.

Global visualization of deuterium uptake: deuterium uptake for different states is shown together for comparison.

Woods plot: deuteration difference between chosen states shown in the format of Woods plot.

Zooming of the Woods plot: Woods plot can be zoomed in.

Customizable label names and colors: Labels and colors on the plot can be changed by the user.

Peptide kinetics chart: Kinetics plot (deuteration change in time) are available for each peptide.

3D structure visualization: structure of the protein is visualized in 3D.

Downloadable charts: charts are downloadable, preferably in a vector format (.eps, .svg or .pdf).

Deuterium uptake download: data shown on Woods plot is downloadable (e.q. as CSV file).

Downloadable results of intermediate computations: results of intermediate computations (e.q. pure deuteration data) are downloadable.

Report generation: generates a report (e.q. in Html format) with results of the analysis with parametrization.

PyMol export: exports data to the PyMol format.

HDX Data Summary: summary of the experimental data (Masson et al. 2019).

2 HaDeX functionalities

2.1 Data import

The HaDeX web server works only on data in the DynamX datafile format (Waters Corp.). The data from other sources may be also adjusted to the format accepted by HaDeX provided it has following columns:

Although data can be imported into R using other tools, we strongly advise to rely on the read_hdx() function:

Currently, read_hdx() supports .csv, .tsv and .xls files fulfilling the data structure described above.

2.2 Computation of deuteration levels

The computation of the level of deuteration involves several pre-processing steps, all of which are described in this section. These steps are performed automatically in the GUI or by the prepare_dataset() function in the console.

2.2.1 Measured data into overall peptide mass

The results of HDX-MS measurements as given in the DynamX data files are represented as the measured mass of peptides plus proton mass to charge ratio (\(Center\)). For later use, this value has to be transformed into an overall mass of a peptide measured after specific time point from a protein in a specific state, as shown in equation 1:

\[pepMass = z \times (Center-protonMass)\tag{1}\] where:

  • \(pepMass\) - expected mass of the peptide after incubation (Da),

  • \(protonMass\) - the mass of the proton (Da),

  • \(z\) - charge of the peptide,

  • \(Center\) - experimentally measured peptide mass plus proton mass to charge ratio \(\left(\frac{m}{z}\right)\).

HDX-MS experiments are often repeated (by the rule of thumb at least three times). Thus, we aggregate the results of replicates as a weighted mean mass into a single result per peptide using equation 2:

\[aggMass = \sum_{k = 1}^{N}\frac{Inten_k}{N}{\times}pepMass_k\tag{2}\] where:

  • \(aggMass\) - weighted mean mass of the peptide (Da),

  • \(k\) - replicate index,

  • \(Inten\) - intensity,

  • \(N\) - number of replicates.

This data manipulation results from an original data structure. Each repetition of the measurement gives the data for given peptide in a given time per possible value of \(z\) as shown in the example below. We need to use the value of \(pepMass\) as shown in equation 1 but still keep the information about original measurement - that’s why we use the weighted mass.

##                 Sequence                            File z       RT  Inten
## 1 KQFEHLNHQNPDTFEPKDLDML KD_190119_gg_Nucb2_CaCl2_10s_01 3 5.396196  95419
## 2 KQFEHLNHQNPDTFEPKDLDML KD_190119_gg_Nucb2_CaCl2_10s_01 4 5.392023 194073
##     Center
## 1 901.7158
## 2 676.4348

The uncertainty of a measurement is variability associated with the precision of measuring instrumentation. We present here a novel derivation of uncertainty formulas for HDX-MS data according to the ISO guidelines (Joint Committee for Guides in Metrology 2008). Input files always encompass results of more than one measurement. We assume uncorrelatedness of replicates as they come from different samples. Therefore, we average measurements of replicates for each time point and for all protein states. Thus, we compute peptide mass uncertainty \(u\) as uncertainty for aggregate estimate using the formula for standard deviation of the mean:

\[u(x) = \sqrt{\frac{ \sum_{i=1}^n \left( x_{i} - \overline{x} \right)^2}{n(n-1)}}\tag{3}\] where:

  • \(x_{i}\) - measurement,

  • \(\overline{x}\) - mean value,

  • \(n\) - number of measurements.

After obtaining the mass of the peptide, we can compute the deuteration level depending on the chosen maximum deuteration level. The maximum deuteration can also be computed in two different ways: either as theoretical (where the maximum deuteration depends on the theoretical deuteration levels) and experimental (where the maximum deuteration is assumed to be equal to the deuteration measured at the last time point).

2.2.2 Experimental deuteration level

The experimental deuteration level is computed as the deuteration level of the peptide from a protein in a specific state and after incubation time \(t\) compared to the deuteration level measured at the start of the incubation (\(t_0\)). It yields a value for the chosen state and chosen time \(t\).

\[D = D_{t} - D_{t_0}\tag{4}\]

where:

  • \(D\) - deuteration level (Da),

  • \(D_{t_0}\) - experimentally measured deuteration at the beginning of the incubation (0 or close to 0),

The equation 4 produces only absolute deuteration levels. The computations of relative deuteration levels follows a similar logic and is normalized by the difference of deuteration between the start (\(t_0\)) and the end of the experiment (\(t_out\)) as shown in the equation 4a:

\[D = \frac{D_{t} - D_{t_0}}{D_{t_{out}} - D_{t_0}}\tag{4a}\]

All functions in the HaDeX package contain the logical parameter \(relative\) to determine if they should return absolute or relative deuteration levels.

2.2.2.1 Uncertainty calculations

We describe the methodology of the uncertainty calculations for relative deuteration levels. The uncertainty for absolute deuteration levels is computed similarly, but without scaling.

To calculate uncertainty related to functions of more than one variables (e. g., equation 4) the Law of propagation of uncertainty is defined by equation 5:

\[u_{c}(y) = \sqrt{\sum_{k} \left[ \frac{\partial y}{\partial x_{k}} u(x_{k}) \right]^2}\tag{5}\]

As the variable of interest is \(D\), we apply the general formula to the deuteration level \(D\) (equation 6):

\[u_{c}(D) = \sqrt{\sum_{k} \left[ \frac{\partial D}{\partial D_{k}} u(D_{k}) \right]^2 }\tag{6}\]

where:

  • \(k \in \{0, t, out\}\),

  • \(D_{k}\) - deuteration in \(k\) time (Da),

  • \(u(D_{k})\) - an uncertainty associated with \(D_{k}\) as standard deviation of the mean value,

Then, expanding the equation 6:

\[u_{c}(D) = \sqrt{ \left[ \frac{1}{D_{t_{out}}-D_{t_0}} u(D_{t}) \right]^2 + \left[ \frac{D_{t} - D_{t_{out}}}{(D_{t_{out}}-D_{t_0})^2} u(D_{t_0}) \right]^2 + \left[ \frac{D_{t_0} - D_{t}}{(D_{t_{out}}-D_{t_0})^2} u (D_{t_{out}}) \right]^2}\tag{7}\] As expected, the uncertainty associated with \(D_{t}\) has the biggest impact on \(u_{c}(D)\).

2.2.3 Theoretical deuteration level

As opposed to the experimental deuteration levels, theoretical deuteration level only partially depends on the experimental data. Here, the maximum deuteration level is based on a hypothetical peptide where all hydrogens were replaced by deuters, as it is shown in equation 8:

\[D = \frac{D_{t}-MHP}{MaxUptake \times protonMass}\tag{8}\]

where:

  • \(D_{t}\) - deuteration measured in a chosen time point (Da),

  • \(MHP\) - theoretical mass of the peptide (constant) (Da),

  • \(MaxUptake\) - the maximum proton uptake for the peptide (theoretical constant) (Da),

  • \(protonMass\) - mass of a proton (constant) (Da).

The absolute deuteration level is calculated as in equation 8 but without scaling (equation 8a):

\[D = D_{t} - MHP\tag{8a}\]

2.2.3.1 Uncertainty calculations

For functions of one variable uncertainty reduces to:

\[u(y) = \left| \frac{dy}{dx} u(x) \right|.\tag{9}\]

Substituting \(D\) from equation 8, we have

\[u(D) = \left|\frac{1}{MaxUptake \times protonMass} u(D_{t}) \right|\tag{10}\]

For the absolute values, \(u(D)\) is identical with \(u(D_{t})\), based on equations 8a and 9.

2.3 Difference of deuteration levels between two states

The differences of deuteration levels between two states are associated with a different level of protection of hydrogens. Therefore, we are especially interested in the differential analysis of the deuteration levels. Thus, the deuteration level in one state \((D_{2})\) is subtracted from deuteration level in the other state \((D_{1})\):

\[diff = D_{1} - D_{2}\tag{11}\]

and the uncertainty is a function of two variables (based on equation 11 and 5):

\[u_{c}(diff) = \sqrt{u(D_{1})^2 + u(D_{2})^2}\tag{12}\]

2.4 Visual data analysis

2.5 Woods plot

Woods plots show the difference between the deuteration of all peptides in two different states in a specific time point as described by equation 11. Similarly to the comparison plot, HaDeX provides both experimental and theoretical deuteration levels using either relative or absolute values:

– theoretical:

– experimental:

2.5.1 Confidence limit in Woods plot

The function calculate_confidence_limit_values() calculates confidence limit values as it is described elsewhere (Houde, Berkowitz, and Engen 2011).

## [1] -0.01619004  0.01619004

The function add_stat_dependency() returns data extended by column describing relation of a given peptide with confidence limit.

## # A tibble: 108 x 29
##    Sequence Start   End Med_Sequence frac_exch_state~ err_frac_exch_s~
##    <chr>    <int> <int>        <dbl>            <dbl>            <dbl>
##  1 VPIDID      17    22         19.5          NaN            NaN      
##  2 KTKVKGE~    23    44         33.5          NaN            NaN      
##  3 YYDEY       45    49         47              0.975          0.00984
##  4 YYDEYL      45    50         47.5            0.679          0.00549
##  5 YLRQVID     49    55         52              0.453          0.00309
##  6 YLRQVIDV    49    56         52.5            0.428          0.00341
##  7 YLRQVID~    49    57         53              0.417          0.00271
##  8 LRQVID      50    55         52.5            0.520          0.00660
##  9 LRQVIDV     50    56         53              0.450          0.00248
## 10 LRQVIDVL    50    57         53.5            0.414          0.00346
## # ... with 98 more rows, and 23 more variables: frac_exch_state_2 <dbl>,
## #   err_frac_exch_state_2 <dbl>, diff_frac_exch <dbl>,
## #   err_frac_exch <dbl>, abs_frac_exch_state_1 <dbl>,
## #   err_abs_frac_exch_state_1 <dbl>, abs_frac_exch_state_2 <dbl>,
## #   err_abs_frac_exch_state_2 <dbl>, abs_diff_frac_exch <dbl>,
## #   err_abs_diff_frac_exch <dbl>, avg_theo_in_time_1 <dbl>,
## #   err_avg_theo_in_time_1 <dbl>, avg_theo_in_time_2 <dbl>,
## #   err_avg_theo_in_time_2 <dbl>, diff_theo_frac_exch <dbl>,
## #   err_diff_theo_frac_exch <dbl>, abs_avg_theo_in_time_1 <dbl>,
## #   err_abs_avg_theo_in_time_1 <dbl>, abs_avg_theo_in_time_2 <dbl>,
## #   err_abs_avg_theo_in_time_2 <dbl>, abs_diff_theo_frac_exch <dbl>,
## #   err_abs_diff_theo_frac_exch <dbl>, valid_at_0.98 <lgl>

2.6 Kinetic plots

By the term kinetics we understand deuteration change in time. To calculate deuteration values per peptide for different time points is used function calculate_kinetics(). This function uses the calculate_state_deuteration() function.

## # A tibble: 5 x 15
##   Protein Sequence Start   End State time_chosen frac_exch_state
##   <chr>   <chr>    <int> <int> <chr> <fct>                 <dbl>
## 1 db_Nuc~ YYDEYL      45    50 gg_N~ 0.167                 0.209
## 2 db_Nuc~ YYDEYL      45    50 gg_N~ 1                     0.377
## 3 db_Nuc~ YYDEYL      45    50 gg_N~ 10                    0.741
## 4 db_Nuc~ YYDEYL      45    50 gg_N~ 25                    0.862
## 5 db_Nuc~ YYDEYL      45    50 gg_N~ 60                    0.939
## # ... with 8 more variables: err_frac_exch_state <dbl>,
## #   abs_frac_exch_state <dbl>, err_abs_frac_exch_state <dbl>,
## #   avg_theo_in_time <dbl>, err_avg_theo_in_time <dbl>,
## #   abs_avg_theo_in_time <dbl>, err_abs_avg_theo_in_time <dbl>,
## #   Med_Sequence <dbl>
## # A tibble: 5 x 15
##   Protein Sequence Start   End State time_chosen frac_exch_state
##   <chr>   <chr>    <int> <int> <chr> <fct>                 <dbl>
## 1 db_Nuc~ YYDEYL      45    50 gg_N~ 0.167                 0.164
## 2 db_Nuc~ YYDEYL      45    50 gg_N~ 1                     0.340
## 3 db_Nuc~ YYDEYL      45    50 gg_N~ 10                    0.607
## 4 db_Nuc~ YYDEYL      45    50 gg_N~ 25                    0.679
## 5 db_Nuc~ YYDEYL      45    50 gg_N~ 60                    0.790
## # ... with 8 more variables: err_frac_exch_state <dbl>,
## #   abs_frac_exch_state <dbl>, err_abs_frac_exch_state <dbl>,
## #   avg_theo_in_time <dbl>, err_avg_theo_in_time <dbl>,
## #   abs_avg_theo_in_time <dbl>, err_abs_avg_theo_in_time <dbl>,
## #   Med_Sequence <dbl>

Calculated data can be shown next to each other on the plot for comparison. To visualize kinetic data we recommend plot_kinetics() function:

– theoretical:

– experimental:

2.7 Additional tools

HaDeX provides additional tools for assessment of experiments.

2.7.1 Peptide coverage

The sequence of the protein(s) is reconstructed from the peptides from the input file. Thus, amino acids not covered by peptides are marked as X according to the IUPAC convention. The sequence is reconstructed using the reconstruct_sequence() function.

## [1] "xxxxxxxxxxxxxxxxVPIDIDKTKVKGEGHVEGEKIENPDTGLYYDEYLRQVIDVLETDKHFREKLQTADIEEIKSGKLSRELDLVSHHVRTRLDELKRQEVARLRMLIKAKMDSVQDTGIDHQALLKQFEHLNHQNPDTFEPKDLDMLIKAATSDLENYDKTRHEEFKKYEMxxxxxxxxxxxxLDEEKRQREESKFGEMxxxxxxxxxxxxxxxxxxxKEVWEEADGLDPNEFDPKTFFKLHDVNNDRFLDEQELEAxFTKELEKVYDPKNEEDDMVEMEEERLxxxxHVMNEVDINKDRLVTLEEFLRATEKKEFLEPDSWETLDQQQLFTEDELKEFESHISQQEDELRKKAEELQKQKEELQRQHDQLQAQEQELQQVVKQMEQKKLQQANPPAGPAGELK"

Additionally, the coverage of peptides can be presented on a chart using the plot_coverage() and plot_position_frequency() functions.

The user can choose which state (or states) should be included in these plots. If this parameter is not provided, the first possible state is chosen. If a given peptide is available in more than one state, it is shown only once.

2.7.2 Quality control

The function quality_control() plots the change in uncertainty of deuteration levels as a function of incubation time. The uncertainty is averaged over all peptides available at a given time point in a selected state. Therefore, the user can detect a time point after which the decrease of the deuteration uncertainty becomes too marginal to prolong the measurements. This function is most useful in case of multiple measurements of the same or very similar proteins because it helps to optimize the duration of the incubation. The result of this function can be easily visualized.

Example:

This example is based on relative values. Although HaDeX can provide results in absolute values, be aware that absolute calculations do not encompass time out, so the uncertainty does not change with \(D_{t_{out}}\).

3 HaDeX Graphical User Interface

The HaDeX Shiny app is launched by the HaDeX_gui() function or available at MS Lab website: http://mslab-ibb.pl/shiny/HaDeX/.

4 Examples

4.1 Example 1: CD160-HVEM

The interaction between HVEM and the CD160 receptor was measured with HDX-MS.

Firstly, we read the input data, exactly as provided by the DynamX 3.0 (Waters Corp.)

Then, we reconstruct the protein sequence from the peptides measured during the experiment. We observe the region from amino acid 107 till amino acid 124 is not covered by any peptide.

## [1] "INITSSASQEGTRLNLICTVWHKKEEAEGFVVFLCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQLMFTISQVTPLHSGTYQCCARSQKSGIRLQGHFFSILFxxxxxxxxxxxxxxxxxxFSHNEGTL"

The theoretical plot allows finding regions, which exchange quickly (N-terminus, regions between 30-70 amino acid) and regions, which exchange slowly (peptides 15-24, 95-110 amino acid). We can also see the differences in the exchange between two states, indicating regions which changed upon binding with other protein.

On the experimental plot, there are visible regions, which exchange quickly (N terminal part, regions between 30-70 amino acid) and regions, which exchange slowly (peptides 15-24, 30-35, 95-110 amino acid).

Plots below are equivalent to plots above but in absolute values - for users that prefer those.

The plot below shows peptides for which levels of deuteration were significantly lower upon binding with other protein (red) and peptides for which levels of exchange were significantly higher upon binding with other protein (blue). The plot also shows peptides in which no significant changes in deuteration between states are visible (grey). The biggest changes on the theoretical plot can be observed in 3 peptides (15-24, 30-35, 75-90).

The biggest changes on the experimental plot can be observed in 3 peptides (15-24, 110-115).

Plot below shows results in absolute values.

##   out_time avg_err_state_first sd_err_state_first avg_err_state_second
## 1        5         0.002925674        0.002207625          0.003811229
## 2       25         0.002086275        0.001636480          0.003375989
## 3      120         0.002124354        0.001284390          0.002759393
## 4     1440         0.001841383        0.001032972          0.002532581
##   sd_err_state_second avg_err_theo_state_first sd_err_theo_state_first
## 1         0.004561764             0.0007087017            0.0005185554
## 2         0.003690424             0.0007087017            0.0005185554
## 3         0.001976257             0.0007087017            0.0005185554
## 4         0.001120593             0.0007087017            0.0005185554
##   avg_err_theo_state_second sd_err_theo_state_second    avg_diff
## 1               0.001236524             0.0005534176 0.004920756
## 2               0.001236524             0.0005534176 0.004053767
## 3               0.001236524             0.0005534176 0.003470818
## 4               0.001236524             0.0005534176 0.003200211
##       sd_diff avg_theo_diff sd_theo_diff
## 1 0.004952401   0.001476534 0.0006500359
## 2 0.003949308   0.001476534 0.0006500359
## 3 0.002088545   0.001476534 0.0006500359
## 4 0.001369362   0.001476534 0.0006500359

4.2 Example workflow 2

## [1] "xxxxxxxxxxxxxxxxVPIDIDKTKVKGEGHVEGEKIENPDTGLYYDEYLRQVIDVLETDKHFREKLQTADIEEIKSGKLSRELDLVSHHVRTRLDELKRQEVARLRMLIKAKMDSVQDTGIDHQALLKQFEHLNHQNPDTFEPKDLDMLIKAATSDLENYDKTRHEEFKKYEMxxxxxxxxxxxxLDEEKRQREESKFGEMxxxxxxxxxxxxxxxxxxxKEVWEEADGLDPNEFDPKTFFKLHDVNNDRFLDEQELEAxFTKELEKVYDPKNEEDDMVEMEEERLxxxxHVMNEVDINKDRLVTLEEFLRATEKKEFLEPDSWETLDQQQLFTEDELKEFESHISQQEDELRKKAEELQKQKEELQRQHDQLQAQEQELQQVVKQMEQKKLQQANPPAGPAGELK"

##   out_time avg_err_state_first sd_err_state_first avg_err_state_second
## 1       60         0.005530984        0.002672025          0.007094412
## 2     1440         0.005308684        0.002471285          0.006814869
##   sd_err_state_second avg_err_theo_state_first sd_err_theo_state_first
## 1         0.002096056              0.001396896            0.0006199273
## 2         0.002019402              0.001396896            0.0006199273
##   avg_err_theo_state_second sd_err_theo_state_second    avg_diff
## 1               0.002539802             0.0007520063 0.009074865
## 2               0.002539802             0.0007520063 0.008631836
##       sd_diff avg_theo_diff sd_theo_diff
## 1 0.002499717   0.002982639 0.0006774071
## 2 0.002732523   0.002982639 0.0006774071

References

Houde, Damian, Steven A. Berkowitz, and John R. Engen. 2011. “The Utility of Hydrogen/Deuterium Exchange Mass Spectrometry in Biopharmaceutical Comparability Studies.” Journal of Pharmaceutical Sciences 100 (6): 2071–86. https://doi.org/10.1002/jps.22432.

Hourdel, Véronique, Stevenn Volant, Darragh P. O’Brien, Alexandre Chenal, Julia Chamot-Rooke, Marie-Agnès Dillies, and Sébastien Brier. 2016. “MEMHDX: An Interactive Tool to Expedite the Statistical Validation and Visualization of Large HDX-MS Datasets.” Bioinformatics 32 (22): 3413–9. https://doi.org/10.1093/bioinformatics/btw420.

Joint Committee for Guides in Metrology. 2008. “JCGM 100: Evaluation of Measurement Data - Guide to the Expression of Uncertainty in Measurement.” JCGM.

Lau, Andy M. C., Zainab Ahdash, Chloe Martens, and Argyris Politis. 2019. “Deuteros: Software for Rapid Analysis and Visualization of Data from Differential Hydrogen Deuterium Exchange-Mass Spectrometry.” Bioinformatics (Oxford, England), January. https://doi.org/10.1093/bioinformatics/btz022.

Masson, Glenn R., John E. Burke, Natalie G. Ahn, Ganesh S. Anand, Christoph Borchers, Sébastien Brier, George M. Bou-Assaf, et al. 2019. “Recommendations for Performing, Interpreting and Reporting Hydrogen Deuterium Exchange Mass Spectrometry (HDX-MS) Experiments.” Nature Methods 16 (7): 595–602. https://doi.org/10.1038/s41592-019-0459-y.

Pascal, Bruce D., Scooter Willis, Janelle L. Lauer, Rachelle R. Landgraf, Graham M. West, David Marciano, Scott Novick, Devrishi Goswami, Michael J. Chalmers, and Patrick R. Griffin. 2012. “HDX Workbench: Software for the Analysis of H/D Exchange MS Data.” Journal of the American Society for Mass Spectrometry 23 (9): 1512–21. https://doi.org/10.1007/s13361-012-0419-6.