Applying decision tree algorithms to early differential diagnosis between different clinical forms of acute Lyme borreliosis and tick-borne encephalitis
- Authors: Ilyinskikh E.N.1, Filatova E.N.1, Samoylov K.V.1, Semenova A.V.1, Axyonov S.V.1
- 
							Affiliations: 
							- Siberian State Medical University
 
- Issue: Vol 28, No 5 (2023)
- Pages: 275-288
- Section: Original study articles
- Submitted: 01.10.2023
- Accepted: 17.10.2023
- Published: 05.11.2023
- URL: https://rjeid.com/1560-9529/article/view/601806
- DOI: https://doi.org/10.17816/EID601806
- ID: 601806
Cite item
Abstract
Background: Tick-borne encephalitis and Lyme borreliosis are the most common natural focal infections in Russia often arising as a mixed infection, which is often clinically difficult to distinguish from a monoinfection at the onset of the disease that is a result of delayed laboratory verification of the diagnosis and it requires further searching for fundamentally new approaches to the issue of early differential diagnosis of tick-borne infections.
Aims: is to develop decision tree algorithms for early differential diagnosis between the mono- and mixed forms of acute Lyme borreliosis and tick-borne encephalitis with prevailing febrile syndrome in clinical picture based on clinical and laboratory data.
Materials and methods: We retrospectively analyzed 55 clinical and laboratory parameters obtained from 291 hospitalized tick-borne infections patients with or without erythema migrans at the site of ixodid tick bites in the first week of the disease, who were included in the single-center study from 2010 to 2023. In 211 patients without erythema, the analysis was carried out between three classes depending on the diagnosis: the mixed infection of non-erythematous Lyme borreliosis and tick-borne encephalitis, the mono-infection of non-erythematous Lyme borreliosis or the monoinfection of tick-borne encephalitis. The other two classes, which included 80 patients with erythema, had the mixed infection of acute erythematous Lyme borreliosis and tick-borne encephalitis or erythematous Lyme borreliosis monoinfection. Python programming language was applied to develop two decision tree models. Feature importance was assessed for all predictors. Each patient class was randomly divided into training (70%) and testing (30%) datasets. Accuracy evaluation of the models was based on ROC analysis.
Results: The decision tree algorithm for early differential diagnosis among the tick-borne infection patients without erythema migrans included the following most important predictors: maximal fever rise, chills, neutrophil-to-monocyte ratio, ESR, absolute number of reactive lymphocytes and immature granulocytes, and percentage of eosinophils. The model for differential diagnosis between the patients with erythema migrans included the following predictors: maximal fever rise, the absolute number of reactive lymphocytes and immature granulocytes, and the percentage of basophils. Both decision tree models showed excellent predictive values based on sensitivity, specificity, precision, accuracy, and F1 scores, as well as areas under the ROC curve, which were higher than 0.90.
Conclusions: Based on clinical and laboratory parameters, two decision tree models with high sensitivity have been developed, which can be easily applied in clinical practice for early differential diagnosis of the tick-borne infections with prevailing fever syndrome.
Full Text
BACKGROUND
Among infections transmitted by ixodous ticks, tick-borne encephalitis and ixodous tick-borreliosis are the most common natural focal diseases in Russia, which often occur in a mixed form [1–3]. These diseases have significant similarities in the clinical picture, especially in the initial stage of the disease, which is predominantly febrile and intoxication syndrome [4–6]. Therefore, often delayed laboratory verification of the diagnosis of tick-borne infections due to the late appearance of specific antibodies against borreliosis can lead to the prescription of inadequate etiotropic therapy, which potentially contributes to the progression of the infectious process with the development of clinically more severe and/or chronic forms of the disease [7].
To date, there is no simple tool available to differentiate between isolated and mixed forms of ixoid tick-borreliosis and tick-borne encephalitis in the first days of illness based on clinical data prior to the results of specific laboratory tests. Machine learning methods are currently used for outcome prediction and early differential diagnosis of infectious diseases such as dengue fever [8, 9] and COVID-19 coronavirus infection [10, 11], in particular decision tree models, which are efficient classification algorithms that identify non-linear relationships between predictors, whose main advantages include visibility and ease of interpretation in practical applications [12].
The aim of the study is to develop decision tree algorithms for early differential diagnosis between isolated and mixed forms of acute ixodal tick-borreliosis and tick-borne encephalitis with predominant febrile-intoxication syndrome in the clinical picture based on the analysis of clinical and laboratory data at the beginning of the disease.
MATERIALS AND METHODS
Study design
The single-centre retrospective randomised observational study included 291 patients with mixed infection of acute ixodal tick-borreliosis and febrile tick-borne encephalitis or with mono-infections of these diseases who met the inclusion criteria.
Eligibility criteria
Inclusion criteria: patients with tick-borne infections aged 20 to 75 years, hospitalised not later than 7 days from the onset of the disease with clinical manifestations of predominantly febrile intoxication syndrome, who had clinical-epidemiological and laboratory confirmation of diagnoses of mixed infection or monoinfection of acute ixodous tick-borreliosis and/or febrile form of tick-borne encephalitis.
Exclusion criteria were pregnancy, lactation, meningeal syndrome confirmed by characteristic laboratory changes and cerebrospinal fluid, encephalitis, other acute and chronic infectious diseases (tuberculosis, chronic viral hepatitis B and C, HIV infection, etc.), as well as oncological or severe somatic pathology.
Terms and conditions of the event
The study was conducted at the Infectious Diseases Clinic of the Siberian State Medical University (FGBOU VO SibGMU of the Ministry of Health of Russia), Tomsk.
Duration of the study
Patients with tick-borne infections hospitalised during the spring and summer epidemic seasons from 2010 to 2023 were included in the study.
Study outcomes
Clinical and laboratory data obtained once during hospitalisation from the case histories of patients admitted to the hospital in the first 7 days from the onset of the disease with diagnoses of mixed infection or monoinfection of acute ixoid tick-borreliosis and/or febrile form of tick-borne encephalitis were analysed. In addition, to verify the final diagnosis, specific laboratory tests were performed in paired serum samples on the day of the patient’s hospitalisation, as well as in dynamics after 14, 21 days, 3 and 6 months.
Five variants of final diagnoses were verified in the examined patients, which were divided into two categories depending on the presence or absence of erythema migrans, a pathognomonic sign of ixoid tick borreliosis, at the site of ixoid tick borreliosis, in order to build differential diagnostic decision tree models using various parameters of clinical and laboratory data.
Using machine learning, two decision tree models were built to allow differential diagnosis in the first week of illness before the results of specific laboratory tests among patients without erythema at the tick-borne site who had three variants of final diagnoses: mixed infection of the erythematous form of acute ixoid tick-borreliosis with febrile form of tick-borne encephalitis, monoinfection of the erythematous form of ixoid tick-borreliosis, or monoinfection of the febrile form of tick-borne encephalitis, and among patients with erythema who had two variants of diagnoses: mixed infection of the erythematous form of ixoid tick-borreliosis with the febrile form of tick-borne encephalitis or monoinfection with the erythematous form of ixoid tick-borreliosis.
Subgroup analyses
To build decision tree models, each of the initial 5 classes of patients with different final diagnoses were randomly divided into two additional samples — training and test samples, keeping the ratio of 70 and 30% [12].
In the future, only training samples were used to build decision tree algorithms, while test samples were used to validate the obtained algorithms and to exclude overtraining, i.e. a situation when the model describes the data from the training sample well but is poorly applicable to the test data.
Methods of recording outcomes
The final diagnosis in all patients was made on the basis of clinical and epidemiological data and the results of specific laboratory tests, which were evaluated by experienced clinicians in dynamics during hospitalisation and after discharge from the hospital. Diagnoses of isolated and mixed infection with ixoid tick-borreliosis and/or febrile form of tick-borne encephalitis were formulated in accordance with their clinical classifications [1–3].
For laboratory confirmation of diagnoses of ixodine tick- borreliosis and/or tick-borne encephalitis, enzyme-linked immunosorbent assay (ELISA) was used to determine specific immunoglobulins (Ig) of classes M and G to Borrelia burgdorferi s. l. in diagnostic titres, as well as IgM and IgG to tick-borne encephalitis virus antigen using test systems of Vector-Best JSC (Russia). l., as well as ET virus antigen, IgM and IgG to tick-borne encephalitis virus, using test systems of Vector-Best JSC (Russia). In addition, human granulocytic anaplasmosis and ehrlichioses (Anaplasma phagocytophilum, Ehrlichia muris, Ehrlichia chaffeensis), as well as tick-borne fever caused by Borrelia miyamotoi were excluded in patients using RealBest kits (Vector-Best JSC, Russia) for polymerase chain reaction (PCR).
The study analysed 29 clinical parameters, including the maximum elevation of body temperature and other manifestations of febrile intoxication syndrome in the first week of illness. In addition, the analysis included 20 indices of general and biochemical blood tests on admission of patients to the hospital, including standard and advanced haemogram indices such as IG (immature granulocytes) — absolute number of immature granulocytes and R E-LYMP (reactive lymphocytes) — absolute number of reactive lymphocytes, determined using an automatic haematological analyser “Sysmex XN-1000” (“Sysmex Corp.” Japan), as well as 6 leukocytic intoxication indices (LII), including neutrophil-monocyte ratio index (NIRI), lymphocyte-monocyte ratio index (LMRI), neutrophil-lymphocyte index (NLI), leukocyte-erythrocyte sedimentation rate index (ESRSI), lymphocyte-granulocyte index (LGI) and LII according to V.K. Ostrovsky (LIIO) [13, 14].
Ethical expertise
Informed consent was obtained from all patients in writing. The conduct of the study was approved by the Local Ethical Committee of FGBOU VO SibGMU of the Ministry of Health of Russia (protocols No. 7939 dated 21.10.2019 and No. 9119/1 dated 30.05.2022).
Statistical analysis
Principles of sample size calculation: sample size was not pre-calculated.
Methods of statistical data analysis: statistical analyses were performed using STATISTICA 12.0 (StatSoft, USA) and Epi Info version 7.2.1.0 (CDC, USA). Comparisons between categorical variables expressed as percentages (%) were performed using Pearson’s chi-square test of agreement (χ2) or Fisher’s exact test, as appropriate [15]. Determination of odds ratio (OR) and 95% confidence interval (95% CI) were used to assess the effect of the trait at the nodes of the models. Statistical significance was set at the p <0.05 level.
The work was performed in Python programming language version 3.7.13 using built-in libraries (modules) for working with data arrays, machine learning and visualisation [12, 16]. To tune the decision tree model algorithm, the optimal hyperparameters were selected: Info(D) entropy was chosen as the partitioning criterion and uncertainty measure for the data set D in a decision tree node, the maximum tree depth, i.e., the number of nodes in the tree, was 15, and the threshold value of the partitioning criterion for data separation in a node was equal to 0.1.
The entropy Info(D) is calculated using the formula:
where D is a dataset that contains the number of n different classes Ci (i=1, 2, 3, ..., n); p(i) is the probability or fraction of samples belonging to class Ci for node t [16, 17].
In the model for patients with erythema, n=2 (classes: patients with mixed infection of the erythematous form of borreliosis with tick-borne encephalitis and patients with mono-infection with the eithematous form of borreliosis), and in the decision tree for patients without erythema, n=3 (classes: patients with mixed infection without erythematous form of tick-borreliosis with tick-borne encephalitis, patients with monoinfection without eithematous form of tick-borreliosis and patients with monoinfection with febrile form of tick-borne encephalitis).
In the ID3 algorithm we used, the decision tree is built recursively by selecting predictors according to a rule based on the criterion of data acquisition and splitting at each node, which is determined by the measure of information gain.
The information gain depends on the degree of uncertainty or impurity of the classes in the node and represents the expected decrease in the InfoA entropy (D) at each feature splitting. The lower the entropy, the less heterogeneity or randomness in the system. At each node, the tree algorithm selects the split point (split value) with the highest information gain that minimises the uncertainty of the outcome at each node and splits the predictor variable data to where it is less than or greater than a certain threshold value. When no further partitioning can be done, the tree is considered fully grown and each terminal node or leaf has records with a single class, that is, the variant diagnosis in this study.
The input variables (input), i.e. the raw data included in the models, were categorical, with a binary response like “Yes”=1 or “No”=0, or quantitative (numerical) [16, 17].
Tree models allow the importance of predictors to be assessed. Feature importance (FI) can give an indication of which variables have the strongest effect in classification models. To determine feature importance, the Sklearn module of the Python library package was used, which has a “Feature_importances” attribute in the “Forests of trees” model that shows which partitioning of variables is most effective in distinguishing between the classes studied [18].
The assessment of accuracy and efficiency of the models was based on the determination of the area under receiver operating characteristic curve (AUC) and on the indices defining sensitivity or completeness (sensitivity or recall), specificity (specificity), accuracy (accuracy), precision or positive predictive value (precision or positive predictive value), as well as on the F1 score (F1 score), which were calculated from the confusion matrix, i.e. the table of predicted and actual values of the classifier [18, 19]. AUC scores were presented as M±SD, where M is the mean and SD is the standard deviation.
RESULTS
Objects (participants) of the study
Among 211 patients with tick-borne infections without erythema, we searched for clinical and laboratory predictors to distinguish between three classes of patients depending on the final diagnosis: mixed infection of the erythematous-free form of tick-borreliosis with the febrile form of tick-borne encephalitis (class 1, n=38, mean age: 48.53±3.41 years), monoinfection with the erythematous form of tick-borreliosis (class 2, n=93, mean age: 47.10±3.12 years) or monoinfection with the febrile form of tick-borne encephalitis (class 3, n=80, mean age: 45.33±3.42 years). Among the 80 patients with erythema, analysis of predictor variables allowed a differential diagnosis between two classes of patients: mixed infection of the erythematous form of infectious tick-borreliosis with the febrile form of tick-borne encephaliti (class 4, n=37, mean age: 47.36±4.1 years) and monoinfection with the erythematous form of ixoid tick-borreliosis (class 5, n=43, mean age: 49.21±3.20 years).
Each of the patient classes was divided into training and test samples at a ratio of 70 and 30% (Table 1). In each class, the composition of the training and test samples was comparable depending on the sex and age of the patients.
Table 1. Number and sex-depending distribution in the training and test datasets of patients diagnosed with different clinical forms of tick-borne infections with or without erythema migrans at the site of ixodid tick bite
| Sampling | Paul | Sampling Paul Number of groups, abs. (%) | ||||
| Tick-borne infections without erythema | Tick-borne infections with erythema | |||||
| Class 1 SI n=38 | Class 2 MI ICBM n=93 | Class 3 MI KE n=80 | Class 4 SI n=37 | Class 5 MI ICBM n=43 | ||
| Teaching | Total | 27 (71,0) | 65 (70,0) | 56 (70,0) | 26 (70,3) | 30 (69,8) | 
| men | 15 (55,6) | 31 (47,7) | 30 (53,6) | 16 (61,5) | 15 (50,0) | |
| women | 12 (44,4) | 34 (52,3) | 26 (46,4) | 10 (38,5) | 15 (50,0) | |
| Test | Total | 11 (29,0) | 28 (30,0) | 24 (30,0) | 11 (29,7) | 13 (30,2) | 
| men | 6 (54,5) | 13 (46,4) | 13 (54,2) | 6 (54,5) | 7 (53,8) | |
| women | 5 (45,5) | 15 (53,6) | 11 (45,8) | 5 (45,5) | 6 (46,2) | |
Note: SI — mixed infection; MI — monoinfection; ICBM — ixode tick-borreliosis; KE — tick-borne encephalitis.
The main results of the study
When building decision tree models for early differential diagnosis in patients with suspected isolated and mixed forms of tick-borne infections who did or did not have erythema migrans at the site of ixodal tick bite, 55 input predictor variables obtained during clinical and laboratory examination of patients in the first week of the disease before verification of the final diagnosis were included in the analysis. As a result, non-linear relationships were established between the values of certain predictors and the classes as which the variants of the patients’ final diagnoses served. The study used the predictor variables that had the highest importance values for the differential diagnosis of mixed pathology and monoinfections in groups of patients with tick-borne infections to construct decision trees.
By determining the relative importance scores of each feature, the following predictors were found to have the strongest discriminatory power for early differential diagnosis between three classes of patients without erythema migrans with diagnoses of mixed infection of the erythematous form of tick-borreliosis with tick-borne encephalitis, monoinfection of the erythematous form of tick-borreliosis, or monoinfection of the febrile form of tick-borne encephalitis: maximum fever height (FI=0.26), COE (FI=0.16), ISNM (FI=0.15), chills (FI=0.14), absolute IG count (FI=0.14), absolute RE-LYMP count (FI=0.11) and relative eosinophil (EO) count (FI=0.05) in the haemogram.
The most important variables for differential diagnosis between patients with erythematous forms of isolated borreliosis and mixed infection of tick-borreliosis with tick-borne encephalitis were the following four attributes in descending order of importance: maximum fever height (FI=0.34), absolute IG count (FI=0.28), relative basophil count (BASO) (FI=0.21) and absolute RE-LYMP count (FI=0.19) in haemogram.
A decision tree model for early differential diagnosis in patients with suspected tick-borne infections who do not have erythema migrans at the site of ixodal tick bite is shown in Figure 1.
Fig. 1. Decision tree model for early clinical differential diagnosis between the following three classes of patients without erythema migrans at the site of ixodid tick bite: the mixed infection of Lyme borreliosis non-erythematous form and tick-borne encephalitis, the monoinfection of Lyme borreliosis non-erythematous form or the monoinfection of tick-borne encephalitis: ИКБ — monoinfection of Lyme borreliosis; КЭ — monoinfection of tick-borne encephalitis; СИ — mixed infection of Lyme borreliosis and tick-borne encephalitis; ИСНМ — neutrophil-monocyte ratio, units; СОЭ — erythrocyte sedimentation rate, mm/h; EO — number of eosinophils, %; RE-LYMP — absolute number of reactive lymphocytes, ×109/l; IG — absolute number of immature granulocytes, ×109/l.
The root node of the training sample of this model included three classes of patients with mixed infection of the erythematous form of borreliosis and tick-borne encephalitis, with monoinfection of the erythematous form of tick-borreliosis and monoinfection of the febrile form of tick-borne encephalitis, so the entropy in the first node was equal to 1.50. The maximum fever height with a threshold value less than / equal to or greater than 38.0° C was chosen as the initial predictor variable at the root node. Thus, this decision tree model had two main branches to classify patients without erythema migrans. In the tree branch with maximum body temperature in patients less than or equal to the threshold value of 38.0° C at the cleavage point, the next four nodes of the model included the predictors: an ISNM with a threshold value less than / equal to or greater than 9.50 units, a COE less than / equal to or greater than 9.50 mm/h, a RE-LYMP less than / equal to or greater than 0.075×109 /L, and a relative EO count less than / equal to or greater than 1.60%.
In each of the nodes, the statistical significance of the differences was determined and LS scores were calculated for the predominant class compared to the remaining patients with the other two variants of tick-borne infection diagnoses.
Of the 95 patients presenting with subfebrile fever or normal body temperature, all three classes were represented, but patients with a diagnosis of erythematous borreliosis monoinfection predominated, 61 patients or 64.2% (OR=21.98 (7.30–66.17), p <0.001). Provided the two following rules were fulfilled: an ISNM ≤9.50 units and a COE ≤9.50 mm/h, 43 patients, or 86.0% (LS=7.29 (2.58–20.62), χ2=15.74, p <0.001), had this diagnosis at the final node, and entropy decreased to 0.58.
If the following rule was observed in patients with subfebrile body temperature: ISNM ≤9.50 units, COE >9.50 mm/h and EO ≤1.60%, then in the final node 100% of patients had a diagnosis of febrile form of tick-borne encephalitis, and entropy decreased to 0. If patients had relative EO count in peripheral blood greater than 1.60%, the most likely diagnosis was monoinfection with erythematous form of tick-borreliosis, which 16 (88.89%) patients had, and entropy was 0.50. Among the patients with maximum body temperature not exceeding 38.0° C, who had an ISNM score >9.50 units, patients with a mixed infection of the erythematous-less form of tick-borreliosis and tick-borne encephalitis predominated, accounting for 8 (80.0%) patients in the node (OR=52.0 (8.94–301.64), p <0.001). The combination of subfebrile fever, ISNM >9.50 units and RE-LYMP ≤0.075×109 /L was exclusively characteristic of patients diagnosed with mixed infection of the erythematous form of tick-borreliosis and tick-borne encephalitis, i.e. 8 (100%) patients in the final node. If patients had a RE-LYMP score >0.075×109 /L, this corresponded to a class of patients with monoinfection with the erythematous-less form of tick-borreliosis in 100%. Therefore, the entropy scores at these two end nodes were 0.
The tree branch with maximum fever elevation in patients above the threshold value of 38.0° C at the cleavage point included two other subsequent predictor variables: presence or absence of chills symptom in the patient, and absolute number of IG in haemogram with a threshold value less than/equal to or greater than 0.025×109 /L. Overall, of the 53 patients with febrile fever, 36 (67.92%) patients were diagnosed with febrile febrile encephalitis monoinfection (OR=7.94 (3.71–16.96), χ2=31.78, p <0.001). If patients had febrile fever accompanied by chills, this diagnosis was already present in 25, or 96.15%, that is, in the vast majority of patients (LS=36.36 (4.27–309.43), p <0.001), and the entropy at the final node decreased to 0.24. When patients with febrile temperature had no chills and the IG parameter was ≤0.025×109 /L, the dominant class was patients with mixed infection of the erythematous-free form of tick-borreliosis and tick-borne encephalitis, accounting for 12, or 80.0%, of the patients. The entropy at this node was 0.72. If patients had an IG score >0.025×109 /L, this end node was dominated by patients with monoinfection of febrile tick-borne encephalitis, i.e. 11, or 91.67%, of patients. The entropy at the final node was 0.40. Thus, this decision tree model had eight end nodes, three of which had a predominant class of patients with erythematous tick-borreliosis monoinfection, three others had a predominant diagnosis of febrile tick-borne encephalitis monoinfection, and the remaining two nodes had an increased probability of mixed infection of erythematous tick-borreliosis monoinfection with febrile tick-borne encephalitis.
Fig. 2. Decision tree model for early clinical differential diagnosis between the following two classes of patients with erythema migrans at the site of ixodid tick bite: the mixed infection of Lyme borreliosis erythematous form and tick-borne encephalitis or the monoinfection of Lyme borreliosis erythematous form: BASO — number of basophils, %; IG — absolute number of immature granulocytes, ×109/l; RE-LYMP — absolute number of reactive lymphocytes, ×109/l.
Figure 2 shows a decision tree model for early clinical differential diagnosis between patients with monoinfection of erythematous form of ixodes tick-borreliosis and mixed infection of erythematous form of tick-borreliosis with tick-borne encephalitis, obtained on a training sample.
As in the case of tick-borne infection patients without erythema, maximum fever height was selected as a predictor variable at the root node of this model. The entropy at the first node was equal to 1.0. If the values of this predictor were less than or equal to the threshold value of 37.40° C at the cleavage point, patients with erythema had 15.13 (3.97–57.64) times greater odds of being diagnosed with erythema monoinfection of tick-borreliosis (22 patients, or 84.61%, at the node) over the combined course of erythema erythema and tick-borne encephalitis (p <0.001). In the branch of the tree with maximum body temperature in patients with erythema less than or equal to the threshold value of 37.40° C at the cleavage point, the subsequent predictor was IG score with a threshold value less than / equal to or greater than 0.005×109 /L.
Provided the rule was met: body temperature less than or equal to 37.40° C and IG parameter ≤0.005×109 /L, 100% of patients were diagnosed with erythematous monoinfection of ixoid tick-borreliosis, and entropy was 0. When patients had an IG value above 0.005×109 /L, the most likely diagnosis in 80.0% of patients was mixed infection with the erythematous form of tick-borreliosis and the febrile form of tick-borne encephalitis, and entropy was 0.72.
Among patients with fever above 37.40° C, the diagnosis of mixed infection of the erythematous form of tick- borreliosis with tick-borne encephalitis was dominant, i.e. 22 patients, or 73.33%, in the node, indicating a relatively more severe febrile syndrome in this group. In this branch of the algorithm, the discriminatory variables in the second and third nodes were the relative number of BASO and the absolute number of RE-LYMP in peripheral blood. In the case of the rule of thumb: BASO ≤0.25% the only class in the final node was 14 patients diagnosed with mixed infection of erythema form of tick-borreliosis with febrile form of tick-borne encephalitis, and the entropy was 0. If patients with erythema had a value of this indicator at the split point greater than 0.25%, then patients with an equal probability of 50% could have had diagnoses of mixt- or mono-infection, which additionally required the inclusion of another variable in the model at the third node of the decision tree — the absolute number of R E-LYMP. In the case of the rule of thumb: BASO >0.25% and RE-LYMP ≤0.21×109 /L in blood, the class of patients diagnosed with a mixed infection of the erythematous form of tick-borreliosis and the febrile form of tick-borne encephalitis, i.e. 8, or 80.0%, had the highest probability. If RE-LYMP levels exceeded 0.21×109 /L, the only class in the final node were patients with monoinfection with the erythematous form of tick-borreliosis. The entropy in the first case did not exceed 0.72, and in the second case it was equal to 0. As a result, the decision tree algorithm ended up with five nodes, in three of which the prevailing class was patients with mixed infection with erythematous tick-borreliosis and febrile form of tick-borne encephalitis, in the other two the probability of monoinfection with erythematous tick-borreliosis was increased.
Additional findings from the study
The predictive value of predictors and rules of both decision tree algorithms for differential diagnosis of tick- borne infections was validated on a test group. Table 2 shows the test sample confusion matrix used to evaluate the accuracy of the decision tree models for early differential diagnosis of tick-borne infection patients with the presence or absence of erythema migrans at the site of ixodal tick bite.
The results of evaluating the performance of the constructed decision tree models after validation using test sample are presented in Table 3. Both decision tree models obtained high predictive performance based on the determination of various metrics such as sensitivity, specificity, accuracy, precision and F1 score. In particular, using a test group, the sensitivity and specificity of decision tree model 1 for prognosis were validated for three classes of erythema-free patients: mixed infection with erythema-free borreliosis with tick-borne encephalitis (72.72 and 92.31%), monoinfection with erythema-free borreliosis (89.29 and 91.43%) and monoinfection with febrile tick-borne encephalitis (83.33 and 92.31%), and model 2 for differential diagnosis between the two classes of patients with erythema: erythema borreliosis mixed infection with tick-borne encephalitis (81.81 and 84.61%) and erythema borreliosis monoinfection (84.62 and 81.81%), which indicated that there was no overtraining and that the resulting decision tree algorithms were applicable to other comparable samples.
Table 2. Confusion matrix depicting the accuracy of the decision tree models for early differential diagnosis of patients with different clinical forms of tick-borne infections with or without erythema migrans at the site of ixodid tick bite
| Expected results | Actual results | ||||
| Model 1 for tick-borne infections without erythema | Model 2 for tick-borne infections with erythema | ||||
| Class 1 SI | Class 2 MI ICBM | Class 3 MI KE | Class 4 SI | Class 5 MI ICBM | |
| Class 1 SI | 8 | 2 | 2 | 9 | 2 | 
| Class 2 MI ICBM | 1 | 25 | 2 | 2 | 11 | 
| Class 3 MI KE | 2 | 1 | 20 | – | – | 
| Total | 11 | 28 | 24 | 11 | 13 | 
Note: SI — mixed infection; MI — monoinfection; ICBM — ixode tick-borreliosis; KE — tick-borne encephalitis.
To evaluate the quality of the trained models, the diagnosis of each patient was predicted using predictors from the test samples. Based on the real and predicted values, ROC curves with AUC calculation were plotted to show the quality of the models. As a result, both decision tree algorithms showed excellent results of discriminative ability to correctly assign patients to their respective classes with different diagnoses, as the AUC was greater than 0.90 in all cases (see Table 3).
Table 3. Accuracy evaluation of decision tree models for early differential diagnosis of patients with different clinical forms of tick-borne infections with or without erythema migrans at the site of ixodid tick bite
| Evaluation criterion | Model 1 for tick-borne infections without erythema | Model 2 for tick-borne infections with erythema | |||
| Class 1 SI n=11 | Class 2 MI ICBM n=28 | Class 3 MI KE n=24 | Class 4 SI n=11 | Class 5 MI ICBM n=13 | |
| Sensitivity, % | 72,72 | 89,29 | 83,33 | 81,81 | 84,62 | 
| Specificity, % | 92,31 | 91,43 | 92,31 | 84,61 | 81,81 | 
| Accuracy, % | 88,89 | 90,48 | 88,89 | 83,33 | 83,33 | 
| Precision, % | 66,67 | 89,29 | 86,96 | 81,81 | 84,62 | 
| F1 indicator, % | 69,56 | 89,29 | 85,11 | 81,81 | 84,62 | 
| AUC, M±SD | 0,91±0,15 | 0,93±0,12 | 0,95±0,13 | 0,94±0,06 | 0,94±0,06 | 
Note: SI — mixed infection; MI — monoinfection; ICBM — ixode tick-borreliosis; KE — tick-borne encephalitis.
DISCUSSION
Summary of the main result of the study
Using clinical and laboratory parameters, two decision tree algorithms with high sensitivity, specificity and accuracy were developed for early clinical differential diagnosis between isolated and mixed forms of acute ixoid tick-borreliosis and tick-borne encephalitis with febrile-intoxication syndrome predominating in the clinical picture. The decision tree algorithm for differential diagnosis at onset between patients with tick-borne infections without erythema who had diagnoses of mixed infection of the erythema-free form of borreliosis with tick-borne encephalitis, monoinfection of the erythema-free form of tick-borreliosis, or monoinfection of the febrile form of tick-borne encephalitis included the following most important predictors: maximum fever height, chills, ISNM, COE, absolute RE-LYMP and IG counts, and eosinophil percentage. The model for differential diagnosis between patients with erythema who had diagnoses of mixed infection of erythema borreliosis with tick-borne encephalitis or monoinfection with erythema borreliosis included: maximum fever height, absolute RE-LYMP and IG counts, and percentage of basophils.
Discussion of the main result of the study
The clinical picture of mixed infection of tick-borreliosis with tick-borne encephalitis is extremely diverse, which largely depends on the application of different classifications, as well as on the form and stage of the course of these diseases, so the differential diagnosis of tick-borne infections, especially in the first week of illness, can be difficult and requires fundamentally new, previously unused approaches [3–5]. There are sporadic studies in the literature devoted to the construction of multiple linear or logistic regression models for predicting the outcome of ixodes tick-borne borreliosis (Lyme borreliosis) based on the analysis of clinical or immunological data [20, 21].
Decision tree algorithms are currently used for clinical differential diagnosis and medical triage strategies for patients with various infectious diseases accompanied by fever syndrome. In particular, clinical and laboratory data, including haemogram parameters, are used in decision tree models for early differential diagnosis of dengue fever and COVID-19 coronavirus infection from other diseases [8, 10]. In addition, decision tree models have been developed to predict the severe course and/or fatal outcome of dengue fever and COVID-19, which is of particular importance in epidemics not only for clinical decision support but also for planning the allocation of health care resources [9, 11].
It has been shown that although mixt-infection of ixoid tick-borreliosis with tick-borne encephalitis in the initial period of the disease does not have specific symptoms characteristic of the combined disease, however, as a rule, it proceeds clinically more severely due to pronounced manifestations of fever and intoxication syndromes compared to mono-infection of acute ixoid tick-borreliosis, but more easily compared to the isolated form of tick-borne encephalitis, which was confirmed in the decision tree algorithms we constructed, since such parameters as the maximum fever and/or chills were the most important predictors [4–6]. In addition, it is known that most researchers believe that the erythematous form of acute ixoid tick-borreliosis has a more severe course than the erythematous form [22, 23].
Among the laboratory parameters reflecting the intensity of the general inflammatory syndrome, having differential significance in the decision tree models developed by us, were the indices of ISNM, SOE, absolute number of IG and/or RE-LYMP. It is known that the majority of patients with both mixed infection of tick-borreliosis with tick-borne encephalitis and monoinfection of febrile form of tick- borne encephalitis at the beginning of the disease tended to a moderate increase in the level of COE and the number of neutrophils, including the number of paloconuclear and immature forms of granulocytes in the leukocytic formula of peripheral blood [2, 4, 23], which is consistent with the data obtained by us. In addition, the decrease in RE-LYMP in patients with mixed infection of ixoid tick-borreliosis with tick-borne encephalitis compared to ixoid tick-borreliosis monoinfection probably reflects a decrease in the number of activated T-lymphocytes in the immune status during the first days of the disease [3, 24]. Increased IG immature granulocyte count and leukocyte ISNM index, as well as decreased RE-LYMP parameter are considered as important markers of severity and/or increased risk of mortality in patients with COVID-19 [25–27] or dengue fever [28].
Changes in the other two haemogram parameters in the models we have constructed — the number of eosinophils or basophils — appear to be related to differences in the immune response to the virus and borrelia. Although it is known that these parameters in tick-borne infections remain within normal limits in most cases, but in monoinfection of tick-borreliosis, especially in the case of the erythematous form, these parameters have a significant tendency to increase in the first days of the disease compared to healthy people and patients with mixt- and monoinfection of tick-borne encephalitis [22, 23], which was confirmed in the present study.
Limitations of the study
Limitations of this study are that the decision tree models were derived from relatively small samples and trained on data from only one cohort. In addition, these models can only be applied to adult patients and to patients with isolated or mixed forms of acute ixodes tick-borne borreliosis and/or febrile tick-borne encephalitis who were hospitalised in the first week of illness and had no signs of nervous system involvement.
CONCLUSION
The most widespread natural focal infections in Russia with a vector-borne mechanism of transmission — tick- borne encephalitis and ixode tick-borreliosis — often occur in the form of a mixed form, which is often clinically difficult to distinguish from a monoinfection at the beginning of the disease, which may also be due to late laboratory verification of the diagnosis and requires the search for a fundamentally new approach to the problem of early differential diagnosis of tick-borne infections. One of such approaches is decision tree models, the main advantages of which compared to regression analysis methods include clarity and ease of interpretation in practical application. Decision tree algorithms built using data from clinical and laboratory examination of patients are currently used for early differential diagnosis, prediction of severe course and/or lethal outcome of various infectious diseases, including dengue fever and COVID-19.
As a result of this study, two decision tree models for early differential diagnosis of patients with isolated or mixed form of acute ixoid tick-borne borreliosis and tick- borne encephalitis with predominant fever syndrome in the clinical picture, based on the data of analysis of the most important clinical and laboratory predictors in the first week of the disease, were built for the first time using the machine learning method with the Python programming language. One decision tree algorithm is designed for clinical differential diagnosis between patients without erythema migrans at the site of sucking of the ixodal tick — mixed infection of the erythematous form of borreliosis with tick-borne encephalitis, monoinfection with an erythematous form of tick-borreliosis with tick-borne encephalitis or monoinfection with a febrile form of tick-borne encephalitis, and another for differential diagnosis between patients with erythema who have a mixed infection of an erythematous form of tick- borreliosis with tick-borne encephalitis or a monoinfection of an erythematous form of tick-borreliosis. Both decision tree models demonstrated high accuracy scores that included sensitivity, specificity, accuracy, precision and F1 score, as well as an AUC area under the ROC curve that exceeded 0.90.
This study suggests that classification algorithms using simple clinical and haematological parameters may be of practical use in determining the tactics of patient management and justifying the indications for choosing adequate etiotropic therapy for tick-borne infections in the first days of illness before the results of laboratory verification of the diagnosis are available.
ADDITIONAL INFORMATION
Funding source. The study was supported by the grant of the Russian Science Foundation No. 22-15-20010, https://rscf.ru/ project/22-15-20010/ and the Tomsk Region Administration.
Competing interests. The authors declare that they have no competing interests.
Authors’ contribution. All authors made a substantial contribution to the conception of the work, acquisition, analysis, interpretation of data for the work, drafting and revising the work, final approval of the version to be published and agree to be accountable for all aspects of the work. E.N. Ilyinskikh — formation of the concept and design of the study, editing manuscript, formulation of conclusions and interpretation of the results of the study; E.N. Filatova — selection and approval of clinical research material in accordance with ethical standards, processing of archival patient records, statistical processing of materials, writing the manuscript; K.V. Samoylov — collecting and analyzing literature data, writing the manuscript; A.V. Semenova — processing of archival patient records; S.V. Axyonov — development of mathematical models, statistical processing of materials, writing the manuscript.
About the authors
Ekaterina N. Ilyinskikh
Siberian State Medical University
							Author for correspondence.
							Email: infconf2009@mail.ru
				                	ORCID iD: 0000-0001-7646-6905
				                	SPIN-code: 5245-5958
							Scopus Author ID: 6602611268
							ResearcherId: P-1653-2016
				                								
MD, Dr. Sci. (Med.), Associate Professor
Russian Federation, 2 Moskovsky trakt, 634050 TomskEvgenia N. Filatova
Siberian State Medical University
														Email: synamber@mail.ru
				                	ORCID iD: 0000-0001-9951-8632
				                	SPIN-code: 8094-3417
														ResearcherId: AEQ-2635-2022
				                								
MD
Russian Federation, 2 Moskovsky trakt, 634050 TomskKirill V. Samoylov
Siberian State Medical University
														Email: samoilov.krl@gmail.com
				                	ORCID iD: 0000-0002-8477-8551
				                	SPIN-code: 4710-0894
														ResearcherId: HGC-9557-2022
				                								
MD
Russian Federation, 2 Moskovsky trakt, 634050 TomskAlina V. Semenova
Siberian State Medical University
														Email: wind_of_change95@mail.ru
				                	ORCID iD: 0000-0001-5195-3897
				                	SPIN-code: 2690-1166
														ResearcherId: ACK-7745-2022
				                								
MD
Russian Federation, 2 Moskovsky trakt, 634050 TomskSergey V. Axyonov
Siberian State Medical University
														Email: axyonov@tpu.ru
				                	ORCID iD: 0000-0002-1251-7133
				                	SPIN-code: 2229-4552
							Scopus Author ID: 55543000900
							ResearcherId: F-8210-2017
				                								
Cand. Sci. (Eng.), Associate Professor
Russian Federation, 2 Moskovsky trakt, 634050 TomskReferences
- Lobzin YuV, Uskov AN, Kozlov SS. Laim-borrelioz: iksodovye kleshchevye borreliozy. Saint Petersburg: Foliant; 2000. 156 p. (In Russ).
- Ierusalimsky AP. Kleshchevoi entsefalit: rukovodstvo dlya vrachei. Novosibirsk: Gosudarstvennaya meditsinskaya akademiya MZ RF; 2001. 360 p. (In Russ).
- Bondarenko AL, Zykova IV, Abbasova SV, Tikhomolova EG, Nekhoroshkina EL. Mixed infection of tick-borne encephalitis and ixodes tick-borne borrelioses. Infektsionnye Bolezni. 2011;9(4):54–63. (In Russ).
- Subbotin AV, Semenov VA, Etenko DA. Problema sovremennykh smeshannykh neiroinfektsii, peredayushchikhsya iksodovymi kleshchami. The Russian Archives of Internal Medicine. 2012; (2(4)):35–39. (In Russ).
- Konkova-Reidman АB, Zlobin VI. Clinical polymorphism of ixodes tick-borne borrelioses (mixed infection with tick-borne encephalitis) on the territory of South-Ural region of Russia. Sibirskii meditsinskii zhurnal (Irkutsk). 2011;100(1):17–20. (In Russ).
- Minoranskaya NS, Minoranskaya EI. Clinical and epidemiologic characteristics of Lyme borreliosis and tick-borne encephalitis mixed infection in Krasnoyarsk kray. Kazan Medical Journal. 2013; 94(2):211–215. (In Russ).
- Andronova NV, Minoranskaya NS, Minoranskaya EI. The specific immune response and some remote results in the acute course of tick-borne borreliosis and mixed-infection of tick-borne encephalitis and tick-borne borreliosis. Sibirskii meditsinskii zhurnal (Irkutsk). 2011;100(1):54–57. (In Russ).
- Tanner L, Schreiber M, Low JG, et al. Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS Negl Trop Dis. 2008;2(3):e196. doi: 10.1371/journal.pntd.0000196
- Tamibmaniam J, Hussin N, Cheah WK, Ng KS, Muninathan P. Proposal of a clinical decision tree algorithm using factors associated with severe dengue infection. PLoS One. 2016;11(8):e0161696. doi: 10.1371/journal.pone.0161696
- Dobrijević D, Andrijević L, Antić J, Rakić G, Pastor K. Hemogram-based decision tree models for discriminating COVID-19 from RSV in infants. J Clin Lab Anal. 2023;37(6):e24862. doi: 10.1002/jcla.24862
- Kumaran M, Pham TM, Wang K, et al. Predicting the risk factors associated with severe outcomes among COVID-19 patients-decision tree modeling approach. Front Public Health. 2022;10:838514. doi: 10.3389/fpubh.2022.838514
- Bruce P, Bruce A. Practical Statistics for Data Scientists. Saint Petersburg: BKhV-Peterburg; 2018. 304 p. (In Russ).
- Kowalska-Kępczyńska A, Mleczko M, Domerecka W, Krasowska D, Donica H. Assessment of immune cell activation in pemphigus. Cells. 2022;11(12):1912. doi: 10.3390/cells11121912
- Grebennikova IV, Lidokhova OV, Makeeva AV, et al. Age-depended changes of leukocyte indices in COVID-19. Naučno-medicinskij vestnik Central’nogo Černozem’â. 2022;(87): 9–15. (In Russ).
- Petrie A, Sabin K. Medical statistics at a glance. Leonov VP, editor. Moscow: GEOTAR-Media; 2015. 216 p. (In Russ).
- Huang Y, Li S, Lin B, et al. Early detection of college students’ psychological problems based on decision tree model. Front Psychol. 2022;13:946998. doi: 10.3389/fpsyg.2022.946998
- Mukhopadhyay S. Advanced data analytics using Python: with machine learning, deep learning and NLP examples. Kolkata : Apress Berkeley; 2018. 186 p. doi: 10.1007/978-1-4842-3450-1
- Kalafi EY, Nor NAM, Taib NA, et al. Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol (Praha). 2019;65(5-6):212–220.
- Lakshmanan V, Sara Robinson S, Munn M. Machine learning. Design patterns. Saint Petersburg: BKhV-Peterburg; 2022. 448 p. (In Russ).
- Turk SP, Lumbard K, Liepshutz K, et al. Post-treatment Lyme disease symptoms score: Developing a new tool for research. PLoS One. 2019;14(11):e0225012. doi: 10.1371/journal.pone.0225012
- Minoranskaya NS, Uskov AN, Sarap PV. Importance of immune status for prognosis chronic borreliosis infections. Jurnal infektologii. 2014;6(1): 35–40. (In Russ).
- Suzdaltcev AA, Karavashkin NV, Kulagina AP. Clinical and epidemiological aspects of ixodic tick-borne borreliosis in the Samara region. Meditsinskii vestnik Bashkortostana. 2021;16(3(93)):27–32. (In Russ).
- Bondarenko AL, Sapozhnikova VV. Analysis of clinical-epidemiological, laboratory parameters and cytokine status in patients with erythematous and non-erythematous forms of ixodes tick borreliosis. Infektsionnye Bolezni. 2018;16(2):34–42. (In Russ). doi: 10.20953/1729-9225-2018-2-34-42
- Blom K, Cuapio A, Sandberg JT, et al. Cell-mediated immune responses and immunopathogenesis of human tick-borne encephalitis virus-infection. Front Immunol. 2018;9:2174. doi: 10.3389/fimmu.2018.02174
- Rutkowska E, Kwiecień I, Kulik K, et al. Usefulness of the new hematological parameter: reactive lymphocytes RE-LYMP with flow cytometry markers of inflammation in COVID-19. Cells. 2021;10(1):82. doi: 10.3390/cells10010082
- Georgakopoulou VE, Makrodimitri S, Triantafyllou M, et al. Immature granulocytes: Innovative biomarker for SARS-CoV-2 infection. Mol Med Rep. 2022;26(1):217. doi: 10.3892/mmr.2022.12733
- Rizo-Téllez SA, Méndez-García LA, Flores-Rebollo C, et al. The neutrophil-to-monocyte ratio and lymphocyte-to-neutrophil ratio at admission predict in-hospital mortality in Mexican patients with severe SARS-CoV-2 Infection (COVID-19). Microorganisms. 2020;8(10):1560. doi: 10.3390/microorganisms8101560
- Oehadian A, Michels M, de Mast Q, et al. New parameters available on Sysmex XE-5000 hematology analyzers contribute to differentiating dengue from leptospirosis and enteric fever. Int J Lab Hematol. 2015;37(6):861–868. doi: 10.1111/ijlh.12422
Supplementary files
 
				
			 
					 
						 
						 
									
 
  
  
  Email this article
			Email this article 
 Open Access
		                                Open Access Access granted
						Access granted Subscription or Fee Access
		                                							Subscription or Fee Access
		                                					


