Table 3.

Prediction models for oral cancer

Part A: Logistic Regression, all patientsOR (95% CI)PAUCRescaled R2
Univariate models (150 cases/150 controls)
 log2 solCD442.036 (1.552–2.671)<0.00010.6810.137
 Protein2.159 (1.288–3.617)0.0030.5900.042
Multivariable modela (149 cases/148 controls)
 log2 solCD442.684 (1.797–4.010)<0.00010.7630.276
 Protein0.646 (0.301–1.386)0.262
Part B: Logistic regression stratified by HPV status
HPV negative (48 cases/150 controls)
Univariate
 log2 solCD442.311 (1.561–3.422)<0.00010.6890.146
 Protein1.838 (0.888–3.807)0.1010.5620.020
Multivariable modelb (48 cases/148 controls):
 log2 solCD444.017 (2.124–7.597)<0.00010.7710.275
 Protein0.179 (0.052–0.620)0.006
HPV positive (31 cases/150 controls)
 Univariate
  log2 solCD442.001 (1.291–3.102)0.0020.6670.096
  Protein1.882 (0.789–4.492)0.1540.5670.018
Multivariable modelc (148 controls)
 log2 solCD443.079 (1.486–6.378)0.0030.7730.221
 Protein0.384 (0.080–1.833)0.230
Part C: Logistic regression analysis of risk groups derived by multivariate recursive partitioning
Univariate Modeld of risk groups based on CD44 and protein levels
Risk Level (n = case + control)SolCD44 (ng/mL; level description)Protein (mg/mL)OR (95% CI)PredictionPAUCRescaled R2
 Low (102 = 29 + 73)<2.22 (low)<1.23 (low–medium)ReferenceControl0.7220.227
 Medium (116 = 54 + 62)≥2.22 and <5.33 (medium)≥0.558 (medium–high)2.192 (1.247–3.854)Control0.006
 High (5 = 4 + 1)<2.22 (low)≥1.23 (high)10.069 (1.079–93.93)Case0.043
 High (20 = 16 + 4)≥2.22 and <5.33 (medium)<0.558 (low)10.069 (3.103–32.672)Case0.0001
 High (57 = 47 + 10)≥5.33 (high)11.830 (5.279–26.508)Case<.0001
Multivariable modele of risk groups based on CD44 and protein levels
Risk level (n)SolCD44ProteinOR (95% CI)PredictionPAUCRescaled R2
 Low (102)<2.22 (low)<1.23 (low–medium)ReferenceControl0.7900.325
 Medium (116)≥2.22 and <5.33 (medium)≥0.558 (medium–high)2.755 (1.483–5.117)Control0.001
 High (5)<2.22 (low)≥1.23 (high)5.905 (0.591–59.053)Case0.131
 High (20)≥2.22 and <5.33 (medium)<0.558 (low)11.860 (3.312–42.472)Case<.0001
 High (57)≥5.33 (high)14.489 (5.973–35.145)Case<.0001
SES High vs. low0.577 (0.304–1.094)0.092
White Non-Hispanic vs. Black at age <607.885 (2.372–26.206)
White Hispanic vs. Black at age <601.767 (0.636–4.907)
White Non-Hispanic vs. Black at age ≥600.799 (0.216–2.956)
White Hispanic vs. Black at age ≥600.382 (0.124–1.175)
Age ≥60 vs. <60 in Black3.099 (0.838–11.457)
Age ≥60 vs. <60 in White Non-Hispanic0.314 (0.111–0.892)
Age ≥60 vs. <60 in White Hispanic0.669 (0.324–1.383)
Alcohol Ever vs. Never in Male1.615 (0.713–3.660)
Alcohol Ever vs. Never in Female0.202 (0.056–0.726)
Male vs. Female in alcohol=Never0.216 (0.062–0.757)
Male vs. Female in alcohol=Ever1.723 (0.695–4.273)
  • NOTE: Rescaled R2, coefficient of determination measured the dispersion explained by model; ORs, 1-unit increase for continuous variables log2 CD44, protein, and age, unless specified categories; race/ethnicity (WNH and Black vs. WH), gender (male vs. female), smoking and alcohol (ever vs. never), and SES (high vs. low).

  • aAdjusted for age (P = 0.132), race/ethnicity (P = 0.004), age × race/ethnicity (P = 0.006), gender (P = 0.030), alcohol (P = 0.032), gender×alcohol (P = 0.020), smoking (P = 0.527), and SES (P = 0.042). Model “markers + covariates” (AUC=0.763) provided significantly better prediction than the reduced model excluding both markers (AUC=0.686) and only including potential risk factors (P = 0.003), indicating that the markers aid prediction over and above prediction provided by knowledge of risk factors.

  • bAdjusted for age (P = 0.020), gender (P = 0.009), age × gender (P = 0.008), race/ethnicity (P = 0.740), alcohol (P = 0.183), smoking (P = 0.487), and SES (P = 0.047).

  • cAdjusted for age (P = 0.052), gender (P = 0.104), age × gender (P = 0.096), race/ethnicity (P = 0.298), alcohol (P = 0.537), smoking (P = 0.131), and SES (P = 0.070).

  • dAUC=0.722 for risk group model (based on CD44 and protein) is significantly different from AUC = 0.681 for univariate model log2 solCD44 (P = 0.025).

  • eLogistic regression model included CD44-protein risk groups (5 categories, P < 0.0001), age (≥60 vs. <60, P = 0.090), gender (P = 0.017), race/ethnicity (P = 0.001), alcohol (P = 0.014), SES (P = 0.092), and interaction age × race/ethnicity (P = 0.029) and gender×alcohol (P = 0.007). Smoking (ever vs. never, P = 0.700) and teeth removed (6 or more or all vs. 5 or less, P = 0.485) were tested for inclusion into model (AUC=0.791); they were removed since their inclusion did not improve model fit.