Table 1.

Distribution of population variables in the training and test sets by smoking and disease status

Training setTest set
Current smokersFormer smokersCurrent smokersFormer smokers
No lung cancerLung cancerNo lung cancerLung cancerNo lung cancerLung cancerNo lung cancerLung cancerMissing (%)
Total73,677 (98.9)820 (1.1)77,328 (99.61)304 (0.39)8,187 (98.89)92 (1.11)8,593 (99.61)34 (0.39)
Number of genotyped individuals, n (%)508 (44.8)626 (55.2)678 (74.34)234 (25.66)57 (47.9)62 (52.1)72 (72.73)27 (27.27)98.83
Sociodemographic
Sex, n (%)
 Male45,647 (99.23)354 (0.77)43,178 (99.76)104 (0.24)5,073 (99.24)39 (0.76)4,870 (99.75)12 (0.25)0
 Female70,169 (98.84)820 (1.16)74,009 (99.59)304 (0.41)7,774 (98.83)92 (1.17)8,225 (99.59)34 (0.41)
Dead at censoring, n (%)a3,508 (100)0 (0)3,319 (100)0 (0)413 (100)0 (0)368 (100)0 (0)0
Age, mean (SD), y57.5 (8.8)62 (7.3)60.5 (9.5)65.6 (8.8)57.4 (8.8)60.6 (8.2)60.5 (9.6)65.8 (8.5)0
Age at recruitment, mean (SD), y49.5 (8.7)57.2 (7.2)52.5 (9.4)60.8 (8.4)49.5 (8.8)56.1 (8.1)52.5 (9.5)60.8 (8.3)0
Follow-up, mean (SD), y7.9 (2)4.8 (2.6)8 (2)4.8 (2.6)7.9 (2)4.5 (2.9)8 (2)4.9 (2.3)0
BMI, mean (SD), kg/m225.7 (4.1)25.7 (4.5)26.4 (4.1)26.7 (3.6)25.7 (4.2)26 (4.3)26.3 (4.1)27 (2.5)12.56
Education, n (%)
 High school and below60,415 (98.84)710 (1.16)56,190 (99.54)259 (0.46)6,642 (98.74)85 (1.26)6,201 (99.57)27 (0.43)4.68
 Greater than high school11,817 (99.32)81 (0.68)17,532 (99.89)20 (0.11)1,390 (99.57)6 (0.43)1,984 (99.7)6 (0.3)
Medical history
Hay fever, n (%)
 No15,925 (99.1)145 (0.9)16,526 (99.73)45 (0.27)1,784 (98.84)21 (1.16)1,815 (99.78)4 (0.22)73.53
 Yes2,664 (99.48)14 (0.52)3,963 (99.67)13 (0.33)301 (99.34)2 (0.66)467 (99.57)2 (0.43)
Asthma, n (%)
 No30,788 (99.16)260 (0.84)27,690 (99.73)76 (0.27)3,393 (99.09)31 (0.91)3,071 (99.71)9 (0.29)61.46
 Yes1,521 (98.51)23 (1.49)2,103 (98.92)23 (1.08)161 (98.17)3 (1.83)269 (99.26)2 (0.74)
Family history of cancer, n (%)
 No9,730 (98.81)117 (1.19)10,880 (99.32)74 (0.68)1,098 (99.01)11 (0.99)1,235 (99.2)10 (0.8)84.02
 Yes1,051 (98.32)18 (1.68)1,409 (99.09)13 (0.91)121 (99.18)1 (0.82)157 (98.74)2 (1.26)
Smoking exposures
Smoking intensity, mean (SD), cpd13.5 (7.5)17.6 (7.4)13.1 (9.2)17.8 (10.9)13.5 (7.4)17.2 (7.9)13 (9.2)17.1 (11.6)0
Smoking duration, mean (SD), y30.3 (9.7)39.5 (8.2)19 (10.7)31.4 (12.1)30.2 (9.8)38.8 (8.4)18.9 (10.7)31.7 (11)0
Quit time, mean (SD), y0 (0.1)0 (0.1)15 (9.9)11.9 (9.3)0 (0.1)0 (0.1)15 (9.9)10.8 (7.4)0
Age start smoking, mean (SD), y27.2 (6.1)22.5 (5.1)26.5 (4.8)22.3 (5.2)27.2 (6.2)21.8 (4.2)26.5 (4.9)23.2 (5.9)0
Cigarettes per day
 ≤1528,222 (98.21)515 (1.79)29,174 (99.37)186 (0.63)3,203 (98.31)55 (1.69)3,242 (99.42)19 (0.58)0
 >1545,455 (99.33)305 (0.67)48,154 (99.76)118 (0.24)4,984 (99.26)37 (0.74)5,351 (99.72)15 (0.28)
Occupational exposures
Silica, n (%)
 Not exposed44,057 (98.64)609 (1.36)43,604 (99.51)216 (0.49)4,944 (98.58)71 (1.42)4,870 (99.47)26 (0.53)42.06
 Exposed1,125 (97.74)26 (2.26)1,153 (99.31)8 (0.69)118 (98.33)2 (1.67)104 (99.05)1 (0.95)
PAH, n (%)
 Not exposed39,750 (98.72)514 (1.28)39,620 (99.51)194 (0.49)4,419 (98.55)65 (1.45)4,412 (99.53)21 (0.47)42.06
 Exposed5,432 (97.82)121 (2.18)5,137 (99.42)30 (0.58)643 (98.77)8 (1.23)562 (98.94)6 (1.06)
Metal, n (%)
 Not exposed32,596 (98.65)447 (1.35)34,035 (99.56)150 (0.44)3,661 (98.79)45 (1.21)3,831 (99.56)17 (0.44)46.19
 Exposed6,827 (98)139 (2)7,521 (99.31)52 (0.69)760 (97.31)21 (2.69)806 (99.02)8 (0.98)
Asbestos, n (%)
 Not exposed39,734 (98.73)513 (1.27)38,974 (99.54)182 (0.46)4,425 (98.75)56 (1.25)4,334 (99.54)20 (0.46)42.06
 Exposed5,448 (97.81)122 (2.19)5,783 (99.28)42 (0.72)637 (97.4)17 (2.6)640 (98.92)7 (1.08)
  • aNumbers are taken at time of censoring, cases were censored at diagnosis, before death.