Author, year | Data modality | Data set size (train/valid/test) | Inclusion and exclusion criteria (if any) | AI type | Labeling procedure | Pre-processing | Augmentation | Model structure | Performance measurements | Outcome |
---|---|---|---|---|---|---|---|---|---|---|
Akay, G., et al. 2023 [27] | Cephalograms | 588 (447/141) | Inclusion: patients between 8–22 years, clear C2, C3 and C4, images with no artifacts and distortions | Deep learning | By two dentomaxilofacial radiologists, final decision but agreement among observers | Cropping and labeling, size reduction | NA | CNN | Kappa coefficient, | 0.88 |
Precision (per class) | (0.47–0.82) | |||||||||
Recall (per class) | (0.37–0.74) | |||||||||
F1 score (per class) | (0.44–0.76) | |||||||||
Accuracy | 0.57 | |||||||||
Khazaei M. et al. 2023 [28] | Cephalograms | 1846 (1477/-/369) | Inclusion: age between 5–18 years, clear C2, C3 and C4, no trauma or surgery in head and neck area, no orthodontic treatment, no medical condition affecting bone development, no systemic disease, no growth delay, no craniofacial anomalies, no growth disorders and no growth hormone therapy | Deep learning | By two orthodontists | Cropping | Translation, rotation, zoom, intensity shift, normalization | ConvNextBase-296 | Accuracy | 0.82 |
F-score | 0.81 | |||||||||
EfficientNetB3-386 | Accuracy | 0.81 | ||||||||
F-score | 0.80 | |||||||||
DenseNet-121 | Accuracy | 0.80 | ||||||||
F-score | 0.80 | |||||||||
DenseNet-169 | Accuracy | 0.67 | ||||||||
F-score | 0.66 | |||||||||
VGG-16 | Accuracy | 0.75 | ||||||||
F-score | 0.74 | |||||||||
VGG-19 | Accuracy | 0.68 | ||||||||
F-score | 0.67 | |||||||||
ResNet-101 | Accuracy | 0.65 | ||||||||
F-score | 0.65 | |||||||||
ResNet50 | Accuracy | 0.65 | ||||||||
F-score | 0.65 | |||||||||
Li H., et al. 2022 [29] | Cephalograms | 10,200 (7111/1544/1545) | Inclusion: No congenital or acquired malformation of the cervical vertebrae, no trauma or operation in the head and neck area, no disorder affecting bone development, no systemic disease, no growth and development retardation, no congenital acquired malformations in the head and neck region, clear C2, C3, and C4 | Deep learning | By two orthodontists in case of disagreement the third orthodontist was consulted | Automatic ROI extraction using YOLOv3 and shape recognition network | NA | ConvNet | Precision (per class) | (0.57–0.85) |
Recall (per class) | (0.63–0.81) | |||||||||
F1 score (per class) | (0.60–0.81) | |||||||||
Accuracy | 0.70 | |||||||||
AUC | 0.94 | |||||||||
ICC | 0.94 | |||||||||
Makaremi, M., et al. (2019) [30] | Cephalograms | 600 (300/ 200/ 100) and 900 and 1900 | NA | Deep learning | By a radiologic technician | Cropping, Sobel filtering, entropy filter | NA | CNN | Precision (per class) 6 layer | (0.59–0.99) |
Recall (per class) 6 layers | (0.67–0.99) | |||||||||
F1 score (per class) 6 layers | (0.74–0.92) | |||||||||
Recall (per class) 7 layers | (0.67–0.99) | |||||||||
Precision (per class) 7 layer | (0.59–0.99) | |||||||||
F1 score (per class) 7 layers | (0.74–0.92) | |||||||||
Zhou, J., et al. 2021 [21] | Cephalograms | 1080 (980/ -/ 100) | Inclusion: clear contour of c2, c3, c4, 6–22 years | Deep learning | By two examiners in case of disagreement the third examiner was consulted | Cropping, extracting and crafting the features (measurement between landmarks) | NA | CNN | ICC | 0.98 |
exclusion: congenital disease | Accuracy | 0.71 | ||||||||
Kim, E.-G et al. 2021 [31] | Cephalograms | 600 (fivefold cross validation) | Inclusion: 6–18 years | Deep learning | By two specialists | Automated ROI extraction using U-net | Rotation, horizontal and vertical flip, changes in brightness, saturation, contrast and hue | CNN | Accuracy | 0.62 |
Makaremi, M., et al. (2020) [32] | Cephalograms | 600 (300/ 200/ 100) and (200/200/200) | NA | Deep learning | By an expert | Cropping, Sobel filtering | NA | CNN | Accuracy | 0.90 |
Kok, H., et al. 2020 [33] | Cephalograms | 360 (fivefold cross validation) | Inclusion: 8–17 years | Deep learning, Machine learning | By an orthodontist | Extracting and crafting the features (measurement between landmarks) | NA | ANN | Accuracy | 0.94 |
Precision (per class) | (0.83–1.0) | |||||||||
Recall (per class) | (0.83–1.0) | |||||||||
F1 score (per class) | (0.83–0.97) | |||||||||
Kappa coefficient | 0.95 | |||||||||
Naïve Bayes model | Accuracy | 0.68 | ||||||||
Exclusion: disease preventing bone development, systemic diseases and syndromes, growth and development retardation, an anomaly with prevention of craniofacial growth, endocrine disorders or malnutrition, long-term infectious disease | Kappa coefficient | 0.61 | ||||||||
Precision (per class) | (0.25–1.0) | |||||||||
Recall (per class) | (0.05–1.0) | |||||||||
F1 score (per class) | (0.08–0.90) | |||||||||
Kok, H., et al. 2021 [34] | Cephalograms | 419 patients (293/ 63/63) | Inclusion: 8–17 years | Deep learning | By an experienced researcher | Extracting and crafting the features (measurement between landmarks) | NA | ANN | Accuracy | 0.94 |
Sensitivity (per class) | (0.88–1.0) | |||||||||
Specificity (per class) | (0.97–1.0) | |||||||||
F1 score (per class) | (0.90–1.0) | |||||||||
Amasya, H., et al. 2020 [35] | Cephalograms | 647 (498/ -/ 149) | Inclusion: no congenital or acquired malformation of the cervical vertebrae, proper visualization of C2, C3, C4 and C5, age between 10 and 30 | Deep learning, Machine learning | By a software and two radiologists | Extracting and crafting the features (measurement between landmarks) | NA | ANN | Agreement | 0.86 |
Kappa coefficient (wk) | 0.92 | |||||||||
LR | Agreement | 0.78 | ||||||||
Kappa coefficient (wk) | 0.86 | |||||||||
SVM | Agreement | 0.81 | ||||||||
Kappa coefficient (wk) | 0.87 | |||||||||
RF | Agreement | 0.82 | ||||||||
Kappa coefficient (wk) | 0.90 | |||||||||
DT | Agreement | 0.85 | ||||||||
Kappa coefficient (wk) | 0.92 | |||||||||
Amasya H. et al. 2020 [36] | Cephalograms | 647 | Inclusionage between 10 and 30, no congenital or acquired malformation of the cervical vertebrae, good quality of C2, C3, C4 and C5 Exclusion: current orthodontic treatment, permanent incisors or first molars missing, erupted or supernumerary teeth overlapping incisor apex, obvious skeletal asymmetry | Deep learning | By a software and two radiologists | NA | NA | ANN | Agreement with observers | 0.58 |
Mohammad-Rahimi, H., et al. 2022 [37] | Cephalograms | 890 (692/ 99/ 99) | Inclusion: cephalograms with visible c2 to c4 | Deep learning | By two orthodontists | Cropping | Random cropping, random color jitter, random affine, random gaussian noise | ResNet 101 | Accuracy, | 0.61 |
Precision (per class) | (0.25–0.88) | |||||||||
exclusion: images of patients wearing items, non-standard images, low quality images | Recall (per class) | (0.33–0.78) | ||||||||
F1 score (per class) | (0.29–0.82) | |||||||||
Liao, N., et al. 2022 [38] | Cephalograms | 900 (fivefold cross-validation) | Inclusion: 7–25 years | Deep learning | By three orthodontics and radiologists | Cropping | Random horizontal flipping, color jittering, random rotation | Resnet 50- iCVM | Accuracy (CVM-900, CVM-900-subset) | (0.69, 0.84) |
Kappa coefficient (CVM-900, CVM-900-subset) | (0.94, 0.96) | |||||||||
MAE (CVM-900, CVM-900-subset) | (0.33,0.16) | |||||||||
Li, H., et al. 2022 [39] | Cephalograms | 6079 (4255/912/ 912) | Inclusion: complete medical record, qualified cephalograms, age less than 18Â years old | Deep learning | By two experienced orthodontists in case of disagreement the third orthodontist was consulted | Cropping | Random translation, random rotation, adaptive histogram equalization | Resnet 152 | Kappa coefficient | 0.82 |
AUC | 0.93 | |||||||||
Accuracy | 0.67 | |||||||||
Precision (per class) | (0.52–0.77) | |||||||||
Recall (per class) | (0.52–0.84) | |||||||||
F1 score (per class) | (0.52–0.81) | |||||||||
VGG16 | Kappa coefficient | 0.79 | ||||||||
AUC | 0.92 | |||||||||
Accuracy | 0.61 | |||||||||
GoogLeNet | Kappa coefficient | 0.81 | ||||||||
AUC | 0.92 | |||||||||
Accuracy | 0.64 | |||||||||
exclusion: syndromes, metabolic disease, special drugs, disease affecting growth and development | DenseNet161 | Kappa coefficient | 0.81 | |||||||
AUC | 0.92 | |||||||||
Accuracy | 0.64 | |||||||||
Seo, H., et al. 2021 [24] | Cephalograms | 600 (480/ -/120) | Inclusion: 6–19 years | Deep learning | By a radiologist | Cropping | Rotation, horizontal and vertical translation, horizontal and vertical scaling | Inception-Resnet v2 | Accuracy | 0.94 ± 0.018 |
Precision | 0.84 ± 0.064 | |||||||||
Recall | 0.84 ± 0.061 | |||||||||
F1 score | 0.84 ± 0.051 | |||||||||
ResNet-18 | Accuracy | 0.92 ± 0.025 | ||||||||
Precision | 0.80 ± 0.094 | |||||||||
Recall | 0.80 ± 0.065 | |||||||||
F1 score | 0.80 ± 0.074 | |||||||||
MobileNet-v2 | Accuracy | 0.91 ± 0.022 | ||||||||
Precision | 0.77 ± 0.111 | |||||||||
Recall | 0.77 ± 0.040 | |||||||||
F1 score | 0.77 ± 0.070 | |||||||||
ResNet-50 | Accuracy | 0.92 ± 0.025 | ||||||||
Precision | 0.80 ± 0.096 | |||||||||
Recall | 0.80 ± 0.068 | |||||||||
F1 score | 0.80 ± 0.075 | |||||||||
ResNet-101 | Accuracy | 0.93 ± 0.020 | ||||||||
Precision | 0.82 ± 0.113 | |||||||||
Recall | 0.83 ± 0.096 | |||||||||
F1 score | 0.82 ± 0.054 | |||||||||
Inception-v3 | Accuracy | 0.93 ± 0.027 | ||||||||
Precision | 0.82 ± 0.119 | |||||||||
Recall | 0.83 ± 0.100 | |||||||||
F1 score | 0.82 ± 0.082 | |||||||||
Atici, S. F., et al. 2023 [40] | Cephalograms | 1012 (823/-/189) | Inclusion: clear and visible C2, C3 and C4, | Deep learning | By an orthodontist | Segmentation and cropping | Random translation, rotation and auto contrast | Aggregate net | Accuracy | (male:0.75, female:0.82) |
exclusion: abnormalities of head and neck, low image quality | Intra-examiner agreement (wk) | 0.95 | ||||||||
Inter-examiner agreement (wk) | 0.90 | |||||||||
Atici, SF., et al. 2022 [25] | Cephalograms | 1018 (761/ -/ 257) | Inclusion: age between 4 and 29, adequate quality, clear c2/c3/c4, | Deep learning, Machine learning | By an expert Orthodontist Scientist and by an oral and maxillofacial surgeon | Automatic ROI extraction by Aggregate channel features object detector | Not needed | CNN | Accuracy | 0.84 |
Intra-examiner agreement (wk) | 0.95 | |||||||||
Inter-examiner agreement | 0.90 | |||||||||
Recall (per class) | (0.52–0.77) | |||||||||
Precision (per class) | (0.55–0.78) | |||||||||
F1 score (per class) | (0.55–0.76) | |||||||||
MobileNet V2 | Accuracy (with directional filters) | 0.69 | ||||||||
exclusion: poor quality, head and neck malformation | ResNet101 | Accuracy (with directional filters) | 0.68 | |||||||
Xception | Accuracy (with directional filters) | 0.71 | ||||||||
SVM | Accuracy (with directional filters) | 0.60 | ||||||||
Radwan, M., et al. 2022 [41] | Cephalograms | 1501 (1201/150/150) | Inclusion: patients between 7–25 years | Deep learning | By an orthodontic resident | Automatic ROI extraction using U-net | NA | Alex-net | ICC | 0.97 |
exclusion: artifacts, incomplete C2, C3 or C4, syndromes affecting maxillofacial, incorrect head position | Kappa coefficient | 0.87 ± 0.027 | ||||||||
Accuracy (per class) | (0.80–0.91) | |||||||||
Sensitivity (per class) | (0.45–0.98) | |||||||||
Specificity (per class) | (0.75–0.94) | |||||||||
F1 score (per class) | (0.57–0.90) | |||||||||
Xie, L., et al. 2021 [42] | CBCT | 231 | Inclusion: no history of systemic or physiological disorders, no history of trauma or surgery in the dentofacial region and reliable CBCT scans, female, 7–17 years old | Machine learning | By three orthodontists | Reorientation, MPR mode, extracting and crafting the features (measurement between landmarks) | NA | LR | Accuracy | 0.87 |
AUC | 0.94 | |||||||||
Kok, H., et al. 2019 [13] | Cephalograms | 300 (fivefold cross validation) | Inclusion: 8–17 years,balance quality,clear c2/c3/c4, no trauma, operation, congenital or acquired malformations in the head and neck area, no history of orthodontic treatment, no disorder interposing with bone development, no systemic disease or growth and development retardation | Deep learning, Machine learning | By an orthodontist | Extracting and crafting the features (measurement between landmarks) | NA | DT | Classification accuracy (per class) | (0.83–0.99) |
AUC (per class) | (0.71–0.98) | |||||||||
F1 score (per class) | (0.42–0.97) | |||||||||
Precision (per class) | (0.40–0.97) | |||||||||
Recall (per class) | (0.40–0.98) | |||||||||
kNN | Classification accuracy (per class) | (0.81–0.92) | ||||||||
AUC (per class) | (0.81–0.95) | |||||||||
F1 score (per class) | (0.44–0.82) | |||||||||
Precision (per class) | (0.48–0.78) | |||||||||
Recall (per class) | (0.38–0.86) | |||||||||
SVM | Classification accuracy (per class) | (0.88–0.95) | ||||||||
AUC (per class) | (0.90–0.99) | |||||||||
F1 score (per class) | (0.50–0.91) | |||||||||
Precision (per class) | (0.51–0.84) | |||||||||
Recall (per class) | (0.50–0.98) | |||||||||
RF | Classification accuracy (per class) | (0.83–0.97) | ||||||||
AUC (per class) | (0.84–0.99) | |||||||||
F1 score (per class) | (0.39–0.95) | |||||||||
Precision (per class) | (0.40–0.91) | |||||||||
Recall (per class) | (0.38–0.98) | |||||||||
Neural network | Classification accuracy (per class) | (0.85–0.97) | ||||||||
AUC (per class) | (0.90–0.99) | |||||||||
F1 score (per class) | (0.48–0.95) | |||||||||
Precision (per class) | (0.47–0.93) | |||||||||
Recall (per class) | (0.50–0.97) | |||||||||
Naïve Bayes | Classification accuracy (per class) | (0.95–0.83) | ||||||||
AUC (per class) | (0.85–0.98) | |||||||||
F1 score (per class) | (0.38–0.88) | |||||||||
Precision (per class) | (0.44–0.92) | |||||||||
Recall (per class) | (0.33–0.85) | |||||||||
LR | Classification accuracy (per class) | (0.81–0.90) | ||||||||
AUC (per class) | (0.81–0.96) | |||||||||
F1 score (per class) | (0.25–0.75) | |||||||||
Precision (per class) | (0.36–0.75) | |||||||||
Recall (per class) | (0.19–0.98) | |||||||||
Sokic, E., et al. 2012 [43] | Cephalograms | 211 | Inclusion: 8–16 years | Machine learning | By orthodontists | Prescaling, bilinear projective transformation, special markers, extracting and crafting the features (measurement between landmarks) | NA | Fuzzy C means | Accuracy | 0.70 |
Xie, L., et al. 2022 [44] | CBCT | 709 (447/-/262) | Inclusion: 7–19 years, no history of systemic or physiological syndromes, no history of trauma or surgery in the dentofacial area, dependable and suitable CBCT scans | Statistical modeling | By three orthodontists | Reorientation, MPR mode, extracting and crafting the features (measurement between landmarks) | NA | LR | Agreement percentage | 0.88 |
Kappa coefficient | 0.90 | |||||||||
AUC | 0.96 | |||||||||
ICC (range) | (0.94–0.99) | |||||||||
Yang, Y. M., et al. (2014) [45] | CBCT | 121 | Inclusion: 6–18 years | Statistical modeling | By an investigator | NA | NA | Regression models | R \(2\) (Female-male) | (0.84–0.9) |
exclusion: cleft lip and/or palate, trauma, or syndromes | ||||||||||
Baptisa, R. S., et al. 2012 [46] | Cephalograms | 188 (tenfold cross validation) | NA | Machine learning | By specialist in orthodontics and radiology and a specialist in orthodontics and then by an examiner using a software | Extracting and crafting the features (measurement between landmarks), cropping | NA | Naïve bayes 1 | Kappa coefficient | 0.99 ± 0.019 |
Accuracy | 0.90 | |||||||||
Feng, X., et al. (2021) [47] | Cephalograms and CBCT | 60 | Inclusion: 8–16 years, in the age of growth and development; Exclusion: unclear CBCT, incomplete C2 and C4, history of craniofacial deformity, syndrome affecting the shape of the cervical spine, intense systemic STDs | Rule-based AI | By a researcher with three years of experience in CVM assessment | Otsu’s method, three-dimensional least squares method, superpixel segmentation, And marking the selected points automatically with morphological algorithm and manual method, extracting and crafting the features (measurement between landmarks) | NA | Decision Tree | Kappa coefficient | 0.87 |
Gamma value | 0.99 |