A retrospective analysis of the data from the Zhejiang Provincial People’s Hospital in Hangzhou, Zhejiang, China was conducted and approved by the Zhejiang Provincial People’s Hospital Medical Ethics Committee, and all methods were conducted in accordance with the principles of the Declaration of Helsinki. In this study, 155 nodules collected from 148 patients were described as complex C-SNs and ACR BI-RADS category 4a and above in the US reports from 2018 to 2021. Data and images collected in retrospective study as a training group. In addition, this study prospectively collected data and images related to 76 nodules from 72 patients as a test group according to the requirements of the training group. The above data were obtained after surgical or percutaneous puncture and the corresponding pathological results were obtained. Written information was provided and informed consent was obtained from all subjects.
Inclusion and Exclusion Criteria 1. Inclusion Criteria: (a) All breast nodules had a BI-RADS classification of 4a and above; (b) All nodules were described as C-SNs in the ultrasound reports; (c) All nodes were operated on for complete pathological results. 2. Exclude candidates: (a) A nodule with unclear pathological findings was found; (b) Incomplete or missing clinical and imaging information; (c) Radiotherapy, chemotherapy and puncture biopsy were administered to patients before US study; (d) At the time of the US, the patient was breastfeeding.
After selection, 177 nodules from 170 patients were included in the study. 109 nodes from the ultrasound image archiving and communication system (PACS) retrospective data were selected for the training group, including 74 benign glands and 35 malignant glands. A total of 68 glands from prospective data were selected for the test group, including 38 benign glands and 30 malignant glands, of which only 59 images of glands were selected for the DL test group because some images did not meet the requirements. The details can be seen in Fig. 1. According to the ACR BI RADS classification, 121 (72%) nodes were classified as grade 4a, 23 nodes (14%) were classified as grade 4b, 15 nodes (9%) were classified as grade 4c, and 9 nodes (5 %) were classified as Grade 5.
Clinical data and collection of ultrasound features
Clinical features include age, height, weight, breast feeding history, menopausal history, and family history. Among them, age, height, and weight were continuous variables, and lactation history, menopausal history, and family history were categorical variables. The color ultrasound Doppler images were obtained from multiple ultrasound diagnostic devices, including Philips Epic 5 ultrasound system (Philips Medical Systems, Bothell, WA, USA), Supersonic Aixplorer ultrasound system (Supersonic Imagine, Provence, France), Mindray Resona 7 and Mindray DC-8 (Mindray, Shenzhen, China). This study used a high-frequency line array probe with a center frequency ≥ 12 MHz, using color Doppler mode only (excluding other blood flow imaging modes such as energy Doppler) which requires clear color signals, less noise and color overflow, with a scaler color of red/blue and a scale of 4–8 cm/s. Ultrasound features were extracted and assessed by 2 breast ultrasound experts with more than 5 years experience in breast ultrasound diagnosis respectively, and the extracted features included: margin, lesion shape, distribution, aspect ratio, cyst solid component distribution, cystic fluid transfer, Cystic -solid intersection, presence of spongy structures/capsules, microcalcification, internal vascularity/BF, and the above features were assessed and subjected to dichotomous variations (negative as 0 and positive as 1). Any disagreement about the suitability of a trial for inclusion in the review was resolved by consensus through discussion. In this study, the above training group characteristics were used as independent variables and the benign and malignant outcomes were used as dependent variables. Multiple logistic regression was used to produce traditional statistical models and screen independent predictors, and the test group was used to test model accuracy and calculate ROC curves.
In this study, retrospective survey data and images from January 2018-June 2021 were used as training group (including 25% randomized data as validation group to guide hyperparameter selection), and prospectively collected survey data and images from July 2021 – August 2022 was used as the independent test group. This study used Resnet50 as a pre-trained model.
In this study, the size of the input image was cropped to 224*224 pixels and normalized, the batch size was 64, and the training cycle was 30 rounds. To reduce the effects of overfitting and sample imbalance, the training group images are scaled, randomly rotated, randomly cropped, adjusted contrast, adjusted hue, and adjusted saturation using the data enhancement mode, and the number of training samples is significantly increased after data enhancement, by 1260 images in the malignant group and 1404 images in the benign group after expansion. Continuously update the model parameters by forward calculation and backward propagation and calculate the loss function. Validate the training model with images of independent test groups, produce ROC curves and PR curves, and plot confusion matrix。
Clinical and ultrasound features combined DL
In this study, two variables derived from the predicted values of the traditional statistical model and the predicted values of DL were brought into a new logistic regression equation as independent variables. We calculated the predicted values of the CM and plotted the ROC curve. The areas under the ROC curves of these three models were compared to verify the accuracy of the models and to filter out the superior models. The model building process is shown in Fig. 6.
Study of supporting functions of CM
In addition, in our study, we selected 2 sonographers with 3 and 5 years of breast ultrasound experience to identify benign and malignant nodules in the test group, with independent diagnosis in the first round and rediagnosis in combination with CM diagnosis in the second round. Their diagnostic accuracy with and without CM assistance was also compared.
In this study, data were classified into training and test groups, each of which was divided into benign and malignant groups, and basic data on clinical and ultrasound characteristics were compared. Quantitative data were compared using T test of Mann-Whitney YOU, and qualitative data were compared using a chi-square test. Consistency test of the results assessed by two breast ultrasound experts using the Kappa method. Traditional statistical models used multiple logistic regression equations. AUC values are used to compare the performance of the diagnostic capabilities of the three models. The Hanley & McNeil method was used to compare the diagnostic efficacy of the two sonographers before and after using the CM. P values less than 0.05 in all statistical data were considered statistically significant. All statistical analyzes were performed using SPSS (SPSS 23.0, SPSS Inc., Chicago, IL), R studio (based on R 4.2.1), Anaconda 3 (python 3.9).